Commit Graph

22457 Commits (f041dc80de4abbdd0909d871bf64f3f87d2350ff)

Author SHA1 Message Date
Linus Torvalds f3be0c984e Quick follow up, nothing super urgent here. Main reason I'm sending
this out is because the IPsec and Bluetooth PRs did not make it
 yesterday. I don't want to have to send you all of this + whatever
 comes next week, for rc7. The fixes under "Previous releases -
 regressions" are for real user-reported regressions from v7.0.
 
 Previous releases - regressions:
 
  - Revert "ipv6: preserve insertion order for same-scope addresses"
 
  - xfrm: move policy_bydst RCU sync, a fix which added a sync RCU
    on netns exit got backported to stable and was causing serious
    accumulation of dying netns's for real workloads
 
  - pcs-mtk-lynxi: fix bpi-r3 serdes configuration
 
 Previous releases - always broken:
 
  - usual grab bag of race, locking and leak fixes for Bluetooth
 
  - handful of page handling fixes for IPsec
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmoZ+wAACgkQMUZtbf5S
 IrtXJQ/9Gwf702wvkRaeLdqrwQ/qLsvDfx5s+3ALIE0Xsm4z9g7V0XKrZ0cfiI1h
 aWGX8HugXEQuy9QvlFt09tgGEd76159g2WdlsBbh1raqiJRUw4GJKXYvwCmBZxsT
 o8bwfVTQ8CVmUTCKhYrpzJKroT6jR8dKHIrkRn5ZyBOBPMOhK8rnDs1OdseW5haI
 b/EkQrzzvTxd7/dJETIJszMQh/nbS5XIlKpQ+f7dfzR1gtO2GOJ24VWqrimonRTo
 qvMwyt+ca2axv7Af796I8mz7X9rqLjWVWzY2uSpd7Y5zITyQwHNbeNvxzr2Ivi4g
 2BcIi+ZHeeRbgQ9EL+rzapTnnIPIw0APPXnp5NnnNDj0RRG3G6PzulW9SmcdsmGD
 o6E7axSZPQT/KnCw1/N7uMfB9cPzgb1i0h8rbE6tCvtkDtJwECtey7Dc7RU9zLqP
 e0jWDv99+MyEqGPcu2LAg2IWLfsuQiV4priy4mM1NgOTQVgS1yw7+x0GiTqiClJ0
 GcOCTOdvYKlmzhLzsLo4I+AcKZq2uJi8wNXMUEP5pmuYByVeF5j+MmoFpQspzx+L
 gdUh9IctAjd47oX/uNaRtocOriU+JJEApToE9WekMb0XYd5Qx1jnt3WqB9ZFuDf4
 smjUirtAWYcT3d4SXR4wGzB5WEa8TITH07A7sa8noozzNmQRu1E=
 =ttPc
 -----END PGP SIGNATURE-----

Merge tag 'net-7.1-rc6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull more networking fixes from Jakub Kicinski:
 "Quick follow up, nothing super urgent here. Main reason I'm sending
  this out is because the IPsec and Bluetooth PRs did not make it
  yesterday. I don't want to have to send you all of this + whatever
  comes next week, for rc7. The fixes under "Previous releases -
  regressions" are for real user-reported regressions from v7.0.

  Previous releases - regressions:

   - Revert "ipv6: preserve insertion order for same-scope addresses"

   - xfrm: move policy_bydst RCU sync, a fix which added a sync RCU on
     netns exit got backported to stable and was causing serious
     accumulation of dying netns's for real workloads

   - pcs-mtk-lynxi: fix bpi-r3 serdes configuration

  Previous releases - always broken:

   - usual grab bag of race, locking and leak fixes for Bluetooth

   - handful of page handling fixes for IPsec"

* tag 'net-7.1-rc6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (36 commits)
  wireguard: send: append trailer after expanding head
  Revert "ipv6: preserve insertion order for same-scope addresses"
  net: skbuff: fix pskb_carve leaking zcopy pages
  ipv6: fix possible infinite loop in fib6_select_path()
  ipv6: fix possible infinite loop in rt6_fill_node()
  bpf: sockmap: fix tail fragment offset in bpf_msg_push_data
  vsock/virtio: bind uarg before filling zerocopy skb
  Revert "esp: fix page frag reference leak on skb_to_sgvec failure"
  net: pcs: pcs-mtk-lynxi: fix bpi-r3 serdes configuration
  sctp: fix race between sctp_wait_for_connect and peeloff
  net: mana: Skip redundant detach on already-detached port
  net: mana: Add NULL guards in teardown path to prevent panic on attach failure
  Bluetooth: hci_sync: Reset device counters in hci_dev_close_sync()
  Bluetooth: hci_sync: Set HCI_CMD_DRAIN_WORKQUEUE during device close
  Bluetooth: hci_core: Rework hci_dev_do_reset() to use hci_sync functions
  Bluetooth: ISO: serialize iso_sock_clear_timer with socket lock
  Bluetooth: ISO: fix UAF in iso_recv_frame
  Bluetooth: L2CAP: Fix possible crash on l2cap_ecred_conn_rsp
  Bluetooth: l2cap: clear chan->ident on ECRED reconfiguration success
  Bluetooth: hci_qca: Use 100 ms SSR delay for rampatch and NVM loading
  ...
2026-05-29 15:46:40 -07:00
Linus Torvalds d0ee290071 Arm:
- Restore CONFIG_PKVM_DISABLE_STAGE2_ON_PANIC to its former glory by
   making sure the config symbol is correctly spelled out in the code
 
 - Don't reset the AArch32 view of the PMU counters to zero when the
   guest is writing to them
 
 - Fix an assorted collection of memory leaks in the newly added tracing
   code
 
 - Fix the capping of ZCR_EL2 which could be used in an unsanitised way
   by an L2 guest
 
 x86:
 
 - Include the kernel's linux/mman.h in KVM selftests to ensure MADV_COLLAPSE
   is defined, as older libc versions may not provide it.
 
 - Include execinfo.h if and only if KVM selftests are building against glibc,
   and provide a test_dump_stack() for non-glibc builds.
 
 - Silence an annoying RCU splat on (even non-KVM-related) panics.  The splat
   is technically legit, but in practice not an issue.  To have a race, you
   would need to unload the KVM modules at exactly the time a panic happens;
   and speaking of incredibly rare races, taking the locks risks introducing
   a deadlock if the module unload code took the lock on a CPU that has been
   halted.  Which seems possibly more likely than the RCU grace period issue,
   so just shut it up.  This code used to be in KVM but is now outside it;
   but the x86 maintainers haven't picked it up, so here we are.
 
 - Rate-limit global clock updates once again (but without delayed work), as
   KVM was subtly relying on the old rate-limiting for NPT correction to guard
   against "update storms" when running without a master clock on systems with
   overcommitted CPUs.
 
 - Fix a brown paper bag goof where KVM checked if ERAPS is "dirty" instead of
   marking it dirty when emulating INVPCID.
 
 - Flush the TLB when transitioning from xAVIC => x2AVIC to ensure the CPU TLB
   doesn't contain AVIC-tagged entries for the APIC base GPA.
 
 - The top 10 commits fix buffer overflow (and potential TOC/TOU) flaws in the
   page state change protocol for encrypted VMs.  AI models find it quite
   easily given it was reported three times, but aren't as good at writing
   a comprehensive fix.  There's more to clean up in the area, which will
   come in 7.2.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCgAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmoZ2qQUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroPPFQgAhDwSk+VVnn4vuerijZh6eo3Tz4EQ
 af0Ccng1uDuTuz9HzkF/ffR4z3tBMYtUhVtUiPu5xrUabzmIW7T0roNvsCwzVZor
 ekZt3Y8FgwSgF+nxbBQQXBPvv+tOHpoIhfbirftWE9tRRFivfK1Z1duRGwsv7Seb
 0eK+iB1huJLjXqIZQtSLEY44LSoQbDIt/StkkYFLUr10oOvTRCFiu2wPA2gZrK56
 KTVrCg7rtn135wh8TVA72u+pIszylIPFTQ1HbbzzBoQ8/Opp0olFL3q0HeAwkx6D
 q0EJiNMP0QD8NDC7Q8efAit4wI0pXE4Y6ScHQJTm3p+hB6KXc9o7LKbCmA==
 =6jit
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "arm64:

   - Restore CONFIG_PKVM_DISABLE_STAGE2_ON_PANIC to its former glory by
     making sure the config symbol is correctly spelled out in the code

   - Don't reset the AArch32 view of the PMU counters to zero when the
     guest is writing to them

   - Fix an assorted collection of memory leaks in the newly added
     tracing code

   - Fix the capping of ZCR_EL2 which could be used in an unsanitised
     way by an L2 guest

  x86:

   - Include the kernel's linux/mman.h in KVM selftests to ensure
     MADV_COLLAPSE is defined, as older libc versions may not provide
     it.

   - Include execinfo.h if and only if KVM selftests are building
     against glibc, and provide a test_dump_stack() for non-glibc
     builds.

   - Silence an annoying RCU splat on (even non-KVM-related) panics.

     The splat is technically legit, but in practice not an issue. To
     have a race, you would need to unload the KVM modules at exactly
     the time a panic happens; and speaking of incredibly rare races,
     taking the locks risks introducing a deadlock if the module unload
     code took the lock on a CPU that has been halted. Which seems
     possibly more likely than the RCU grace period issue, so just shut
     it up. This code used to be in KVM but is now outside it; but the
     x86 maintainers haven't picked it up, so here we are.

   - Rate-limit global clock updates once again (but without delayed
     work), as KVM was subtly relying on the old rate-limiting for NPT
     correction to guard against "update storms" when running without a
     master clock on systems with overcommitted CPUs.

   - Fix a brown paper bag goof where KVM checked if ERAPS is "dirty"
     instead of marking it dirty when emulating INVPCID.

   - Flush the TLB when transitioning from xAVIC => x2AVIC to ensure the
     CPU TLB doesn't contain AVIC-tagged entries for the APIC base GPA.

   - The top 10 commits fix buffer overflow (and potential TOC/TOU)
     flaws in the page state change protocol for encrypted VMs. AI
     models find it quite easily given it was reported three times, but
     aren't as good at writing a comprehensive fix. There's more to
     clean up in the area, which will come in 7.2"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (22 commits)
  KVM: SEV: Use READ_ONCE() when reading entries/indices from PSC buffer
  KVM: SEV: Check PSC request indices against the actual size of the buffer
  KVM: SEV: Don't explicitly pass PSC buffer to snp_begin_psc()
  KVM: SEV: WARN if KVM attempts to setup scratch area with min_len==0
  KVM: SEV: Compute the correct max length of the in-GHCB scratch area
  KVM: SEV: Use the size of the PSC header as the minimum size for PSC requests
  KVM: SEV: Ignore Port I/O requests of length '0'
  KVM: SEV: Reject MMIO requests larger than 8 bytes with GHCB v2+
  KVM: SEV: Ignore MMIO requests of length '0'
  KVM: SEV: Require in-GHCB scratch area if GHCB v2+ is in use
  KVM: arm64: Correctly cap ZCR_EL2 provided by a guest hypervisor
  KVM: arm64: Fix memory leak in hyp_trace_unload()
  KVM: arm64: Fix rollback in hyp_trace_buffer_share_hyp()
  KVM: arm64: Fix meta-page unsharing in pKVM hyp tracing
  KVM: arm64: PMU: Preserve AArch32 counter low bits
  KVM: SVM: Flush the current TLB when transitioning from xAVIC => x2AVIC
  KVM: x86: Fix ERAPS RAP clear on INVPCID single-context invalidation
  KVM: arm64: Fix CONFIG_PKVM_DISABLE_STAGE2_ON_PANIC
  KVM: selftests: Guard execinfo.h inclusion for non-glibc builds
  KVM: x86: Rate-limit global clock updates on vCPU load
  ...
2026-05-29 13:47:55 -07:00
Fernando Fernandez Mancera 072aa0f5c3 Revert "ipv6: preserve insertion order for same-scope addresses"
Chris Adams reported that preserving insertion order for same-scope
addresses is causing SSH connections to be dropped after stopping a VM
while running NetworkManager.

NetworkManager caches the IPv6 address configuration, when a RA arrives,
it determines the list of addresses to configure and checks if the
addresses are already in the right order in the kernel. If they aren't,
NetworkManager removes and re-adds them to achieve the desired order.

As the order changes, NetworkManager is confused and reconfigures the
addresses on every update. In addition, this would also affect to cloud
tooling that relies on IPv6 addresses order to identify primary and
secondaries addresses.

This reverts commit cb3de96eea.

Fixes: cb3de96eea ("ipv6: preserve insertion order for same-scope addresses")
Reported-by: Chris Adams <linux@cmadams.net>
Closes: https://lore.kernel.org/netdev/20260521135310.GC977@cmadams.net/
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Link: https://patch.msgid.link/20260529112357.5079-1-fmancera@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-29 13:00:54 -07:00
Paolo Bonzini b397897016 KVM x86 fixes for 7.1-rcN
- Include the kernel's linux/mman.h in KVM selftests to ensure MADV_COLLAPSE
    is defined, as older libc versions may not provide it.
 
  - Include execinfo.h if and only if KVM selftests are building against glibc,
    and provide a test_dump_stack() for non-glibc builds.
 
  - Fudge around an RCU splat in the emegerncy reboot code that is technically
    a legitimate flaw, but in practice is a non-issue and fixing the flaw, e.g.
    by adding locking, would incur meaningful risk, i.e. do more harm than good.
 
  - Rate-limit global clock updates once again (but without delayed work), as
    KVM was subtly relying on the old rate-limiting for NPT correction to guard
    against "update storms" when running without a master clock on systems with
    overcommitted CPUs.
 
  - Fix a brown paper bag goof where KVM checked if ERAPS is "dirty" instead of
    marking it dirty when emulating INVPCID.
 
  - Flush the TLB when transitioning from xAVIC => x2AVIC to ensure the CPU TLB
    doesn't contain AVIC-tagged entries for the APIC base GPA.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmoZtdgACgkQOlYIJqCj
 N/26sw/+IWOA5AxyoNW/lKAhhkzDTzGWrNCQkpMv+F4tOUbHYniTxI/pv4L3eMvf
 ZLUXijYxhpJtnblLtrnPpSFl5tll4xQdMUv7+fljgpYmy6+erQHodtCgRi5wHDbM
 NlD7DWOgwmpvzYLcybq1RfjZ3n+OBRvq95haQ6Ph4FtoYuIomtJ5tF2mnMlyxlc/
 aIK5wzQ/JeYdQxwwz1ctlHkgE5bPnS+Sxr33+MRFQ5cIpuwdoS9zYRITNBM107kg
 bLeei8Cxh91sgEidgwS8JToLvaEQH8AodkROjcScllwUxYsshPKsHeH7sTMbCOVd
 DiH9VbheZo7d4kb6pvhGsY891ec00dR5E/l2gZYLWHg4v0lINTw6uBdoJuq3t2TO
 Q3KmGVaUWz+c6dY/0qntVpws35zG106S8Pp4mx/1EnUHbJKZYDsUMC1ppwhrr3Pz
 WEyQ9PFXhOyoSbrtOaEfU+wsFPeAfT9eYADu7oV1t7l75TJAKW1EEaSGfzOO/crj
 3GK3vRq2B1cMHX9c4fwhSs4h8k5JvKlI/mtGPxZN3khVorx9dv/rTqOoeQEsFS5+
 8s5XcNPPJlKfNXcu3Jq6rn8U/JA2HnbH298Nk5uXTCfTrZtDgbOnI8YVYWnoadOl
 8xJoie5ccEsysVj1npNNh61LNMF1XBUUC+eNn0I1o0NzeRauxF8=
 =QQUn
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-fixes-7.1-rc6' of https://github.com/kvm-x86/linux into HEAD

KVM x86 fixes for 7.1-rcN

 - Include the kernel's linux/mman.h in KVM selftests to ensure MADV_COLLAPSE
   is defined, as older libc versions may not provide it.

 - Include execinfo.h if and only if KVM selftests are building against glibc,
   and provide a test_dump_stack() for non-glibc builds.

 - Fudge around an RCU splat in the emegerncy reboot code that is technically
   a legitimate flaw, but in practice is a non-issue and fixing the flaw, e.g.
   by adding locking, would incur meaningful risk, i.e. do more harm than good.

 - Rate-limit global clock updates once again (but without delayed work), as
   KVM was subtly relying on the old rate-limiting for NPT correction to guard
   against "update storms" when running without a master clock on systems with
   overcommitted CPUs.

 - Fix a brown paper bag goof where KVM checked if ERAPS is "dirty" instead of
   marking it dirty when emulating INVPCID.

 - Flush the TLB when transitioning from xAVIC => x2AVIC to ensure the CPU TLB
   doesn't contain AVIC-tagged entries for the APIC base GPA.
2026-05-29 19:28:16 +02:00
Linus Torvalds 3101173200 cxl fixes for v7.1-rc6
cxl/test: Update mock dev array before calling platform_device_add()
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5DAy15EJMCV1R6v9YGjFFmlTOEoFAmoZsZYACgkQYGjFFmlT
 OEqWGQ//dw5RvgcSxTF9+bUE/kyBJD37UfsmUXMAbC8AhVbBgQsXaWUM72azi/s2
 Ht+lYgQbtyGC5RmK+A0GCOLh6UtYP3WW49eii7otbJUafdbXHwNiTxdkUXCqX2Mn
 voE5Sw8bClNq3g2dIQuXDAPi0AFYzpMpvqjyzB8v1BzxqrFCZm8IUIubjcPwsan1
 E6vQZ0arAGBqgWsbYyOkvWHHyhKIp7ymI2yQ8xljjoPeqHvcVbpFeJUSwS1XDh0d
 HciWK0VQZhRvonP5xM7lJLN5RMIBXBnPk98LWQO4xgmxuKOxVkxapsHY2yfrzkE3
 PFqlVgddPN/wT3XgPXP/1B2TvqqPJEUHImU8YlMNL3PT6IJhhyNcf5qbvr29rdSq
 T2ysQCDrBZDXe4FqvVc02JkUuY9/yJ83Z4CY6iTRCSUIuAMk5dL0Y4dOA/O1hLbc
 fnxbbj0bV9tgpjt2bOyPqcs/3p7eRkteJHJ8FoCyZFBdGk2hROoyD3vvCH8DI10/
 kZ/ZUnLuBYxRIDwSIbv9yIsTKHz9eO58gBrf3zEm/1HuWY0e4R4CCllGKrwiBJ89
 Q+TpJwCWQ7nwyf5uhI3WK2dAAZO22U+rW1noGLbbRTDX66HSFVk1dUXqPw9p7V4R
 AiaJksa0apPSp69wiVeBy2rVZz3lxSnRlvCvKra7qZ26N45iqFw=
 =a0MQ
 -----END PGP SIGNATURE-----

Merge tag 'cxl-fixes-7.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl

Pull Compute Express Link (CXL) fixes from Dave Jiang:

 - cxl/test: update mock dev array before calling platform_device_add()

* tag 'cxl-fixes-7.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
  cxl/test: Update mock dev array before calling platform_device_add()
2026-05-29 10:04:09 -07:00
Linus Torvalds 3e20009988 Including fixes from netfilter.
Current release - regressions:
 
   - netfilter: walk fib6_siblings under RCU
 
 Previous releases - regressions:
 
   - netlink: fix sending unassigned nsid after assigned one
 
   - bridge: fix sleep in atomic context in netlink path
 
   - sched: fix ethx:ingress -> ethy:egress -> ethx:ingress mirred loop
 
   - ipv4: fix net->ipv4.sysctl_local_reserved_ports UaF
 
   - eth: tun: free page on short-frame rejection in tun_xdp_one()
 
 Previous releases - always broken:
 
   - skbuff: fix missing zerocopy reference in pskb_carve helpers
 
   - handshake: drain pending requests at net namespace exit
 
   - ethtool:
     - rss: avoid modifying the RSS context response
     - module: avoid leaking a netdev ref on module flash errors
     - coalesce: cap profile updates at NET_DIM_PARAMS_NUM_PROFILES
 
   - netfilter: fix dst corruption in same register operation
 
   - nfc: hci: fix out-of-bounds read in HCP header parsing
 
   - ipv6: exthdrs: refresh nh pointer after ipv6_hop_jumbo()
 
   - eth: vti: use ip6_tnl.net in vti6_changelink().
 
   - eth: vxlan: do not reuse cached ip_hdr() value after skb_tunnel_check_pmtu()
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmoYVTISHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkYfoP/jBxdUf2IirOjl/vjJFm7cXzcCdTWreb
 HmlvVRPF0YDuwQEjaZA+Ed/+wi0QIiyckI60Ltpfz9DbSm3ugstfUxPNWKVb5HZQ
 TI1diAa+uTmaXndC5Kb56U/KNMcMZOJ0FZwHheU2mC/7USpB9S/gaGYf2vxCOF9B
 huMrCuvoHhASxaL6W1xyYR3P4ouGS9XoQU/sGRWAynpi45BZdFF/Y8W2YrCk0IKc
 SwkWbId2Ek6/2+f3pWKYbE88UEjpNh2U6K+kcAgy/UN3N0+tb91kuOrn/5Z+WjE7
 3ZdEBvALj6K0P7BxsR64M1ikVgm2KcZAn8UH5UOqkzlP3VGWHYbbk/4KvEGD1oJF
 p0lauztIkPPdq16Dau8v+KHw5UU4vBpEDo3323hh7kcSIu7cJkWSVxo7/WDjokzT
 HlIZtzKpXwCUSSCNmV3y3zXR/Xl41HOzU5lZv6f8P2hkMfyIu9te9lXF6Foc6r2u
 Ng0oVkevURpGhqpKQKxRtaApPrfOCYFkN4aVzvm5haxhFcughJZmQcjVbu03l4CM
 /nddhYop7D2NdnZzSdlBO1bK/KBebZCYlSKZJGjdL7zqIOQAjjw9UoW0rU+84pkU
 dcvFBPm+iWAhvwWEGaUrnuNcYth/umNMTzC4domLUyPrVydSUH0zi0RQYc9mXffR
 EvWEj952b4o0
 =IBwj
 -----END PGP SIGNATURE-----

Merge tag 'net-7.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "This is again significantly bigger than the same point into the
  previous cycle, but at least smaller than last week.

  I'm not aware of any pending regression for the current cycle.

  Including fixes from netfilter.

  Current release - regressions:

    - netfilter: walk fib6_siblings under RCU

  Previous releases - regressions:

    - netlink: fix sending unassigned nsid after assigned one

    - bridge: fix sleep in atomic context in netlink path

    - sched: fix ethx:ingress -> ethy:egress -> ethx:ingress mirred loop

    - ipv4: fix net->ipv4.sysctl_local_reserved_ports UaF

    - eth: tun: free page on short-frame rejection in tun_xdp_one()

  Previous releases - always broken:

    - skbuff: fix missing zerocopy reference in pskb_carve helpers

    - handshake: drain pending requests at net namespace exit

    - ethtool:
       - rss: avoid modifying the RSS context response
       - module: avoid leaking a netdev ref on module flash errors
       - coalesce: cap profile updates at NET_DIM_PARAMS_NUM_PROFILES

    - netfilter: fix dst corruption in same register operation

    - nfc: hci: fix out-of-bounds read in HCP header parsing

    - ipv6: exthdrs: refresh nh pointer after ipv6_hop_jumbo()

    - eth:
       - vti: use ip6_tnl.net in vti6_changelink().
       - vxlan: do not reuse cached ip_hdr() value after
         skb_tunnel_check_pmtu()"

* tag 'net-7.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (94 commits)
  dpll: zl3073x: make frequency monitor a per-device attribute
  dpll: zl3073x: use __dpll_device_change_ntf() and remove change_work
  dpll: export __dpll_device_change_ntf() for use under dpll_lock
  net/handshake: Drain pending requests at net namespace exit
  net/handshake: Verify file-reference balance in submit paths
  net/handshake: Close the submit-side sock_hold race
  net/handshake: hand off the pinned file reference to accept_doit
  net/handshake: Take a long-lived file reference at submit
  net/handshake: Pass negative errno through handshake_complete()
  nvme-tcp: store negative errno in queue->tls_err
  net/handshake: Use spin_lock_bh for hn_lock
  net: skbuff: fix missing zerocopy reference in pskb_carve helpers
  net: hibmcge: move dma_rmb() after dma_sync_single_for_cpu() in RX path
  net: hibmcge: disable Relaxed Ordering to fix RX packet corruption
  selftests/tc-testing: Add netem test case exercising loops
  selftests/tc-testing: Add mirred test cases exercising loops
  net/sched: act_mirred: Fix return code in early mirred redirect error paths
  net/sched: act_mirred: Fix blockcast recursion bypass leading to stack overflow
  net/sched: Fix ethx:ingress -> ethy:egress -> ethx:ingress mirred loop
  net/sched: fix packet loop on netem when duplicate is on
  ...
2026-05-28 13:13:48 -07:00
Victor Nogueira 0f6e00aa5f selftests/tc-testing: Add netem test case exercising loops
Add a netem nested duplicate test case to validate that it won't
cause an infinite loop

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Link: https://patch.msgid.link/20260525122556.973584-10-jhs@mojatatu.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-05-28 12:26:37 +02:00
Victor Nogueira d38dc56a02 selftests/tc-testing: Add mirred test cases exercising loops
Add mirred loop test cases to validate that those will be caught and other
test cases that were previously misinterpreted as loops by mirred.

This commit adds 12 test cases:

- Redirect multiport: dummy egress -> dev1 ingress -> dummy egress (Loop)
- Redirect singleport: dev1 ingress -> dev1 egress -> dev1 ingress (Loop)
- Redirect multiport: dev1 ingress -> dummy ingress -> dev1 egress (No Loop)
- Redirect multiport: dev1 ingress -> dummy ingress -> dev1 ingress (Loop)
- Redirect multiport: dev1 ingress -> dummy egress -> dev1 ingress (Loop)
- Redirect multiport: dummy egress -> dev1 ingress -> dummy egress, different prios (Loop)
- Redirect multiport: dev1 ingress -> dummy ingress -> dummy egress -> dev1 egress (No Loop)
- Redirect multiport: dev1 ingress -> dummy egress -> dev1 egress (No Loop)
- Redirect multiport: dev1 ingress -> dummy egress -> dummy ingress (No Loop)
- Redirect singleport: dev1 ingress -> dev1 ingress (Loop)
- Redirect singleport: dummy egress -> dummy ingress (No Loop)
- Redirect multiport: dev1 ingress -> dummy ingress -> dummy egress (No Loop)

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Link: https://patch.msgid.link/20260525122556.973584-9-jhs@mojatatu.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-05-28 12:26:36 +02:00
Jamal Hadi Salim b213a4c607 Revert "selftests/tc-testing: Add tests for restrictions on netem duplication"
This reverts commit ecdec65ec7.

The tests added were related to check_netem_in_tree() which was
just reverted in the previous patch.

Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/20260525122556.973584-4-jhs@mojatatu.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-05-28 12:26:36 +02:00
Ido Schimmel 147f3b1f23 selftests: rtnetlink: Add bridge promiscuity tests
Add two test cases that always pass, but trigger sleeping in atomic
context BUGs without "bridge: Fix sleep in atomic context in netlink
path" and "bridge: Fix sleep in atomic context in sysfs path".

Reviewed-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260526064818.272516-4-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-27 17:23:05 -07:00
Li Ming d90f236f8b cxl/test: Update mock dev array before calling platform_device_add()
CXL test environment hits the following error sometimes.

 cxl_mem mem9: endpoint7 failed probe

All mock memdevs are platform firmware devices added by cxl_test module,
and cxl_test module also provides a platform device driver for them to
create a memdev device to CXL subsystem. cxl_test module uses
cxl_rcd/mem_single/mem arrays to store different types of mock memdevs.
CXL drivers calls registered mock functions for a mock memdev by
checking if a given memdev is in these arrays.

When cxl_test module adds these mock memdevs, it always calls
platform_device_add() before adding them to a suitable mock memdev
array. However, there is a small window where CXL drivers calls mock
function for a added memdev before it added to a mock memdev array. In
above case, cxl endpoint driver considers a added memdev was not a mock
memdev, then calling devm_cxl_endpoint_decoders_setup() for it rather
than mock_endpoint_decoders_setup().

An appropriate solution is that adding a new mock device to a mock
device array before calling platform_device_add() for it. It can
guarantee the new mock device is visible to CXL subsystem.

This patch introduces a new helped called cxl_mock_platform_device_add()
to handle the issue, and uses the function for all mock devices addition.

Fixes: 3a2b97b321 ("cxl/test: Improve init-order fidelity relative to real-world systems")
Signed-off-by: Li Ming <ming.li@zohomail.com>
Tested-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Link: https://patch.msgid.link/20260520121457.234404-1-ming.li@zohomail.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2026-05-26 13:54:17 -07:00
Jakub Kicinski f6f1bfc198 netfilter pull request nf-26-05-22
-----BEGIN PGP SIGNATURE-----
 
 iQJdBAABCABHFiEEgKkgxbID4Gn1hq6fcJGo2a1f9gAFAmoQL+UbFIAAAAAABAAO
 bWFudTIsMi41KzEuMTIsMiwyDRxmd0BzdHJsZW4uZGUACgkQcJGo2a1f9gDS/RAA
 rLqfCwVendInfbwAZ/ldOdfRWgpMISHHgG0mYtoSUMtaDs4X5s2azOFk4A3pClZF
 Or8SgVhbBSlrEOSxJKYimH81oqXc15mHpj0BYUcqZsCJoDBAcbwafh2alVKUrzoY
 JaKrwEVgpGH8wXsjExmzXXWsRrQfYGlI+IvoCn2PlfY0Ex7lQdXIv1brpmkkHJ3Q
 Ho9Juf0yywe0sh/tLqH9qG7aDImkafz32lGWjjFkTMaSTqfNH1ENNLWIdskObmpk
 4YVncco8zrrRqHg76HIJdqaY9UminWognlIrlTHbB3G1VoD5fCEHlmREWBPtEE7V
 qqB13ec3DSJgrNY/hR0mwz5TV+dPiD8M39SWtlIhwJIQYOGUgNimsuAR/Nze5Gdl
 j7tedKViS8MOlDINhHhVBVKbrr74B6f4L+5gSIRWUEJKSP/VIqbDQ0AI86FqPUiN
 shEPCOtNxcJDalBALWHqBd7vm50OWiB6ZcqjabOzX5vyHsgRMPsaqYAbHVW36uhP
 NGLQywu7oq+4AKJX6OALEMIP6QtiKtV6q5p93b5ICv0/wlv+S/M/i81PP86DJ7n0
 ZUZ2vKRaupqTObb8Au4Qow96/KRZgBPaZy3lNt35KFKG/xLfOcXF309OfD8xmhjV
 fOCje1LMktxoaMtFuFiLHH8SPn48AwTxZwQglkToMFc=
 =aOa1
 -----END PGP SIGNATURE-----

Merge tag 'nf-26-05-22' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Florian Westphal says:

====================
netfilter: updates for net

Patches 7+8 fix a regression from 7.1-rc1. Everything else
is from 2.6.x to 5.3 releases.  There are additional known
issues with these patches (drive-by-findings in related code).

There are many old bugs all over netfilter and our ability to review
feature patches has come to a complete halt due to lack of time.
There are further security bugs that we cannot address
due to lack of time, maintainers and reviewers.

Other remarks: The xtables 32bit compat interface is already
off in many vendor kernels, the plan is to remove it soon.

1) Prevent RST packets with invalid sequence numbers from forcing TCP
   connections into the CLOSE state without a direction check.
   From Hamza Mahfooz.
2) Re-derive the TCP header pointer after skb_ensure_writable in
   synproxy_tstamp_adjust. Prevent use-after-free and invalid checksum
   updates caused by stale pointers during buffer expansion.
   From Chris Mason.
3) Fix a race condition causing keymap list corruption in conntracks gre/pptp
   helper.
4) Use raw_smp_processor_id() in xt_cpu to prevent splats under
   PREEMPT_RCU.
5) Disable netfilter payload mangling in user namespaces (nft_payload.c
   and nf_queue).
   TCP option mangling via nft_exthdr.c remains enabled.
   There will be followups here to restrict resp. revalidate
   headers.
6) Fix an out-of-bounds read in ebtables's compat_mtw_from_user function.
7) Use list_for_each_entry_rcu() to traverse fib6_siblings in
   nft_fib6_info_nh_uses_dev(). Ensure safe list walking under RCU.
8) Fix an out-of-bounds read in nft_fib_ipv6 caused by incorrect list
   traversal.
9) Add nft_fib_nexthop selftest to netfilter. Cover nexthop enumeration for
    single, group, and multipath route shapes.
    All three nft_fib6 fixes from Jiayuan Chen.
10) Fix destination corruption in shift operations when source and destination
    registers overlap.  Reject partial register overlap for all operations
    from control plane.  From Fernando Fernandez Mancera.

* tag 'nf-26-05-22' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nf_tables: fix dst corruption in same register operation
  selftests: netfilter: add nft_fib_nexthop test
  netfilter: nft_fib_ipv6: handle routes via external nexthop
  netfilter: nft_fib_ipv6: walk fib6_siblings under RCU
  netfilter: ebtables: fix OOB read in compat_mtw_from_user
  netfilter: disable payload mangling in userns
  netfilter: xt_cpu: prefer raw_smp_processor_id
  netfilter: nf_conntrack_gre: fix gre keymap list corruption
  netfilter: synproxy: refresh tcphdr after skb_ensure_writable
  netfilter: conntrack: tcp: do not force CLOSE on invalid-seq RST without direction check
====================

Link: https://patch.msgid.link/20260522104257.2008-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-25 10:37:28 -07:00
Linus Torvalds 6a97c4d526 Arm:
- Fix ITS EventID sanitisation when restoring an interrupt translation
   table.
 
 - Fix PPI memory leak when failing to initialise a vcpu.
 
 - Correctly return an error when the validation of a hypervisor trace
   descriptor fails, and limit this validation to protected mode only.
 
 RISC-V:
 
 - Fix invalid HVA warning in steal-time recording
 
 - Return SBI_ERR_FAILURE to guest upon OOM in pmu_event_info()
   and pmu_snapshot_set_shmem()
 
 - Fix NULL pointer dereference in SBI v0.1 SEND_IPI handler
 
 - Fix sign extension of value for MMIO loads
 
 s390:
 
 - Fix bugs in vSIE (nested virtualization) and UCONTROL, caused by the page
   table rewrite.
 
 x86:
 
 - Apply erratum #1235 workaround (disable AVIC IPI virtualization) on Hygon
   Family 18h, just like on AMD Family 17h.
 
 - When KVM_CAP_X86_APIC_BUS_CYCLES_NS is queried on a specific VM, return
   the VM's configured APIC bus frequency instead of the default.  This
   is less confusing (read: not wrong) and makes it easier to fill in CPUID
   information that communicates the APIC bus frequency to the guest.
 
 Selftests:
 
 - Do not include glibc-internal <bits/endian.h>; it worked by chance and
   broke building KVM selftests with musl.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCgAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmoSp6cUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroPgswgAiO5Gi7d6dIspG7e41g5fF2Wq5Rnq
 1nB7ZV+CqT0k1fvFe4hBrc2c+DLzFn+h3/fj+4scVF4oAN9YRauIq/2xlGWR23bR
 gsFncJ2w6TAKLN3MvCh1SpO+GI7kcnTs7HtJ6weDkddbGEtUIgkUZkwEYnEN4t6T
 pgO7USGFbBBXY575UO/xMeLkfyABzJlLjQbKrvG6RKtEsKAxzTxcPtjQegtHYH4Q
 6DLGif4YUB0ZWMQETccl/bKqU6L+OQgDUOSUoHWt+2ox0DLDwiy7VVf3infecXsJ
 r3PGKn709nlrd+hBn2S9gCbT/BCxp828k2DxSasZ7PQ8634O+qrpLLkODw==
 =VWgs
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "arm64:

   - Fix ITS EventID sanitisation when restoring an interrupt
     translation table.

   - Fix PPI memory leak when failing to initialise a vcpu.

   - Correctly return an error when the validation of a hypervisor trace
     descriptor fails, and limit this validation to protected mode only.

  RISC-V:

   - Fix invalid HVA warning in steal-time recording

   - Return SBI_ERR_FAILURE to guest upon OOM in pmu_event_info() and
     pmu_snapshot_set_shmem()

   - Fix NULL pointer dereference in SBI v0.1 SEND_IPI handler

   - Fix sign extension of value for MMIO loads

  s390:

   - Fix bugs in vSIE (nested virtualization) and UCONTROL, caused by
     the page table rewrite.

  x86:

   - Apply erratum #1235 workaround (disable AVIC IPI virtualization) on
     Hygon Family 18h, just like on AMD Family 17h.

   - When KVM_CAP_X86_APIC_BUS_CYCLES_NS is queried on a specific VM,
     return the VM's configured APIC bus frequency instead of the
     default. This is less confusing (read: not wrong) and makes it
     easier to fill in CPUID information that communicates the APIC bus
     frequency to the guest.

  Selftests:

   - Do not include glibc-internal <bits/endian.h>; it worked by chance
     and broke building KVM selftests with musl"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: SVM: Disable AVIC IPI virtualization on Hygon Family 18h (erratum #1235)
  KVM: selftests: Verify that KVM returns the configured APIC cycle length
  KVM: x86: Return the VM's configured APIC bus frequency when queried
  KVM: selftests: elf: Include <endian.h> instead of <bits/endian.h>
  KVM: s390: Properly reset zero bit in PGSTE
  KVM: s390: vsie: Fix redundant rmap entries
  KVM: s390: vsie: Fix unshadowing logic
  KVM: s390: Fix leaking kvm_s390_mmu_cache in case of errors
  KVM: s390: vsie: Fix memory leak when unshadowing
  KVM: arm64: Fix nVHE/pKVM hyp tracing error on invalid desc
  KVM: arm64: vgic: Free private_irqs when init fails after allocation
  KVM: arm64: vgic-its: Reject restored DTE with out-of-range num_eventid_bits
  RISC-V: KVM: Fix sign extension for MMIO loads
  RISC-V: KVM: Fix NULL pointer dereference in SBI v0.1 SEND_IPI handler
  riscv: kvm: return SBI_ERR_FAILURE for pmu_event_info() when OOM
  riscv: kvm: return SBI_ERR_FAILURE for pmu_snapshot_set_shmem() when OOM
  RISC-V: KVM: Fix invalid HVA warning in steal-time recording
2026-05-24 12:50:36 -07:00
Linus Torvalds f0e77c598e bpf-fixes
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmoRvpcACgkQ6rmadz2v
 bTow3w/+L3PVujliBpziQFTnHJ2SoTiwUoyjJpsQOc2o1yDEWJXvTcnQ5EURah4t
 aDMjPBBWgDea7HHWvC/vbRf2D8AjCf3gZBBpzW6uTQ2F1whD3DRCZ4O6XPvfdJEQ
 R0JkqZyjBjH9fkKBy30PcF+XM9iJ5pY/mkx6nCrcYvsvbj5cIkZnmP03vBGh1jeI
 yanlYb6N2XHwQp98PKoiN4/BP4ZOQx2HhBX0TmhTcRXVAyyX5SQy4ukrp1y2CSji
 YjpM2qHdEMtMeFFwcy1K2hJwNbjhrvfgHaKbwSuM3eLjug2AMBX0zp/4Zvw7mb2o
 B6zMRo0UgOt+kJzunmqnfNe01YZ+Z+So+FkinLSTba91gwCgxa3Qm3gNsZBtxv5V
 ayrrrFoB1PCxsRJqC0Jio7WXY1JRUkusHOdzR/8pygmwcp+vy6XEzJwhGD+DeMcu
 T4VJj2bp1bCK4iZwqjyxNAoniYSIjwxzwVDw8s0Zz1Bk+92YJEnZatahFTYFzJRK
 G9hnJaht0dK960LnudBUwKXz37dvM3LxAAt0ckAepfHAOwwrdB5XhgLQjfPZejot
 J6FWsxVoS1L+lXV7104QPy2Y9zmJ7ElOzQHWRcoBWs7Srar1a+PUFD0nkuSKmPcu
 7P3ukMr6NyekE0zGlOWSZNetlZpdzvUrpuRY2WOIl+sezwCp2xg=
 =04VP
 -----END PGP SIGNATURE-----

Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Pull bpf fixes from Alexei Starovoitov:

 - Fix bpf_throw() and global subprog combination (Kumar Kartikeya
   Dwivedi)

 - Fix out of bounds access in BPF interpreter (Yazhou Tang)

 - Fix potential out of bounds access in inner per-cpu array map
   (Guannan Wang)

 - Reject NULL data/sig in bpf_verify_pkcs7_signature (KP Singh)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  libbpf: fix off-by-one in emit_signature_match jump offset
  bpf: Reject NULL data/sig in bpf_verify_pkcs7_signature
  selftests/bpf: Cover global subprog exception leaks
  bpf: Check global subprog exception paths
  bpf: make bpf_session_is_return() reference optional
  bpf: Use array_map_meta_equal for percpu array inner map replacement
  selftests/bpf: Add test for large offset bpf-to-bpf call
  bpf: Fix s16 truncation for large bpf-to-bpf call offsets
  bpf: Fix out-of-bounds read in bpf_patch_call_args()
2026-05-24 09:53:17 -07:00
Linus Torvalds ab868c1097 RDMA v7.1 first rc window
- syzbot triggred crash in rxe due to concurrent plug/unplug
 
 - Possible non-zero'd memory exposed to userspace in bnxt_re
 
 - Malicous 'magic packet' with SIW causes a buffer overflow
 
 - Tighten the new uAPI validation code to not crash in debugging prints
   and have the right module dependencies in drivers
 
 - mana was missing the max_msg_sz report to userspace
 
 - UAF in rtrs on an error path
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCahDmpgAKCRCFwuHvBreF
 YRRTAP0YXV95Y5gcli55IhetKjJUzQbaREz2NueqIpf1IorMbAD+Lns4DgZCU0KW
 bC81x7cGHBSyCju9zogIdBFJhsbxeQ4=
 =MxJz
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:

 - syzbot triggred crash in rxe due to concurrent plug/unplug

 - Possible non-zero'd memory exposed to userspace in bnxt_re

 - Malicous 'magic packet' with SIW causes a buffer overflow

 - Tighten the new uAPI validation code to not crash in debugging prints
   and have the right module dependencies in drivers

 - mana was missing the max_msg_sz report to userspace

 - UAF in rtrs on an error path

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/rtrs: Fix use-after-free in path file creation cleanup
  RDMA/mana_ib: Report max_msg_sz in mana_ib_query_port
  RDMA/core: Do not read wild stack memory in uverbs_get_handler_fn()
  RDMA/core: Move the _ib_copy_validate_udata* functions to ib_core_uverbs
  RDMA/siw: Reject MPA FPDU length underflow before signed receive math
  RDMA/bnxt_re: zero shared page before exposing to userspace
  selftests/rdma: explicitly skip tests when required modules are missing
  RDMA/nldev: Add mutual exclusion in nldev_dellink()
2026-05-23 07:17:27 -07:00
Sean Christopherson d9c41dc531 KVM: selftests: Verify that KVM returns the configured APIC cycle length
Add checks in the APIC bus clock test to verify that querying
KVM_CAP_X86_APIC_BUS_CYCLES_NS on the VM after changing the frequency
returns the VM's actual APIC cycle length, not KVM's default.  For
giggles, verify that KVM still returns its default frequency for the
system-scoped check.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20260522173526.3539407-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2026-05-23 10:07:40 +02:00
Hisam Mehboob 2d42c7cf1a KVM: selftests: elf: Include <endian.h> instead of <bits/endian.h>
<bits/endian.h> is a glibc-internal header that explicitly states it
should never be included directly:

  #error "Never use <bits/endian.h> directly; include <endian.h> instead."

Replace it with the correct public header <endian.h> which works on
all C libraries including musl. Building KVM selftests with musl-gcc
fails with:

  lib/elf.c:10:10: fatal error: bits/endian.h: No such file or directory

Fixes: 6089ae0bd5 ("kvm: selftests: add sync_regs_test")
Signed-off-by: Hisam Mehboob <hisamshar@gmail.com>
Message-ID: <20260409164020.1575176-4-hisamshar@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2026-05-23 10:05:26 +02:00
Ilya Maximets 2e43b64248 selftests: net: add a test case for nsid in all nsid notifications
The test subscribes to link events from all namespaces and makes
sure that local events do not carry NSID in their ancillary data
(even if there is a self-referential NSID allocated for the local
namespace), and remote events do.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://patch.msgid.link/20260520172317.175168-5-i.maximets@ovn.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-22 17:11:09 -07:00
Jiayuan Chen a40aaaef2f selftests: netfilter: add nft_fib_nexthop test
Functional coverage of nft_fib6_eval()'s nexthop enumeration over
three route shapes:

  1) single external nexthop (nhid)
  2) external nexthop group (nhid -> group)
  3) old-style multipath (nexthop ... nexthop ...)

Each scenario places one nexthop on the input device (veth0). For
(2) and (3) the matching nexthop is the second member, so the walk
has to traverse beyond the primary nh. Two nft counters on prerouting
verify the data path: one increments only when fib reports veth0 as
the oif, the other counts "missing" results and must stay at zero.

  ./nft_fib_nexthop.sh
  PASS: single external nexthop (nhid -> veth0)
  PASS: nexthop group (dummy0 + veth0)
  PASS: old-style multipath (sibling on veth0)

Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Signed-off-by: Florian Westphal <fw@strlen.de>
2026-05-22 12:28:46 +02:00
Linus Torvalds 68993ced0f Including fixes from Bluetooth, wireless and netfilter.
Current release - fix to a fix:
 
  - Bluetooth: btmtk: accept too short WMT FUNC_CTRL events
 
  - vsock/virtio: relax the recently added memory limit a little
 
 Current release - regressions:
 
  - IB/IPoIB: make sure IB drivers always use async set_rx_mode since
    some (mlx5) are now required to use it due to locking changes
 
 Previous releases - regressions:
 
  - udp: fix UDP length on last GSO_PARTIAL segment
 
  - af_unix: fix UAF read of tail->len in unix_stream_data_wait()
 
  - tcp: fix stale per-CPU tcp_tw_isn leak enabling ISN prediction
 
  - mlx5e: fix unlocked writing to ICOSQ, breaking AF_XDP
 
 Previous releases - always broken:
 
  - tap: fix stack info leak in tap_ioctl() SIOCGIFHWADDR
 
  - ipv4: raw: reject IP_HDRINCL packets with ihl < 5
 
  - Bluetooth: a lot of locking and concurrency fixes (as always)
 
  - batman-adv (mesh wireless networking): a lot of random fixes
    for issues reported by security researchers and Sashiko
 
  - netfilter: same thing, a lot of small security-ish fixes
    all over the place, nothing really stands out
 
 Misc:
 
  - bring back the old 3c509 driver, Maciej wants to maintain it
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmoPULUACgkQMUZtbf5S
 IrtAvA//bfjxxazZKkGqL8mp6uMYS5Su81Oh/pBcyEWC7q2xv3ftNp5pt8oCTWYP
 eryKi7XrxfNHrkFcmnH+aWQ431UekZLfAjrSd+5V0YvE1nQDnKrgbat5qx2SYSsr
 ZA7EYnJjvAtPMb0KqUJlYPMSfVdFA0H3gEOdnawkGRnizkKNO5NsNRkC4rHzpCil
 hzW5SCTZWQ0r1Cm3IxcTnSCJEOYRqH0BUBbiSRFCWNMZZpq0xKi3UiJFOdgRvqgc
 VoPz6sMRPxZyL8gW8i2jJVz6vj2yuWifJwbl8y3ZkqJJy4HvNXfcPIBH5+vBIWlB
 hWMuYlUv5F0w+h4+UKeDr789Tdpv12edUIDX+prbsJ8c4bXmBflt069HlFjG9Pto
 /k2e5owR0NYSaLt4WvAM6Tr5j1ralzQjHKVDg8JbPaAD+0dtb+e3dXE8J3MBPrw6
 EWtdg9jX+vqsbVoHwMQO9Xp2waNY9+97L07w+I0nVf7NLJvrvz0lkSjMKfNPNyV1
 C5W7McAbSOx3nJ+XzYwMoVK0wP9OunKA73EhAoEdvQSyOGLqQT+iZzDoTMnwKJFs
 2L3fbc8LQ10WBG2B2rCPB/gaGQ1ZZD8uSlZoS9N31dvUPFDaCnCYgKIze/pdcE/R
 KOQskME2xd61KzpYlJszkrjJIbnppkNt/mBvvfNUP+zJZPFRyuA=
 =ei7U
 -----END PGP SIGNATURE-----

Merge tag 'net-7.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from Bluetooth, wireless and netfilter.

  Craziness continues with no end in sight. Even discounting the driver
  revert this is a pretty huge PR for standards of the previous era. I'd
  speculate - we haven't seen the worst of it, yet. Good news, I guess,
  is that so far we haven't seen many (any?) cases of "AI reported a
  bug, we fixed it and a real user regressed".

  Current release - fix to a fix:

   - Bluetooth: btmtk: accept too short WMT FUNC_CTRL events

   - vsock/virtio: relax the recently added memory limit a little

  Current release - regressions:

   - IB/IPoIB: make sure IB drivers always use async set_rx_mode since
     some (mlx5) are now required to use it due to locking changes

  Previous releases - regressions:

   - udp: fix UDP length on last GSO_PARTIAL segment

   - af_unix: fix UAF read of tail->len in unix_stream_data_wait()

   - tcp: fix stale per-CPU tcp_tw_isn leak enabling ISN prediction

   - mlx5e: fix unlocked writing to ICOSQ, breaking AF_XDP

  Previous releases - always broken:

   - tap: fix stack info leak in tap_ioctl() SIOCGIFHWADDR

   - ipv4: raw: reject IP_HDRINCL packets with ihl < 5

   - Bluetooth: a lot of locking and concurrency fixes (as always)

   - batman-adv (mesh wireless networking): a lot of random fixes for
     issues reported by security researchers and Sashiko

   - netfilter: same thing, a lot of small security-ish fixes all over
     the place, nothing really stands out

  Misc:

   - bring back the old 3c509 driver, Maciej wants to maintain it"

* tag 'net-7.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (187 commits)
  net: enetc: avoid VF->PF mailbox timeout during SR-IOV teardown
  net: enetc: fix init and teardown order to prevent use of unsafe resources
  net: enetc: fix unbounded loop and interrupt handling in VF-to-PF messaging
  net: enetc: fix DMA write to freed memory in enetc_msg_free_mbx()
  net: enetc: fix race condition in VF MAC address configuration
  net: enetc: fix TOCTOU race and validate VF MAC address
  net: enetc: add ratelimiting to VF mailbox error messages
  net: enetc: fix missing error code when pf->vf_state allocation fails
  net: enetc: fix incorrect mailbox message status returned to VFs
  net: bridge: prevent too big nested attributes in br_fill_linkxstats()
  l2tp: use list_del_rcu in l2tp_session_unhash
  net: bcmgenet: keep RBUF EEE/PM disabled
  ethernet: 3c509: Fix most coding style issues
  ethernet: 3c509: Update documentation to match MAINTAINERS
  ethernet: 3c509: Add GPL 2.0 SPDX license identifier
  ethernet: 3c509: Fix AUI transceiver type selection
  Revert "drivers: net: 3com: 3c509: Remove this driver"
  tools: ynl: support listening on all nsids
  net: gro: don't merge zcopy skbs
  pds_core: ensure null-termination for firmware version strings
  ...
2026-05-21 14:39:12 -07:00
Nimrod Oren dfc0770433 selftests: net: Fix checksums in xdp_native
Data adjustment cases failed with "Data exchange failed" when using IPv4
because the program did not update the IP and UDP checksums in the IPv4
branch. The issue was masked when both IPv4 and IPv6 were configured,
since the test harness prefers IPv6.

While here, generalize csum_fold_helper() to fold twice so it works for
any 32-bit input.

Fixes: 0b65cfcef9 ("selftests: drv-net: Test tail-adjustment support")
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Nimrod Oren <noren@nvidia.com>
Link: https://patch.msgid.link/20260520153928.3371765-1-noren@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-21 07:47:00 -07:00
Matthieu Baerts (NGI0) 92cc6708f4 selftests: rds: config: disable modules
The run.sh script explicitly checks that CONFIG_MODULES is disabled.

By default, this config option is enabled. Explicitly disable it to be
able to run the RDS tests.

Note that writing '# CONFIG_(...) is not set' is usually recommended to
disable an option in the .config, but it looks like selftests usually
set 'CONFIG_(...)=n', which looks clearer.

Fixes: 0f5d680047 ("selftests: rds: add tools/testing/selftests/net/rds/config")
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Allison Henderson <achender@kernel.org>
Link: https://patch.msgid.link/20260520-net-rds-config-modules-v1-1-2100df02fe9a@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-21 07:39:20 -07:00
Xingwang Xiang 33644bd38a selftests/bpf: add regression test for ktls+sockmap verdict UAF
Test the scenario where a socket is inserted into a sockmap with a
BPF_SK_SKB_VERDICT program before TLS RX is configured.  Previously
sk_psock_verdict_data_ready() would call tcp_read_skb() and drain the
receive queue without advancing copied_seq, causing tls_decrypt_sg()
to walk a dangling frag_list pointer (use-after-free).

The test drives the full vulnerable sequence and verifies that after
the fix recv() returns the correct decrypted data.

Signed-off-by: Xingwang Xiang <v3rdant.xiang@gmail.com>
Link: https://patch.msgid.link/20260517145630.20521-3-v3rdant.xiang@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-20 17:23:56 -07:00
KP Singh 49b18315be
bpf: Reject NULL data/sig in bpf_verify_pkcs7_signature
__bpf_dynptr_data() can return NULL (FILE dynptrs, any non-contiguous
backing). bpf_verify_pkcs7_signature() forwards the pointer to
verify_pkcs7_signature() unchecked, causing a NULL deref in
asn1_ber_decoder() reachable from a sleepable BPF LSM at lsm.s/bpf.

NULL-check both pointers and reject with -EINVAL. Mirrors the guards
already in kernel/bpf/crypto.c.

Fixes: 865b0566d8 ("bpf: Add bpf_verify_pkcs7_signature() kfunc")
Reported-by: Xianrui Dong <dongxianrui1@gmail.com>
Signed-off-by: KP Singh <kpsingh@kernel.org>
Reviewed-by: Amery Hung <ameryhung@gmail.com>
Acked-by: Song Liu <song@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20260520024059.313468-1-kpsingh@kernel.org
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
2026-05-20 05:12:05 +02:00
Ido Schimmel ae743a8ca8 selftests: bridge_vlan_mcast: Test toggling of multicast snooping
Test toggling of multicast snooping when per-VLAN multicast snooping is
enabled. The test always passes, but without "bridge: mcast: Fix
possible use-after-free when removing a bridge port" it results in a
splat.

Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260517121122.188333-3-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-19 18:15:22 -07:00
Linus Torvalds c6e99c10fd 14 hotfixes. 9 are for MM. 10 are cc:stable and the remainder are for
post-7.1 issues or aren't deemed suitable for backporting.
 
 There's a 2 patch MAINTAINERS series from Mike Rapoport which updates us
 for the new KEXEC/KDUMP/crash/LUO/etc arrangements.  And a 2 patch series
 from Muchun Song to fix a couple of memory-hotplug issues.  Otherwise
 singletons, please see the changelogs for details.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCagviKQAKCRDdBJ7gKXxA
 jlbsAP9SEShHxXEYcRMVQtXb+8/iJDe7J3KwVDP4e0VOlQKTPAD/c+C2bx4nllOG
 77wl9Qkr++KqTSmoPbzA7Q02gJC2ngQ=
 =2qN3
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2026-05-18-21-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "14 hotfixes. 9 are for MM. 10 are cc:stable and the remainder are for
  post-7.1 issues or aren't deemed suitable for backporting.

  There's a two-patch MAINTAINERS series from Mike Rapoport which
  updates us for the new KEXEC/KDUMP/crash/LUO/etc arrangements. And
  another two-patch series from Muchun Song to fix a couple of
  memory-hotplug issues. Otherwise singletons, please see the changelogs
  for details"

* tag 'mm-hotfixes-stable-2026-05-18-21-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mm/memory: fix spurious warning when unmapping device-private/exclusive pages
  mm: fix __vm_normal_page() to handle missing support for pmd_special()/pud_special()
  drivers/base/memory: fix memory block reference leak in poison accounting
  mm/memory_hotplug: fix memory block reference leak on remove
  lib: kunit_iov_iter: fix test fail on powerpc
  mm/page_alloc: fix initialization of tags of the huge zero folio with init_on_free
  MAINTAINERS: add kexec@ list to LIVE UPDATE ENTRY
  MAINTAINERS: add tree for KDUMP and KEXEC
  selftests/mm: run_vmtests.sh: fix destructive tests invocation
  scripts/gdb: slab: update field names of struct kmem_cache
  scripts/gdb: mm: cast untyped symbols in x86_page_ops
  mm/damon: fix damos_stat tracepoint format for sz_applied
  mm/damon/sysfs-schemes: call missing mem_cgroup_iter_break()
  mm/migrate_device: fix spinlock leak in migrate_vma_insert_huge_pmd_page
2026-05-19 07:49:33 -07:00
Matthieu Baerts (NGI0) 01ff78e4b3 selftests: mptcp: drop nanoseconds width specifier
Using the format specifier +%s%3N with GNU date is honoured, and only
prints 3 digits of the nanoseconds portion of the seconds since epoch,
which corresponds to the milliseconds.

The uutils implementation of date currently does not honour this, and
always prints all 9 digits. This is a known issue [1], but can be worked
around by adapting this test to use nanoseconds instead of microseconds,
and then divide it by 1e6.

This fix is similar to what has been done on systemd side [2], and it is
needed to run the selftests on Ubuntu 26.04, containing uutils 0.8.0.

Note that the Fixes tag is there even if this patch doesn't fix an issue
in the kernel selftests, but it is useful for those using uutils 0.8.0.

Fixes: 048d19d444 ("mptcp: add basic kselftest for mptcp")
Cc: stable@vger.kernel.org
Link: https://github.com/uutils/coreutils/issues/11658 [1]
Link: https://github.com/systemd/systemd/pull/41627 [2]
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260515-net-mptcp-misc-fixes-7-1-rc4-v2-6-701e96419f2f@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-05-19 15:36:35 +02:00
Li Xiasong fc5ef43318 selftests: mptcp: join: cover ADD_ADDR tx drop and list progress
Extend add_addr_ports_tests with IPv6 signaling cases that exercise
ADD_ADDR tx-space shortage when tcp_timestamps are enabled.

Add one case to verify PM still progresses to later signal endpoints
after the first one is dropped.

This covers both failure accounting and the non-blocking behavior of
the announce list after a tx-space drop on pure ACK.

Signed-off-by: Li Xiasong <lixiasong1@huawei.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260515-net-mptcp-misc-fixes-7-1-rc4-v2-3-701e96419f2f@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-05-19 15:36:35 +02:00
Paolo Abeni 956aa1ec97 Included fixes:
* fix TCP selftest failures by reducing number of attempted pings
 * fix RCU ptr deref outside of RCU read section
 * fix UAF in case of TCP peer failed to be added to hashtable
 * fix race condition between iface teardown and new peer being added
 * ensure dstats are updated with BH disabled to avoid concurrency
 -----BEGIN PGP SIGNATURE-----
 
 iJEEABYIADkWIQQKU153ubb5unbkl6Gx/ZpNW1HNdwUCagZUrRsUgAAAAAAEAA5t
 YW51MiwyLjUrMS4xMiwyLDIACgkQsf2aTVtRzXcJ3gEAv0uFO2tjJha8KppS8Bpa
 3asGcAcFP686hMYpm5MjrmIA+wcWETrqwtzf1YcfjKyDCVNGVwrHSyQl5dIGNwGD
 nQgN
 =WFT+
 -----END PGP SIGNATURE-----

Merge tag 'ovpn-net-20260514' of https://github.com/OpenVPN/ovpn-net-next

Antonio Quartulli says:

====================
Included fixes:
* fix TCP selftest failures by reducing number of attempted pings
* fix RCU ptr deref outside of RCU read section
* fix UAF in case of TCP peer failed to be added to hashtable
* fix race condition between iface teardown and new peer being added
* ensure dstats are updated with BH disabled to avoid concurrency

* tag 'ovpn-net-20260514' of https://github.com/OpenVPN/ovpn-net-next:
  ovpn: disable BHs when updating device stats
  ovpn: fix race between deleting interface and adding new peer
  ovpn: respect peer refcount in CMD_NEW_PEER error path
  ovpn: tcp - use cached peer pointer in ovpn_tcp_close()
  selftests: ovpn: reduce remaining ping flood counts
====================

Link: https://patch.msgid.link/20260514231544.795993-1-antonio@openvpn.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-05-19 13:51:08 +02:00
Kumar Kartikeya Dwivedi 511a5db3c9 selftests/bpf: Cover global subprog exception leaks
Add a verifier failure case where the caller holds a reference across a
global subprog call that may throw. The program must be rejected because
the exceptional path would skip the caller's reference release.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260517075530.3461166-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-05-17 11:15:05 -07:00
Linus Torvalds d458a24034 block-7.1-20260515
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmoHJWMQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgprUfD/0dTUuuHPbTqqLYgAIbkoKlA9XqG4OOGQnI
 C3Y/3o/GNneRbNrzH3Pi1y9A20s55Xa6f2nh+EmQKiT03dEz5FLhF9jRdpL/uajL
 9pnyG9WgqKX8qUqPKsot9f1i6Sp9BRJJBoQYgh8qabuR2EOXGiE7Y72ODtt0JJv2
 Jfl7RH7g1PaGPgG+Bm8X93WxwXytsZPsaB58VJc3iuPP1KSeAMOiJB8yGrq3fGpK
 41glIcaML1SxobctCPhC8f2Emek+jShmbXpGvnBpRxx96snzIucxjQlrVifLnufH
 S+cdY53rnyQo0CRtU2zfhDsQLRxVvgKZHxnSIT5CXe2/yWS5U+Wa7iCMQgUxWhv4
 yBD7dyZ/W5+U6jRGJtC/IzYGcyiH90XuKDyG5eBy/D7VMzCBxL3If+YEibmEvE9M
 e8PnrnyFHyGxe9mWUCG+rMMRySFmTqscS/bH8my4utJ2bA/F7e87KGuIrMXOUwtr
 S3AzurvUsZJOfiFkh74ly3C9WhEIFo852giM2SiKa5FAgvTaZwHkMybLB7KxtETZ
 GPahd/CKg4RIaoi89hfQ8iY+mNLjykEHdap6y/kCSeOObHGr/KR3DQ7rsePYhC5L
 3EV+Laz8qgrlFglkGhcaDlJGLe0wKnsgJf3HUcA53lNNZjfNq0eCN+aGGfRCYVNm
 5D5IKT/oYg==
 =6rcw
 -----END PGP SIGNATURE-----

Merge tag 'block-7.1-20260515' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux

Pull block fixes from Jens Axboe:

 - NVMe merge request via Keith:
     - Fix memory leak on a passthrough integrity mapping failure (Keith)
     - Hide secrets behind debug option (Hannes)
     - Fix pci use-after-free for host memory buffer (Chia-Lin Kao)
     - Fix tcp taregt use-after-free for data digest (Sagi)
     - Revert a mistaken quirk (Alan Cui)
     - Fix uevent and controller state race condition (Maurizio)
     - Fix apple submission queue re-initialization (Nick Chan)

 - Three fixes for blk-integrity, fixing an issue with the user data
   mapping and two problems with recomputing number of segments

 - Two fixes for the iov_iter bounce buffering

 - Fix for the handling of dead zoned write plugs

 - ublk max_sectors validation fix, with associated selftest addition

* tag 'block-7.1-20260515' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
  nvme-apple: Reset q->sq_tail during queue init
  block: align down bounces bios
  block: pass a minsize argument to bio_iov_iter_bounce
  selftests: ublk: cap nthreads to kernel's actual nr_hw_queues
  block: fix handling of dead zone write plugs
  block: bio-integrity: Fix null-ptr-deref in bio_integrity_map_user()
  block: recompute nr_integrity_segments in blk_insert_cloned_request
  block: don't overwrite bip_vcnt in bio_integrity_copy_user()
  nvme: fix race condition between connected uevent and STARTED_ONCE flag
  Revert "nvme: add quirk NVME_QUIRK_IGNORE_DEV_SUBNQN for 144d:a808"
  nvmet-tcp: Fix potential UAF when ddgst mismatch
  nvme-pci: fix use-after-free in nvme_free_host_mem()
  nvmet-auth: Do not print DH-HMAC-CHAP secrets
  nvme: fix bio leak on mapping failure
  nvme: make prp passthrough usage less scary
  ublk: reject max_sectors smaller than PAGE_SECTORS in parameter validation
2026-05-15 12:47:00 -07:00
Linus Torvalds 66182ca873 Including fixes from netfilter.
Previous releases - regressions:
 
   - ethtool: fix NULL pointer dereference in phy_reply_size
 
   - netfilter:
     - allocate hook ops while under mutex
     - close dangling table module init race
     - restore nf_conntrack helper propagation via expectation
 
   - tcp:
     - fix potential UAF in reqsk_timer_handler().
     - fix out-of-bounds access for twsk in tcp_ao_established_key().
 
   - vsock: fix empty payload in tap skb for non-linear buffers
 
   - hsr: fix NULL pointer dereference in hsr_get_node_data()
 
   - eth: cortina: fix RX drop accounting
 
   - eth: ice: fix locking in ice_dcb_rebuild()
 
 Previous releases - always broken:
 
   - napi: avoid gro timer misfiring at end of busypoll
 
   - sched:
     - dualpi2: initialize timer earlier in dualpi2_init()
     - sch_cbs: Call qdisc_reset for child qdisc
 
   - shaper:
     - fix ordering issue in net_shaper_commit()
     - reject handle IDs exceeding internal bit-width
 
   - ipv6: flowlabel: enforce per-netns limit for unprivileged callers
 
   - tls: fix off-by-one in sg_chain entry count for wrapped sk_msg ring
 
   - smc: avoid NULL deref of conn->lnk in smc_msg_event tracepoint
 
   - sctp: revalidate list cursor after sctp_sendmsg_to_asoc() in SCTP_SENDALL
 
   - batman-adv:
     - reject new tp_meter sessions during teardown
     - purge non-released claims
 
   - eth: i40e: cleanup PTP registration on probe failure
 
   - eth: idpf: fix double free and use-after-free in aux device error paths
 
   - eth: ena: fix potential use-after-free in get_timestamp
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmoF2xoSHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOk38gP/0aAvFK8YZiEbQ/XtxEHc/SQaOX0EmRI
 6TBAABW+pmf3YNIuMcFbMc4WdTDt2TDfqPONG70reqW/rut4e6kr8qGjY1ABEFYx
 BdMSafs4hxiychSHVWACygSHew8PblfKGkIgmDqGmRWJCHaCRs49m64nhZV3k5mO
 Q1s/O6JGDWmI1Z5/9UNgHefkQnfTRbqiKKFpYFT01yfxpjlqJyfGyXcDZqGS29n0
 f0xfCRrHvoA2DIkyVVZxjLHtqqvTfNyFRdvw+pBEuGvdmjiGkxUoHkHRuwlarJt3
 Ry2QDTakL2qRCrEzcSCbXaDKBcFKkpcQG22QYpl4yYpKR6JKZ+2G0inin1oJ3L77
 OpwY033ksPhdlegybINL5yN98P9Jq+jO3HYJbH9Z/1oInpgrAQJjAiS0LoK+qm37
 D3eag9Qsw+svdZ6HSMjUymS9GdKwShrya7YE8K8IVrTWWeGzUW/uw1hxcdr/5/ZC
 olJRa3aN80YhwlvHkkhOGvZZY1Xz2Vtds6uCY3zh/nFDoYlmJs08C9v7UE5/NH4x
 KpOY6nfa3RBl87ILFpzqpzP5fjHE1NbOJIfBCFdHpvhDWU7yCfmzhRBenrfXpnt9
 9teEalnA3jILlTXanLGxEhWFcMiSk2D+/sHYWYdGUth62YJpP6GkiUQK5OZPHv2x
 zDl02XCxq6Ag
 =l6fA
 -----END PGP SIGNATURE-----

Merge tag 'net-7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from netfilter.

  Previous releases - regressions:

   - ethtool: fix NULL pointer dereference in phy_reply_size

   - netfilter:
      - allocate hook ops while under mutex
      - close dangling table module init race
      - restore nf_conntrack helper propagation via expectation

   - tcp:
      - fix potential UAF in reqsk_timer_handler().
      - fix out-of-bounds access for twsk in tcp_ao_established_key().

   - vsock: fix empty payload in tap skb for non-linear buffers

   - hsr: fix NULL pointer dereference in hsr_get_node_data()

   - eth:
      - cortina: fix RX drop accounting
      - ice: fix locking in ice_dcb_rebuild()

  Previous releases - always broken:

   - napi: avoid gro timer misfiring at end of busypoll

   - sched:
      - dualpi2: initialize timer earlier in dualpi2_init()
      - sch_cbs: Call qdisc_reset for child qdisc

   - shaper:
      - fix ordering issue in net_shaper_commit()
      - reject handle IDs exceeding internal bit-width

   - ipv6: flowlabel: enforce per-netns limit for unprivileged callers

   - tls: fix off-by-one in sg_chain entry count for wrapped sk_msg ring

   - smc: avoid NULL deref of conn->lnk in smc_msg_event tracepoint

   - sctp: revalidate list cursor after sctp_sendmsg_to_asoc() in SCTP_SENDALL

   - batman-adv:
      - reject new tp_meter sessions during teardown
      - purge non-released claims

   - eth:
      - i40e: cleanup PTP registration on probe failure
      - idpf: fix double free and use-after-free in aux device error paths
      - ena: fix potential use-after-free in get_timestamp"

* tag 'net-7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (88 commits)
  net: phy: DP83TC811: add reading of abilities
  net: tls: prevent chain-after-chain in plain text SG
  net: tls: fix off-by-one in sg_chain entry count for wrapped sk_msg ring
  net/smc: reject CHID-0 ACCEPT that matches an empty ism_dev slot
  macsec: use rcu_work to defer TX SA crypto cleanup out of softirq
  macsec: use rcu_work to defer RX SA crypto cleanup out of softirq
  macsec: introduce dedicated workqueue for SA crypto cleanup
  net: net_failover: Fix the deadlock in slave register
  MAINTAINERS: update atlantic driver maintainer
  selftests/tc-testing: Add QFQ/CBS qlen underflow test
  net/sched: sch_cbs: Call qdisc_reset for child qdisc
  FDDI: defza: Sanitise the reset safety timer
  net: ethernet: ravb: Do not check URAM suspension when WoL is active
  ethtool: fix ethnl_bitmap32_not_zero() bit interval semantics
  net/smc: avoid NULL deref of conn->lnk in smc_msg_event tracepoint
  net/smc: fix sleep-inside-lock in __smc_setsockopt() causing local DoS
  net: atm: fix skb leak in sigd_send() default branch
  net: ethtool: phy: avoid NULL deref when PHY driver is unbound
  net: atlantic: preserve PCI wake-from-D3 on shutdown when WOL enabled
  net: shaper: reject QUEUE scope handle with missing id
  ...
2026-05-14 08:57:43 -07:00
Guannan Wang 5939801753 bpf: Use array_map_meta_equal for percpu array inner map replacement
percpu_array_map_ops.map_meta_equal points to the generic
bpf_map_meta_equal(), which does not compare max_entries.  When a
percpu array serves as an inner map, replacing it with one that has
fewer max_entries bypasses the check.  Since percpu_array_map_gen_lookup()
inlines the original template's index_mask as a JIT immediate, a lookup
on the replacement map can access pptrs[] out of bounds.

Point percpu_array_map_ops.map_meta_equal to array_map_meta_equal(),
which already enforces the max_entries equality check.

Add a selftest to verify that replacing a percpu array inner map with
a differently-sized one is rejected.

Fixes: db69718b8e ("bpf: inline bpf_map_lookup_elem() for PERCPU_ARRAY maps")
Signed-off-by: Guannan Wang <wgnbuaa@gmail.com>
Acked-by: Mykyta Yatsenko <yatsenko@meta.com>
Link: https://lore.kernel.org/r/20260514074454.77491-1-wgnbuaa@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-05-14 08:18:50 -07:00
Ralf Lici 7f11449778 selftests: ovpn: reduce remaining ping flood counts
Commit 201ba70631 ("selftests: ovpn: reduce ping count in test.sh")
lowered the baseline traffic flood ping count to avoid flakes on slower
CI instances, however some instances were left out.

Apply the same limit to the remaining ovpn selftest flood pings that
still request 500 packets.

Fixes: 201ba70631 ("selftests: ovpn: reduce ping count in test.sh")
Signed-off-by: Ralf Lici <ralf@mandelbit.com>
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
2026-05-14 16:24:45 +02:00
Victor Nogueira 59afae2008 selftests/tc-testing: Add QFQ/CBS qlen underflow test
Since CBS was not calling reset for its child qdisc, there are scenarios
where it could cause an underflow on its parent's qlen/backlog. When the
parent is QFQ, a null-ptr deref could occur.

Add a test case that reproduces the underflow followed by a null-ptr
deref scenario.

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-13 17:53:39 -07:00
Alistair Popple be3f38d05c mm/memory: fix spurious warning when unmapping device-private/exclusive pages
Device private and exclusive entries are only supported for anonymous
folios.  This condition is tested in __migrate_device_pages() and
make_device_exclusive() using folio_test_anon().  However the unmap path
tests this assumption using vma_is_anonymous().

This is wrong because whilst anonymous VMAs can only contain folios where
folio_test_anon() is true the opposite relation does not hold.  A folio
for which folio_test_anon() is true does not imply vma_is_anonymous() is
true.  Such a condition can occur if for example a folio is part of a
private filebacked mapping.

In this case vma_is_anonymous() is false as the mapping is filebacked, but
folio_test_anon() may be true, thus permitting devices to migrate the
folio to device private memory.  This can lead to the following spurious
warnings during process teardown:

[  772.737706] ------------[ cut here ]------------
[  772.739201] WARNING: mm/memory.c:1754 at unmap_page_range.cold+0x26/0x18a, CPU#17: hmm-tests/2041
[  772.742050] Modules linked in: test_hmm nvidia_uvm(O) nvidia(O)
[  772.743959] CPU: 17 UID: 0 PID: 2041 Comm: hmm-tests Tainted: G        W  O        7.0.0+ #387 PREEMPT(full)
[  772.747104] Tainted: [W]=WARN, [O]=OOT_MODULE
[  772.748509] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
[  772.752117] RIP: 0010:unmap_page_range.cold+0x26/0x18a
[  772.753780] Code: 7e fe ff ff 48 89 4c 24 78 4c 89 44 24 38 e8 f2 ff b1 00 48 8b 4c 24 78 4c 8b 44 24 38 48 8b 44 24 18 48 83 78 48 00 74 04 90 <0f> 0b 90 48 89 ca b8 ff ff 37 00 48 c1 ea 03 48 c1 e0 2a 80 3c 02
[  772.759602] RSP: 0018:ffff888112607550 EFLAGS: 00010286
[  772.761310] RAX: ffff88811bbf4dc0 RBX: dffffc0000000000 RCX: ffffea03e9bfffd8
[  772.763583] RDX: 1ffff1102377e9c1 RSI: 0000000000000008 RDI: ffff88811bbf4e08
[  772.765914] RBP: 0000000000000006 R08: ffff8881059f7448 R09: ffffed10224c0e68
[  772.768184] R10: ffff888112607347 R11: 0000000000000001 R12: 0000000000000001
[  772.770461] R13: ffffea03e9bfffc0 R14: ffff888112607908 R15: ffffea03e9bfffc0
[  772.772782] FS:  00007f327caa2780(0000) GS:ffff888427b7d000(0000) knlGS:0000000000000000
[  772.775328] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  772.777187] CR2: 00007f327ca89000 CR3: 00000001994d5000 CR4: 00000000000006f0
[  772.779135] Call Trace:
[  772.779792]  <TASK>
[  772.780317]  ? dmirror_interval_invalidate+0x1a3/0x290 [test_hmm]
[  772.781873]  ? vm_normal_page_pud+0x2b0/0x2b0
[  772.782992]  ? __rwlock_init+0x150/0x150
[  772.784006]  ? lock_release+0x216/0x2b0
[  772.785008]  ? __mmu_notifier_invalidate_range_start+0x505/0x6e0
[  772.786522]  ? lock_release+0x216/0x2b0
[  772.787498]  ? unmap_single_vma+0xb6/0x210
[  772.788573]  unmap_vmas+0x27d/0x520
[  772.789506]  ? unmap_single_vma+0x210/0x210
[  772.790607]  ? mas_update_gap.part.0+0x620/0x620
[  772.791834]  unmap_region+0x19e/0x350
[  772.792769]  ? remove_vma+0x130/0x130
[  772.793684]  ? mas_alloc_nodes+0x1f2/0x300
[  772.794730]  vms_complete_munmap_vmas+0x8c1/0xe20
[  772.795926]  ? unmap_region+0x350/0x350
[  772.796917]  do_vmi_align_munmap+0x36a/0x4e0
[  772.798018]  ? lock_release+0x216/0x2b0
[  772.799024]  ? vma_shrink+0x620/0x620
[  772.799983]  do_vmi_munmap+0x150/0x2c0
[  772.800939]  __vm_munmap+0x161/0x2c0
[  772.801872]  ? expand_downwards+0xd60/0xd60
[  772.802948]  ? clockevents_program_event+0x1ef/0x540
[  772.804217]  ? lock_release+0x216/0x2b0
[  772.805158]  __x64_sys_munmap+0x59/0x80
[  772.805776]  do_syscall_64+0xfc/0x670
[  772.806336]  ? irqentry_exit+0xda/0x580
[  772.806976]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
[  772.807772] RIP: 0033:0x7f327cbb2717
[  772.808323] Code: 73 01 c3 48 8b 0d f9 76 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 0b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c9 76 0d 00 f7 d8 64 89 01 48
[  772.811337] RSP: 002b:00007ffde7f57d38 EFLAGS: 00000202 ORIG_RAX: 000000000000000b
[  772.812564] RAX: ffffffffffffffda RBX: 00007f327cc9c000 RCX: 00007f327cbb2717
[  772.813733] RDX: 0000000000000000 RSI: 0000000000400000 RDI: 00007f327c289000
[  772.814867] RBP: 0000000000421360 R08: 000000000000001a R09: 0000000000000000
[  772.815991] R10: 0000000000000003 R11: 0000000000000202 R12: 00007ffde7f57d74
[  772.817121] R13: 00007f327c689010 R14: 0000000000100000 R15: 00007f327c289000
[  772.818272]  </TASK>
[  772.818614] irq event stamp: 0
[  772.819159] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
[  772.820174] hardirqs last disabled at (0): [<ffffffff82a57ab3>] copy_process+0x19f3/0x6440
[  772.821511] softirqs last  enabled at (0): [<ffffffff82a57b00>] copy_process+0x1a40/0x6440
[  772.822869] softirqs last disabled at (0): [<0000000000000000>] 0x0
[  772.823871] ---[ end trace 0000000000000000 ]---

Fix this by using the same check for folio_test_anon() in
zap_nonpresent_ptes(). Also add a hmm-test case for this.

Link: https://lore.kernel.org/20260501065116.2057242-1-apopple@nvidia.com
Fixes: 999dad824c ("mm/shmem: persist uffd-wp bit across zapping for file-backed")
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Reported-by: Arsen Arsenović <aarsenovic@baylibre.com>
Reviewed-by: Balbir Singh <balbirs@nvidia.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Liam R. Howlett <liam@infradead.org>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-05-13 17:40:03 -07:00
Luiz Capitulino 3432cbb291 selftests/mm: run_vmtests.sh: fix destructive tests invocation
Destructive tests should be invoked with -d command-line option, but this
won't work today since 'd' is missing in getopts command-line.  This
commit fixes it.

Link: https://lore.kernel.org/214fd9e4-5398-4c26-859e-c982c2e277c3@redhat.com
Fixes: f16ff3b692 ("selftests/mm: run_vmtests.sh: add missing tests")
Signed-off-by: Luiz Capitulino <luizcap@redhat.com>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: SeongJae Park <sj@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Liam R. Howlett <liam@infradead.org>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-05-13 17:40:01 -07:00
Linus Torvalds 59a62ea458 sched_ext: Fixes for v7.1-rc3
Bulk is hardening of the new sub-scheduler infrastructure.
 
 - UAFs and lifecycle bugs on the sub-sched attach/detach paths: parent
   sub_kset freed under a racing child, list_del_rcu on an uninitialized
   list head, ops->priv stomped by concurrent attach/detach, and a UAF in
   the init-failure error path.
 
 - Task state-machine reorg closing concurrent enable-vs-dead races: a
   task exiting during the unlocked init window could trip NULL ops
   derefs or skip exit_task() cleanup.
 
 - A scx_link_sched() self-deadlock on scx_sched_lock.
 
 - isolcpus: stop dereferencing the now-RCU-protected HK_TYPE_DOMAIN
   cpumask without RCU, and stop rejecting BPF schedulers when only
   cpuset isolated partitions are active.
 
 - PREEMPT_RT: disable irq_work runs in hardirq context so dumps show the
   failing task rather than the irq_work kthread.
 
 - Assorted !CONFIG_EXT_SUB_SCHED, randconfig, and selftest build fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCagTk1g4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGT6TAP0ZbRHz9ViligecZXIHjEvZQjEV4sn1NLpGi4og
 V0Ol2AD/RzqHQZo5+HpMz4hPrcZdkAWcr74cLrNTJ2WQjOk4RgE=
 =6Mbx
 -----END PGP SIGNATURE-----

Merge tag 'sched_ext-for-7.1-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext

Pull sched_ext fixes from Tejun Heo:
 "The bulk of this is hardening of the new sub-scheduler infrastructure.

   - UAFs and lifecycle bugs on the sub-sched attach/detach paths:
     parent sub_kset freed under a racing child, list_del_rcu on an
     uninitialized list head, ops->priv stomped by concurrent
     attach/detach, and a UAF in the init-failure error path

   - Task state-machine reorg closing concurrent enable-vs-dead races: a
     task exiting during the unlocked init window could trip NULL ops
     derefs or skip exit_task() cleanup

   - A scx_link_sched() self-deadlock on scx_sched_lock

   - isolcpus: stop dereferencing the now-RCU-protected HK_TYPE_DOMAIN
     cpumask without RCU, and stop rejecting BPF schedulers when only
     cpuset isolated partitions are active

   - PREEMPT_RT: disable irq_work runs in hardirq context so dumps show
     the failing task rather than the irq_work kthread

   - Assorted !CONFIG_EXT_SUB_SCHED, randconfig, and selftest build
     fixes"

* tag 'sched_ext-for-7.1-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
  sched_ext: Use HK_TYPE_DOMAIN_BOOT to detect isolcpus= domain isolation
  sched_ext: Defer sub_kset base put to scx_sched_free_rcu_work
  sched_ext: INIT_LIST_HEAD() &sch->all in scx_alloc_and_add_sched()
  sched_ext: Drop NONE early return in scx_disable_and_exit_task()
  sched_ext: Avoid UAF in scx_root_enable_workfn() init failure path
  sched_ext: Clear ops->priv on scx_alloc_and_add_sched() error paths
  sched_ext: Fix ops->priv clobber on concurrent attach/detach
  selftests/sched_ext: Fix build error in dequeue selftest
  sched_ext: Handle SCX_TASK_NONE in disable/switched_from paths
  sched_ext: Close sub-sched init race with post-init DEAD recheck
  sched_ext: Close root-enable vs sched_ext_dead() race with SCX_TASK_INIT_BEGIN
  sched_ext: Replace SCX_TASK_OFF_TASKS flag with SCX_TASK_DEAD state
  sched_ext: Inline scx_init_task() and move RESET_RUNNABLE_AT into scx_set_task_state()
  sched_ext: Cleanups in preparation for the SCX_TASK_INIT_BEGIN/DEAD work
  sched_ext: Use IRQ_WORK_INIT_HARD() to initialize sch->disable_irq_work
  sched_ext: Fix !CONFIG_EXT_SUB_SCHED build warnings
  sched_ext: Drop unused scx_find_sub_sched() stub
  sched_ext: Move scx_error() out of scx_link_sched()'s lock region
2026-05-13 15:00:40 -07:00
Linus Torvalds 0913b580f8 cgroup: Fixes for v7.1-rc3
- cpuset fixes:
   - Partition invalidation could return CPUs still in use by sibling
     partitions, producing overlapping effective_cpus.
   - cpuset_can_attach() over-reserved DL bandwidth on moves that stayed
     within the same root domain.
   - Pending DL migration state leaked into later attaches when a later
     can_attach() check failed.
   - Reorder PF_EXITING and __GFP_HARDWALL checks so dying tasks can
     allocate from any node and exit quickly.
 
 - dmem: propagate -ENOMEM instead of spinning forever when the fallback
   pool allocation also fails.
 
 - selftests/cgroup: percpu test error-path leak, bogus numeric
   comparison of cpuset strings, and a zero-length read() that silently
   passed OOM-kill tests.
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCagTkzw4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGR+AAQCcYEGJ+yNAzzrTcY8xy7333rorMckSmZt18jzv
 1KSqEQD+KjindGNcWP/meQBPnEjcBjix6i961mgnQ99e/UD2HQ4=
 =4pT3
 -----END PGP SIGNATURE-----

Merge tag 'cgroup-for-7.1-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup fixes from Tejun Heo:

 - cpuset fixes:
     - Partition invalidation could return CPUs still in use by sibling
       partitions, producing overlapping effective_cpus
     - cpuset_can_attach() over-reserved DL bandwidth on moves that
       stayed within the same root domain
     - Pending DL migration state leaked into later attaches when a
       later can_attach() check failed
     - Reorder PF_EXITING and __GFP_HARDWALL checks so dying tasks can
       allocate from any node and exit quickly

 - dmem: propagate -ENOMEM instead of spinning forever when the fallback
   pool allocation also fails

 - selftests/cgroup: percpu test error-path leak, bogus numeric
   comparison of cpuset strings, and a zero-length read() that silently
   passed OOM-kill tests

* tag 'cgroup-for-7.1-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup/cpuset: Return only actually allocated CPUs during partition invalidation
  selftests/cgroup: Fix error path leaks in test_percpu_basic
  cgroup/cpuset: Reserve DL bandwidth only for root-domain moves
  cgroup/cpuset: Reset DL migration state on can_attach() failure
  selftests/cgroup: Fix string comparison in write_test
  selftests/cgroup: Fix cg_read_strcmp() empty string comparison
  cgroup/dmem: Return -ENOMEM on failed pool preallocation
  cgroup/cpuset: move PF_EXITING check before __GFP_HARDWALL in cpuset_current_node_allowed()
2026-05-13 14:56:31 -07:00
Linus Torvalds e1914add27 Arm:
* Add the pKVM side of the workaround for ARM's erratum 4193714, provided
   that the EL3 firmware does its part of the job. KVM will refuse to
   initialise otherwise.
 
 - Correctly handle 52bit VAs for guest EL2 stage-1 translations when
   running under NV with E2H==0.
 
 * Correctly deal with permission faults in guest_memfd memslots.
 
 * Fix the steal-time selftest after the infrastructure was reworked.
 
 * Make sure the host cannot pass a non-sensical clock update to the
   EL2 tracing infrastructure.
 
 * Appoint Steffen Eiden as a reviewer in anticipation of the KVM/s390
   ability to run arm64 guests, which will inevitably lead to arm64
   code being directly used on s390.
 
 * Make sure that EL2 is configured with both exception entry and exit
   being Context Synchronization Events.
 
 * Handle the current vcpu being NULL on EL2 panic.
 
 * Fix the selftest_vcpu memcache being empty at the point of donation or
   sharing.
 
 * Check that the memcache has enough capacity before engaging on the
   share/donate path.
 
 * Fix __deactivate_fgt() to use its parameter rather than a variable
   in the macro context.
 
 s390:
 
 * Fix array overrun with large amounts of PCI devices.
 
 x86:
 
 * Never use L0's PAUSE loop exiting while L2 is running, since it's
   unlikely that a nested guest will help solving the hypervisor's
   spinlock contention
 
 * Fix emulation of MOVNTDQA.
 
 * Fix typo in Xen hypercall tracepoint
 
 * Add back an optimization that was left behind when recently
   fixing a bug.
 
 * Add module parameter to disable CET, whose implementation seems
   to have issues.  For now it remains enabled by default.
 
 Generic:
 
 * Reject offset causing an unsigned overflow in kvm_reset_dirty_gfn()
 
 Documentation:
 
 * Update stale links
 
 Selftests:
 
 * Fix guest_memfd_test with host page size > guest page size.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCgAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmoEnNgUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroPOeAgArZ60yQGH0TJipyNsaPt+m+IEGMZ/
 UC1tRd384EJnwpjFfZOvwluNNxeFlSXlku7iEXHHveK1qqFXnh+WBXJ91ftfDK/+
 OOqVBBziOyxI6Mbsm2S415kzOQ15atsrclrcGC4emSydgX+JASZ4nsGx6MDRPu/8
 p4TNy3vD5wxe3UGttYElMoFcgT0N/HepMyvUlXohjcjl/hkgf5GL4yPc/TGuvdtz
 EJfmDRhJEwyzf4/Ut8tzX+LhNxSY2iBr5XBvC8XQMSJBVbU/CRGxUk28fEzo7ykx
 EHVOlkxgUN1zO0xh/8aMgRIZNDMveWupR2sJe6StCqOlcbBMI2oYFNnLfQ==
 =f8oe
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "arm64:

   - Add the pKVM side of the workaround for ARM's erratum 4193714,
     provided that the EL3 firmware does its part of the job. KVM will
     refuse to initialise otherwise

   - Correctly handle 52bit VAs for guest EL2 stage-1 translations when
     running under NV with E2H==0

   - Correctly deal with permission faults in guest_memfd memslots

   - Fix the steal-time selftest after the infrastructure was reworked

   - Make sure the host cannot pass a non-sensical clock update to the
     EL2 tracing infrastructure

   - Appoint Steffen Eiden as a reviewer in anticipation of the KVM/s390
     ability to run arm64 guests, which will inevitably lead to arm64
     code being directly used on s390

   - Make sure that EL2 is configured with both exception entry and exit
     being Context Synchronization Events

   - Handle the current vcpu being NULL on EL2 panic

   - Fix the selftest_vcpu memcache being empty at the point of donation
     or sharing

   - Check that the memcache has enough capacity before engaging on the
     share/donate path

   - Fix __deactivate_fgt() to use its parameter rather than a variable
     in the macro context

  s390:

   - Fix array overrun with large amounts of PCI devices

  x86:

   - Never use L0's PAUSE loop exiting while L2 is running, since it's
     unlikely that a nested guest will help solving the hypervisor's
     spinlock contention

   - Fix emulation of MOVNTDQA

   - Fix typo in Xen hypercall tracepoint

   - Add back an optimization that was left behind when recently fixing
     a bug

   - Add module parameter to disable CET, whose implementation seems to
     have issues. For now it remains enabled by default

  Generic:

   - Reject offset causing an unsigned overflow in kvm_reset_dirty_gfn()

  Documentation:

   - Update stale links

  Selftests:

   - Fix guest_memfd_test with host page size > guest page size"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (22 commits)
  KVM: VMX: introduce module parameter to disable CET
  KVM: x86: Swap the dst and src operand for MOVNTDQA
  KVM: x86: use again the flush argument of __link_shadow_page()
  KVM: selftests: Ensure gmem file sizes are multiple of host page size
  Documentation: kvm: update links in the references section of AMD Memory Encryption
  KVM: nSVM: Never use L0's PAUSE loop exiting while L2 is running
  KVM: x86: Fix Xen hypercall tracepoint argument assignment
  KVM: Reject wrapped offset in kvm_reset_dirty_gfn()
  KVM: arm64: Pre-check vcpu memcache for host->guest donate
  KVM: arm64: Pre-check vcpu memcache for host->guest share
  KVM: arm64: Seed pkvm_ownership_selftest vcpu memcache
  KVM: arm64: Fix __deactivate_fgt macro parameter typo
  KVM: arm64: Guard against NULL vcpu on VHE hyp panic path
  KVM: arm64: Make EL2 exception entry and exit context-synchronization events
  MAINTAINERS: Add Steffen as reviewer for KVM/arm64
  KVM: arm64: Remove potential UB on nvhe tracing clock update
  KVM: selftests: arm64: Fix steal_time test after UAPI refactoring
  KVM: arm64: Handle permission faults with guest_memfd
  KVM: arm64: nv: Consider the DS bit when translating TCR_EL2
  KVM: arm64: Work around C1-Pro erratum 4193714 for protected guests
  ...
2026-05-13 11:53:51 -07:00
Yu Miao 7d8f3158a5 selftests/cgroup: Fix error path leaks in test_percpu_basic
When cg_name_indexed() returns NULL partway through the child creation
loop, the code returned -1 without running cleanup_children and cleanup.
That left the `parent` pathname allocation unreleased and did not remove
child cgroup directories already created under the parent. Fix by jumping
to cleanup_children instead of returning.

When cg_create() fails, `child` (the pathname from cg_name_indexed())
was not freed before cleanup_children. Fix by freeing `child` before
branching to cleanup_children.

Fixes: 90631e1dea ("kselftests: cgroup: add perpcu memory accounting test")
Signed-off-by: Yu Miao <yumiao@kylinos.cn>
Signed-off-by: Tejun Heo <tj@kernel.org>
2026-05-13 08:40:52 -10:00
Yi Lai 0bf1b4dda2 selftests/rdma: explicitly skip tests when required modules are missing
Currently, the rdma rxe selftests fail with an exit code of 1 when
required kernel modules are not present. This causes spurious failures
in environments where these modules might not be compiled or available.

Include the standard kselftest 'ktap_helpers.sh' and replace the
hardcoded error exits with '$KSFT_SKIP'. This ensures the tests are
properly marked as skipped rather than failed.

Fixes: e01027cab3 ("RDMA/rxe: Add testcase for net namespace rxe")
Signed-off-by: Yi Lai <yi1.lai@intel.com>
Link: https://patch.msgid.link/20260507125106.3114167-1-yi1.lai@intel.com
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-05-13 14:28:59 -04:00
Hisam Mehboob 34065a5f3c KVM: selftests: Guard execinfo.h inclusion for non-glibc builds
The backtrace() function and execinfo.h are GNU extensions available
in glibc but not in non-glibc C libraries such as musl. Building KVM
selftests with musl-gcc fails with:

  lib/assert.c:9:10: fatal error: execinfo.h: No such file or directory

Fix this by guarding the inclusion of execinfo.h and the stack dumping
logic under #ifdef __GLIBC__. For non-glibc builds, provide a local
stub for test_dump_stack().

Suggested-by: Aqib Faruqui <aqibaf@amazon.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Hisam Mehboob <hisamshar@gmail.com>
Link: https://patch.msgid.link/20260409153846.1502656-2-hisamshar@gmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-05-13 10:38:02 -07:00
Sean Christopherson 6d3790bc68 KVM: selftests: Include sys/mman.h *and* linux/mman.h, via kvm_syscalls.h
Include both linux/mman.h (the kernel provided version) and sys/mman.h (the
libc provided version) throughout KVM selftests, by way of kvm_syscalls.h
(which should have been including sys/mman.h anyways).  Pulling in the
kernel's version fixes compilation errors with the guest_memfd test on
older versions of libc due to a recent commit adding MADV_COLLAPSE testing.

  In file included from include/kvm_util.h:8,
                   from guest_memfd_test.c:21:
  guest_memfd_test.c: In function ‘test_collapse’:
  guest_memfd_test.c:219:47: error: ‘MADV_COLLAPSE’ undeclared (first use in this function); did you mean ‘MADV_COLD’?
      219 |         TEST_ASSERT_EQ(madvise(mem, pmd_size, MADV_COLLAPSE), -1);
          |                                               ^~~~~~~~~~~~~
    include/test_util.h:62:16: note: in definition of macro ‘TEST_ASSERT_EQ’
       62 |         typeof(a) __a = (a);                                            \
          |                ^
    guest_memfd_test.c:219:47: note: each undeclared identifier is reported only once for each function it appears in
      219 |         TEST_ASSERT_EQ(madvise(mem, pmd_size, MADV_COLLAPSE), -1);
          |                                               ^~~~~~~~~~~~~
    include/test_util.h:62:16: note: in definition of macro ‘TEST_ASSERT_EQ’
       62 |         typeof(a) __a = (a);                                            \
          |                ^

Route the includes through kvm_syscalls.h to try and avoid a future game
of whack-a-mole, i.e. so that future expansion of test coverage doesn't run
into the same problem.

To discourage use of sys/mman.h, opportunistically include the kernel's
version of mman.h in test_util.h as it only needs MAP_SHARED, i.e. only
needs the full set of kernel defs, not the libc syscall wrappers.

Fixes: 9830209b4a ("KVM: selftests: Test MADV_COLLAPSE on guest_memfd")
Reported-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Closes: https://lore.kernel.org/all/20260427204313.50741-1-rick.p.edgecombe@intel.com
Link: https://patch.msgid.link/20260428012503.1213654-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-05-13 09:53:43 -07:00
Ming Lei 87d0740b7c selftests: ublk: cap nthreads to kernel's actual nr_hw_queues
dev->nthreads is derived from the user-requested queue count before the
ADD command, but the kernel may reduce nr_hw_queues (capped to
nr_cpu_ids). When the VM has fewer CPUs than requested queues, the
daemon creates more handler threads than there are kernel queues.

In non-batch mode, the extra threads access uninitialized queues
(q_depth=0), submit zero io_uring SQEs, and block forever in
io_cqring_wait. In batch mode, the extra threads cause similar hangs
during device removal.

In both cases, the stuck threads prevent the daemon from closing the
char device, holding the last ublk_device reference and causing
ublk_ctrl_del_dev() to hang in wait_event_interruptible().

Fix by capping dev->nthreads to the kernel-returned nr_hw_queues after
the ADD command completes. per_io_tasks mode is excluded because threads
interleave across all queues, so nthreads > nr_hw_queues is valid.

Fixes: abe54c1603 ("selftests: ublk: kublk: decouple ublk_queues from ublk server threads")
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Link: https://patch.msgid.link/20260513101941.1373998-1-tom.leiming@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-05-13 07:55:39 -06:00
Sean Christopherson 87c810160e KVM: selftests: Ensure gmem file sizes are multiple of host page size
When creating a guest_memfd file and associated memslot to validate shared
guest memory, size the file+memslot to the maximum of the host or guest
page size.  Attempting to allocate a single guest page will fail if the
host page size is greater than the guest page size, as KVM requires that
the size of memslots and guest_memfd files are a multiple of the host page
size.

For simplicity, verify the entire file can be shared between guest and host,
e.g. instead of trying to validate "partial" mappings.

Fixes: 42188667be ("KVM: selftests: Add guest_memfd testcase to fault-in on !mmap()'d memory")
Reported-by: Zenghui Yu <zenghui.yu@linux.dev>
Closes: https://lore.kernel.org/all/0064952b-048c-455d-ad89-e27e5cb82591@linux.dev
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20260512155634.772602-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2026-05-12 22:26:10 +02:00
Jakub Kicinski 6e8ae9d805 selftests: drv-net: add shaper test for duplicate leaves
Add test exercising duplicate leaves.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20260510192904.3987113-5-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-05-12 16:14:59 +02:00
Yazhou Tang 344a00712c selftests/bpf: Add test for large offset bpf-to-bpf call
Add a selftest to verify the verifier and JIT behavior when handling
bpf-to-bpf calls with relative jump offsets exceeding the s16 boundary.

The test utilizes an inline assembly block with ".rept 32765" to generate
a massive dummy subprogram. By placing this padding between the main
program and the target subprogram, it forces the verifier to process a
bpf-to-bpf call where the imm field exceeds the s16 range.

- When JIT is enabled, it asserts that the program is successfully loaded
  and executes correctly to return the expected value. Since the fix
  does not change the JIT behavior, the test passes whether the fix is
  applied or not.
- When JIT is disabled, it also asserts that the program is successfully
  loaded and executes correctly to return the expected value 3.
  - Before the fix, the verifier rewrites the call instruction with a
    truncated offset (here 32768 -> -32768) and lets it pass. When the
    program is executed, the call instruction will go to a wrong target
    (the landing pad) instead of the intended subprogram, then return -1
    and fail.
  - After the fix, the verifier correctly handles the large offset and
    allows it to pass. The program then executes correctly to return the
    expected value 3.

Co-developed-by: Tianci Cao <ziye@zju.edu.cn>
Signed-off-by: Tianci Cao <ziye@zju.edu.cn>
Co-developed-by: Shenghao Yuan <shenghaoyuan0928@163.com>
Signed-off-by: Shenghao Yuan <shenghaoyuan0928@163.com>
Signed-off-by: Yazhou Tang <tangyazhou518@outlook.com>
Acked-by: Xu Kuohai <xukuohai@huawei.com>
Link: https://lore.kernel.org/r/20260506094714.419842-4-tangyazhou@zju.edu.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-05-11 08:27:02 -07:00
Andrea Righi 3788e32516 selftests/sched_ext: Fix build error in dequeue selftest
Building the dequeue selftest with newer compilers (e.g., gcc 16)
triggers the following error:

 dequeue.c:28:22: error: variable 'sum' set but not used

The 'volatile' qualifier prevents the writes from being optimized away,
but does not silence the unused variable 'sum' is indeed only written
and never read.

Consume 'sum' via an empty asm() with a register input constraint. This
forces the compiler to keep the accumulated value (preserving the CPU
stress loop) and avoiding the build error.

Fixes: 658ad2259b ("selftests/sched_ext: Add test to validate ops.dequeue() semantics")
Signed-off-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2026-05-10 16:03:05 -10:00
Hongfu Li 2a3d7256fa selftests/cgroup: Fix string comparison in write_test
Use string comparison (!=) instead of numeric comparison (-ne) for
cpuset values like "0-1".
For example:
$ [[ "0-1" != "2-3" ]] && echo "true" || echo "false"
true
$ [[ "0-1" -ne "2-3" ]] && echo "true" || echo "false"
false

Signed-off-by: Hongfu Li <lihongfu@kylinos.cn>
Signed-off-by: Tejun Heo <tj@kernel.org>
2026-05-10 15:54:12 -10:00