Commit Graph

84367 Commits (e7ae89a0c97ce2b68b0983cd01eda67cf373517d)

Author SHA1 Message Date
Linus Torvalds 68993ced0f Including fixes from Bluetooth, wireless and netfilter.
Current release - fix to a fix:
 
  - Bluetooth: btmtk: accept too short WMT FUNC_CTRL events
 
  - vsock/virtio: relax the recently added memory limit a little
 
 Current release - regressions:
 
  - IB/IPoIB: make sure IB drivers always use async set_rx_mode since
    some (mlx5) are now required to use it due to locking changes
 
 Previous releases - regressions:
 
  - udp: fix UDP length on last GSO_PARTIAL segment
 
  - af_unix: fix UAF read of tail->len in unix_stream_data_wait()
 
  - tcp: fix stale per-CPU tcp_tw_isn leak enabling ISN prediction
 
  - mlx5e: fix unlocked writing to ICOSQ, breaking AF_XDP
 
 Previous releases - always broken:
 
  - tap: fix stack info leak in tap_ioctl() SIOCGIFHWADDR
 
  - ipv4: raw: reject IP_HDRINCL packets with ihl < 5
 
  - Bluetooth: a lot of locking and concurrency fixes (as always)
 
  - batman-adv (mesh wireless networking): a lot of random fixes
    for issues reported by security researchers and Sashiko
 
  - netfilter: same thing, a lot of small security-ish fixes
    all over the place, nothing really stands out
 
 Misc:
 
  - bring back the old 3c509 driver, Maciej wants to maintain it
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmoPULUACgkQMUZtbf5S
 IrtAvA//bfjxxazZKkGqL8mp6uMYS5Su81Oh/pBcyEWC7q2xv3ftNp5pt8oCTWYP
 eryKi7XrxfNHrkFcmnH+aWQ431UekZLfAjrSd+5V0YvE1nQDnKrgbat5qx2SYSsr
 ZA7EYnJjvAtPMb0KqUJlYPMSfVdFA0H3gEOdnawkGRnizkKNO5NsNRkC4rHzpCil
 hzW5SCTZWQ0r1Cm3IxcTnSCJEOYRqH0BUBbiSRFCWNMZZpq0xKi3UiJFOdgRvqgc
 VoPz6sMRPxZyL8gW8i2jJVz6vj2yuWifJwbl8y3ZkqJJy4HvNXfcPIBH5+vBIWlB
 hWMuYlUv5F0w+h4+UKeDr789Tdpv12edUIDX+prbsJ8c4bXmBflt069HlFjG9Pto
 /k2e5owR0NYSaLt4WvAM6Tr5j1ralzQjHKVDg8JbPaAD+0dtb+e3dXE8J3MBPrw6
 EWtdg9jX+vqsbVoHwMQO9Xp2waNY9+97L07w+I0nVf7NLJvrvz0lkSjMKfNPNyV1
 C5W7McAbSOx3nJ+XzYwMoVK0wP9OunKA73EhAoEdvQSyOGLqQT+iZzDoTMnwKJFs
 2L3fbc8LQ10WBG2B2rCPB/gaGQ1ZZD8uSlZoS9N31dvUPFDaCnCYgKIze/pdcE/R
 KOQskME2xd61KzpYlJszkrjJIbnppkNt/mBvvfNUP+zJZPFRyuA=
 =ei7U
 -----END PGP SIGNATURE-----

Merge tag 'net-7.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from Bluetooth, wireless and netfilter.

  Craziness continues with no end in sight. Even discounting the driver
  revert this is a pretty huge PR for standards of the previous era. I'd
  speculate - we haven't seen the worst of it, yet. Good news, I guess,
  is that so far we haven't seen many (any?) cases of "AI reported a
  bug, we fixed it and a real user regressed".

  Current release - fix to a fix:

   - Bluetooth: btmtk: accept too short WMT FUNC_CTRL events

   - vsock/virtio: relax the recently added memory limit a little

  Current release - regressions:

   - IB/IPoIB: make sure IB drivers always use async set_rx_mode since
     some (mlx5) are now required to use it due to locking changes

  Previous releases - regressions:

   - udp: fix UDP length on last GSO_PARTIAL segment

   - af_unix: fix UAF read of tail->len in unix_stream_data_wait()

   - tcp: fix stale per-CPU tcp_tw_isn leak enabling ISN prediction

   - mlx5e: fix unlocked writing to ICOSQ, breaking AF_XDP

  Previous releases - always broken:

   - tap: fix stack info leak in tap_ioctl() SIOCGIFHWADDR

   - ipv4: raw: reject IP_HDRINCL packets with ihl < 5

   - Bluetooth: a lot of locking and concurrency fixes (as always)

   - batman-adv (mesh wireless networking): a lot of random fixes for
     issues reported by security researchers and Sashiko

   - netfilter: same thing, a lot of small security-ish fixes all over
     the place, nothing really stands out

  Misc:

   - bring back the old 3c509 driver, Maciej wants to maintain it"

* tag 'net-7.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (187 commits)
  net: enetc: avoid VF->PF mailbox timeout during SR-IOV teardown
  net: enetc: fix init and teardown order to prevent use of unsafe resources
  net: enetc: fix unbounded loop and interrupt handling in VF-to-PF messaging
  net: enetc: fix DMA write to freed memory in enetc_msg_free_mbx()
  net: enetc: fix race condition in VF MAC address configuration
  net: enetc: fix TOCTOU race and validate VF MAC address
  net: enetc: add ratelimiting to VF mailbox error messages
  net: enetc: fix missing error code when pf->vf_state allocation fails
  net: enetc: fix incorrect mailbox message status returned to VFs
  net: bridge: prevent too big nested attributes in br_fill_linkxstats()
  l2tp: use list_del_rcu in l2tp_session_unhash
  net: bcmgenet: keep RBUF EEE/PM disabled
  ethernet: 3c509: Fix most coding style issues
  ethernet: 3c509: Update documentation to match MAINTAINERS
  ethernet: 3c509: Add GPL 2.0 SPDX license identifier
  ethernet: 3c509: Fix AUI transceiver type selection
  Revert "drivers: net: 3com: 3c509: Remove this driver"
  tools: ynl: support listening on all nsids
  net: gro: don't merge zcopy skbs
  pds_core: ensure null-termination for firmware version strings
  ...
2026-05-21 14:39:12 -07:00
Jakub Kicinski 0e3c08f1b7 Quite a few more updates:
- cfg80211/mac80211:
    - various security(-ish) fixes
    - fix A-MSDU subframe handling
    - fix multi-link element parsing
  - ath10: avoid sending commands to dead device
  - ath11k:
    - fix WMI buffer leaks on error conditions
    - fix UAF in RX MSDU coalesce path
    - allow peer ID 0 on RX path (legal for mobile devices)
    - reinitialize shared SRNG pointers on restart
  - ath12k:
    - fix 20 MHz-only parsing of EHT-MCS map
  - iwlwifi:
    - fix TSO segmentation explosion
    - don't TX to dead device
    - fix warning in WoWLAN
    - fix TX rates on old devices
    - disconnect on beacon loss only if also no other traffic
    - fill NULL-ptr deref
    - fix STEP_URM hardware access
 -----BEGIN PGP SIGNATURE-----
 
 iQIyBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmoPJEkACgkQ10qiO8sP
 aACQrw/4vU+lbZNW19OyaJMd4h+44gUW+UGJixOzputQCBc6JGUlRsxgceWq5Ws5
 5x2LTOX7S1wcvKm0VuSvkIRP3e9YHcgB60iBtsJ3ozz4RCoCFiSu8Bb2RdkGtRTp
 7CKMK9NNuovOJncBzfyANq4ujsGs/58BmGbhXbaZ0ACfLUauesCCUtM7iQZE1k7t
 lBqtk8ezkz1L8006w5vR7VR8g4KCCofQTEAOASmx450ZeGAiHMlWVKdMFFHV3zWj
 ZDXopvLaMtduLNq9xqGYCRhAIZqOv1axgL7w9RRxsi2gWHv71kLqyz0IzgbFmh1m
 ZxUSQ45+MHVYCHxs7HHCcTR5gqQlx47j5Wi3tuLUH8yoSZ8dPeWjmQMvIEswfZql
 WNq18o6mcK+L3Yg87+oxiiJ7V/euaM//0+ZGtqhbiB+2FyHZhO42BqALTy7e4swS
 kmEl8gCj2lgCbD2AHJQ9VpOJwoNdNuLYoJqg9IiIu/CYqQF80FGO8e6HZXBXsJkL
 3KAPQIXkMMMkSjtpTg/GdHDiFqv/7lF8u3FgED7w7M1ZVQYNUc13KiDwPALFV0pu
 bBbRktB6lvF6ShW9XrTrmn9lT0iiWHxr5YctWoys4+Ofr5V7PUzxNVDxDuzSVCZ0
 apLYZ7uwSXO37q99p/azs47dzYp7tpnwKd4rpQuUTl/9bYErUA==
 =vzwG
 -----END PGP SIGNATURE-----

Merge tag 'wireless-2026-05-21' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless

Johannes Berg says:

====================
Quite a few more updates:
 - cfg80211/mac80211:
   - various security(-ish) fixes
   - fix A-MSDU subframe handling
   - fix multi-link element parsing
 - ath10: avoid sending commands to dead device
 - ath11k:
   - fix WMI buffer leaks on error conditions
   - fix UAF in RX MSDU coalesce path
   - allow peer ID 0 on RX path (legal for mobile devices)
   - reinitialize shared SRNG pointers on restart
 - ath12k:
   - fix 20 MHz-only parsing of EHT-MCS map
 - iwlwifi:
   - fix TSO segmentation explosion
   - don't TX to dead device
   - fix warning in WoWLAN
   - fix TX rates on old devices
   - disconnect on beacon loss only if also no other traffic
   - fill NULL-ptr deref
   - fix STEP_URM hardware access

* tag 'wireless-2026-05-21' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: (24 commits)
  wifi: cfg80211: wext: validate chandef in monitor mode
  wifi: mac80211: consume only present negotiated TTLM maps
  wifi: wilc1000: fix dma_buffer leak on bus acquire failure
  wifi: mac80211: capture fast-RX rate before mesh reuses skb->cb
  wifi: mac80211: fix multi-link element inheritance
  wifi: mac80211: fix MLE defragmentation
  wifi: mac80211: don't override max_amsdu_subframes
  wifi: mac80211: bounds-check link_id in ieee80211_ml_epcs
  wifi: ath12k: fix EHT TX MCS limitation due to wrong 20 MHz-only parsing
  wifi: ath11k: clear shared SRNG pointer state on restart
  wifi: ath11k: fix use after free in ath11k_dp_rx_msdu_coalesce()
  wifi: ath11k: fix peer resolution on rx path when peer_id=0
  wifi: iwlwifi: mld: disconnect only after 6 beacons without Rx
  wifi: iwlwifi: mld: don't WARN on WoWLAN suspend w/o BSS vif
  wifi: iwlwifi: use correct function to read STEP_URM register
  wifi: iwlwifi: mvm: fix driver-set TX rates on old devices
  wifi: iwlwifi: mld: don't dereference a pointer before NULL checking it
  wifi: iwlwifi: mld: stop TX during firmware restart
  wifi: iwlwifi: mld: fix TSO segmentation explosion when AMSDU is disabled
  wifi: ath10k: skip WMI and beacon transmission when device is wedged
  ...
====================

Link: https://patch.msgid.link/20260521152903.374070-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-21 11:03:58 -07:00
Eric Dumazet bdd39576bf net: bridge: prevent too big nested attributes in br_fill_linkxstats()
After commit ff205bf8c554 ("netlink: add one debug check in nla_nest_end()")
syzbot found that br_fill_linkxstats() can send corrupted netlink packets.

Make sure the nested attribute size is bounded.

Fixes: a60c090361 ("bridge: netlink: export per-vlan stats")
Reported-by: syzbot+a35f9259d08f907c06e6@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/6a0b0da3.050a0220.175f0c.0000.GAE@google.com/
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20260520114207.1394241-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-21 08:47:36 -07:00
Michael Bommarito 979c017803 l2tp: use list_del_rcu in l2tp_session_unhash
An unprivileged local user can pin a host CPU indefinitely in
l2tp_session_get_by_ifname() by issuing L2TP_CMD_SESSION_GET on
L2TP_ATTR_IFNAME concurrently with L2TP_CMD_SESSION_CREATE and
L2TP_CMD_SESSION_DELETE on the same tunnel. All three commands take
GENL_UNS_ADMIN_PERM, so CAP_NET_ADMIN in the netns user namespace
suffices; on any host that has l2tp_core loaded the trigger is
reachable from a standard `unshare -Urn` sandbox.

l2tp_session_unhash() removes a session from tunnel->session_list
with list_del_init(), but that list is walked by
l2tp_session_get_by_ifname() with list_for_each_entry_rcu() under
rcu_read_lock_bh(). list_del_init() leaves the deleted entry's
next/prev self-pointing; a reader that has loaded the entry and
then advances pos->list.next reads &session->list, container_of()s
back to the same session, and list_for_each_entry_rcu() never
reaches the list head. The CPU stays in strcmp() inside the
walker, with BH and preemption disabled, so RCU grace periods on
the host stall behind it and the wedged thread cannot be killed
(SIGKILL is delivered on syscall return).

Use list_del_rcu() to match the existing list_add_rcu() in
l2tp_session_register(); the deleted session remains visible to
in-flight walkers with consistent next/prev pointers until
kfree_rcu() in l2tp_session_free() releases it. tunnel->session_list
has exactly one list_del_init() call site; the list_del_init
(&session->clist) at l2tp_core.c:533 operates on the per-collision
list, which is not walked under RCU. list_empty(&session->list) is
not used anywhere in net/l2tp/ after the unhash point, so dropping
the post-delete self-init is safe; the fix has no userspace-visible
behavior change.

Fixes: 89b768ec2d ("l2tp: use rcu list add/del when updating lists")
Cc: stable@vger.kernel.org # 6.11+
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Link: https://patch.msgid.link/20260518183447.64078-1-michael.bommarito@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-21 08:47:20 -07:00
Sabrina Dubroca 4db79a322d net: gro: don't merge zcopy skbs
skb_gro_receive() can currently copy frags between the source and GRO
skb, without checking the zerocopy status, and in particular the
SKBFL_MANAGED_FRAG_REFS flag.

When SKBFL_MANAGED_FRAG_REFS is set, the skb doesn't hold a reference
on the pages in shinfo->frags. Appending those frags to another skb's
frags without fixing up the page refcount can lead to UAF.

When either the last skb in the GRO chain (the one we would append
frags to) or the source skb is zerocopy, don't merge the skbs.

Fixes: 753f1ca4e1 ("net: introduce managed frags infrastructure")
Reported-by: Huzaifa Sidhpurwala <huzaifas@redhat.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/c3b7f906bbfcbdfd7b4fa9d6c18a438870df85be.1779307748.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-21 08:21:33 -07:00
Justin Iurman e46e6bc97f ipv6: ioam: refresh hdr pointer before ioam6_event()
Reported by Sashiko:

In ipv6_hop_ioam(), the hdr pointer is initialized to point into the
skb's linear data buffer. Later, the code calls skb_ensure_writable(),
which might reallocate the buffer:

	if (skb_ensure_writable(skb, optoff + 2 + hdr->opt_len))
		goto drop;

	/* Trace pointer may have changed */
	trace = (struct ioam6_trace_hdr *)(skb_network_header(skb)
					   + optoff + sizeof(*hdr));

	ioam6_fill_trace_data(skb, ns, trace, true);

	ioam6_event(IOAM6_EVENT_TRACE, dev_net(skb->dev),
		    GFP_ATOMIC, (void *)trace, hdr->opt_len - 2);

If the skb is cloned or lacks sufficient linear headroom,
skb_ensure_writable() will invoke pskb_expand_head(), which reallocates
the skb's data buffer and frees the old one, invalidating pointers to
it. While the code recalculates the trace pointer immediately after the
call to skb_ensure_writable(), it fails to recalculate the hdr pointer.

This patch fixes the above by recalculating the hdr pointer before
passing hdr->opt_len to ioam6_event(), so that we avoid any UaF.

Fixes: f655c78d62 ("net: exthdrs: ioam6: send trace event")
Cc: stable@vger.kernel.org
Signed-off-by: Justin Iurman <justin.iurman@gmail.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260520124242.32320-1-justin.iurman@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-21 08:19:25 -07:00
Zhang Cen c367b90821 netpoll: normalize skb->dev to the netpoll device
__netpoll_send_skb() always transmits through np->dev and queues busy
packets on np->dev->npinfo->txq, but it leaves skb->dev unchanged.
Stacked callers such as DSA and macvlan can reach netpoll with skb->dev
still naming the upper device while np->dev is the lower device that
owns the netpoll state.

If the skb has to be deferred, queue_process() later dequeues it from
the lower device's txq but retries it through skb->dev. That can
re-enter the upper ndo_start_xmit path on an already transformed skb,
and if the upper device disappears before the lower txq drains the
workqueue can dereference a stale skb->dev pointer.

The buggy scenario involves two paths, with each column showing the
order within that path:

path A label: netpoll enqueue path   path B label: upper-device teardown
1. Stacked xmit calls netpoll        1. Teardown unregisters the upper
   with lower np->dev and upper         net_device while lower npinfo
   skb->dev.                            stays alive.
2. __netpoll_send_skb() uses         2. netdev_release() runs for the
   np->dev->npinfo as the txq           upper net_device.
   owner.
3. Busy transmit queues the skb      3. The lower txq still owns the
   on that lower txq with upper         deferred skb.
   skb->dev.
4. queue_process() drains the        4. queue_process() dereferences
   lower txq and reads skb->dev.        that stale upper skb->dev.

Normalize skb->dev to np->dev after loading np->dev from the netpoll
instance, before either the direct transmit path or the fallback enqueue.
This keeps the queued skb in the same device and txq domain as the
netpoll state that owns it.

KASAN report as below:

KASAN slab-use-after-free in queue_process+0x7c/0x480
Workqueue: events queue_process
The buggy address belongs to the object at ffff88810906c000 which belongs
to the cache kmalloc-4k of size 4096
The buggy address is located 168 bytes inside of freed 4096-byte region
[ffff88810906c000, ffff88810906d000)
Read of size 8
Call trace:
  dump_stack_lvl+0x73/0xb0 (?:?)
  print_report+0xd1/0x620 (?:?)
  srso_alias_return_thunk+0x5/0xfbef5 (?:?)
  __virt_addr_valid+0x215/0x420 (?:?)
  kasan_complete_mode_report_info+0x64/0x200 (?:?)
  kasan_report+0xf7/0x130 (?:?)
  queue_process+0x7c/0x480 (net/core/netpoll.c:88)
  kasan_check_range+0x10c/0x1c0 (?:?)
  __kasan_check_read+0x15/0x20 (?:?)
  process_one_work+0x8b7/0x1af0 (kernel/workqueue.c:3200)
  assign_work+0x170/0x3f0 (?:?)
  worker_thread+0x574/0xf10 (?:?)
  _raw_spin_unlock_irqrestore+0x4b/0x60 (?:?)
  trace_hardirqs_on+0x2a/0x180 (?:?)
  kthread+0x2fc/0x3f0 (?:?)
  ret_from_fork+0x58b/0x830 (?:?)
  __switch_to+0x58e/0xe90 (?:?)
  __switch_to_asm+0x39/0x70 (?:?)
  ret_from_fork_asm+0x1a/0x30 (?:?)
Freed by task stack:
  kasan_save_stack+0x3d/0x60 (?:?)
  kasan_save_track+0x18/0x40 (?:?)
  kasan_save_free_info+0x3f/0x60 (?:?)
  __kasan_slab_free+0x48/0x70 (?:?)
  kfree+0x20e/0x4e0 (?:?)
  kvfree+0x31/0x40 (?:?)
  netdev_release+0x71/0x90 (net/core/net-sysfs.c:2227)
  device_release+0xd2/0x250 (?:?)
  kobject_put+0x181/0x4c0 (lib/kobject.c:730)
  netdev_run_todo+0x700/0x1000 (net/core/dev.c:11666)
  rtnl_dellink+0x396/0xc00 (net/core/rtnetlink.c:3558)
  rtnetlink_rcv_msg+0x740/0xc20 (net/core/rtnetlink.c:6897)
  netlink_rcv_skb+0x147/0x3a0 (?:?)
  rtnetlink_rcv+0x19/0x20 (net/core/rtnetlink.c:7021)
  netlink_unicast+0x4d1/0x830 (net/netlink/af_netlink.c:1327)
  netlink_sendmsg+0x840/0xe10 (net/netlink/af_netlink.c:1812)
  ____sys_sendmsg+0x8a7/0xb50 (?:?)
  ___sys_sendmsg+0x104/0x190 (?:?)
  __sys_sendmsg+0x135/0x1d0 (?:?)
  __x64_sys_sendmsg+0x7b/0xc0 (?:?)
  x64_sys_call+0x205c/0x2130 (?:?)
  do_syscall_64+0x115/0x6a0 (arch/x86/entry/syscall_64.c:87)
  entry_SYSCALL_64_after_hwframe+0x77/0x7f (?:?)

Fixes: 5de4a473bd ("netpoll queue cleanup")
Signed-off-by: Zhang Cen <rollkingzzc@gmail.com>
Link: https://patch.msgid.link/20260519104647.3517990-1-rollkingzzc@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-21 08:10:18 -07:00
Yuho Choi 1341db3224 ipv6: route: Unregister netdevice notifier on BPF init failure
ip6_route_init() registers ip6_route_dev_notifier before registering the
IPv6 route BPF iterator target. If bpf_iter_register() fails after the
notifier has been registered, the error path currently jumps to
out_register_late_subsys and unwinds the RTNL handlers and pernet route
state without removing the notifier from the netdevice notifier chain.

This leaves ip6_route_dev_notify() callable after the IPv6 route state it
uses has been torn down. Add a separate unwind label for the BPF iterator
failure path and unregister the netdevice notifier before continuing with
the existing cleanup.

Fixes: 138d0be35b ("net: bpf: Add netlink and ipv6_route bpf_iter targets")
Signed-off-by: Yuho Choi <dbgh9129@gmail.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260520030329.1061183-1-dbgh9129@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-21 07:43:15 -07:00
Zijing Yin dbc81608e3 phonet/pep: disable BH around forwarded sk_receive_skb()
The networking receive path is usually run from softirq context, but
protocols that take the socket lock may have packets stored in the
backlog and processed later from process context. In that case
release_sock() -> __release_sock() drops the slock with spin_unlock_bh()
and then calls sk->sk_backlog_rcv() with bottom halves enabled.

Typical sk_backlog_rcv handlers process the socket whose backlog is
being drained, so the BH state at entry is irrelevant for the slocks
they touch. pep_do_rcv() is different: when the inbound skb targets an
existing PEP pipe, it forwards the skb to a different *child* socket
via sk_receive_skb(). That helper takes the child slock with
bh_lock_sock_nested(), which is just spin_lock_nested() and assumes BH
is already off. The same child slock therefore ends up acquired with
BH on (process path) and with BH off (softirq path):

  process context                   softirq context
  ---------------                   ---------------
  release_sock(listener)            __netif_receive_skb()
   __release_sock()                  phonet_rcv()
    spin_unlock_bh()                  __sk_receive_skb(listener)
    [BH now ENABLED]                  [BH already disabled]
    sk_backlog_rcv:                   sk_backlog_rcv:
     pep_do_rcv()                      pep_do_rcv()
      sk_receive_skb(child)             sk_receive_skb(child)
       bh_lock_sock_nested(child)        bh_lock_sock_nested(child)
       => SOFTIRQ-ON-W                   => IN-SOFTIRQ-W

Lockdep flags this as inconsistent lock state, and it can become a real
self-deadlock if a softirq on the same CPU tries to receive to the same
child socket while its slock is held in the BH-enabled path:

  WARNING: inconsistent lock state
  inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
   (slock-AF_PHONET/1){+.?.}-{3:3}, at: __sk_receive_skb+0x1cf/0x900
    __sk_receive_skb              net/core/sock.c:563
    sk_receive_skb                include/net/sock.h:2022 [inline]
    pep_do_rcv                    net/phonet/pep.c:675
    sk_backlog_rcv                include/net/sock.h:1190
    __release_sock                net/core/sock.c:3216
    release_sock                  net/core/sock.c:3815
    pep_sock_accept               net/phonet/pep.c:879

Wrap the forwarded sk_receive_skb() in local_bh_disable() /
local_bh_enable() so the child slock is always acquired with BH off.
local_bh_disable() nests safely on the softirq path.

Discovered via in-house syzkaller fuzzing; the same root cause also
on the linux-6.1.y syzbot dashboard as extid 44f0626dd6284f02663c.
Reproduced under KASAN + LOCKDEP + PROVE_LOCKING, reproducer:
https://pastebin.com/A3t8xzCR

Fixes: 9641458d3e ("Phonet: Pipe End Point for Phonet Pipes protocol")
Link: https://syzkaller.appspot.com/bug?extid=44f0626dd6284f02663c
Cc: stable@vger.kernel.org
Signed-off-by: Zijing Yin <yzjaurora@gmail.com>
Acked-by: Rémi Denis-Courmont <remi@remlab.net>
Reported-by: syzbot+9f4a135646b66c509935@syzkaller.appspotmail.com
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260519172635.86304-1-yzjaurora@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-21 07:38:21 -07:00
Paolo Abeni 42734af663 Here are batman-adv bugfixes, all by by Sven Eckelmann.
- fix batadv_skb_is_frag() kernel-doc
 
  - BATMAN V: stop OGMv2 on disabled interface
 
  - BATMAN IV: abort OGM send on tvlv append failure
 
  - BATMAN IV: reject oversized TVLV packets
 
  - tp_meter: fix race condition in send error reporting
 
  - tp_meter: avoid role confusion in tp_list
 
  - mcast: fix use-after-free in orig_node RCU release
 
  - BATMAN IV: recover OGM scheduling after forward packet error
 
  - bla: fix report_work leak on backbone_gw purge
 
  - bla: avoid double decrement of bla.num_requests
 
  - bla: avoid NULL-ptr deref for claim via dropped interface
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAmoNoHkWHHN3QHNpbW9u
 d3VuZGVybGljaC5kZQAKCRChK+OYQpKeoaFZD/4yN7deqhkVKj039EVNxoOOVOSi
 T0kQCOG6h20UiJX6j2l6QuoXF6vwBGDPFZfwCMKSMWXFhbuhzl8L0oEvZ+p6goho
 FyPgAWkWRcopPnSiJTQcFXVVtkku7EkPO2MjMLUEkU8YEKGfnBSWHUBr9+Mw0OUV
 rmrkRQcl7/BpWX++fM0U/qUFfKi8VuIPgQubPyC602/iInseNvx6Ju0abRJeTCF6
 YM4IugyxKoHCwHbnR2jNwl1PfaI7FyLF/ZCvi/2NpAfwnjcykBlMYoEj46Qh3dTD
 05ZIAHlgHAfMDHuW87rema6aYT0nCpSwtBaM4YE2/vim4hxRot6AN/ho/4JFHzHW
 GHCuhw3NxADeUJZBefGy4QEtGPIP/Odz4WcVTr/29TnOTKEAG2rsI4mwLoM4ogp/
 Wa0tlZWUZNHEsjPHSkEWqnR+X/+J8Q9W/RUP85gthVpMKvHyMf7YC2JBXlJ2QvmF
 HzmCjYLv83X4pftutf6R/EAy8MedJTSCoCzJGitIDd5qW3EG3yYBgz6ep4DD/4Hp
 4qWP3gdUbcPnL5mw3SOWYXhOKFQnOs68h5QsSRdEp4LYj0FExC6EZagRZUU7fGOu
 W5r7vsN6j83sBuZd5rQRIs1XNSm5RlUAfJAsWawKLI96l2xSuIrtmJq3zP/kc4cH
 q4Lc88OyebVE5t8/yA==
 =0Yhf
 -----END PGP SIGNATURE-----

Merge tag 'batadv-net-pullrequest-20260520' of https://git.open-mesh.org/batadv

Simon Wunderlich says:

====================
Here are batman-adv bugfixes, all by by Sven Eckelmann.

 - fix batadv_skb_is_frag() kernel-doc

 - BATMAN V: stop OGMv2 on disabled interface

 - BATMAN IV: abort OGM send on tvlv append failure

 - BATMAN IV: reject oversized TVLV packets

 - tp_meter: fix race condition in send error reporting

 - tp_meter: avoid role confusion in tp_list

 - mcast: fix use-after-free in orig_node RCU release

 - BATMAN IV: recover OGM scheduling after forward packet error

 - bla: fix report_work leak on backbone_gw purge

 - bla: avoid double decrement of bla.num_requests

 - bla: avoid NULL-ptr deref for claim via dropped interface

* tag 'batadv-net-pullrequest-20260520' of https://git.open-mesh.org/batadv:
  batman-adv: bla: avoid NULL-ptr deref for claim via dropped interface
  batman-adv: bla: avoid double decrement of bla.num_requests
  batman-adv: bla: fix report_work leak on backbone_gw purge
  batman-adv: iv: recover OGM scheduling after forward packet error
  batman-adv: mcast: fix use-after-free in orig_node RCU release
  batman-adv: tp_meter: avoid role confusion in tp_list
  batman-adv: tp_meter: fix race condition in send error reporting
  batman-adv: tvlv: reject oversized TVLV packets
  batman-adv: tvlv: abort OGM send on tvlv append failure
  batman-adv: v: stop OGMv2 on disabled interface
  batman-adv: fix batadv_skb_is_frag() kernel-doc
====================

Link: https://patch.msgid.link/20260520115422.53552-1-sw@simonwunderlich.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-05-21 15:59:11 +02:00
Stefano Garzarella c6087c5aaa vsock/virtio: fix skb overhead accounting to preserve full buf_alloc
After commit 059b7dbd20 ("vsock/virtio: fix potential unbounded skb
queue"), virtio_transport_inc_rx_pkt() subtracts per-skb overhead from
buf_alloc when checking whether a new packet fits. This reduces the
effective receive buffer below what the user configured via
SO_VM_SOCKETS_BUFFER_SIZE, causing legitimate data packets to be
silently dropped and applications that rely on the full buffer size
to deadlock.

Also, the reduced space is not communicated to the remote peer, so
its credit calculation accounts more credit than the receiver will
actually accept, causing data loss (there is no retransmission).

With this approach we currently have failures in
tools/testing/vsock/vsock_test.c. Test 18 sometimes fails, while
test 22 always fails in this way:
    18 - SOCK_STREAM MSG_ZEROCOPY...hash mismatch

    22 - SOCK_STREAM virtio credit update + SO_RCVLOWAT...send failed:
    Resource temporarily unavailable

Fix by allowing at most `buf_alloc * 2` as the total budget for payload
plus skb overhead in virtio_transport_inc_rx_pkt(), similar to how
SO_RCVBUF is doubled to reserve space for sk_buff metadata.
This preserves the full buf_alloc for payload under normal operation,
while still bounding the skb queue growth.

With this patch, all tests in tools/testing/vsock/vsock_test.c are
now passing again.

Fixes: 059b7dbd20 ("vsock/virtio: fix potential unbounded skb queue")
Cc: stable@vger.kernel.org
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20260518090656.134588-3-sgarzare@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-05-21 13:14:01 +02:00
Stefano Garzarella a4f0b00178 vsock/virtio: reset connection on receiving queue overflow
When there is no more space to queue an incoming packet, the packet is
silently dropped. This causes data loss without any notification to
either peer, since there is no retransmission.

Under normal circumstances, this should never happen. However, it could
happen if the other peer doesn't respect the credit, or if the skb
overhead, which we recently began to take into account with commit
059b7dbd20 ("vsock/virtio: fix potential unbounded skb queue"),
is too high.

Fix this by resetting the connection and setting the local socket error
to ENOBUFS when virtio_transport_recv_enqueue() can no longer queue a
packet, so both peers are explicitly notified of the failure rather than
silently losing data.

Fixes: ae6fcfbf5f ("vsock/virtio: discard packets if credit is not respected")
Cc: stable@vger.kernel.org
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20260518090656.134588-2-sgarzare@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-05-21 13:14:01 +02:00
Hyunwoo Kim 48f6a5356a net: skbuff: propagate shared-frag marker through frag-transfer helpers
Two frag-transfer helpers (__pskb_copy_fclone() and skb_shift()) fail
to propagate the SKBFL_SHARED_FRAG bit in skb_shinfo()->flags when
moving frags from source to destination.  __pskb_copy_fclone() defers
the rest of the shinfo metadata to skb_copy_header() after copying
frag descriptors, but that helper only carries over gso_{size,segs,
type} and never touches skb_shinfo()->flags; skb_shift() moves frag
descriptors directly and leaves flags untouched.  As a result, the
destination skb keeps a reference to the same externally-owned or
page-cache-backed pages while reporting skb_has_shared_frag() as
false.

The mismatch is harmful in any in-place writer that uses
skb_has_shared_frag() to decide whether shared pages must be detoured
through skb_cow_data().  ESP input is one such writer (esp4.c,
esp6.c), and a single nft 'dup to <local>' rule -- or any other
nf_dup_ipv4() / xt_TEE caller -- is enough to land a pskb_copy()'d
skb in esp_input() with the marker stripped, letting an unprivileged
user write into the page cache of a root-owned read-only file via
authencesn-ESN stray writes.

Set SKBFL_SHARED_FRAG on the destination whenever frag descriptors
were actually moved from the source.  skb_copy() and skb_copy_expand()
share skb_copy_header() too but linearize all paged data into freshly
allocated head storage and emerge with nr_frags == 0, so
skb_has_shared_frag() returns false on its own; they need no change.

The same omission exists in skb_gro_receive() and skb_gro_receive_list().
The former moves the incoming skb's frag descriptors into the
accumulator's last sub-skb via two paths (a direct frag-move loop and
the head_frag + memcpy path); the latter chains the incoming skb whole
onto p's frag_list.  Downstream skb_segment() reads only
skb_shinfo(p)->flags, and skb_segment_list() reuses each sub-skb's
shinfo as the nskb -- both p and lp must carry the marker.

The same omission also exists in tcp_clone_payload(), which builds an
MTU probe skb by moving frag descriptors from skbs on sk_write_queue
into a freshly allocated nskb.  The helper falls into the same family
and warrants the same fix for consistency; no TCP TX-side in-place
writer is currently known to reach a user page through this gap, but
a future consumer depending on the marker would regress silently.

The same omission exists in skb_segment(): the per-iteration flag
merge takes only head_skb's flag, and the inner switch that rebinds
frag_skb to list_skb on head_skb-frags exhaustion does not fold the
new frag_skb's flag into nskb.  Fold frag_skb's flag at both sites
so segments drawing frags from frag_list members carry the marker.

Fixes: cef401de7b ("net: fix possible wrong checksum generation")
Fixes: f4c50a4034 ("xfrm: esp: avoid in-place decrypt on shared skb frags")
Suggested-by: Sabrina Dubroca <sd@queasysnail.net>
Suggested-by: Sultan Alsawaf <sultan@kerneltoast.com>
Suggested-by: Ben Hutchings <ben@decadent.org.uk>
Suggested-by: Lin Ma <malin89@huawei.com>
Suggested-by: Jingguo Tan <tanjingguo@huawei.com>
Suggested-by: Aaron Esau <aaron1esau@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
Tested-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
Link: https://patch.msgid.link/ageeJfJHwgzmKXbh@v4bel
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-05-21 11:31:05 +02:00
Eric Dumazet 1bbf0ced1d tcp: fix stale per-CPU tcp_tw_isn leak enabling ISN prediction
Blamed commit moved the TIME_WAIT-derived ISN from the skb control
block to a per-CPU variable, assuming the value would always be consumed
by tcp_conn_request() for the same packet that wrote it. That assumption
is violated by multiple drop paths between the producer
(__this_cpu_write(tcp_tw_isn, isn) in tcp_v{4,6}_rcv()) and the consumer
(tcp_conn_request()):

 - min_ttl / min_hopcount check
 - xfrm policy check
 - tcp_inbound_hash() MD5/AO mismatch
 - tcp_filter() eBPF/SO_ATTACH_FILTER drop
 - th->syn && th->fin discard in tcp_rcv_state_process() TCP_LISTEN
 - psp_sk_rx_policy_check() in tcp_v{4,6}_do_rcv()
 - tcp_checksum_complete() in tcp_v{4,6}_do_rcv()
 - tcp_v{4,6}_cookie_check() returning NULL

When a packet is dropped on any of these paths, tcp_tw_isn is left set.

The next SYN processed on the same CPU then consumes the non zero value in
tcp_conn_request(), receiving a potentially predictable ISN.

This patch moves back tcp_tw_isn to skb->cb[], getting rid of the per-cpu
variable.

Note that tcp_v{4,6}_fill_cb() do not set it.

Very litle impact on overall code size/complexity:

$ scripts/bloat-o-meter -t vmlinux.old vmlinux.new
add/remove: 0/0 grow/shrink: 2/1 up/down: 8/-15 (-7)
Function                                     old     new   delta
tcp_v6_rcv                                  3038    3042      +4
tcp_v4_rcv                                  3035    3039      +4
tcp_conn_request                            2938    2923     -15
Total: Before=24436060, After=24436053, chg -0.00%

Fixes: 41eecbd712 ("tcp: replace TCP_SKB_CB(skb)->tcp_tw_isn with a per-cpu field")
Reported-by: Chris Mason <clm@meta.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260519084611.2485277-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-20 19:14:06 -07:00
Minh Nguyen 99e22ddf4e vsock/vmci: fix UAF when peer resets connection during handshake
vmci_transport_recv_connecting_server() returned err = 0 for a peer
RST in its default switch arm:

	err = pkt->type == VMCI_TRANSPORT_PACKET_TYPE_RST ? 0 : -EINVAL;

That made vmci_transport_recv_listen() skip vsock_remove_pending(),
leaving the pending socket on the listener's pending_links with
sk_state = TCP_CLOSE while destroy: still dropped the explicit
reference taken before schedule_delayed_work().

One second later vsock_pending_work() observed is_pending=true and
performed full cleanup: vsock_remove_pending() then the two trailing
sock_put(sk) calls -- the first reached refcount 0 and __sk_freed
the socket, and the second wrote into the freed object:

  BUG: KASAN: slab-use-after-free in refcount_warn_saturate
  Write of size 4 at addr ffff88800b1cac80 by task kworker
  Workqueue: events vsock_pending_work

Treat peer RST like any other unexpected packet type (err = -EINVAL).
All destroy: arms now return err < 0, so vmci_transport_recv_listen()
removes pending from pending_links synchronously and
vsock_pending_work() takes the is_pending=false / !rejected branch,
dropping only its own work reference.  This also closes the
multi-packet race Sashiko reported on v2: pending is removed from
the list before any subsequent packet can find it.

The pre-existing sk_acceptq_removed() gap on the err < 0 path of
vmci_transport_recv_listen() that Sashiko also noted is not
introduced or changed by this patch.

Tested on lts-6.12.79 with KASAN: 52/100 unpatched -> 0/100 patched.

Fixes: d021c34405 ("VSOCK: Introduce VM Sockets")
Cc: stable@vger.kernel.org
Signed-off-by: Minh Nguyen <minhnguyen.080505@gmail.com>
Acked-by: Bryan Tan <bryan-bt.tan@broadcom.com>
Link: https://patch.msgid.link/20260519102310.237181-1-minhnguyen.080505@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-20 19:11:18 -07:00
Eric Dumazet e4bdef4d32 ipv4: use WARN_ON_ONCE() in ip_rt_bug()
It turns out ip_rt_bug() can be called more than expected.

syzbot will still panic (because of panic_on_warn=1), but non debug
kernels will no longer die while repeating stack traces on the console.

Fixes: c378a9c019 ("ipv4: Give backtrace in ip_rt_bug().")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Link: https://patch.msgid.link/20260519193248.4018872-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-20 19:00:36 -07:00
Eric Dumazet 7eb72c1e39 ipv4: icmp: reject broadcast/multicast routes
syzbot was able to trigger ip_rt_bug() in a loop, using an IPv4 packet
with a crafted IPOPT_SSRR option:

  options: ipv4_options {
    options: array[ipv4_option] {
      union ipv4_option {
        ssrr: ipv4_option_route[IPOPT_SSRR] {
         type: const = 0x89 (1 bytes)
         length: len = 0x7 (1 bytes)
         pointer: int8 = 0xa2 (1 bytes)
         data: array[ipv4_addr] {
           union ipv4_addr {
             broadcast: const = 0xffffffff (4 bytes)
           }
         }
       }
     }

Change __icmp_send() to not send ICMP to broadcast/multicast destinations.

Fixes: c378a9c019 ("ipv4: Give backtrace in ip_rt_bug().")
Reported-by: syzbot+c13a57c2639c2c0d03a6@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/6a0cc169.170a0220.1f6c2d.0004.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20260519200836.4141061-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-20 19:00:02 -07:00
David Carlier 4eb82ba543 net: devmem: reject dma-buf bind with non-page-aligned size or SG length
net_devmem_bind_dmabuf() trusts dmabuf->size and sg_dma_len() to be
PAGE_SIZE multiples without checking:

  - tx_vec is sized dmabuf->size / PAGE_SIZE, and
    net_devmem_get_niov_at() only bounds-checks virt_addr < dmabuf->size
    before indexing tx_vec[virt_addr / PAGE_SIZE]. With size =
    N*PAGE_SIZE + r (1 <= r < PAGE_SIZE), sendmsg() at iov_base =
    N*PAGE_SIZE passes the bound check and reads tx_vec[N] -- one past.

  - owner->area.num_niovs = len / PAGE_SIZE while gen_pool_add_owner()
    covers the full byte len, so a non-page-multiple non-final sg
    desyncs num_niovs from the gen_pool region for every later sg, on
    both RX and TX.

dma-buf does not require page-aligned sizes, so the bind path has to
enforce what its own indexing assumes. Reject both with -EINVAL.

The size check is TX-only (only tx_vec is sized off dmabuf->size); the
SG-length check covers both directions.

Fixes: bd61848900 ("net: devmem: Implement TX path")
Cc: stable@vger.kernel.org
Signed-off-by: David Carlier <devnexen@gmail.com>
Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20260519203530.66310-1-devnexen@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-20 18:59:01 -07:00
Jakub Kicinski 5027c886e2 bluetooth pull request for net:
- hci_sync: Fix not setting mask for HCI_EVT_LE_ALL_REMOTE_FEATURES_COMPLETE
  - L2CAP: fix UAF in l2cap_sock_cleanup_listen() vs l2cap_conn_del()
  - ISO: drop ISO_END frames received without prior ISO_START
  - MGMT: validate Add Extended Advertising Data length
  - bnep: Fix UAF read of dev->name
  - btmtk: fix urb->setup_packet leak in error paths
  - btintel_pcie: Fix incorrect MAC access programming
  - hci_uart: fix UAFs and race conditions in close and init paths
 -----BEGIN PGP SIGNATURE-----
 
 iQJNBAABCgA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmoOHbkZHGx1aXoudm9u
 LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKUlGD/9k81ghY4sS0chhP0sklpWm
 V5hi01rhlmYkNJY+CWG24wRc+OjgTzRLdeEJCOkLSkXj0X8lWtN4LhaVthVWoq+5
 jCyNm5t95WMKYS+iBh1Qsd9jdPyruJhCmeQPJ8OEjrer5o/XZHphRfGi4hJEVNfO
 r8x5vUnAHkdzRIH5UeHaJ01TV+hDIacin9+Yls2EMdEQZ/bpGIZUuDwcpQkApKr2
 DMPigwF/Acy0mpXKtuJ5DAeF1c6JvGhskW7mhf0Zn2snuDdqolSY0r3llBx5WyB2
 KvnZ7o38uShTw2Qv2ZYpO4ETEDyi36UxAXXkVAVZlv5n8cpG17DSb7b7koFIDUhi
 KSFyes4QW+3fJDLFKapI7mwvpg4m9VfUOUh8WedqFzdYTNjqS1I77aT+n+6LpELg
 zqAhDAAYzvtp2EZ3x8KEHJmM1u2/2ZS/IPAGNP8LmYDlZsTYNF428JcVNwa7sZwS
 AOnp0JNI4+efO+bB3JZgS3ChEVUl8jJ6ccxTiSBTPCHZmHK2zFT48xodA2xi6Sd0
 OGb/IUzvVtSg0IB1Q8Z0cHq7GcrhvK1TrxzvMsdnzKoNRPR+yyHvvePBPks/TusK
 URRYKoEwhuIQywrIF7lbd2NcXpaUAXZz8lQ6IDrQ5jFrtF85TfsW2SphKs3/6xxz
 Ov/8zQ/z241bU2fp8d/gOA==
 =9q+B
 -----END PGP SIGNATURE-----

Merge tag 'for-net-2026-05-20' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth

Luiz Augusto von Dentz says:

====================
bluetooth pull request for net:

 - hci_sync: Fix not setting mask for HCI_EVT_LE_ALL_REMOTE_FEATURES_COMPLETE
 - L2CAP: fix UAF in l2cap_sock_cleanup_listen() vs l2cap_conn_del()
 - ISO: drop ISO_END frames received without prior ISO_START
 - MGMT: validate Add Extended Advertising Data length
 - bnep: Fix UAF read of dev->name
 - btmtk: fix urb->setup_packet leak in error paths
 - btintel_pcie: Fix incorrect MAC access programming
 - hci_uart: fix UAFs and race conditions in close and init paths

* tag 'for-net-2026-05-20' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
  Bluetooth: fix UAF in l2cap_sock_cleanup_listen() vs l2cap_conn_del()
  Bluetooth: hci_uart: fix UAFs and race conditions in close and init paths
  Bluetooth: MGMT: validate Add Extended Advertising Data length
  Bluetooth: btmtk: fix urb->setup_packet leak in error paths
  Bluetooth: ISO: drop ISO_END frames received without prior ISO_START
  Bluetooth: btintel_pcie: Fix incorrect MAC access programming
  Bluetooth: hci_sync: Fix not setting mask for HCI_EVT_LE_ALL_REMOTE_FEATURES_COMPLETE
  Bluetooth: bnep: Fix UAF read of dev->name
====================

Link: https://patch.msgid.link/20260520204959.2902497-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-20 17:26:56 -07:00
Xingwang Xiang ddf8029623 bpf, skmsg: fix verdict sk_data_ready racing with ktls rx
sk_psock_strp_data_ready() already checks tls_sw_has_ctx_rx() and
defers to psock->saved_data_ready when a TLS RX context is present,
avoiding a conflict with the TLS strparser's ownership of the receive
queue (commit e91de6afa8, "bpf: Fix running sk_skb program types
with ktls").

sk_psock_verdict_data_ready() has no equivalent guard.  When a socket
is inserted into a sockmap (BPF_SK_SKB_VERDICT) before TLS RX is
configured, tls_sw_strparser_arm() saves sk_psock_verdict_data_ready
as rx_ctx->saved_data_ready.  On data arrival:

  tls_data_ready -> tls_strp_data_ready -> tls_rx_msg_ready
    -> saved_data_ready() = sk_psock_verdict_data_ready()
      -> tcp_read_skb() drains sk_receive_queue via __skb_unlink()
         without calling tcp_eat_skb(), so copied_seq is not advanced.

tls_strp_msg_load() then finds tcp_inq() >= full_len (stale), calls
tcp_recv_skb() on the now-empty queue, hits WARN_ON_ONCE(!first), and
returns with rx_ctx->strp.anchor.frag_list pointing at a psock-owned
(potentially freed) skb.  tls_decrypt_sg() subsequently walks that
frag_list: use-after-free.

Apply the same fix as sk_psock_strp_data_ready(): if a TLS RX context
is present, call psock->saved_data_ready (sock_def_readable) to wake
recv() waiters and return immediately, leaving the receive queue
untouched.  TLS retains sole ownership of the queue and decrypts the
record normally through tls_sw_recvmsg().

Fixes: ef5659280e ("bpf, sockmap: Allow skipping sk_skb parser program")
Signed-off-by: Xingwang Xiang <v3rdant.xiang@gmail.com>
Link: https://patch.msgid.link/20260517145630.20521-2-v3rdant.xiang@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-20 17:21:21 -07:00
David Howells 8bfab4b6ff rxrpc: Fix RESPONSE packet verification to extract skb to a linear buffer
This improves the fix for CVE-2026-43500.

Fix the verification of RESPONSE packets to avoid the problem of
overwriting a RESPONSE packet sent via splice to a local address by
extracting the contents of the UDP packet into a kmalloc'd linear buffer
rather than decrypting the data in place in the sk_buff (which may corrupt
the original buffer).

Fixes: 24481a7f57 ("rxrpc: Fix conn-level packet handling to unshare RESPONSE packets")
Reported-by: Hyunwoo Kim <imv4bel@gmail.com>
Closes: https://lore.kernel.org/r/afKV2zGR6rrelPC7@v4bel/
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Simon Horman <horms@kernel.org>
cc: Jiayuan Chen <jiayuan.chen@linux.dev>
cc: linux-afs@lists.infradead.org
cc: stable@kernel.org
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
Tested-by: Marc Dionne <marc.dionne@auristor.com>
Link: https://patch.msgid.link/20260515230516.2718212-4-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-20 16:36:45 -07:00
David Howells d2bc90cf6c rxrpc: Fix DATA decrypt vs splice() by copying data to buffer in recvmsg
This improves the fix for CVE-2026-43500.

Fix the pagecache corruption from in-place decryption of a DATA packet
transmitted locally by splice() by getting rid of the packet sharing in the
I/O thread and unconditionally extracting the packet content into a bounce
buffer in which the buffer is decrypted.  recvmsg() (or the kernel
equivalent) then copies the data from the bounce buffer to the destination
buffer.  The sk_buff then remains unmodified.

This has an additional advantage in that the packet is then arranged in the
buffer with the correct alignment required for the crypto algorithms to
process directly.  The performance of the crypto does seem to be a little
faster and, surprisingly, the unencrypted performance doesn't seem to
change much - possibly due to removing complexity from the I/O thread.

Yet another advantage is that the I/O thread doesn't have to copy packets
which would slow down packet distribution, ACK generation, etc..

The buffer belongs to the call and is allocated initially at 2K,
sufficiently large to hold a whole jumbo subpacket, but the buffer will be
increased in size if needed.  However, to take this work, MSG_PEEK may
cause a later packet to be decrypted into the buffer, in which case the
earlier one will need re-decrypting for a subsequent recvmsg().

Note that rx_pkt_offset may legitimately see 0 as a valid offset now, so
switch to using USHRT_MAX to indicate an invalid offset.

Note also that I would generally prefer to replace the buffers of the
current sk_buff with a new kmalloc'd buffer of the right size, ditching the
old data and frags as this makes the handling of MSG_PEEK easier and
removes the re-decryption issue, but this looks like quite a complicated
thing to achieve.  skb_morph() looks half way to what I want, but I don't
want to have to allocate a new sk_buff.

Fixes: d0d5c0cd1e ("rxrpc: Use skb_unshare() rather than skb_cow_data()")
Reported-by: Hyunwoo Kim <imv4bel@gmail.com>
Closes: https://lore.kernel.org/r/afKV2zGR6rrelPC7@v4bel/
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Simon Horman <horms@kernel.org>
cc: Jiayuan Chen <jiayuan.chen@linux.dev>
cc: linux-afs@lists.infradead.org
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
Tested-by: Marc Dionne <marc.dionne@auristor.com>
Link: https://patch.msgid.link/20260515230516.2718212-3-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-20 16:36:45 -07:00
David Howells 2b50aceafe crypto/krb5, rxrpc: Fix lack of pre-decrypt/pre-verify length checks
Change the krb5 crypto library to provide facilities to precheck the length
of the message about to be decrypted or verified.

Fix AF_RXRPC to make use of this to validate DATA packets secured with
RxGK.

Fixes: 9d1d2b5934 ("rxrpc: rxgk: Implement the yfs-rxgk security class (GSSAPI)")
Closes: https://sashiko.dev/#/patchset/20260511160753.607296-1-dhowells%40redhat.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Herbert Xu <herbert@gondor.apana.org.au>
cc: Simon Horman <horms@kernel.org>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: linux-afs@lists.infradead.org
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
Tested-by: Marc Dionne <marc.dionne@auristor.com>
Link: https://patch.msgid.link/20260515230516.2718212-2-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-20 16:36:45 -07:00
Jakub Kicinski b8d7519352 net: shaper: rework the VALID marking (again)
Recent commit changed the semantics from NOT_VALID to VALID.
I didn't realize that the flags are not stored atomically
with the entry in XArray. There's still a race of reader
observing a VALID mark for a slot, getting interrupted,
writer replacing the entry with a different one, reader
continuing, fetching the entry which is now a different
pointer than the pointer for which VALID was meant.

The biggest consequence of this is that we may see a UAF
since net_shaper_rollback() assumed that entries without
VALID can be freed without observing RCU.

Looks like the XArray marks are buying us nothing at this
point. Let's convert the code to an explicit valid field.
The smp_load_acquire() / smp_store_release() barriers are
marginally cleaner.

Reported-by: Sashiko <sashiko-bot@kernel.org>
Fixes: 93954b40f6 ("net-shapers: implement NL set and delete operations")
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260515221325.1685455-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-20 16:34:20 -07:00
Jakub Kicinski a3442936dd net: shaper: annotate the data races
As previously discussed we don't care about making the shaper
state fully RCU-compliant because the hierarchy itself can't
be dumped in one go over Netlink. Let's annotate the reads
and writes to make that clear.

The field-by-field assignments will also be useful for the
next commit which adds explicit "valid" field (which we don't
want to override with the current full struct assignment).

Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260515221325.1685455-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-20 16:34:20 -07:00
Gal Pressman 78effd896e udp: Fix UDP length on last GSO_PARTIAL segment
Following the cited commit, __udp_gso_segment() writes single MSS length
in the UDP header.
The cited patch doesn't account for the fact that the last segment could
be a GSO skb by itself. This could happen when the size of the packet is
a multiple of MSS, hence the first segment is also the last one (there
is no need for a remainder skb).

When the post-loop segment is a GSO skb, assign the single MSS length in
the UDP header.

Fixes: b10b446ce7 ("udp: gso: Use single MSS length in UDP header for GSO_PARTIAL")
Reported-by: Matthew Schwartz <matthew.schwartz@linux.dev>
Closes: https://lore.kernel.org/all/6c3fb15e-711d-4b8d-b152-e03d9b05293f@linux.dev/
Tested-by: Matthew Schwartz <matthew.schwartz@linux.dev>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20260518062250.3019914-3-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-20 15:03:47 -07:00
Alice Mikityanska 5f17ae0f59 udp: gso: Fix handling checksum in __udp_gso_segment
The cited commit started using msslen for uh->len, but still uses newlen
to adjust uh->check. Although the checksum is ignored in most cases due
to the hardware offload, __udp_gso_segment attempts to maintain the
correct one. Fix uh->check and adjust it by the right value.

Additionally, after the fix, newlen becomes assigned and unused before
the loop. The code can be simplified a bit if mss adjustment is dropped,
so that newlen becomes equal to msslen before the loop, and msslen can
be also dropped, saving a few lines of code.

This brings us back to one variable, drops an unneeded arithmetic for
mss, and fixes the UDP checksum.

Fixes: b10b446ce7 ("udp: gso: Use single MSS length in UDP header for GSO_PARTIAL")
Signed-off-by: Alice Mikityanska <alice@isovalent.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20260518062250.3019914-2-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-20 15:02:47 -07:00
Safa Karakuş ab1513597c Bluetooth: fix UAF in l2cap_sock_cleanup_listen() vs l2cap_conn_del()
bt_accept_dequeue() unlinks a not-yet-accepted child from the parent
accept queue and release_sock()s it before returning, so the returned
sk has no caller reference and is unlocked.

l2cap_sock_cleanup_listen() walks these children on listening-socket
close.  A concurrent HCI disconnect drives hci_rx_work ->
l2cap_conn_del() which runs l2cap_chan_del() + l2cap_sock_kill() and
frees the child sk and its l2cap_chan; cleanup_listen() then uses both:

  BUG: KASAN: slab-use-after-free in l2cap_sock_kill
    l2cap_sock_kill / l2cap_sock_cleanup_listen / __x64_sys_close
  Freed by: l2cap_conn_del -> l2cap_sock_close_cb -> l2cap_sock_kill

This is distinct from the two fixes already in this area: commit
e83f5e24da ("Bluetooth: serialize accept_q access") serialises the
accept_q list/poll and takes temporary refs inside bt_accept_dequeue(),
and CVE-2025-39860 serialises the userspace close()/accept() race by
calling cleanup_listen() under lock_sock() in l2cap_sock_release().
Neither covers l2cap_conn_del() running from hci_rx_work, so this UAF
still reproduces on current bluetooth/master.

Take the reference at the source: bt_accept_dequeue() does sock_hold()
while sk is still locked, before release_sock(); callers sock_put().
cleanup_listen() pins the chan with l2cap_chan_hold_unless_zero() under
a brief child sk lock (serialising vs l2cap_sock_teardown_cb()), drops
it before l2cap_chan_lock(), and skips a duplicate l2cap_sock_kill() on
SOCK_DEAD.  conn->lock is not taken here: cleanup_listen() runs under
the parent sk lock and that would invert
conn->lock -> chan->lock -> sk_lock (lockdep).

KASAN/SMP: an unprivileged listen/close vs HCI-disconnect race produced
12 use-after-free reports per run before this change; 0, and no lockdep
report, over 1600+ raced iterations after it on bluetooth/master.

Fixes: 15f02b9105 ("Bluetooth: L2CAP: Add initial code for Enhanced Credit Based Mode")
Cc: stable@vger.kernel.org
Reported-by: Siwei Zhang <oss@fourdim.xyz>
Reviewed-by: Siwei Zhang <oss@fourdim.xyz>
Signed-off-by: Safa Karakuş <safa.karakus@secunnix.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2026-05-20 16:35:47 -04:00
Michael Bommarito d3f7d17960 Bluetooth: MGMT: validate Add Extended Advertising Data length
MGMT_OP_ADD_EXT_ADV_DATA is registered as a variable-length command,
with MGMT_ADD_EXT_ADV_DATA_SIZE as the fixed header size.  The handler
then uses cp->adv_data_len and cp->scan_rsp_len to validate and copy
cp->data, but it never checks that those bytes are part of the mgmt
command payload.

A short command can therefore make add_ext_adv_data() pass an
out-of-bounds pointer into tlv_data_is_valid().  If the bytes beyond
the command buffer are addressable, they can also be copied into the
advertising instance as scan response data, where the caller can read
them back via MGMT_OP_GET_ADV_INSTANCE.  The trigger requires
CAP_NET_ADMIN in the initial user namespace; KASAN reports an 8-byte
slab-out-of-bounds read.

Reject commands whose length does not match the fixed header plus both
advertising data lengths before parsing cp->data.

Fixes: 1241057283 ("Bluetooth: Break add adv into two mgmt commands")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2026-05-20 16:35:47 -04:00
David Carlier 84c24fb151 Bluetooth: ISO: drop ISO_END frames received without prior ISO_START
ISO data PDUs carry a packet-boundary flag indicating START, CONT, END
or SINGLE. The ISO_CONT branch of iso_recv() guards against a missing
ISO_START by checking conn->rx_len before touching conn->rx_skb, but
ISO_END does not.

If a peer sends an ISO_END as the first packet on a fresh ISO
connection, conn->rx_skb is still NULL and conn->rx_len is zero, so
skb_put(conn->rx_skb, ...) dereferences NULL and oopses. For BIS,
where receivers sync to a broadcaster without pairing, any broadcaster
on the air can trigger this.

Mirror the ISO_CONT check at the top of ISO_END so a stray end fragment
is logged and dropped instead of crashing the host.

Fixes: ccf74f2390 ("Bluetooth: Add BTPROTO_ISO socket type")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2026-05-20 16:35:47 -04:00
Luiz Augusto von Dentz 23d528d817 Bluetooth: hci_sync: Fix not setting mask for HCI_EVT_LE_ALL_REMOTE_FEATURES_COMPLETE
This fixes not setting the bit for HCI_EVT_LE_ALL_REMOTE_FEATURES_COMPLETE
when extended features bit is set otherwise the controller may not
generate HCI_EVT_LE_ALL_REMOTE_FEATURES_COMPLETE causing
hci_le_read_all_remote_features_sync to timeout waiting for it.

Also remove dead code.

Fixes: a106e50be7 ("Bluetooth: HCI: Add support for LL Extended Feature Set")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2026-05-20 16:35:47 -04:00
Jann Horn 59e932ded9 Bluetooth: bnep: Fix UAF read of dev->name
bnep_add_connection() needs to keep holding the bnep_session_sem while
reading dev->name (just like bnep_get_connlist() does); otherwise the
bnep_session() thread can concurrently free the net_device, which can for
example be triggered by a concurrent bnep_del_connection().

(This UAF is fairly uninteresting from a security perspective;
calling bnep_add_connection() requires passing a capable(CAP_NET_ADMIN)
check. It also requires completely tearing down a netdev during a fairly
tight race window.)

Cc: stable@vger.kernel.org
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2026-05-20 16:35:47 -04:00
Kartik Nair dc14686f27 wifi: cfg80211: wext: validate chandef in monitor mode
cfg80211_wext_siwfreq() constructs a channel definition for monitor
mode but passes it to cfg80211_set_monitor_channel() without first
validating it with cfg80211_chandef_valid(). This causes a WARN_ON
in cfg80211_chandef_dfs_required() when it receives an invalid chandef.

Add the missing cfg80211_chandef_valid() check before calling
cfg80211_set_monitor_channel() to return -EINVAL early on invalid
channel definitions, consistent with how other callers handle this.

Reported-by: syzbot+02a1a03b8622d3c7d1c9@syzkaller.appspotmail.com
Signed-off-by: Kartik Nair <contact.kartikn@gmail.com>
Link: https://patch.msgid.link/20260510202437.7857-1-contact.kartikn@gmail.com
[clarify subject]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-05-20 11:44:19 +02:00
Michael Bommarito a6e6ccd5bd wifi: mac80211: consume only present negotiated TTLM maps
ieee80211_tid_to_link_map_size_ok() validates negotiated TTLM elements
against the number of link-map entries indicated by link_map_presence.
ieee80211_parse_neg_ttlm() must consume the same layout.

The parser advanced its cursor for every TID, including TIDs whose
presence bit is clear and therefore have no map bytes in the element.
A sparse map can then make a later present TID read past the validated
element.

The bad bytes land in neg_ttlm->{up,down}link[tid] but are gated by
valid_links before being applied to driver state, so a peer cannot
turn the read into a policy change.  Under KUnit + KASAN with an
exact-sized element allocation the OOB read is reported as a
slab-out-of-bounds; whether the same trigger fires under the
production RX path depends on surrounding allocator state.

Advance the cursor only when the current TID has a map present.

Fixes: 8f500fbc6c ("wifi: mac80211: process and save negotiated TID to Link mapping request")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Link: https://patch.msgid.link/20260515151719.1317659-2-michael.bommarito@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-05-20 11:20:37 +02:00
Zhao Li d71c841be5 wifi: mac80211: capture fast-RX rate before mesh reuses skb->cb
ieee80211_invoke_fast_rx() reads RX status through
IEEE80211_SKB_RXCB(skb), which aliases the same skb->cb storage
that ieee80211_rx_mesh_data() reuses as IEEE80211_TX_INFO.  In the
unicast forward path, mesh_data does:

	info = IEEE80211_SKB_CB(fwd_skb);
	memset(info, 0, sizeof(*info));

on the same skb the caller still names via rx->skb, then either
queues the skb for TX (success) or kfree_skb()'s it (no-route)
before returning RX_QUEUED.  The caller's RX_QUEUED arm then
calls sta_stats_encode_rate(status) on memory that is either
zeroed (success path) or freed (no-route path).  The latter is
KASAN slab-use-after-free in ieee80211_prepare_and_rx_handle.

Fix by encoding the rate from status before invoking
ieee80211_rx_mesh_data(), so the RX_QUEUED arm consumes a value
captured while status was still backed by valid memory.

Fixes: 3468e1e0c6 ("wifi: mac80211: add mesh fast-rx support")
Cc: stable@vger.kernel.org
Signed-off-by: Zhao Li <enderaoelyther@gmail.com>
Link: https://patch.msgid.link/20260509043427.60322-2-enderaoelyther@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-05-20 11:19:53 +02:00
Johannes Berg fe2d61a5d2 wifi: mac80211: fix multi-link element inheritance
When parsing a beacon, mac80211 erroneously inherits any
reconfiguration or EPCS multi-link elements from the outer
elements into the multi-BSSID profile that's requested, if
connected to a non-transmitted BSS, unless that profile
has a non-inheritance element.

This also happens if parsing a multi-BSSID profile that
doesn't have a non-inheritance element.

Fix this by having an empty non-inheritance element so
cfg80211_is_element_inherited() is invoked in these cases
and causes the parser to skip the elements that should
never be inherited.

Fixes: cf36cdef10 ("wifi: mac80211: Add support for parsing Reconfiguration Multi Link element")
Fixes: 24711d60f8 ("wifi: mac80211: Support parsing EPCS ML element")
Reviewed-by: Ilan Peer <ilan.peer@intel.com>
Reviewed-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20260508091032.92184c0a3f08.I3c43b0b63d2cef8a4ddddaef1c2faaeb1de711ad@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-05-20 11:19:53 +02:00
Johannes Berg a74e893f30 wifi: mac80211: fix MLE defragmentation
If either reconf or EPCS multi-link element (MLE) is contained in
a non-transmitted profile, the defragmentation routine is called
with a pointer to the defragmented copy, but the original elements.

This is incorrect for two reasons:
 - if the original defragmentation was needed, it will not find the
   correct data
 - if the original frame is at a higher address, the parsing will
   potentially overrun the heap data (though given the layout of
   the buffers, only into the new defragmentation buffer, and then
   it has to stop and fail once that's filled with copied data.

Fix it by tracking the container along with the pointer and in
doing so also unify the two almost identical defragmentation
routines.

Fixes: 4d70e9c548 ("wifi: mac80211: defragment reconfiguration MLE when parsing")
Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com>
Reviewed-by: Ilan Peer <ilan.peer@intel.com>
Link: https://patch.msgid.link/20260508091031.8a6c34613178.I4de16ebbce2d27f2f8f98fc49949c7a376c2fe8d@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-05-20 11:19:52 +02:00
Emmanuel Grumbach e1e83feb8e wifi: mac80211: don't override max_amsdu_subframes
In client mode, the extended capabilities are handled by the kernel
looking at the association frame.  When the supplicant installs the keys
it calls sta_apply_parameters and it doesn't include the extended
capabilities since those can't change after association.
As a result, we overrode the max_amsdu_subframes that we set after
association.

Check that the ext_capa coming from the user space is valid before
looking at it. If the ext_capa is NULL, it really means that the
extended capabilities are not changed (as opposed to cleared).

The default value for max_amsdu_subframes is 0, which means there is no
limit. This value is valid and in case the association response frame
does not have extended capabilities, this is the value we should use.

Closes: https://bugzilla.kernel.org/show_bug.cgi?id=221079
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20260513170623.828dbb58c782.Ifd2bfc190c26140e919127adb02ffddd7b551499@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-05-20 11:14:41 +02:00
Alexandru Hossu f718506edd wifi: mac80211: bounds-check link_id in ieee80211_ml_epcs
IEEE80211_MLE_STA_EPCS_CONTROL_LINK_ID is 0x000f, so link_id extracted
from a PRIO_ACCESS ML element PER_STA_PROFILE subelement can be 0..15.
sdata->link[] has IEEE80211_MLD_MAX_NUM_LINKS (15) entries (indices 0..14),
making index 15 out-of-bounds.

A connected WiFi 7 AP can trigger this by sending an EPCS Enable Response
action frame with a PER_STA_PROFILE subelement where link_id = 15.  The
unsolicited-notification path (dialog_token = 0) is reachable any time
EPCS is already enabled, without any prior client request.

sdata->link[15] reads into the first word of sdata->activate_links_work
(a wiphy_work whose embedded list_head is non-NULL after INIT_LIST_HEAD),
so the NULL check on the result does not catch the invalid access.  The
garbage pointer is then passed to ieee80211_sta_wmm_params(), which
dereferences link->sdata and crashes the kernel.

The same class of bug was fixed for ieee80211_ml_reconfiguration() by
commit 162d331d83 ("wifi: mac80211: bounds-check link_id in
ieee80211_ml_reconfiguration").

Fixes: de86c5f608 ("wifi: mac80211: Add support for EPCS configuration")
Signed-off-by: Alexandru Hossu <hossu.alexandru@gmail.com>
Link: https://patch.msgid.link/20260515102908.1653088-1-hossu.alexandru@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-05-20 11:04:17 +02:00
Jann Horn be309f8eae af_unix: Fix UAF read of tail->len in unix_stream_data_wait()
unix_stream_data_wait() does skb_peek_tail(&sk->sk_receive_queue) without
holding any lock that prevents SKBs on that queue from being dequeued and
freed.
This has been the case since commit 79f632c71b ("unix/stream: fix
peeking with an offset larger than data in queue").
The first consequence of this is that the pointer comparison
`tail != last` can be false even if `last` semantically refers to an
already-freed SKB while `tail` is a new SKB allocated at the same address;
which can cause unix_stream_data_wait() to wrongly keep blocking after new
data has arrived, but only in a weird scenario where a peeking recv() and
a normal recv() on the same socket are racing, which is probably not a
real problem.

But since commit 2b514574f7 ("net: af_unix: implement splice for stream
af_unix sockets"), `tail` is actually dereferenced, which can cause UAF in
the following race scenario (where test_setup() runs single-threaded,
and afterwards, test_thread1() and test_thread2() run concurrently in
two threads:
```
static int socks[2];
void test_setup(void) {
  socketpair(AF_UNIX, SOCK_STREAM, 0, socks);
  send(socks[1], "A", 1, 0);
  int peekoff = 1;
  setsockopt(socks[0], SOL_SOCKET, SO_PEEK_OFF, &peekoff, sizeof(peekoff));
}
void test_thread1(void) {
  char dummy;
  recv(socks[0], &dummy, 1, MSG_PEEK);
}
void test_thread2(void) {
  char dummy;
  recv(socks[0], &dummy, 1, 0);
  shutdown(socks[1], SHUT_WR);
}
```

when racing like this:
```
thread1                       thread2
unix_stream_read_generic
  mutex_lock(&u->iolock)
  skb_peek(&sk->sk_receive_queue)
  skb_peek_next(skb, &sk->sk_receive_queue)
  mutex_unlock(&u->iolock)
                              unix_stream_read_generic
                                unix_state_lock(sk)
                                skb_peek(&sk->sk_receive_queue)
                                unix_state_unlock(sk)
  unix_stream_data_wait
    unix_state_lock(sk)
    tail = skb_peek_tail(&sk->sk_receive_queue)
                                spin_lock(&sk->sk_receive_queue.lock)
                                __skb_unlink(skb, &sk->sk_receive_queue)
                                spin_unlock(&sk->sk_receive_queue.lock)
                                consume_skb(skb) [frees the SKB]
    `tail != last`: false
    `tail`: true
    `tail->len != last_len` ***UAF***
```

Fix the UAF by removing the read of tail->len; checking tail->len would
only make sense if SKBs in the receive queue of a UNIX socket could grow,
which can no longer happen.

Kuniyuki explained:

> When commit 869e7c6248 ("net: af_unix: implement stream sendpage
> support") added sendpage() support, data could be appended to the last
> skb in the receiver's queue.
>
> That's why we needed to check if the length of the last skb was changed
> while waiting for new data in unix_stream_data_wait().
>
> However, commit a0dbf5f818 ("af_unix: Support MSG_SPLICE_PAGES") and
> commit 57d44a354a ("unix: Convert unix_stream_sendpage() to use
> MSG_SPLICE_PAGES") refactored sendmsg(), and now data is always added
> to a new skb.

That means this fix is not suitable for kernels before 6.5.

Fixes: 2b514574f7 ("net: af_unix: implement splice for stream af_unix sockets")
Cc: stable@vger.kernel.org # 6.5.x
Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260518-b4-unix-recv-wait-hotfix-v2-1-83e29ce8ad31@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-19 18:53:56 -07:00
Justin Iurman d4ea0dfd75 ipv6: ioam: add NULL check for idev in ipv6_hop_ioam()
Reported by Sashiko:

The function ipv6_hop_ioam() accesses
__in6_dev_get(skb->dev)->cnf.ioam6_enabled without validating the returned
idev pointer. Because addrconf_ifdown() can concurrently clear dev->ip6_ptr
via RCU, __in6_dev_get() can return NULL during interface teardown, which
could cause a NULL pointer dereference when processing an IOAM Hop-by-Hop
option.

Let's add a check and use SKB_DROP_REASON_IPV6DISABLED accordingly.

Fixes: 9ee11f0fff ("ipv6: ioam: Data plane support for Pre-allocated Trace")
Cc: stable@vger.kernel.org
Signed-off-by: Justin Iurman <justin.iurman@gmail.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260517183059.29140-1-justin.iurman@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-19 18:46:44 -07:00
Ido Schimmel 4df78ff026 bridge: mcast: Fix a possible use-after-free when removing a bridge port
When per-VLAN multicast snooping is enabled, the bridge iterates over
all the bridge ports, disables the per-port multicast context on each
port and enables the per-{port, VLAN} multicast contexts instead. The
reverse happens when per-VLAN multicast snooping is disabled.

When global multicast snooping is enabled, the bridge iterates over all
the bridge ports and enables the per-port multicast context on each
port. The reverse happens when multicast snooping is disabled.

The above scheme can result in a situation where both types of contexts
(per-port and per-{port, VLAN}) are enabled on a single bridge port:

 # ip link add name br1 up type bridge mcast_snooping 1 mcast_querier 1 vlan_filtering 1
 # ip link add name dummy1 up master br1 type dummy
 # ip link set dev br1 type bridge mcast_vlan_snooping 1
 # ip link set dev br1 type bridge mcast_snooping 0
 # ip link set dev br1 type bridge mcast_snooping 1

This is not intended and it is a problem since the commit cited below.
Prior to this commit, when removing a bridge port,
br_multicast_disable_port() would disable the per-port multicast context
and the per-{port, VLAN} multicast contexts would get disabled when
flushing VLANs.

After this commit, br_multicast_disable_port() only disables the
per-port multicast context if per-VLAN multicast snooping is disabled.
If both types of contexts were enabled on the port when it was removed,
the per-port multicast context would remain enabled when freeing the
bridge port, leading to a use-after-free [1].

Fix by preventing the bridge from enabling / disabling the per-port
multicast contexts when toggling global multicast snooping if per-VLAN
multicast snooping is enabled.

[1]
ODEBUG: free active (active state 0) object: ffff88810f8bda78 object type: timer_list hint: br_ip6_multicast_port_query_expired (net/bridge/br_multicast.c:1927)
WARNING: lib/debugobjects.c:629 at debug_print_object+0x1b1/0x3e0, CPU#5: swapper/5/0
[...]
Call Trace:
<IRQ>
__debug_check_no_obj_freed (lib/debugobjects.c:1116)
kfree (mm/slub.c:2620 mm/slub.c:6250 mm/slub.c:6565)
kobject_cleanup (lib/kobject.c:689)
rcu_do_batch (kernel/rcu/tree.c:2617)
rcu_core (kernel/rcu/tree.c:2869)
handle_softirqs (kernel/softirq.c:622)
__irq_exit_rcu (kernel/softirq.c:656 kernel/softirq.c:496 kernel/softirq.c:735)
irq_exit_rcu (kernel/softirq.c:752)
sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1061 (discriminator 47) arch/x86/kernel/apic/apic.c:1061 (discriminator 47))
</IRQ>

Fixes: 4b30ae9adb ("net: bridge: mcast: re-implement br_multicast_{enable, disable}_port functions")
Reported-by: syzbot+ae231e0552fa77b26ea1@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/87qznowlfs.ffs@tglx/
Reported-by: Thomas Gleixner <tglx@kernel.org>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260517121122.188333-2-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-19 18:15:22 -07:00
Gang Yan 3a543ae0e2 mptcp: update window_clamp on subflows when SO_RCVBUF is set
Add __mptcp_subflow_set_rcvbuf() helper to write the subflow sk_rcvbuf,
but also to call the recently added tcp_set_rcvbuf() helper to update
window_clamp. This is needed because the window clap is updated when
scaling_ratio changes, in tcp_measure_rcv_mss(). Until scaling_ratio
changes, the subflow is stuck with the old window clamp which may be
based on a small initial buffer.

Use this new helper in both mptcp_sol_socket_sync_intval() (setsockopt
path) and sync_socket_options() (new subflow creation path).

Note that this patch depends on commit b025461303 ("tcp: update
window_clamp when SO_RCVBUF is set"): it fixes the issue on TCP side,
but the same fix is needed on MPTCP side as well.

Fixes: a2cbb16039 ("tcp: Update window clamping condition")
Cc: stable@vger.kernel.org
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/619
Signed-off-by: Gang Yan <yangang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260515-net-mptcp-misc-fixes-7-1-rc4-v2-5-701e96419f2f@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-05-19 15:36:35 +02:00
Paolo Abeni 0981f90e1a mptcp: reset rcv wnd on disconnect
If the MPTCP socket fallback to TCP before the MP handshake completion,
the IASN remain 0, and the rcv_wnd_sent field is not explicitly
initialized, just incremented over time with the data transfer.

At disconnect time such value is not cleared. If the next connection falls
back to TCP before the MP handshake completion, the data transfer will
keep incrementing the receive window end sequence starting from the last
value used in the previous connection: the announced window will be
unrelated from the actual receiver buffer size and likely too big.

Address the issue zeroing the field at disconnect time.

Fixes: b29fcfb54c ("mptcp: full disconnect implementation")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260515-net-mptcp-misc-fixes-7-1-rc4-v2-4-701e96419f2f@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-05-19 15:36:35 +02:00
Li Xiasong 51e398a3b8 mptcp: pm: fix ADD_ADDR timer infinite retry on option space insufficient
When TCP option space is insufficient (e.g., when sending ADD_ADDR with an
IPv6 address and port while tcp_timestamps is enabled), the original code
jumped to out_unlock without clearing the addr_signal flag. This caused
mptcp_pm_add_timer to keep rescheduling indefinitely, not sending ADD_ADDR,
preventing subsequent addresses in the endpoint list from being announced.

Handle this case by clearing the ADD_ADDR signal and skipping the matching
ADD_ADDR retransmission entry. The skip path cancels the matching timer
(with id check) and advances PM state progression, preserving forward
progress to subsequent PM work.

This cancellation is inherently best-effort. A concurrent add_timer
callback may already be running and may acquire pm.lock before the
cancel path updates entry state. In that case, one final ADD_ADDR
transmit attempt can still be executed.

Once the cancel path sets entry->retrans_times to ADD_ADDR_RETRANS_MAX,
the callback-side retrans_times check suppresses further ADD_ADDR
retransmissions.

Note that when an ADD_ADDR is being prepared, a pure-ACK is queued. On
the output side, it means that it is fine to skip non-pure-ACK packets,
when drop_other_suboptions is set: a pure-ACK will be processed soon
after.

Fixes: 00cfd77b90 ("mptcp: retransmit ADD_ADDR when timeout")
Cc: stable@vger.kernel.org
Signed-off-by: Li Xiasong <lixiasong1@huawei.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260515-net-mptcp-misc-fixes-7-1-rc4-v2-2-701e96419f2f@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-05-19 15:36:35 +02:00
Shardul Bankar 50c2d91c5d mptcp: do not drop partial packets
When a packet arrives with map_seq < ack_seq < end_seq, the beginning
of the packet has already been acknowledged but the end contains new
data. Currently the entire packet is dropped as "old data," forcing
the sender to retransmit.

Instead, skip the already-acked bytes by adjusting the skb offset and
enqueue only the new portion. Update bytes_received and ack_seq to
reflect the new data consumed.

A previous attempt at this fix has been sent by Paolo Abeni [1], but had
issues [2]: it also added a zero-window check and changed rcv_wnd_sent
initialization, which caused test regressions. This version addresses
only the partial packet handling without modifying receive window
accounting.

Fixes: ab174ad8ef ("mptcp: move ooo skbs into msk out of order queue.")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/c9b426a4e163aa3c4fe8b80c79f1a610f47ae7d8.1763075056.git.pabeni@redhat.com [1]
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/600 [2]
Signed-off-by: Shardul Bankar <shardul.b@mpiricsoftware.com>
[pabeni@redhat.com: update map]
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260515-net-mptcp-misc-fixes-7-1-rc4-v2-1-701e96419f2f@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-05-19 15:36:35 +02:00
Sven Eckelmann f80d3d98d2
batman-adv: bla: avoid NULL-ptr deref for claim via dropped interface
Without rtnl_lock held, a hardif might be retrieved as primary interface of
a meshif, but then (while operating on this interface) getting decoupled
from the mesh interface. In this case, the meshif still exists but the
pointer from the primary hardif to the meshif is set to NULL.

The mesh_iface must be checked first to be non-NULL before continuing to
send an ARP request using meshif.

Cc: stable@kernel.org
Fixes: 23721387c4 ("batman-adv: add basic bridge loop avoidance code")
Reported-by: Ido Schimmel <idosch@nvidia.com>
Reported-by: syzbot+9fdcc9f05a98a540b816@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=9fdcc9f05a98a540b816
Signed-off-by: Sven Eckelmann <sven@narfation.org>
2026-05-19 10:43:54 +02:00
Sven Eckelmann 83ab69bd12
batman-adv: bla: avoid double decrement of bla.num_requests
The bla.num_requests is increased when no request_sent was in progress. And
it is decremented in various places (announcement was received, backbone is
purged, periodic work). But the check if the request_sent is actually set
to a specific state and the atomic_dec/_inc are not safe because they are
not atomic (TOCTOU) and multiple such code portions can run concurrently.

At the same time, it is necessary to modify request_sent (state) and
bla.num_requests atomically. Otherwise batadv_bla_send_request() might set
request_sent to 1 and is interrupted.  batadv_handle_announce() can then
set request_sent back to 0 and decrement num_requests before
batadv_bla_send_request() incremented it.

The two operations must therefore be locked. And since state (request_sent)
and wait_periods are only accessed inside this lock, they can be converted
to simpler datatypes. And to avoid that the bla.num_requests is touched by
a parallel running context with a valid backbone_gw reference after
batadv_bla_purge_backbone_gw() ran, a third state "stopped" is required to
correctly signal that a backbone_gw is in the state of being cleaned up.

Cc: stable@kernel.org
Fixes: 23721387c4 ("batman-adv: add basic bridge loop avoidance code")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
2026-05-19 09:09:34 +02:00
Sven Eckelmann 0459430add
batman-adv: bla: fix report_work leak on backbone_gw purge
batadv_bla_purge_backbone_gw() removes stale backbone gateway entries,
but fails to properly handle their associated report_work:

- If report_work is running, the purge must wait for it to finish before
  freeing the backbone_gw, otherwise the worker may access freed memory
  (e.g. bat_priv).
- If report_work is pending, the purge must cancel it and release the
  reference held for that pending work item.

The previous implementation called hlist_for_each_entry_safe() inside a
spin_lock_bh() section, but cancel_work_sync() may sleep and therefore
cannot be called from within a spinlock-protected region.

Restructure the loop to handle one entry per spinlock critical section:
acquire the lock, find the next entry to purge, remove it from the hash
list, then release the lock before calling cancel_work_sync() and
dropping the hash_entry reference. Repeat until no more entries require
purging.

Cc: stable@kernel.org
Fixes: 23721387c4 ("batman-adv: add basic bridge loop avoidance code")
Reviewed-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
2026-05-19 09:09:34 +02:00
Sven Eckelmann aa3153bd13
batman-adv: iv: recover OGM scheduling after forward packet error
When batadv_iv_ogm_schedule_buff() fails to allocate and queue a forward
packet for OGM transmission, the work item that drives periodic OGM
scheduling is never re-armed. This silently halts transmission of the
node's own OGMs on the affected interface — only OGMs from other peers
continue to be aggregated and forwarded.

Fix this by tracking whether batadv_iv_ogm_queue_add() (and transitively
batadv_iv_ogm_aggregate_new()) successfully scheduled a forward packet.
When scheduling fails, batadv_iv_ogm_schedule_buff() falls back to queuing
a dedicated recovery work item (reschedule_work) that fires after one
originator interval and calls batadv_iv_ogm_schedule() again.

Cc: stable@kernel.org
Fixes: c6c8fea297 ("net: Add batman-adv meshing protocol")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
2026-05-19 09:09:29 +02:00