mirror-linux

Commit Graph

Author	SHA1	Message	Date
Linus Torvalds	68993ced0f	Including fixes from Bluetooth, wireless and netfilter. Current release - fix to a fix: - Bluetooth: btmtk: accept too short WMT FUNC_CTRL events - vsock/virtio: relax the recently added memory limit a little Current release - regressions: - IB/IPoIB: make sure IB drivers always use async set_rx_mode since some (mlx5) are now required to use it due to locking changes Previous releases - regressions: - udp: fix UDP length on last GSO_PARTIAL segment - af_unix: fix UAF read of tail->len in unix_stream_data_wait() - tcp: fix stale per-CPU tcp_tw_isn leak enabling ISN prediction - mlx5e: fix unlocked writing to ICOSQ, breaking AF_XDP Previous releases - always broken: - tap: fix stack info leak in tap_ioctl() SIOCGIFHWADDR - ipv4: raw: reject IP_HDRINCL packets with ihl < 5 - Bluetooth: a lot of locking and concurrency fixes (as always) - batman-adv (mesh wireless networking): a lot of random fixes for issues reported by security researchers and Sashiko - netfilter: same thing, a lot of small security-ish fixes all over the place, nothing really stands out Misc: - bring back the old 3c509 driver, Maciej wants to maintain it Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmoPULUACgkQMUZtbf5S IrtAvA//bfjxxazZKkGqL8mp6uMYS5Su81Oh/pBcyEWC7q2xv3ftNp5pt8oCTWYP eryKi7XrxfNHrkFcmnH+aWQ431UekZLfAjrSd+5V0YvE1nQDnKrgbat5qx2SYSsr ZA7EYnJjvAtPMb0KqUJlYPMSfVdFA0H3gEOdnawkGRnizkKNO5NsNRkC4rHzpCil hzW5SCTZWQ0r1Cm3IxcTnSCJEOYRqH0BUBbiSRFCWNMZZpq0xKi3UiJFOdgRvqgc VoPz6sMRPxZyL8gW8i2jJVz6vj2yuWifJwbl8y3ZkqJJy4HvNXfcPIBH5+vBIWlB hWMuYlUv5F0w+h4+UKeDr789Tdpv12edUIDX+prbsJ8c4bXmBflt069HlFjG9Pto /k2e5owR0NYSaLt4WvAM6Tr5j1ralzQjHKVDg8JbPaAD+0dtb+e3dXE8J3MBPrw6 EWtdg9jX+vqsbVoHwMQO9Xp2waNY9+97L07w+I0nVf7NLJvrvz0lkSjMKfNPNyV1 C5W7McAbSOx3nJ+XzYwMoVK0wP9OunKA73EhAoEdvQSyOGLqQT+iZzDoTMnwKJFs 2L3fbc8LQ10WBG2B2rCPB/gaGQ1ZZD8uSlZoS9N31dvUPFDaCnCYgKIze/pdcE/R KOQskME2xd61KzpYlJszkrjJIbnppkNt/mBvvfNUP+zJZPFRyuA= =ei7U -----END PGP SIGNATURE----- Merge tag 'net-7.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from Bluetooth, wireless and netfilter. Craziness continues with no end in sight. Even discounting the driver revert this is a pretty huge PR for standards of the previous era. I'd speculate - we haven't seen the worst of it, yet. Good news, I guess, is that so far we haven't seen many (any?) cases of "AI reported a bug, we fixed it and a real user regressed". Current release - fix to a fix: - Bluetooth: btmtk: accept too short WMT FUNC_CTRL events - vsock/virtio: relax the recently added memory limit a little Current release - regressions: - IB/IPoIB: make sure IB drivers always use async set_rx_mode since some (mlx5) are now required to use it due to locking changes Previous releases - regressions: - udp: fix UDP length on last GSO_PARTIAL segment - af_unix: fix UAF read of tail->len in unix_stream_data_wait() - tcp: fix stale per-CPU tcp_tw_isn leak enabling ISN prediction - mlx5e: fix unlocked writing to ICOSQ, breaking AF_XDP Previous releases - always broken: - tap: fix stack info leak in tap_ioctl() SIOCGIFHWADDR - ipv4: raw: reject IP_HDRINCL packets with ihl < 5 - Bluetooth: a lot of locking and concurrency fixes (as always) - batman-adv (mesh wireless networking): a lot of random fixes for issues reported by security researchers and Sashiko - netfilter: same thing, a lot of small security-ish fixes all over the place, nothing really stands out Misc: - bring back the old 3c509 driver, Maciej wants to maintain it" * tag 'net-7.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (187 commits) net: enetc: avoid VF->PF mailbox timeout during SR-IOV teardown net: enetc: fix init and teardown order to prevent use of unsafe resources net: enetc: fix unbounded loop and interrupt handling in VF-to-PF messaging net: enetc: fix DMA write to freed memory in enetc_msg_free_mbx() net: enetc: fix race condition in VF MAC address configuration net: enetc: fix TOCTOU race and validate VF MAC address net: enetc: add ratelimiting to VF mailbox error messages net: enetc: fix missing error code when pf->vf_state allocation fails net: enetc: fix incorrect mailbox message status returned to VFs net: bridge: prevent too big nested attributes in br_fill_linkxstats() l2tp: use list_del_rcu in l2tp_session_unhash net: bcmgenet: keep RBUF EEE/PM disabled ethernet: 3c509: Fix most coding style issues ethernet: 3c509: Update documentation to match MAINTAINERS ethernet: 3c509: Add GPL 2.0 SPDX license identifier ethernet: 3c509: Fix AUI transceiver type selection Revert "drivers: net: 3com: 3c509: Remove this driver" tools: ynl: support listening on all nsids net: gro: don't merge zcopy skbs pds_core: ensure null-termination for firmware version strings ...	2026-05-21 14:39:12 -07:00
Jakub Kicinski	0e3c08f1b7	Quite a few more updates: - cfg80211/mac80211: - various security(-ish) fixes - fix A-MSDU subframe handling - fix multi-link element parsing - ath10: avoid sending commands to dead device - ath11k: - fix WMI buffer leaks on error conditions - fix UAF in RX MSDU coalesce path - allow peer ID 0 on RX path (legal for mobile devices) - reinitialize shared SRNG pointers on restart - ath12k: - fix 20 MHz-only parsing of EHT-MCS map - iwlwifi: - fix TSO segmentation explosion - don't TX to dead device - fix warning in WoWLAN - fix TX rates on old devices - disconnect on beacon loss only if also no other traffic - fill NULL-ptr deref - fix STEP_URM hardware access -----BEGIN PGP SIGNATURE----- iQIyBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmoPJEkACgkQ10qiO8sP aACQrw/4vU+lbZNW19OyaJMd4h+44gUW+UGJixOzputQCBc6JGUlRsxgceWq5Ws5 5x2LTOX7S1wcvKm0VuSvkIRP3e9YHcgB60iBtsJ3ozz4RCoCFiSu8Bb2RdkGtRTp 7CKMK9NNuovOJncBzfyANq4ujsGs/58BmGbhXbaZ0ACfLUauesCCUtM7iQZE1k7t lBqtk8ezkz1L8006w5vR7VR8g4KCCofQTEAOASmx450ZeGAiHMlWVKdMFFHV3zWj ZDXopvLaMtduLNq9xqGYCRhAIZqOv1axgL7w9RRxsi2gWHv71kLqyz0IzgbFmh1m ZxUSQ45+MHVYCHxs7HHCcTR5gqQlx47j5Wi3tuLUH8yoSZ8dPeWjmQMvIEswfZql WNq18o6mcK+L3Yg87+oxiiJ7V/euaM//0+ZGtqhbiB+2FyHZhO42BqALTy7e4swS kmEl8gCj2lgCbD2AHJQ9VpOJwoNdNuLYoJqg9IiIu/CYqQF80FGO8e6HZXBXsJkL 3KAPQIXkMMMkSjtpTg/GdHDiFqv/7lF8u3FgED7w7M1ZVQYNUc13KiDwPALFV0pu bBbRktB6lvF6ShW9XrTrmn9lT0iiWHxr5YctWoys4+Ofr5V7PUzxNVDxDuzSVCZ0 apLYZ7uwSXO37q99p/azs47dzYp7tpnwKd4rpQuUTl/9bYErUA== =vzwG -----END PGP SIGNATURE----- Merge tag 'wireless-2026-05-21' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== Quite a few more updates: - cfg80211/mac80211: - various security(-ish) fixes - fix A-MSDU subframe handling - fix multi-link element parsing - ath10: avoid sending commands to dead device - ath11k: - fix WMI buffer leaks on error conditions - fix UAF in RX MSDU coalesce path - allow peer ID 0 on RX path (legal for mobile devices) - reinitialize shared SRNG pointers on restart - ath12k: - fix 20 MHz-only parsing of EHT-MCS map - iwlwifi: - fix TSO segmentation explosion - don't TX to dead device - fix warning in WoWLAN - fix TX rates on old devices - disconnect on beacon loss only if also no other traffic - fill NULL-ptr deref - fix STEP_URM hardware access * tag 'wireless-2026-05-21' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: (24 commits) wifi: cfg80211: wext: validate chandef in monitor mode wifi: mac80211: consume only present negotiated TTLM maps wifi: wilc1000: fix dma_buffer leak on bus acquire failure wifi: mac80211: capture fast-RX rate before mesh reuses skb->cb wifi: mac80211: fix multi-link element inheritance wifi: mac80211: fix MLE defragmentation wifi: mac80211: don't override max_amsdu_subframes wifi: mac80211: bounds-check link_id in ieee80211_ml_epcs wifi: ath12k: fix EHT TX MCS limitation due to wrong 20 MHz-only parsing wifi: ath11k: clear shared SRNG pointer state on restart wifi: ath11k: fix use after free in ath11k_dp_rx_msdu_coalesce() wifi: ath11k: fix peer resolution on rx path when peer_id=0 wifi: iwlwifi: mld: disconnect only after 6 beacons without Rx wifi: iwlwifi: mld: don't WARN on WoWLAN suspend w/o BSS vif wifi: iwlwifi: use correct function to read STEP_URM register wifi: iwlwifi: mvm: fix driver-set TX rates on old devices wifi: iwlwifi: mld: don't dereference a pointer before NULL checking it wifi: iwlwifi: mld: stop TX during firmware restart wifi: iwlwifi: mld: fix TSO segmentation explosion when AMSDU is disabled wifi: ath10k: skip WMI and beacon transmission when device is wedged ... ==================== Link: https://patch.msgid.link/20260521152903.374070-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-21 11:03:58 -07:00
Eric Dumazet	bdd39576bf	net: bridge: prevent too big nested attributes in br_fill_linkxstats() After commit ff205bf8c554 ("netlink: add one debug check in nla_nest_end()") syzbot found that br_fill_linkxstats() can send corrupted netlink packets. Make sure the nested attribute size is bounded. Fixes: `a60c090361` ("bridge: netlink: export per-vlan stats") Reported-by: syzbot+a35f9259d08f907c06e6@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/6a0b0da3.050a0220.175f0c.0000.GAE@google.com/ Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20260520114207.1394241-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-21 08:47:36 -07:00
Michael Bommarito	979c017803	l2tp: use list_del_rcu in l2tp_session_unhash An unprivileged local user can pin a host CPU indefinitely in l2tp_session_get_by_ifname() by issuing L2TP_CMD_SESSION_GET on L2TP_ATTR_IFNAME concurrently with L2TP_CMD_SESSION_CREATE and L2TP_CMD_SESSION_DELETE on the same tunnel. All three commands take GENL_UNS_ADMIN_PERM, so CAP_NET_ADMIN in the netns user namespace suffices; on any host that has l2tp_core loaded the trigger is reachable from a standard `unshare -Urn` sandbox. l2tp_session_unhash() removes a session from tunnel->session_list with list_del_init(), but that list is walked by l2tp_session_get_by_ifname() with list_for_each_entry_rcu() under rcu_read_lock_bh(). list_del_init() leaves the deleted entry's next/prev self-pointing; a reader that has loaded the entry and then advances pos->list.next reads &session->list, container_of()s back to the same session, and list_for_each_entry_rcu() never reaches the list head. The CPU stays in strcmp() inside the walker, with BH and preemption disabled, so RCU grace periods on the host stall behind it and the wedged thread cannot be killed (SIGKILL is delivered on syscall return). Use list_del_rcu() to match the existing list_add_rcu() in l2tp_session_register(); the deleted session remains visible to in-flight walkers with consistent next/prev pointers until kfree_rcu() in l2tp_session_free() releases it. tunnel->session_list has exactly one list_del_init() call site; the list_del_init (&session->clist) at l2tp_core.c:533 operates on the per-collision list, which is not walked under RCU. list_empty(&session->list) is not used anywhere in net/l2tp/ after the unhash point, so dropping the post-delete self-init is safe; the fix has no userspace-visible behavior change. Fixes: `89b768ec2d` ("l2tp: use rcu list add/del when updating lists") Cc: stable@vger.kernel.org # 6.11+ Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Link: https://patch.msgid.link/20260518183447.64078-1-michael.bommarito@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-21 08:47:20 -07:00
Sabrina Dubroca	4db79a322d	net: gro: don't merge zcopy skbs skb_gro_receive() can currently copy frags between the source and GRO skb, without checking the zerocopy status, and in particular the SKBFL_MANAGED_FRAG_REFS flag. When SKBFL_MANAGED_FRAG_REFS is set, the skb doesn't hold a reference on the pages in shinfo->frags. Appending those frags to another skb's frags without fixing up the page refcount can lead to UAF. When either the last skb in the GRO chain (the one we would append frags to) or the source skb is zerocopy, don't merge the skbs. Fixes: `753f1ca4e1` ("net: introduce managed frags infrastructure") Reported-by: Huzaifa Sidhpurwala <huzaifas@redhat.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/c3b7f906bbfcbdfd7b4fa9d6c18a438870df85be.1779307748.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-21 08:21:33 -07:00
Justin Iurman	e46e6bc97f	ipv6: ioam: refresh hdr pointer before ioam6_event() Reported by Sashiko: In ipv6_hop_ioam(), the hdr pointer is initialized to point into the skb's linear data buffer. Later, the code calls skb_ensure_writable(), which might reallocate the buffer: if (skb_ensure_writable(skb, optoff + 2 + hdr->opt_len)) goto drop; /* Trace pointer may have changed / trace = (struct ioam6_trace_hdr )(skb_network_header(skb) + optoff + sizeof(hdr)); ioam6_fill_trace_data(skb, ns, trace, true); ioam6_event(IOAM6_EVENT_TRACE, dev_net(skb->dev), GFP_ATOMIC, (void )trace, hdr->opt_len - 2); If the skb is cloned or lacks sufficient linear headroom, skb_ensure_writable() will invoke pskb_expand_head(), which reallocates the skb's data buffer and frees the old one, invalidating pointers to it. While the code recalculates the trace pointer immediately after the call to skb_ensure_writable(), it fails to recalculate the hdr pointer. This patch fixes the above by recalculating the hdr pointer before passing hdr->opt_len to ioam6_event(), so that we avoid any UaF. Fixes: `f655c78d62` ("net: exthdrs: ioam6: send trace event") Cc: stable@vger.kernel.org Signed-off-by: Justin Iurman <justin.iurman@gmail.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260520124242.32320-1-justin.iurman@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-21 08:19:25 -07:00
Zhang Cen	c367b90821	netpoll: normalize skb->dev to the netpoll device __netpoll_send_skb() always transmits through np->dev and queues busy packets on np->dev->npinfo->txq, but it leaves skb->dev unchanged. Stacked callers such as DSA and macvlan can reach netpoll with skb->dev still naming the upper device while np->dev is the lower device that owns the netpoll state. If the skb has to be deferred, queue_process() later dequeues it from the lower device's txq but retries it through skb->dev. That can re-enter the upper ndo_start_xmit path on an already transformed skb, and if the upper device disappears before the lower txq drains the workqueue can dereference a stale skb->dev pointer. The buggy scenario involves two paths, with each column showing the order within that path: path A label: netpoll enqueue path path B label: upper-device teardown 1. Stacked xmit calls netpoll 1. Teardown unregisters the upper with lower np->dev and upper net_device while lower npinfo skb->dev. stays alive. 2. __netpoll_send_skb() uses 2. netdev_release() runs for the np->dev->npinfo as the txq upper net_device. owner. 3. Busy transmit queues the skb 3. The lower txq still owns the on that lower txq with upper deferred skb. skb->dev. 4. queue_process() drains the 4. queue_process() dereferences lower txq and reads skb->dev. that stale upper skb->dev. Normalize skb->dev to np->dev after loading np->dev from the netpoll instance, before either the direct transmit path or the fallback enqueue. This keeps the queued skb in the same device and txq domain as the netpoll state that owns it. KASAN report as below: KASAN slab-use-after-free in queue_process+0x7c/0x480 Workqueue: events queue_process The buggy address belongs to the object at ffff88810906c000 which belongs to the cache kmalloc-4k of size 4096 The buggy address is located 168 bytes inside of freed 4096-byte region [ffff88810906c000, ffff88810906d000) Read of size 8 Call trace: dump_stack_lvl+0x73/0xb0 (?:?) print_report+0xd1/0x620 (?:?) srso_alias_return_thunk+0x5/0xfbef5 (?:?) __virt_addr_valid+0x215/0x420 (?:?) kasan_complete_mode_report_info+0x64/0x200 (?:?) kasan_report+0xf7/0x130 (?:?) queue_process+0x7c/0x480 (net/core/netpoll.c:88) kasan_check_range+0x10c/0x1c0 (?:?) __kasan_check_read+0x15/0x20 (?:?) process_one_work+0x8b7/0x1af0 (kernel/workqueue.c:3200) assign_work+0x170/0x3f0 (?:?) worker_thread+0x574/0xf10 (?:?) _raw_spin_unlock_irqrestore+0x4b/0x60 (?:?) trace_hardirqs_on+0x2a/0x180 (?:?) kthread+0x2fc/0x3f0 (?:?) ret_from_fork+0x58b/0x830 (?:?) __switch_to+0x58e/0xe90 (?:?) __switch_to_asm+0x39/0x70 (?:?) ret_from_fork_asm+0x1a/0x30 (?:?) Freed by task stack: kasan_save_stack+0x3d/0x60 (?:?) kasan_save_track+0x18/0x40 (?:?) kasan_save_free_info+0x3f/0x60 (?:?) __kasan_slab_free+0x48/0x70 (?:?) kfree+0x20e/0x4e0 (?:?) kvfree+0x31/0x40 (?:?) netdev_release+0x71/0x90 (net/core/net-sysfs.c:2227) device_release+0xd2/0x250 (?:?) kobject_put+0x181/0x4c0 (lib/kobject.c:730) netdev_run_todo+0x700/0x1000 (net/core/dev.c:11666) rtnl_dellink+0x396/0xc00 (net/core/rtnetlink.c:3558) rtnetlink_rcv_msg+0x740/0xc20 (net/core/rtnetlink.c:6897) netlink_rcv_skb+0x147/0x3a0 (?:?) rtnetlink_rcv+0x19/0x20 (net/core/rtnetlink.c:7021) netlink_unicast+0x4d1/0x830 (net/netlink/af_netlink.c:1327) netlink_sendmsg+0x840/0xe10 (net/netlink/af_netlink.c:1812) ____sys_sendmsg+0x8a7/0xb50 (?:?) ___sys_sendmsg+0x104/0x190 (?:?) __sys_sendmsg+0x135/0x1d0 (?:?) __x64_sys_sendmsg+0x7b/0xc0 (?:?) x64_sys_call+0x205c/0x2130 (?:?) do_syscall_64+0x115/0x6a0 (arch/x86/entry/syscall_64.c:87) entry_SYSCALL_64_after_hwframe+0x77/0x7f (?:?) Fixes: `5de4a473bd` ("netpoll queue cleanup") Signed-off-by: Zhang Cen <rollkingzzc@gmail.com> Link: https://patch.msgid.link/20260519104647.3517990-1-rollkingzzc@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-21 08:10:18 -07:00
Yuho Choi	1341db3224	ipv6: route: Unregister netdevice notifier on BPF init failure ip6_route_init() registers ip6_route_dev_notifier before registering the IPv6 route BPF iterator target. If bpf_iter_register() fails after the notifier has been registered, the error path currently jumps to out_register_late_subsys and unwinds the RTNL handlers and pernet route state without removing the notifier from the netdevice notifier chain. This leaves ip6_route_dev_notify() callable after the IPv6 route state it uses has been torn down. Add a separate unwind label for the BPF iterator failure path and unregister the netdevice notifier before continuing with the existing cleanup. Fixes: `138d0be35b` ("net: bpf: Add netlink and ipv6_route bpf_iter targets") Signed-off-by: Yuho Choi <dbgh9129@gmail.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260520030329.1061183-1-dbgh9129@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-21 07:43:15 -07:00
Zijing Yin	dbc81608e3	phonet/pep: disable BH around forwarded sk_receive_skb() The networking receive path is usually run from softirq context, but protocols that take the socket lock may have packets stored in the backlog and processed later from process context. In that case release_sock() -> __release_sock() drops the slock with spin_unlock_bh() and then calls sk->sk_backlog_rcv() with bottom halves enabled. Typical sk_backlog_rcv handlers process the socket whose backlog is being drained, so the BH state at entry is irrelevant for the slocks they touch. pep_do_rcv() is different: when the inbound skb targets an existing PEP pipe, it forwards the skb to a different child socket via sk_receive_skb(). That helper takes the child slock with bh_lock_sock_nested(), which is just spin_lock_nested() and assumes BH is already off. The same child slock therefore ends up acquired with BH on (process path) and with BH off (softirq path): process context softirq context --------------- --------------- release_sock(listener) __netif_receive_skb() __release_sock() phonet_rcv() spin_unlock_bh() __sk_receive_skb(listener) [BH now ENABLED] [BH already disabled] sk_backlog_rcv: sk_backlog_rcv: pep_do_rcv() pep_do_rcv() sk_receive_skb(child) sk_receive_skb(child) bh_lock_sock_nested(child) bh_lock_sock_nested(child) => SOFTIRQ-ON-W => IN-SOFTIRQ-W Lockdep flags this as inconsistent lock state, and it can become a real self-deadlock if a softirq on the same CPU tries to receive to the same child socket while its slock is held in the BH-enabled path: WARNING: inconsistent lock state inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. (slock-AF_PHONET/1){+.?.}-{3:3}, at: __sk_receive_skb+0x1cf/0x900 __sk_receive_skb net/core/sock.c:563 sk_receive_skb include/net/sock.h:2022 [inline] pep_do_rcv net/phonet/pep.c:675 sk_backlog_rcv include/net/sock.h:1190 __release_sock net/core/sock.c:3216 release_sock net/core/sock.c:3815 pep_sock_accept net/phonet/pep.c:879 Wrap the forwarded sk_receive_skb() in local_bh_disable() / local_bh_enable() so the child slock is always acquired with BH off. local_bh_disable() nests safely on the softirq path. Discovered via in-house syzkaller fuzzing; the same root cause also on the linux-6.1.y syzbot dashboard as extid 44f0626dd6284f02663c. Reproduced under KASAN + LOCKDEP + PROVE_LOCKING, reproducer: https://pastebin.com/A3t8xzCR Fixes: `9641458d3e` ("Phonet: Pipe End Point for Phonet Pipes protocol") Link: https://syzkaller.appspot.com/bug?extid=44f0626dd6284f02663c Cc: stable@vger.kernel.org Signed-off-by: Zijing Yin <yzjaurora@gmail.com> Acked-by: Rémi Denis-Courmont <remi@remlab.net> Reported-by: syzbot+9f4a135646b66c509935@syzkaller.appspotmail.com Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260519172635.86304-1-yzjaurora@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-21 07:38:21 -07:00
Paolo Abeni	42734af663	Here are batman-adv bugfixes, all by by Sven Eckelmann. - fix batadv_skb_is_frag() kernel-doc - BATMAN V: stop OGMv2 on disabled interface - BATMAN IV: abort OGM send on tvlv append failure - BATMAN IV: reject oversized TVLV packets - tp_meter: fix race condition in send error reporting - tp_meter: avoid role confusion in tp_list - mcast: fix use-after-free in orig_node RCU release - BATMAN IV: recover OGM scheduling after forward packet error - bla: fix report_work leak on backbone_gw purge - bla: avoid double decrement of bla.num_requests - bla: avoid NULL-ptr deref for claim via dropped interface -----BEGIN PGP SIGNATURE----- iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAmoNoHkWHHN3QHNpbW9u d3VuZGVybGljaC5kZQAKCRChK+OYQpKeoaFZD/4yN7deqhkVKj039EVNxoOOVOSi T0kQCOG6h20UiJX6j2l6QuoXF6vwBGDPFZfwCMKSMWXFhbuhzl8L0oEvZ+p6goho FyPgAWkWRcopPnSiJTQcFXVVtkku7EkPO2MjMLUEkU8YEKGfnBSWHUBr9+Mw0OUV rmrkRQcl7/BpWX++fM0U/qUFfKi8VuIPgQubPyC602/iInseNvx6Ju0abRJeTCF6 YM4IugyxKoHCwHbnR2jNwl1PfaI7FyLF/ZCvi/2NpAfwnjcykBlMYoEj46Qh3dTD 05ZIAHlgHAfMDHuW87rema6aYT0nCpSwtBaM4YE2/vim4hxRot6AN/ho/4JFHzHW GHCuhw3NxADeUJZBefGy4QEtGPIP/Odz4WcVTr/29TnOTKEAG2rsI4mwLoM4ogp/ Wa0tlZWUZNHEsjPHSkEWqnR+X/+J8Q9W/RUP85gthVpMKvHyMf7YC2JBXlJ2QvmF HzmCjYLv83X4pftutf6R/EAy8MedJTSCoCzJGitIDd5qW3EG3yYBgz6ep4DD/4Hp 4qWP3gdUbcPnL5mw3SOWYXhOKFQnOs68h5QsSRdEp4LYj0FExC6EZagRZUU7fGOu W5r7vsN6j83sBuZd5rQRIs1XNSm5RlUAfJAsWawKLI96l2xSuIrtmJq3zP/kc4cH q4Lc88OyebVE5t8/yA== =0Yhf -----END PGP SIGNATURE----- Merge tag 'batadv-net-pullrequest-20260520' of https://git.open-mesh.org/batadv Simon Wunderlich says: ==================== Here are batman-adv bugfixes, all by by Sven Eckelmann. - fix batadv_skb_is_frag() kernel-doc - BATMAN V: stop OGMv2 on disabled interface - BATMAN IV: abort OGM send on tvlv append failure - BATMAN IV: reject oversized TVLV packets - tp_meter: fix race condition in send error reporting - tp_meter: avoid role confusion in tp_list - mcast: fix use-after-free in orig_node RCU release - BATMAN IV: recover OGM scheduling after forward packet error - bla: fix report_work leak on backbone_gw purge - bla: avoid double decrement of bla.num_requests - bla: avoid NULL-ptr deref for claim via dropped interface * tag 'batadv-net-pullrequest-20260520' of https://git.open-mesh.org/batadv: batman-adv: bla: avoid NULL-ptr deref for claim via dropped interface batman-adv: bla: avoid double decrement of bla.num_requests batman-adv: bla: fix report_work leak on backbone_gw purge batman-adv: iv: recover OGM scheduling after forward packet error batman-adv: mcast: fix use-after-free in orig_node RCU release batman-adv: tp_meter: avoid role confusion in tp_list batman-adv: tp_meter: fix race condition in send error reporting batman-adv: tvlv: reject oversized TVLV packets batman-adv: tvlv: abort OGM send on tvlv append failure batman-adv: v: stop OGMv2 on disabled interface batman-adv: fix batadv_skb_is_frag() kernel-doc ==================== Link: https://patch.msgid.link/20260520115422.53552-1-sw@simonwunderlich.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-05-21 15:59:11 +02:00
Stefano Garzarella	c6087c5aaa	vsock/virtio: fix skb overhead accounting to preserve full buf_alloc After commit `059b7dbd20` ("vsock/virtio: fix potential unbounded skb queue"), virtio_transport_inc_rx_pkt() subtracts per-skb overhead from buf_alloc when checking whether a new packet fits. This reduces the effective receive buffer below what the user configured via SO_VM_SOCKETS_BUFFER_SIZE, causing legitimate data packets to be silently dropped and applications that rely on the full buffer size to deadlock. Also, the reduced space is not communicated to the remote peer, so its credit calculation accounts more credit than the receiver will actually accept, causing data loss (there is no retransmission). With this approach we currently have failures in tools/testing/vsock/vsock_test.c. Test 18 sometimes fails, while test 22 always fails in this way: 18 - SOCK_STREAM MSG_ZEROCOPY...hash mismatch 22 - SOCK_STREAM virtio credit update + SO_RCVLOWAT...send failed: Resource temporarily unavailable Fix by allowing at most `buf_alloc * 2` as the total budget for payload plus skb overhead in virtio_transport_inc_rx_pkt(), similar to how SO_RCVBUF is doubled to reserve space for sk_buff metadata. This preserves the full buf_alloc for payload under normal operation, while still bounding the skb queue growth. With this patch, all tests in tools/testing/vsock/vsock_test.c are now passing again. Fixes: `059b7dbd20` ("vsock/virtio: fix potential unbounded skb queue") Cc: stable@vger.kernel.org Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20260518090656.134588-3-sgarzare@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-05-21 13:14:01 +02:00
Stefano Garzarella	a4f0b00178	vsock/virtio: reset connection on receiving queue overflow When there is no more space to queue an incoming packet, the packet is silently dropped. This causes data loss without any notification to either peer, since there is no retransmission. Under normal circumstances, this should never happen. However, it could happen if the other peer doesn't respect the credit, or if the skb overhead, which we recently began to take into account with commit `059b7dbd20` ("vsock/virtio: fix potential unbounded skb queue"), is too high. Fix this by resetting the connection and setting the local socket error to ENOBUFS when virtio_transport_recv_enqueue() can no longer queue a packet, so both peers are explicitly notified of the failure rather than silently losing data. Fixes: `ae6fcfbf5f` ("vsock/virtio: discard packets if credit is not respected") Cc: stable@vger.kernel.org Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20260518090656.134588-2-sgarzare@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-05-21 13:14:01 +02:00
Hyunwoo Kim	48f6a5356a	net: skbuff: propagate shared-frag marker through frag-transfer helpers Two frag-transfer helpers (__pskb_copy_fclone() and skb_shift()) fail to propagate the SKBFL_SHARED_FRAG bit in skb_shinfo()->flags when moving frags from source to destination. __pskb_copy_fclone() defers the rest of the shinfo metadata to skb_copy_header() after copying frag descriptors, but that helper only carries over gso_{size,segs, type} and never touches skb_shinfo()->flags; skb_shift() moves frag descriptors directly and leaves flags untouched. As a result, the destination skb keeps a reference to the same externally-owned or page-cache-backed pages while reporting skb_has_shared_frag() as false. The mismatch is harmful in any in-place writer that uses skb_has_shared_frag() to decide whether shared pages must be detoured through skb_cow_data(). ESP input is one such writer (esp4.c, esp6.c), and a single nft 'dup to <local>' rule -- or any other nf_dup_ipv4() / xt_TEE caller -- is enough to land a pskb_copy()'d skb in esp_input() with the marker stripped, letting an unprivileged user write into the page cache of a root-owned read-only file via authencesn-ESN stray writes. Set SKBFL_SHARED_FRAG on the destination whenever frag descriptors were actually moved from the source. skb_copy() and skb_copy_expand() share skb_copy_header() too but linearize all paged data into freshly allocated head storage and emerge with nr_frags == 0, so skb_has_shared_frag() returns false on its own; they need no change. The same omission exists in skb_gro_receive() and skb_gro_receive_list(). The former moves the incoming skb's frag descriptors into the accumulator's last sub-skb via two paths (a direct frag-move loop and the head_frag + memcpy path); the latter chains the incoming skb whole onto p's frag_list. Downstream skb_segment() reads only skb_shinfo(p)->flags, and skb_segment_list() reuses each sub-skb's shinfo as the nskb -- both p and lp must carry the marker. The same omission also exists in tcp_clone_payload(), which builds an MTU probe skb by moving frag descriptors from skbs on sk_write_queue into a freshly allocated nskb. The helper falls into the same family and warrants the same fix for consistency; no TCP TX-side in-place writer is currently known to reach a user page through this gap, but a future consumer depending on the marker would regress silently. The same omission exists in skb_segment(): the per-iteration flag merge takes only head_skb's flag, and the inner switch that rebinds frag_skb to list_skb on head_skb-frags exhaustion does not fold the new frag_skb's flag into nskb. Fold frag_skb's flag at both sites so segments drawing frags from frag_list members carry the marker. Fixes: `cef401de7b` ("net: fix possible wrong checksum generation") Fixes: `f4c50a4034` ("xfrm: esp: avoid in-place decrypt on shared skb frags") Suggested-by: Sabrina Dubroca <sd@queasysnail.net> Suggested-by: Sultan Alsawaf <sultan@kerneltoast.com> Suggested-by: Ben Hutchings <ben@decadent.org.uk> Suggested-by: Lin Ma <malin89@huawei.com> Suggested-by: Jingguo Tan <tanjingguo@huawei.com> Suggested-by: Aaron Esau <aaron1esau@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com> Tested-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com> Link: https://patch.msgid.link/ageeJfJHwgzmKXbh@v4bel Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-05-21 11:31:05 +02:00
Eric Dumazet	1bbf0ced1d	tcp: fix stale per-CPU tcp_tw_isn leak enabling ISN prediction Blamed commit moved the TIME_WAIT-derived ISN from the skb control block to a per-CPU variable, assuming the value would always be consumed by tcp_conn_request() for the same packet that wrote it. That assumption is violated by multiple drop paths between the producer (__this_cpu_write(tcp_tw_isn, isn) in tcp_v{4,6}_rcv()) and the consumer (tcp_conn_request()): - min_ttl / min_hopcount check - xfrm policy check - tcp_inbound_hash() MD5/AO mismatch - tcp_filter() eBPF/SO_ATTACH_FILTER drop - th->syn && th->fin discard in tcp_rcv_state_process() TCP_LISTEN - psp_sk_rx_policy_check() in tcp_v{4,6}_do_rcv() - tcp_checksum_complete() in tcp_v{4,6}_do_rcv() - tcp_v{4,6}_cookie_check() returning NULL When a packet is dropped on any of these paths, tcp_tw_isn is left set. The next SYN processed on the same CPU then consumes the non zero value in tcp_conn_request(), receiving a potentially predictable ISN. This patch moves back tcp_tw_isn to skb->cb[], getting rid of the per-cpu variable. Note that tcp_v{4,6}_fill_cb() do not set it. Very litle impact on overall code size/complexity: $ scripts/bloat-o-meter -t vmlinux.old vmlinux.new add/remove: 0/0 grow/shrink: 2/1 up/down: 8/-15 (-7) Function old new delta tcp_v6_rcv 3038 3042 +4 tcp_v4_rcv 3035 3039 +4 tcp_conn_request 2938 2923 -15 Total: Before=24436060, After=24436053, chg -0.00% Fixes: `41eecbd712` ("tcp: replace TCP_SKB_CB(skb)->tcp_tw_isn with a per-cpu field") Reported-by: Chris Mason <clm@meta.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260519084611.2485277-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-20 19:14:06 -07:00
Minh Nguyen	99e22ddf4e	vsock/vmci: fix UAF when peer resets connection during handshake vmci_transport_recv_connecting_server() returned err = 0 for a peer RST in its default switch arm: err = pkt->type == VMCI_TRANSPORT_PACKET_TYPE_RST ? 0 : -EINVAL; That made vmci_transport_recv_listen() skip vsock_remove_pending(), leaving the pending socket on the listener's pending_links with sk_state = TCP_CLOSE while destroy: still dropped the explicit reference taken before schedule_delayed_work(). One second later vsock_pending_work() observed is_pending=true and performed full cleanup: vsock_remove_pending() then the two trailing sock_put(sk) calls -- the first reached refcount 0 and __sk_freed the socket, and the second wrote into the freed object: BUG: KASAN: slab-use-after-free in refcount_warn_saturate Write of size 4 at addr ffff88800b1cac80 by task kworker Workqueue: events vsock_pending_work Treat peer RST like any other unexpected packet type (err = -EINVAL). All destroy: arms now return err < 0, so vmci_transport_recv_listen() removes pending from pending_links synchronously and vsock_pending_work() takes the is_pending=false / !rejected branch, dropping only its own work reference. This also closes the multi-packet race Sashiko reported on v2: pending is removed from the list before any subsequent packet can find it. The pre-existing sk_acceptq_removed() gap on the err < 0 path of vmci_transport_recv_listen() that Sashiko also noted is not introduced or changed by this patch. Tested on lts-6.12.79 with KASAN: 52/100 unpatched -> 0/100 patched. Fixes: `d021c34405` ("VSOCK: Introduce VM Sockets") Cc: stable@vger.kernel.org Signed-off-by: Minh Nguyen <minhnguyen.080505@gmail.com> Acked-by: Bryan Tan <bryan-bt.tan@broadcom.com> Link: https://patch.msgid.link/20260519102310.237181-1-minhnguyen.080505@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-20 19:11:18 -07:00
Eric Dumazet	e4bdef4d32	ipv4: use WARN_ON_ONCE() in ip_rt_bug() It turns out ip_rt_bug() can be called more than expected. syzbot will still panic (because of panic_on_warn=1), but non debug kernels will no longer die while repeating stack traces on the console. Fixes: `c378a9c019` ("ipv4: Give backtrace in ip_rt_bug().") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev> Link: https://patch.msgid.link/20260519193248.4018872-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-20 19:00:36 -07:00
Eric Dumazet	7eb72c1e39	ipv4: icmp: reject broadcast/multicast routes syzbot was able to trigger ip_rt_bug() in a loop, using an IPv4 packet with a crafted IPOPT_SSRR option: options: ipv4_options { options: array[ipv4_option] { union ipv4_option { ssrr: ipv4_option_route[IPOPT_SSRR] { type: const = 0x89 (1 bytes) length: len = 0x7 (1 bytes) pointer: int8 = 0xa2 (1 bytes) data: array[ipv4_addr] { union ipv4_addr { broadcast: const = 0xffffffff (4 bytes) } } } } Change __icmp_send() to not send ICMP to broadcast/multicast destinations. Fixes: `c378a9c019` ("ipv4: Give backtrace in ip_rt_bug().") Reported-by: syzbot+c13a57c2639c2c0d03a6@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/6a0cc169.170a0220.1f6c2d.0004.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20260519200836.4141061-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-20 19:00:02 -07:00
David Carlier	4eb82ba543	net: devmem: reject dma-buf bind with non-page-aligned size or SG length net_devmem_bind_dmabuf() trusts dmabuf->size and sg_dma_len() to be PAGE_SIZE multiples without checking: - tx_vec is sized dmabuf->size / PAGE_SIZE, and net_devmem_get_niov_at() only bounds-checks virt_addr < dmabuf->size before indexing tx_vec[virt_addr / PAGE_SIZE]. With size = NPAGE_SIZE + r (1 <= r < PAGE_SIZE), sendmsg() at iov_base = NPAGE_SIZE passes the bound check and reads tx_vec[N] -- one past. - owner->area.num_niovs = len / PAGE_SIZE while gen_pool_add_owner() covers the full byte len, so a non-page-multiple non-final sg desyncs num_niovs from the gen_pool region for every later sg, on both RX and TX. dma-buf does not require page-aligned sizes, so the bind path has to enforce what its own indexing assumes. Reject both with -EINVAL. The size check is TX-only (only tx_vec is sized off dmabuf->size); the SG-length check covers both directions. Fixes: `bd61848900` ("net: devmem: Implement TX path") Cc: stable@vger.kernel.org Signed-off-by: David Carlier <devnexen@gmail.com> Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Mina Almasry <almasrymina@google.com> Link: https://patch.msgid.link/20260519203530.66310-1-devnexen@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-20 18:59:01 -07:00
Jakub Kicinski	5027c886e2	bluetooth pull request for net: - hci_sync: Fix not setting mask for HCI_EVT_LE_ALL_REMOTE_FEATURES_COMPLETE - L2CAP: fix UAF in l2cap_sock_cleanup_listen() vs l2cap_conn_del() - ISO: drop ISO_END frames received without prior ISO_START - MGMT: validate Add Extended Advertising Data length - bnep: Fix UAF read of dev->name - btmtk: fix urb->setup_packet leak in error paths - btintel_pcie: Fix incorrect MAC access programming - hci_uart: fix UAFs and race conditions in close and init paths -----BEGIN PGP SIGNATURE----- iQJNBAABCgA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmoOHbkZHGx1aXoudm9u LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKUlGD/9k81ghY4sS0chhP0sklpWm V5hi01rhlmYkNJY+CWG24wRc+OjgTzRLdeEJCOkLSkXj0X8lWtN4LhaVthVWoq+5 jCyNm5t95WMKYS+iBh1Qsd9jdPyruJhCmeQPJ8OEjrer5o/XZHphRfGi4hJEVNfO r8x5vUnAHkdzRIH5UeHaJ01TV+hDIacin9+Yls2EMdEQZ/bpGIZUuDwcpQkApKr2 DMPigwF/Acy0mpXKtuJ5DAeF1c6JvGhskW7mhf0Zn2snuDdqolSY0r3llBx5WyB2 KvnZ7o38uShTw2Qv2ZYpO4ETEDyi36UxAXXkVAVZlv5n8cpG17DSb7b7koFIDUhi KSFyes4QW+3fJDLFKapI7mwvpg4m9VfUOUh8WedqFzdYTNjqS1I77aT+n+6LpELg zqAhDAAYzvtp2EZ3x8KEHJmM1u2/2ZS/IPAGNP8LmYDlZsTYNF428JcVNwa7sZwS AOnp0JNI4+efO+bB3JZgS3ChEVUl8jJ6ccxTiSBTPCHZmHK2zFT48xodA2xi6Sd0 OGb/IUzvVtSg0IB1Q8Z0cHq7GcrhvK1TrxzvMsdnzKoNRPR+yyHvvePBPks/TusK URRYKoEwhuIQywrIF7lbd2NcXpaUAXZz8lQ6IDrQ5jFrtF85TfsW2SphKs3/6xxz Ov/8zQ/z241bU2fp8d/gOA== =9q+B -----END PGP SIGNATURE----- Merge tag 'for-net-2026-05-20' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - hci_sync: Fix not setting mask for HCI_EVT_LE_ALL_REMOTE_FEATURES_COMPLETE - L2CAP: fix UAF in l2cap_sock_cleanup_listen() vs l2cap_conn_del() - ISO: drop ISO_END frames received without prior ISO_START - MGMT: validate Add Extended Advertising Data length - bnep: Fix UAF read of dev->name - btmtk: fix urb->setup_packet leak in error paths - btintel_pcie: Fix incorrect MAC access programming - hci_uart: fix UAFs and race conditions in close and init paths * tag 'for-net-2026-05-20' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: fix UAF in l2cap_sock_cleanup_listen() vs l2cap_conn_del() Bluetooth: hci_uart: fix UAFs and race conditions in close and init paths Bluetooth: MGMT: validate Add Extended Advertising Data length Bluetooth: btmtk: fix urb->setup_packet leak in error paths Bluetooth: ISO: drop ISO_END frames received without prior ISO_START Bluetooth: btintel_pcie: Fix incorrect MAC access programming Bluetooth: hci_sync: Fix not setting mask for HCI_EVT_LE_ALL_REMOTE_FEATURES_COMPLETE Bluetooth: bnep: Fix UAF read of dev->name ==================== Link: https://patch.msgid.link/20260520204959.2902497-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-20 17:26:56 -07:00
Xingwang Xiang	ddf8029623	bpf, skmsg: fix verdict sk_data_ready racing with ktls rx sk_psock_strp_data_ready() already checks tls_sw_has_ctx_rx() and defers to psock->saved_data_ready when a TLS RX context is present, avoiding a conflict with the TLS strparser's ownership of the receive queue (commit `e91de6afa8`, "bpf: Fix running sk_skb program types with ktls"). sk_psock_verdict_data_ready() has no equivalent guard. When a socket is inserted into a sockmap (BPF_SK_SKB_VERDICT) before TLS RX is configured, tls_sw_strparser_arm() saves sk_psock_verdict_data_ready as rx_ctx->saved_data_ready. On data arrival: tls_data_ready -> tls_strp_data_ready -> tls_rx_msg_ready -> saved_data_ready() = sk_psock_verdict_data_ready() -> tcp_read_skb() drains sk_receive_queue via __skb_unlink() without calling tcp_eat_skb(), so copied_seq is not advanced. tls_strp_msg_load() then finds tcp_inq() >= full_len (stale), calls tcp_recv_skb() on the now-empty queue, hits WARN_ON_ONCE(!first), and returns with rx_ctx->strp.anchor.frag_list pointing at a psock-owned (potentially freed) skb. tls_decrypt_sg() subsequently walks that frag_list: use-after-free. Apply the same fix as sk_psock_strp_data_ready(): if a TLS RX context is present, call psock->saved_data_ready (sock_def_readable) to wake recv() waiters and return immediately, leaving the receive queue untouched. TLS retains sole ownership of the queue and decrypts the record normally through tls_sw_recvmsg(). Fixes: `ef5659280e` ("bpf, sockmap: Allow skipping sk_skb parser program") Signed-off-by: Xingwang Xiang <v3rdant.xiang@gmail.com> Link: https://patch.msgid.link/20260517145630.20521-2-v3rdant.xiang@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-20 17:21:21 -07:00
David Howells	8bfab4b6ff	rxrpc: Fix RESPONSE packet verification to extract skb to a linear buffer This improves the fix for CVE-2026-43500. Fix the verification of RESPONSE packets to avoid the problem of overwriting a RESPONSE packet sent via splice to a local address by extracting the contents of the UDP packet into a kmalloc'd linear buffer rather than decrypting the data in place in the sk_buff (which may corrupt the original buffer). Fixes: `24481a7f57` ("rxrpc: Fix conn-level packet handling to unshare RESPONSE packets") Reported-by: Hyunwoo Kim <imv4bel@gmail.com> Closes: https://lore.kernel.org/r/afKV2zGR6rrelPC7@v4bel/ Signed-off-by: David Howells <dhowells@redhat.com> cc: Simon Horman <horms@kernel.org> cc: Jiayuan Chen <jiayuan.chen@linux.dev> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Reviewed-by: Jeffrey Altman <jaltman@auristor.com> Tested-by: Marc Dionne <marc.dionne@auristor.com> Link: https://patch.msgid.link/20260515230516.2718212-4-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-20 16:36:45 -07:00
David Howells	d2bc90cf6c	rxrpc: Fix DATA decrypt vs splice() by copying data to buffer in recvmsg This improves the fix for CVE-2026-43500. Fix the pagecache corruption from in-place decryption of a DATA packet transmitted locally by splice() by getting rid of the packet sharing in the I/O thread and unconditionally extracting the packet content into a bounce buffer in which the buffer is decrypted. recvmsg() (or the kernel equivalent) then copies the data from the bounce buffer to the destination buffer. The sk_buff then remains unmodified. This has an additional advantage in that the packet is then arranged in the buffer with the correct alignment required for the crypto algorithms to process directly. The performance of the crypto does seem to be a little faster and, surprisingly, the unencrypted performance doesn't seem to change much - possibly due to removing complexity from the I/O thread. Yet another advantage is that the I/O thread doesn't have to copy packets which would slow down packet distribution, ACK generation, etc.. The buffer belongs to the call and is allocated initially at 2K, sufficiently large to hold a whole jumbo subpacket, but the buffer will be increased in size if needed. However, to take this work, MSG_PEEK may cause a later packet to be decrypted into the buffer, in which case the earlier one will need re-decrypting for a subsequent recvmsg(). Note that rx_pkt_offset may legitimately see 0 as a valid offset now, so switch to using USHRT_MAX to indicate an invalid offset. Note also that I would generally prefer to replace the buffers of the current sk_buff with a new kmalloc'd buffer of the right size, ditching the old data and frags as this makes the handling of MSG_PEEK easier and removes the re-decryption issue, but this looks like quite a complicated thing to achieve. skb_morph() looks half way to what I want, but I don't want to have to allocate a new sk_buff. Fixes: `d0d5c0cd1e` ("rxrpc: Use skb_unshare() rather than skb_cow_data()") Reported-by: Hyunwoo Kim <imv4bel@gmail.com> Closes: https://lore.kernel.org/r/afKV2zGR6rrelPC7@v4bel/ Signed-off-by: David Howells <dhowells@redhat.com> cc: Simon Horman <horms@kernel.org> cc: Jiayuan Chen <jiayuan.chen@linux.dev> cc: linux-afs@lists.infradead.org Reviewed-by: Jeffrey Altman <jaltman@auristor.com> Tested-by: Marc Dionne <marc.dionne@auristor.com> Link: https://patch.msgid.link/20260515230516.2718212-3-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-20 16:36:45 -07:00
David Howells	2b50aceafe	crypto/krb5, rxrpc: Fix lack of pre-decrypt/pre-verify length checks Change the krb5 crypto library to provide facilities to precheck the length of the message about to be decrypted or verified. Fix AF_RXRPC to make use of this to validate DATA packets secured with RxGK. Fixes: `9d1d2b5934` ("rxrpc: rxgk: Implement the yfs-rxgk security class (GSSAPI)") Closes: https://sashiko.dev/#/patchset/20260511160753.607296-1-dhowells%40redhat.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Herbert Xu <herbert@gondor.apana.org.au> cc: Simon Horman <horms@kernel.org> cc: Chuck Lever <chuck.lever@oracle.com> cc: linux-afs@lists.infradead.org Reviewed-by: Jeffrey Altman <jaltman@auristor.com> Tested-by: Marc Dionne <marc.dionne@auristor.com> Link: https://patch.msgid.link/20260515230516.2718212-2-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-20 16:36:45 -07:00
Jakub Kicinski	b8d7519352	net: shaper: rework the VALID marking (again) Recent commit changed the semantics from NOT_VALID to VALID. I didn't realize that the flags are not stored atomically with the entry in XArray. There's still a race of reader observing a VALID mark for a slot, getting interrupted, writer replacing the entry with a different one, reader continuing, fetching the entry which is now a different pointer than the pointer for which VALID was meant. The biggest consequence of this is that we may see a UAF since net_shaper_rollback() assumed that entries without VALID can be freed without observing RCU. Looks like the XArray marks are buying us nothing at this point. Let's convert the code to an explicit valid field. The smp_load_acquire() / smp_store_release() barriers are marginally cleaner. Reported-by: Sashiko <sashiko-bot@kernel.org> Fixes: `93954b40f6` ("net-shapers: implement NL set and delete operations") Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260515221325.1685455-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-20 16:34:20 -07:00
Jakub Kicinski	a3442936dd	net: shaper: annotate the data races As previously discussed we don't care about making the shaper state fully RCU-compliant because the hierarchy itself can't be dumped in one go over Netlink. Let's annotate the reads and writes to make that clear. The field-by-field assignments will also be useful for the next commit which adds explicit "valid" field (which we don't want to override with the current full struct assignment). Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260515221325.1685455-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-20 16:34:20 -07:00
Gal Pressman	78effd896e	udp: Fix UDP length on last GSO_PARTIAL segment Following the cited commit, __udp_gso_segment() writes single MSS length in the UDP header. The cited patch doesn't account for the fact that the last segment could be a GSO skb by itself. This could happen when the size of the packet is a multiple of MSS, hence the first segment is also the last one (there is no need for a remainder skb). When the post-loop segment is a GSO skb, assign the single MSS length in the UDP header. Fixes: `b10b446ce7` ("udp: gso: Use single MSS length in UDP header for GSO_PARTIAL") Reported-by: Matthew Schwartz <matthew.schwartz@linux.dev> Closes: https://lore.kernel.org/all/6c3fb15e-711d-4b8d-b152-e03d9b05293f@linux.dev/ Tested-by: Matthew Schwartz <matthew.schwartz@linux.dev> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/20260518062250.3019914-3-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-20 15:03:47 -07:00
Alice Mikityanska	5f17ae0f59	udp: gso: Fix handling checksum in __udp_gso_segment The cited commit started using msslen for uh->len, but still uses newlen to adjust uh->check. Although the checksum is ignored in most cases due to the hardware offload, __udp_gso_segment attempts to maintain the correct one. Fix uh->check and adjust it by the right value. Additionally, after the fix, newlen becomes assigned and unused before the loop. The code can be simplified a bit if mss adjustment is dropped, so that newlen becomes equal to msslen before the loop, and msslen can be also dropped, saving a few lines of code. This brings us back to one variable, drops an unneeded arithmetic for mss, and fixes the UDP checksum. Fixes: `b10b446ce7` ("udp: gso: Use single MSS length in UDP header for GSO_PARTIAL") Signed-off-by: Alice Mikityanska <alice@isovalent.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/20260518062250.3019914-2-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-20 15:02:47 -07:00
Safa Karakuş	ab1513597c	Bluetooth: fix UAF in l2cap_sock_cleanup_listen() vs l2cap_conn_del() bt_accept_dequeue() unlinks a not-yet-accepted child from the parent accept queue and release_sock()s it before returning, so the returned sk has no caller reference and is unlocked. l2cap_sock_cleanup_listen() walks these children on listening-socket close. A concurrent HCI disconnect drives hci_rx_work -> l2cap_conn_del() which runs l2cap_chan_del() + l2cap_sock_kill() and frees the child sk and its l2cap_chan; cleanup_listen() then uses both: BUG: KASAN: slab-use-after-free in l2cap_sock_kill l2cap_sock_kill / l2cap_sock_cleanup_listen / __x64_sys_close Freed by: l2cap_conn_del -> l2cap_sock_close_cb -> l2cap_sock_kill This is distinct from the two fixes already in this area: commit `e83f5e24da` ("Bluetooth: serialize accept_q access") serialises the accept_q list/poll and takes temporary refs inside bt_accept_dequeue(), and CVE-2025-39860 serialises the userspace close()/accept() race by calling cleanup_listen() under lock_sock() in l2cap_sock_release(). Neither covers l2cap_conn_del() running from hci_rx_work, so this UAF still reproduces on current bluetooth/master. Take the reference at the source: bt_accept_dequeue() does sock_hold() while sk is still locked, before release_sock(); callers sock_put(). cleanup_listen() pins the chan with l2cap_chan_hold_unless_zero() under a brief child sk lock (serialising vs l2cap_sock_teardown_cb()), drops it before l2cap_chan_lock(), and skips a duplicate l2cap_sock_kill() on SOCK_DEAD. conn->lock is not taken here: cleanup_listen() runs under the parent sk lock and that would invert conn->lock -> chan->lock -> sk_lock (lockdep). KASAN/SMP: an unprivileged listen/close vs HCI-disconnect race produced 12 use-after-free reports per run before this change; 0, and no lockdep report, over 1600+ raced iterations after it on bluetooth/master. Fixes: `15f02b9105` ("Bluetooth: L2CAP: Add initial code for Enhanced Credit Based Mode") Cc: stable@vger.kernel.org Reported-by: Siwei Zhang <oss@fourdim.xyz> Reviewed-by: Siwei Zhang <oss@fourdim.xyz> Signed-off-by: Safa Karakuş <safa.karakus@secunnix.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2026-05-20 16:35:47 -04:00
Michael Bommarito	d3f7d17960	Bluetooth: MGMT: validate Add Extended Advertising Data length MGMT_OP_ADD_EXT_ADV_DATA is registered as a variable-length command, with MGMT_ADD_EXT_ADV_DATA_SIZE as the fixed header size. The handler then uses cp->adv_data_len and cp->scan_rsp_len to validate and copy cp->data, but it never checks that those bytes are part of the mgmt command payload. A short command can therefore make add_ext_adv_data() pass an out-of-bounds pointer into tlv_data_is_valid(). If the bytes beyond the command buffer are addressable, they can also be copied into the advertising instance as scan response data, where the caller can read them back via MGMT_OP_GET_ADV_INSTANCE. The trigger requires CAP_NET_ADMIN in the initial user namespace; KASAN reports an 8-byte slab-out-of-bounds read. Reject commands whose length does not match the fixed header plus both advertising data lengths before parsing cp->data. Fixes: `1241057283` ("Bluetooth: Break add adv into two mgmt commands") Cc: stable@vger.kernel.org Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2026-05-20 16:35:47 -04:00
David Carlier	84c24fb151	Bluetooth: ISO: drop ISO_END frames received without prior ISO_START ISO data PDUs carry a packet-boundary flag indicating START, CONT, END or SINGLE. The ISO_CONT branch of iso_recv() guards against a missing ISO_START by checking conn->rx_len before touching conn->rx_skb, but ISO_END does not. If a peer sends an ISO_END as the first packet on a fresh ISO connection, conn->rx_skb is still NULL and conn->rx_len is zero, so skb_put(conn->rx_skb, ...) dereferences NULL and oopses. For BIS, where receivers sync to a broadcaster without pairing, any broadcaster on the air can trigger this. Mirror the ISO_CONT check at the top of ISO_END so a stray end fragment is logged and dropped instead of crashing the host. Fixes: `ccf74f2390` ("Bluetooth: Add BTPROTO_ISO socket type") Cc: stable@vger.kernel.org Assisted-by: Claude:claude-opus-4-7 Signed-off-by: David Carlier <devnexen@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2026-05-20 16:35:47 -04:00
Luiz Augusto von Dentz	23d528d817	Bluetooth: hci_sync: Fix not setting mask for HCI_EVT_LE_ALL_REMOTE_FEATURES_COMPLETE This fixes not setting the bit for HCI_EVT_LE_ALL_REMOTE_FEATURES_COMPLETE when extended features bit is set otherwise the controller may not generate HCI_EVT_LE_ALL_REMOTE_FEATURES_COMPLETE causing hci_le_read_all_remote_features_sync to timeout waiting for it. Also remove dead code. Fixes: `a106e50be7` ("Bluetooth: HCI: Add support for LL Extended Feature Set") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2026-05-20 16:35:47 -04:00
Jann Horn	59e932ded9	Bluetooth: bnep: Fix UAF read of dev->name bnep_add_connection() needs to keep holding the bnep_session_sem while reading dev->name (just like bnep_get_connlist() does); otherwise the bnep_session() thread can concurrently free the net_device, which can for example be triggered by a concurrent bnep_del_connection(). (This UAF is fairly uninteresting from a security perspective; calling bnep_add_connection() requires passing a capable(CAP_NET_ADMIN) check. It also requires completely tearing down a netdev during a fairly tight race window.) Cc: stable@vger.kernel.org Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2026-05-20 16:35:47 -04:00
Kartik Nair	dc14686f27	wifi: cfg80211: wext: validate chandef in monitor mode cfg80211_wext_siwfreq() constructs a channel definition for monitor mode but passes it to cfg80211_set_monitor_channel() without first validating it with cfg80211_chandef_valid(). This causes a WARN_ON in cfg80211_chandef_dfs_required() when it receives an invalid chandef. Add the missing cfg80211_chandef_valid() check before calling cfg80211_set_monitor_channel() to return -EINVAL early on invalid channel definitions, consistent with how other callers handle this. Reported-by: syzbot+02a1a03b8622d3c7d1c9@syzkaller.appspotmail.com Signed-off-by: Kartik Nair <contact.kartikn@gmail.com> Link: https://patch.msgid.link/20260510202437.7857-1-contact.kartikn@gmail.com [clarify subject] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-05-20 11:44:19 +02:00
Michael Bommarito	a6e6ccd5bd	wifi: mac80211: consume only present negotiated TTLM maps ieee80211_tid_to_link_map_size_ok() validates negotiated TTLM elements against the number of link-map entries indicated by link_map_presence. ieee80211_parse_neg_ttlm() must consume the same layout. The parser advanced its cursor for every TID, including TIDs whose presence bit is clear and therefore have no map bytes in the element. A sparse map can then make a later present TID read past the validated element. The bad bytes land in neg_ttlm->{up,down}link[tid] but are gated by valid_links before being applied to driver state, so a peer cannot turn the read into a policy change. Under KUnit + KASAN with an exact-sized element allocation the OOB read is reported as a slab-out-of-bounds; whether the same trigger fires under the production RX path depends on surrounding allocator state. Advance the cursor only when the current TID has a map present. Fixes: `8f500fbc6c` ("wifi: mac80211: process and save negotiated TID to Link mapping request") Cc: stable@vger.kernel.org Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Link: https://patch.msgid.link/20260515151719.1317659-2-michael.bommarito@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-05-20 11:20:37 +02:00
Zhao Li	d71c841be5	wifi: mac80211: capture fast-RX rate before mesh reuses skb->cb ieee80211_invoke_fast_rx() reads RX status through IEEE80211_SKB_RXCB(skb), which aliases the same skb->cb storage that ieee80211_rx_mesh_data() reuses as IEEE80211_TX_INFO. In the unicast forward path, mesh_data does: info = IEEE80211_SKB_CB(fwd_skb); memset(info, 0, sizeof(*info)); on the same skb the caller still names via rx->skb, then either queues the skb for TX (success) or kfree_skb()'s it (no-route) before returning RX_QUEUED. The caller's RX_QUEUED arm then calls sta_stats_encode_rate(status) on memory that is either zeroed (success path) or freed (no-route path). The latter is KASAN slab-use-after-free in ieee80211_prepare_and_rx_handle. Fix by encoding the rate from status before invoking ieee80211_rx_mesh_data(), so the RX_QUEUED arm consumes a value captured while status was still backed by valid memory. Fixes: `3468e1e0c6` ("wifi: mac80211: add mesh fast-rx support") Cc: stable@vger.kernel.org Signed-off-by: Zhao Li <enderaoelyther@gmail.com> Link: https://patch.msgid.link/20260509043427.60322-2-enderaoelyther@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-05-20 11:19:53 +02:00
Johannes Berg	fe2d61a5d2	wifi: mac80211: fix multi-link element inheritance When parsing a beacon, mac80211 erroneously inherits any reconfiguration or EPCS multi-link elements from the outer elements into the multi-BSSID profile that's requested, if connected to a non-transmitted BSS, unless that profile has a non-inheritance element. This also happens if parsing a multi-BSSID profile that doesn't have a non-inheritance element. Fix this by having an empty non-inheritance element so cfg80211_is_element_inherited() is invoked in these cases and causes the parser to skip the elements that should never be inherited. Fixes: `cf36cdef10` ("wifi: mac80211: Add support for parsing Reconfiguration Multi Link element") Fixes: `24711d60f8` ("wifi: mac80211: Support parsing EPCS ML element") Reviewed-by: Ilan Peer <ilan.peer@intel.com> Reviewed-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20260508091032.92184c0a3f08.I3c43b0b63d2cef8a4ddddaef1c2faaeb1de711ad@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-05-20 11:19:53 +02:00
Johannes Berg	a74e893f30	wifi: mac80211: fix MLE defragmentation If either reconf or EPCS multi-link element (MLE) is contained in a non-transmitted profile, the defragmentation routine is called with a pointer to the defragmented copy, but the original elements. This is incorrect for two reasons: - if the original defragmentation was needed, it will not find the correct data - if the original frame is at a higher address, the parsing will potentially overrun the heap data (though given the layout of the buffers, only into the new defragmentation buffer, and then it has to stop and fail once that's filled with copied data. Fix it by tracking the container along with the pointer and in doing so also unify the two almost identical defragmentation routines. Fixes: `4d70e9c548` ("wifi: mac80211: defragment reconfiguration MLE when parsing") Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com> Reviewed-by: Ilan Peer <ilan.peer@intel.com> Link: https://patch.msgid.link/20260508091031.8a6c34613178.I4de16ebbce2d27f2f8f98fc49949c7a376c2fe8d@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-05-20 11:19:52 +02:00
Emmanuel Grumbach	e1e83feb8e	wifi: mac80211: don't override max_amsdu_subframes In client mode, the extended capabilities are handled by the kernel looking at the association frame. When the supplicant installs the keys it calls sta_apply_parameters and it doesn't include the extended capabilities since those can't change after association. As a result, we overrode the max_amsdu_subframes that we set after association. Check that the ext_capa coming from the user space is valid before looking at it. If the ext_capa is NULL, it really means that the extended capabilities are not changed (as opposed to cleared). The default value for max_amsdu_subframes is 0, which means there is no limit. This value is valid and in case the association response frame does not have extended capabilities, this is the value we should use. Closes: https://bugzilla.kernel.org/show_bug.cgi?id=221079 Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20260513170623.828dbb58c782.Ifd2bfc190c26140e919127adb02ffddd7b551499@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-05-20 11:14:41 +02:00
Alexandru Hossu	f718506edd	wifi: mac80211: bounds-check link_id in ieee80211_ml_epcs IEEE80211_MLE_STA_EPCS_CONTROL_LINK_ID is 0x000f, so link_id extracted from a PRIO_ACCESS ML element PER_STA_PROFILE subelement can be 0..15. sdata->link[] has IEEE80211_MLD_MAX_NUM_LINKS (15) entries (indices 0..14), making index 15 out-of-bounds. A connected WiFi 7 AP can trigger this by sending an EPCS Enable Response action frame with a PER_STA_PROFILE subelement where link_id = 15. The unsolicited-notification path (dialog_token = 0) is reachable any time EPCS is already enabled, without any prior client request. sdata->link[15] reads into the first word of sdata->activate_links_work (a wiphy_work whose embedded list_head is non-NULL after INIT_LIST_HEAD), so the NULL check on the result does not catch the invalid access. The garbage pointer is then passed to ieee80211_sta_wmm_params(), which dereferences link->sdata and crashes the kernel. The same class of bug was fixed for ieee80211_ml_reconfiguration() by commit `162d331d83` ("wifi: mac80211: bounds-check link_id in ieee80211_ml_reconfiguration"). Fixes: `de86c5f608` ("wifi: mac80211: Add support for EPCS configuration") Signed-off-by: Alexandru Hossu <hossu.alexandru@gmail.com> Link: https://patch.msgid.link/20260515102908.1653088-1-hossu.alexandru@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-05-20 11:04:17 +02:00
Jann Horn	be309f8eae	af_unix: Fix UAF read of tail->len in unix_stream_data_wait() unix_stream_data_wait() does skb_peek_tail(&sk->sk_receive_queue) without holding any lock that prevents SKBs on that queue from being dequeued and freed. This has been the case since commit `79f632c71b` ("unix/stream: fix peeking with an offset larger than data in queue"). The first consequence of this is that the pointer comparison `tail != last` can be false even if `last` semantically refers to an already-freed SKB while `tail` is a new SKB allocated at the same address; which can cause unix_stream_data_wait() to wrongly keep blocking after new data has arrived, but only in a weird scenario where a peeking recv() and a normal recv() on the same socket are racing, which is probably not a real problem. But since commit `2b514574f7` ("net: af_unix: implement splice for stream af_unix sockets"), `tail` is actually dereferenced, which can cause UAF in the following race scenario (where test_setup() runs single-threaded, and afterwards, test_thread1() and test_thread2() run concurrently in two threads: ``` static int socks[2]; void test_setup(void) { socketpair(AF_UNIX, SOCK_STREAM, 0, socks); send(socks[1], "A", 1, 0); int peekoff = 1; setsockopt(socks[0], SOL_SOCKET, SO_PEEK_OFF, &peekoff, sizeof(peekoff)); } void test_thread1(void) { char dummy; recv(socks[0], &dummy, 1, MSG_PEEK); } void test_thread2(void) { char dummy; recv(socks[0], &dummy, 1, 0); shutdown(socks[1], SHUT_WR); } ``` when racing like this: ``` thread1 thread2 unix_stream_read_generic mutex_lock(&u->iolock) skb_peek(&sk->sk_receive_queue) skb_peek_next(skb, &sk->sk_receive_queue) mutex_unlock(&u->iolock) unix_stream_read_generic unix_state_lock(sk) skb_peek(&sk->sk_receive_queue) unix_state_unlock(sk) unix_stream_data_wait unix_state_lock(sk) tail = skb_peek_tail(&sk->sk_receive_queue) spin_lock(&sk->sk_receive_queue.lock) __skb_unlink(skb, &sk->sk_receive_queue) spin_unlock(&sk->sk_receive_queue.lock) consume_skb(skb) [frees the SKB] `tail != last`: false `tail`: true `tail->len != last_len` *UAF* ``` Fix the UAF by removing the read of tail->len; checking tail->len would only make sense if SKBs in the receive queue of a UNIX socket could grow, which can no longer happen. Kuniyuki explained: > When commit `869e7c6248` ("net: af_unix: implement stream sendpage > support") added sendpage() support, data could be appended to the last > skb in the receiver's queue. > > That's why we needed to check if the length of the last skb was changed > while waiting for new data in unix_stream_data_wait(). > > However, commit `a0dbf5f818` ("af_unix: Support MSG_SPLICE_PAGES") and > commit `57d44a354a` ("unix: Convert unix_stream_sendpage() to use > MSG_SPLICE_PAGES") refactored sendmsg(), and now data is always added > to a new skb. That means this fix is not suitable for kernels before 6.5. Fixes: `2b514574f7` ("net: af_unix: implement splice for stream af_unix sockets") Cc: stable@vger.kernel.org # 6.5.x Signed-off-by: Jann Horn <jannh@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260518-b4-unix-recv-wait-hotfix-v2-1-83e29ce8ad31@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-19 18:53:56 -07:00
Justin Iurman	d4ea0dfd75	ipv6: ioam: add NULL check for idev in ipv6_hop_ioam() Reported by Sashiko: The function ipv6_hop_ioam() accesses __in6_dev_get(skb->dev)->cnf.ioam6_enabled without validating the returned idev pointer. Because addrconf_ifdown() can concurrently clear dev->ip6_ptr via RCU, __in6_dev_get() can return NULL during interface teardown, which could cause a NULL pointer dereference when processing an IOAM Hop-by-Hop option. Let's add a check and use SKB_DROP_REASON_IPV6DISABLED accordingly. Fixes: `9ee11f0fff` ("ipv6: ioam: Data plane support for Pre-allocated Trace") Cc: stable@vger.kernel.org Signed-off-by: Justin Iurman <justin.iurman@gmail.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260517183059.29140-1-justin.iurman@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-19 18:46:44 -07:00
Ido Schimmel	4df78ff026	bridge: mcast: Fix a possible use-after-free when removing a bridge port When per-VLAN multicast snooping is enabled, the bridge iterates over all the bridge ports, disables the per-port multicast context on each port and enables the per-{port, VLAN} multicast contexts instead. The reverse happens when per-VLAN multicast snooping is disabled. When global multicast snooping is enabled, the bridge iterates over all the bridge ports and enables the per-port multicast context on each port. The reverse happens when multicast snooping is disabled. The above scheme can result in a situation where both types of contexts (per-port and per-{port, VLAN}) are enabled on a single bridge port: # ip link add name br1 up type bridge mcast_snooping 1 mcast_querier 1 vlan_filtering 1 # ip link add name dummy1 up master br1 type dummy # ip link set dev br1 type bridge mcast_vlan_snooping 1 # ip link set dev br1 type bridge mcast_snooping 0 # ip link set dev br1 type bridge mcast_snooping 1 This is not intended and it is a problem since the commit cited below. Prior to this commit, when removing a bridge port, br_multicast_disable_port() would disable the per-port multicast context and the per-{port, VLAN} multicast contexts would get disabled when flushing VLANs. After this commit, br_multicast_disable_port() only disables the per-port multicast context if per-VLAN multicast snooping is disabled. If both types of contexts were enabled on the port when it was removed, the per-port multicast context would remain enabled when freeing the bridge port, leading to a use-after-free [1]. Fix by preventing the bridge from enabling / disabling the per-port multicast contexts when toggling global multicast snooping if per-VLAN multicast snooping is enabled. [1] ODEBUG: free active (active state 0) object: ffff88810f8bda78 object type: timer_list hint: br_ip6_multicast_port_query_expired (net/bridge/br_multicast.c:1927) WARNING: lib/debugobjects.c:629 at debug_print_object+0x1b1/0x3e0, CPU#5: swapper/5/0 [...] Call Trace: <IRQ> __debug_check_no_obj_freed (lib/debugobjects.c:1116) kfree (mm/slub.c:2620 mm/slub.c:6250 mm/slub.c:6565) kobject_cleanup (lib/kobject.c:689) rcu_do_batch (kernel/rcu/tree.c:2617) rcu_core (kernel/rcu/tree.c:2869) handle_softirqs (kernel/softirq.c:622) __irq_exit_rcu (kernel/softirq.c:656 kernel/softirq.c:496 kernel/softirq.c:735) irq_exit_rcu (kernel/softirq.c:752) sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1061 (discriminator 47) arch/x86/kernel/apic/apic.c:1061 (discriminator 47)) </IRQ> Fixes: `4b30ae9adb` ("net: bridge: mcast: re-implement br_multicast_{enable, disable}_port functions") Reported-by: syzbot+ae231e0552fa77b26ea1@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/87qznowlfs.ffs@tglx/ Reported-by: Thomas Gleixner <tglx@kernel.org> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260517121122.188333-2-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-19 18:15:22 -07:00
Gang Yan	3a543ae0e2	mptcp: update window_clamp on subflows when SO_RCVBUF is set Add __mptcp_subflow_set_rcvbuf() helper to write the subflow sk_rcvbuf, but also to call the recently added tcp_set_rcvbuf() helper to update window_clamp. This is needed because the window clap is updated when scaling_ratio changes, in tcp_measure_rcv_mss(). Until scaling_ratio changes, the subflow is stuck with the old window clamp which may be based on a small initial buffer. Use this new helper in both mptcp_sol_socket_sync_intval() (setsockopt path) and sync_socket_options() (new subflow creation path). Note that this patch depends on commit `b025461303` ("tcp: update window_clamp when SO_RCVBUF is set"): it fixes the issue on TCP side, but the same fix is needed on MPTCP side as well. Fixes: `a2cbb16039` ("tcp: Update window clamping condition") Cc: stable@vger.kernel.org Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/619 Signed-off-by: Gang Yan <yangang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260515-net-mptcp-misc-fixes-7-1-rc4-v2-5-701e96419f2f@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-05-19 15:36:35 +02:00
Paolo Abeni	0981f90e1a	mptcp: reset rcv wnd on disconnect If the MPTCP socket fallback to TCP before the MP handshake completion, the IASN remain 0, and the rcv_wnd_sent field is not explicitly initialized, just incremented over time with the data transfer. At disconnect time such value is not cleared. If the next connection falls back to TCP before the MP handshake completion, the data transfer will keep incrementing the receive window end sequence starting from the last value used in the previous connection: the announced window will be unrelated from the actual receiver buffer size and likely too big. Address the issue zeroing the field at disconnect time. Fixes: `b29fcfb54c` ("mptcp: full disconnect implementation") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260515-net-mptcp-misc-fixes-7-1-rc4-v2-4-701e96419f2f@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-05-19 15:36:35 +02:00
Li Xiasong	51e398a3b8	mptcp: pm: fix ADD_ADDR timer infinite retry on option space insufficient When TCP option space is insufficient (e.g., when sending ADD_ADDR with an IPv6 address and port while tcp_timestamps is enabled), the original code jumped to out_unlock without clearing the addr_signal flag. This caused mptcp_pm_add_timer to keep rescheduling indefinitely, not sending ADD_ADDR, preventing subsequent addresses in the endpoint list from being announced. Handle this case by clearing the ADD_ADDR signal and skipping the matching ADD_ADDR retransmission entry. The skip path cancels the matching timer (with id check) and advances PM state progression, preserving forward progress to subsequent PM work. This cancellation is inherently best-effort. A concurrent add_timer callback may already be running and may acquire pm.lock before the cancel path updates entry state. In that case, one final ADD_ADDR transmit attempt can still be executed. Once the cancel path sets entry->retrans_times to ADD_ADDR_RETRANS_MAX, the callback-side retrans_times check suppresses further ADD_ADDR retransmissions. Note that when an ADD_ADDR is being prepared, a pure-ACK is queued. On the output side, it means that it is fine to skip non-pure-ACK packets, when drop_other_suboptions is set: a pure-ACK will be processed soon after. Fixes: `00cfd77b90` ("mptcp: retransmit ADD_ADDR when timeout") Cc: stable@vger.kernel.org Signed-off-by: Li Xiasong <lixiasong1@huawei.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260515-net-mptcp-misc-fixes-7-1-rc4-v2-2-701e96419f2f@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-05-19 15:36:35 +02:00
Shardul Bankar	50c2d91c5d	mptcp: do not drop partial packets When a packet arrives with map_seq < ack_seq < end_seq, the beginning of the packet has already been acknowledged but the end contains new data. Currently the entire packet is dropped as "old data," forcing the sender to retransmit. Instead, skip the already-acked bytes by adjusting the skb offset and enqueue only the new portion. Update bytes_received and ack_seq to reflect the new data consumed. A previous attempt at this fix has been sent by Paolo Abeni [1], but had issues [2]: it also added a zero-window check and changed rcv_wnd_sent initialization, which caused test regressions. This version addresses only the partial packet handling without modifying receive window accounting. Fixes: `ab174ad8ef` ("mptcp: move ooo skbs into msk out of order queue.") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/c9b426a4e163aa3c4fe8b80c79f1a610f47ae7d8.1763075056.git.pabeni@redhat.com [1] Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/600 [2] Signed-off-by: Shardul Bankar <shardul.b@mpiricsoftware.com> [pabeni@redhat.com: update map] Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260515-net-mptcp-misc-fixes-7-1-rc4-v2-1-701e96419f2f@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-05-19 15:36:35 +02:00
Sven Eckelmann	f80d3d98d2	batman-adv: bla: avoid NULL-ptr deref for claim via dropped interface Without rtnl_lock held, a hardif might be retrieved as primary interface of a meshif, but then (while operating on this interface) getting decoupled from the mesh interface. In this case, the meshif still exists but the pointer from the primary hardif to the meshif is set to NULL. The mesh_iface must be checked first to be non-NULL before continuing to send an ARP request using meshif. Cc: stable@kernel.org Fixes: `23721387c4` ("batman-adv: add basic bridge loop avoidance code") Reported-by: Ido Schimmel <idosch@nvidia.com> Reported-by: syzbot+9fdcc9f05a98a540b816@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=9fdcc9f05a98a540b816 Signed-off-by: Sven Eckelmann <sven@narfation.org>	2026-05-19 10:43:54 +02:00
Sven Eckelmann	83ab69bd12	batman-adv: bla: avoid double decrement of bla.num_requests The bla.num_requests is increased when no request_sent was in progress. And it is decremented in various places (announcement was received, backbone is purged, periodic work). But the check if the request_sent is actually set to a specific state and the atomic_dec/_inc are not safe because they are not atomic (TOCTOU) and multiple such code portions can run concurrently. At the same time, it is necessary to modify request_sent (state) and bla.num_requests atomically. Otherwise batadv_bla_send_request() might set request_sent to 1 and is interrupted. batadv_handle_announce() can then set request_sent back to 0 and decrement num_requests before batadv_bla_send_request() incremented it. The two operations must therefore be locked. And since state (request_sent) and wait_periods are only accessed inside this lock, they can be converted to simpler datatypes. And to avoid that the bla.num_requests is touched by a parallel running context with a valid backbone_gw reference after batadv_bla_purge_backbone_gw() ran, a third state "stopped" is required to correctly signal that a backbone_gw is in the state of being cleaned up. Cc: stable@kernel.org Fixes: `23721387c4` ("batman-adv: add basic bridge loop avoidance code") Signed-off-by: Sven Eckelmann <sven@narfation.org>	2026-05-19 09:09:34 +02:00
Sven Eckelmann	0459430add	batman-adv: bla: fix report_work leak on backbone_gw purge batadv_bla_purge_backbone_gw() removes stale backbone gateway entries, but fails to properly handle their associated report_work: - If report_work is running, the purge must wait for it to finish before freeing the backbone_gw, otherwise the worker may access freed memory (e.g. bat_priv). - If report_work is pending, the purge must cancel it and release the reference held for that pending work item. The previous implementation called hlist_for_each_entry_safe() inside a spin_lock_bh() section, but cancel_work_sync() may sleep and therefore cannot be called from within a spinlock-protected region. Restructure the loop to handle one entry per spinlock critical section: acquire the lock, find the next entry to purge, remove it from the hash list, then release the lock before calling cancel_work_sync() and dropping the hash_entry reference. Repeat until no more entries require purging. Cc: stable@kernel.org Fixes: `23721387c4` ("batman-adv: add basic bridge loop avoidance code") Reviewed-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Sven Eckelmann <sven@narfation.org>	2026-05-19 09:09:34 +02:00
Sven Eckelmann	aa3153bd13	batman-adv: iv: recover OGM scheduling after forward packet error When batadv_iv_ogm_schedule_buff() fails to allocate and queue a forward packet for OGM transmission, the work item that drives periodic OGM scheduling is never re-armed. This silently halts transmission of the node's own OGMs on the affected interface — only OGMs from other peers continue to be aggregated and forwarded. Fix this by tracking whether batadv_iv_ogm_queue_add() (and transitively batadv_iv_ogm_aggregate_new()) successfully scheduled a forward packet. When scheduling fails, batadv_iv_ogm_schedule_buff() falls back to queuing a dedicated recovery work item (reschedule_work) that fires after one originator interval and calls batadv_iv_ogm_schedule() again. Cc: stable@kernel.org Fixes: `c6c8fea297` ("net: Add batman-adv meshing protocol") Signed-off-by: Sven Eckelmann <sven@narfation.org>	2026-05-19 09:09:29 +02:00

1 2 3 4 5 ...

84367 Commits (e7ae89a0c97ce2b68b0983cd01eda67cf373517d)