mirror-linux

Commit Graph

Author	SHA1	Message	Date
Pablo Neira Ayuso	a67fd55f6a	netfilter: nf_tables: remove redundant chain validation on register store This validation predates the introduction of the state machine that determines when to enter slow path validation for error reporting. Currently, table validation is perform when: - new rule contains expressions that need validation. - new set element with jump/goto verdict. Validation on register store skips most checks with no basechains, still this walks the graph searching for loops and ensuring expressions are called from the right hook. Remove this. Fixes: `a654de8fdc` ("netfilter: nf_tables: fix chain dependency validation") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>	2025-12-11 13:08:43 +01:00
Florian Westphal	5ec8ca26fe	netfilter: nf_nat: remove bogus direction check Jakub reports spurious failures of the 'conntrack_reverse_clash.sh' selftest. A bogus test makes nat core resort to port rewrite even though there is no need for this. When the test is made, nf_nat_used_tuple() would already have caused us to return if no other CPU had added a colliding entry. Moreover, nf_nat_used_tuple() would have ignored the colliding entry if their origin tuples had been the same. All that is left to check is if the colliding entry in the hash table is subject to NAT, and, if its not, if our entry matches in the reverse direction, e.g. hash table has addr1:1234 -> addr2:80, and we want to commit addr2:80 -> addr1:1234. Because we already checked that neither the new nor the committed entry is subject to NAT we only have to check origin vs. reply tuple: for non-nat entries, the reply tuple is always the inverted original. Just in case there are more problems extend the error reporting in the selftest while at it and dump conntrack table/stats on error. Reported-by: Jakub Kicinski <kuba@kernel.org> Closes: https://lore.kernel.org/netdev/20251206175135.4a56591b@kernel.org/ Fixes: `d8f84a9bc7` ("netfilter: nf_nat: don't try nat source port reallocation for reverse dir clash") Signed-off-by: Florian Westphal <fw@strlen.de>	2025-12-11 13:08:37 +01:00
Jozsef Kadlecsik	99c6931fe1	MAINTAINERS: Remove Jozsef Kadlecsik from MAINTAINERS file I'm retiring from maintaining netfilter. I'll still keep an eye on ipset and respond to anything related to it. Thank you! Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>	2025-12-11 12:40:46 +01:00
Dan Carpenter	885bebac99	nfc: pn533: Fix error code in pn533_acr122_poweron_rdr() Set the error code if "transferred != sizeof(cmd)" instead of returning success. Fixes: `dbafc28955` ("NFC: pn533: don't send USB data off of the stack") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://patch.msgid.link/aTfIJ9tZPmeUF4W1@stanley.mountain Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-12-11 01:40:00 -08:00
Victor Nogueira	5914428e0e	selftests/tc-testing: Create tests to exercise ets classes active list misplacements Add a test case for a bug fixed by Jamal [1] and for scenario where an ets drr class is inserted into the active list twice. - Try to delete ets drr class' qdisc while still keeping it in the active list - Try to add ets class to the active list twice [1] https://lore.kernel.org/netdev/20251128151919.576920-1-jhs@mojatatu.com/ Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Victor Nogueira <victor@mojatatu.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/20251208190125.1868423-2-victor@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-12-11 01:36:40 -08:00
Victor Nogueira	b1e125ae42	net/sched: ets: Remove drr class from the active list if it changes to strict Whenever a user issues an ets qdisc change command, transforming a drr class into a strict one, the ets code isn't checking whether that class was in the active list and removing it. This means that, if a user changes a strict class (which was in the active list) back to a drr one, that class will be added twice to the active list [1]. Doing so with the following commands: tc qdisc add dev lo root handle 1: ets bands 2 strict 1 tc qdisc add dev lo parent 1:2 handle 20: \ tbf rate 8bit burst 100b latency 1s tc filter add dev lo parent 1: basic classid 1:2 ping -c1 -W0.01 -s 56 127.0.0.1 tc qdisc change dev lo root handle 1: ets bands 2 strict 2 tc qdisc change dev lo root handle 1: ets bands 2 strict 1 ping -c1 -W0.01 -s 56 127.0.0.1 Will trigger the following splat with list debug turned on: [ 59.279014][ T365] ------------[ cut here ]------------ [ 59.279452][ T365] list_add double add: new=ffff88801d60e350, prev=ffff88801d60e350, next=ffff88801d60e2c0. [ 59.280153][ T365] WARNING: CPU: 3 PID: 365 at lib/list_debug.c:35 __list_add_valid_or_report+0x17f/0x220 [ 59.280860][ T365] Modules linked in: [ 59.281165][ T365] CPU: 3 UID: 0 PID: 365 Comm: tc Not tainted 6.18.0-rc7-00105-g7e9f13163c13-dirty #239 PREEMPT(voluntary) [ 59.281977][ T365] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 59.282391][ T365] RIP: 0010:__list_add_valid_or_report+0x17f/0x220 [ 59.282842][ T365] Code: 89 c6 e8 d4 b7 0d ff 90 0f 0b 90 90 31 c0 e9 31 ff ff ff 90 48 c7 c7 e0 a0 22 9f 48 89 f2 48 89 c1 4c 89 c6 e8 b2 b7 0d ff 90 <0f> 0b 90 90 31 c0 e9 0f ff ff ff 48 89 f7 48 89 44 24 10 4c 89 44 ... [ 59.288812][ T365] Call Trace: [ 59.289056][ T365] <TASK> [ 59.289224][ T365] ? srso_alias_return_thunk+0x5/0xfbef5 [ 59.289546][ T365] ets_qdisc_change+0xd2b/0x1e80 [ 59.289891][ T365] ? __lock_acquire+0x7e7/0x1be0 [ 59.290223][ T365] ? __pfx_ets_qdisc_change+0x10/0x10 [ 59.290546][ T365] ? srso_alias_return_thunk+0x5/0xfbef5 [ 59.290898][ T365] ? __mutex_trylock_common+0xda/0x240 [ 59.291228][ T365] ? __pfx___mutex_trylock_common+0x10/0x10 [ 59.291655][ T365] ? srso_alias_return_thunk+0x5/0xfbef5 [ 59.291993][ T365] ? srso_alias_return_thunk+0x5/0xfbef5 [ 59.292313][ T365] ? trace_contention_end+0xc8/0x110 [ 59.292656][ T365] ? srso_alias_return_thunk+0x5/0xfbef5 [ 59.293022][ T365] ? srso_alias_return_thunk+0x5/0xfbef5 [ 59.293351][ T365] tc_modify_qdisc+0x63a/0x1cf0 Fix this by always checking and removing an ets class from the active list when changing it to strict. [1] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/tree/net/sched/sch_ets.c?id=ce052b9402e461a9aded599f5b47e76bc727f7de#n663 Fixes: `cd9b50adc6` ("net/sched: ets: fix crash when flipping from 'strict' to 'quantum'") Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Victor Nogueira <victor@mojatatu.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/20251208190125.1868423-1-victor@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-12-11 01:36:40 -08:00
Junrui Luo	8a11ff0948	caif: fix integer underflow in cffrml_receive() The cffrml_receive() function extracts a length field from the packet header and, when FCS is disabled, subtracts 2 from this length without validating that len >= 2. If an attacker sends a malicious packet with a length field of 0 or 1 to an interface with FCS disabled, the subtraction causes an integer underflow. This can lead to memory exhaustion and kernel instability, potential information disclosure if padding contains uninitialized kernel memory. Fix this by validating that len >= 2 before performing the subtraction. Reported-by: Yuhao Jiang <danisjiang@gmail.com> Reported-by: Junrui Luo <moonafterrain@outlook.com> Fixes: `b482cd2053` ("net-caif: add CAIF core protocol stack") Signed-off-by: Junrui Luo <moonafterrain@outlook.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/SYBPR01MB7881511122BAFEA8212A1608AFA6A@SYBPR01MB7881.ausprd01.prod.outlook.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-12-11 01:35:41 -08:00
Marcus Hughes	71cfa7c893	net: sfp: extend Potron XGSPON quirk to cover additional EEPROM variant Some Potron SFP+ XGSPON ONU sticks are shipped with different EEPROM vendor ID and vendor name strings, but are otherwise functionally identical to the existing "Potron SFP+ XGSPON ONU Stick" handled by sfp_quirk_potron(). These modules, including units distributed under the "Better Internet" branding, use the same UART pin assignment and require the same TX_FAULT/LOS behaviour and boot delay. Re-use the existing Potron quirk for this EEPROM variant. Signed-off-by: Marcus Hughes <marcus.hughes@betterinternet.ltd> Link: https://patch.msgid.link/20251207210355.333451-1-marcus.hughes@betterinternet.ltd Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-12-11 01:25:14 -08:00
Jakub Kicinski	95f3013e88	linux-can-fixes-for-6.19-20251210 -----BEGIN PGP SIGNATURE----- iQFHBAABCgAxFiEEn/sM2K9nqF/8FWzzDHRl3/mQkZwFAmk5L74THG1rbEBwZW5n dXRyb25peC5kZQAKCRAMdGXf+ZCRnOSNCACjywpibubStsI/mEHXfsP8Iavu7U/w YZXnfsoTaR/TTYIKmuGo0KVEGrErzV4zBI/kaSWlXweZCepAASJ31AUykVWEoaYx x0hYagsrICSMEE+/amWMEnKNU8FGNHB6lxlkAdyKZWyhBgT55C35SAmavBEVjw/i stQyDAunBKfT2AEtL1CedxM7c1rTMxoK81t1igJCeB4i5khDjC4iQuOTLunbAVES XJcKa+1VOWt+/VLpm+6UhYHzb8nM9NuifbdMmb2d9rnDoTG40YSw0/5/X48TCty9 hNwwkH3xDhP052Qq0aFlXde4U7rM1sftTvGrP6EeZRuu/creETOCn/Y7 =69I5 -----END PGP SIGNATURE----- Merge tag 'linux-can-fixes-for-6.19-20251210' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2025-12-10 Arnd Bergmann's patch fixes a build dependency with the CAN protocols and drivers introduced in the current development cycle. The last patch is by me and fixes the error handling cleanup in the gs_usb driver. * tag 'linux-can-fixes-for-6.19-20251210' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can: can: gs_usb: gs_can_open(): fix error handling can: fix build dependency ==================== Link: https://patch.msgid.link/20251210083448.2116869-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-12-11 01:02:30 -08:00
Jakub Kicinski	898ae76a70	netfilter pull request nf-25-12-10 -----BEGIN PGP SIGNATURE----- iQJBBAABCAArFiEEgKkgxbID4Gn1hq6fcJGo2a1f9gAFAmk5UkYNHGZ3QHN0cmxl bi5kZQAKCRBwkajZrV/2ABGKEAChqKDT9NB2rbp1uxyj26TE8KtWB90FioA/yZcV 6H9lmTp5CXYRsYvkmRDM+xihYHSJpwNptxMVQJ/QonL+LPe8TwiP+THBBwknqrmY JY5rgcEbhAbYBnmHJ9T1soK2FHsisUoyNjrA/6rvQQpkoi9+2j/WPrDT765RdXmM I1fiF3qvKP1rYwpibOqXRozrlPVmBDB/ffibM7rNGR+K2N8B3vZPRbhQrc6B60ao gj6LLdTWZJ4wduPOVo03CzDqzu8hpL1HGwGPYqP1ol3TFlSy8+ByEM+92Nj4BEfm qX+3WDqxYc0U01pjz+TiXfI8nC1o5hzZiDWZ39eBp0zvxBoAyNCjc5ZKLE++EjvI 6hdDcJCcaOipFDv5GI/UmPadYjXf33YWsPK/9mXdChagXvaXtEB/b2KwbHUO4wMT eJEtA9NmuvRAk6tGC9L2T7hurre/Dt8Jx35uFz0nJowqNfLxvFmAKeEB6v3yQKGv Jb+LD3cEY3a4q7o+Vq3AVUZTqQtifW5JYwIailWHWXtxbnEtADQdWFa7NZD1Nm12 U2HFPN0z697WaDByNuK33Svc4CetE5oMXl+0WwE6Qa9hNCSnAkpyZKL3bTwJqEtQ Ij+4IXoQndgTLbr84koJYcwlTGBDu9li7ii59xh1B1K+BLxJmu/RULxlIR4x010v ygvezw== =LRTM -----END PGP SIGNATURE----- Merge tag 'nf-25-12-10' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Florian Westphal says: ==================== netfilter: updates for net 1) Fix refcount leaks in nf_conncount, from Fernando Fernandez Mancera. This addresses a recent regression that came in the last -next pull request. 2) Fix a null dereference in route error handling in IPVS, from Slavin Liu. This is an ancient issue dating back to 5.1 days. 3) Always set ifindex in route tuple in the flowtable output path, from Lorenzo Bianconi. This bug came in with the recent output path refactoring. 4) Prefer 'exit $ksft_xfail' over 'exit $ksft_skip' when we fail to trigger a nat race condition to exercise the clash resolution path in selftest infra, $ksft_skip should be reserved for missing tooling, From myself. * tag 'nf-25-12-10' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: selftests: netfilter: prefer xfail in case race wasn't triggered netfilter: always set route tuple out ifindex ipvs: fix ipv4 null-ptr-deref in route error path netfilter: nf_conncount: fix leaked ct in error paths ==================== Link: https://patch.msgid.link/20251210110754.22620-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-12-11 00:58:38 -08:00
Jakub Kicinski	237c1e152b	Merge branch 'selftests-forwarding-vxlan_bridge_1q_mc_ul-fix-flakiness' Petr Machata says: ==================== selftests: forwarding: vxlan_bridge_1q_mc_ul: Fix flakiness The net/forwarding/vxlan_bridge_1q_mc_ul selftest runs an overlay traffic, forwarded over a multicast-routed VXLAN underlay. In order to determine whether packets reach their intended destination, it uses a TC match. For convenience, it uses a flower match, which however does not allow matching on the encapsulated packet. So various service traffic ends up being indistinguishable from the test packets, and ends up confusing the test. To alleviate the problem, the test uses sleep to allow the necessary service traffic to run and clear the channel, before running the test traffic. This worked for a while, but lately we have nevertheless seen flakiness of the test in the CI. In this patchset, first generalize tc_rule_stats_get() to support u32 in patch #1, then in patch #2 convert the test to use u32 to allow parsing deeper into the packet, and in #3 drop the now-unnecessary sleep. ==================== Link: https://patch.msgid.link/cover.1765289566.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-12-11 00:53:19 -08:00
Petr Machata	514520b34b	selftests: forwarding: vxlan_bridge_1q_mc_ul: Drop useless sleeping After fixing traffic matching in the previous patch, the test does not need to use the sleep anymore. So drop vx_wait() altogether, migrate all callers of vx{10,20}_create_wait() to the corresponding _create(), and drop the now unused _create_wait() helpers. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/eabfe4fa12ae788cf3b8c5c876a989de81dfc3d3.1765289566.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-12-11 00:53:16 -08:00
Petr Machata	0c8b9a68b3	selftests: forwarding: vxlan_bridge_1q_mc_ul: Fix flakiness This test runs an overlay traffic, forwarded over a multicast-routed VXLAN underlay. In order to determine whether packets reach their intended destination, it uses a TC match. For convenience, it uses a flower match, which however does not allow matching on the encapsulated packet. So various service traffic ends up being indistinguishable from the test packets, and ends up confusing the test. To alleviate the problem, the test uses sleep to allow the necessary service traffic to run and clear the channel, before running the test traffic. This worked for a while, but lately we have nevertheless seen flakiness of the test in the CI. Fix the issue by using u32 to match the encapsulated packet as well. The confusing packets seem to always be IPv6 multicast listener reports. Realistically they could be ARP or other ICMP6 traffic as well. Therefore look for ethertype IPv4 in the IPv4 traffic test, and for IPv6 / UDP combination in the IPv6 traffic test. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/6438cb1613a2a667d3ff64089eb5994778f247af.1765289566.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-12-11 00:53:15 -08:00
Petr Machata	0842e34849	selftests: net: lib: tc_rule_stats_get(): Don't hard-code array index Flower is commonly used to match on packets in many bash-based selftests. A dump of a flower filter including statistics looks something like this: [ { "protocol": "all", "pref": 49152, "kind": "flower", "chain": 0 }, { ... "options": { ... "actions": [ { ... "stats": { "bytes": 0, "packets": 0, "drops": 0, "overlimits": 0, "requeues": 0, "backlog": 0, "qlen": 0 } } ] } } ] The JQ query in the helper function tc_rule_stats_get() assumes this form and looks for the second element of the array. However, a dump of a u32 filter looks like this: [ { "protocol": "all", "pref": 49151, "kind": "u32", "chain": 0 }, { "protocol": "all", "pref": 49151, "kind": "u32", "chain": 0, "options": { "fh": "800:", "ht_divisor": 1 } }, { ... "options": { ... "actions": [ { ... "stats": { "bytes": 0, "packets": 0, "drops": 0, "overlimits": 0, "requeues": 0, "backlog": 0, "qlen": 0 } } ] } }, ] There's an extra element which the JQ query ends up choosing. Instead of hard-coding a particular index, look for the entry on which a selector .options.actions yields anything. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/12982a44471c834511a0ee6c1e8f57e3a5307105.1765289566.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-12-11 00:53:15 -08:00
Florian Westphal	b8a81b0ce5	selftests: netfilter: prefer xfail in case race wasn't triggered Jakub says: "We try to reserve SKIP for tests skipped because tool is missing in env, something isn't built into the kernel etc." use xfail, we can't force the race condition to appear at will so its expected that the test 'fails' occasionally. Fixes: `78a5883635` ("selftests: netfilter: add conntrack clash resolution test case") Reported-by: Jakub Kicinski <kuba@kernel.org> Closes: https://lore.kernel.org/netdev/20251206175647.5c32f419@kernel.org/ Signed-off-by: Florian Westphal <fw@strlen.de>	2025-12-10 11:55:59 +01:00
Lorenzo Bianconi	2bdc536c9d	netfilter: always set route tuple out ifindex Always set nf_flow_route tuple out ifindex even if the indev is not one of the flowtable configured devices since otherwise the outdev lookup in nf_flow_offload_ip_hook() or nf_flow_offload_ipv6_hook() for FLOW_OFFLOAD_XMIT_NEIGH flowtable entries will fail. The above issue occurs in the following configuration since IP6IP6 tunnel does not support flowtable acceleration yet: $ip addr show 5: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 00:11:22:33:22:55 brd ff:ff:ff:ff:ff:ff link-netns ns1 inet6 2001:db8:1::2/64 scope global nodad valid_lft forever preferred_lft forever inet6 fe80::211:22ff:fe33:2255/64 scope link tentative proto kernel_ll valid_lft forever preferred_lft forever 6: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 00:22:22:33:22:55 brd ff:ff:ff:ff:ff:ff link-netns ns3 inet6 2001:db8:2::1/64 scope global nodad valid_lft forever preferred_lft forever inet6 fe80::222:22ff:fe33:2255/64 scope link tentative proto kernel_ll valid_lft forever preferred_lft forever 7: tun0@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1452 qdisc noqueue state UNKNOWN group default qlen 1000 link/tunnel6 2001:db8:2::1 peer 2001:db8:2::2 permaddr a85:e732:2c37:: inet6 2002:db8:1::1/64 scope global nodad valid_lft forever preferred_lft forever inet6 fe80::885:e7ff:fe32:2c37/64 scope link proto kernel_ll valid_lft forever preferred_lft forever $ip -6 route show 2001:db8:1::/64 dev eth0 proto kernel metric 256 pref medium 2001:db8:2::/64 dev eth1 proto kernel metric 256 pref medium 2002:db8:1::/64 dev tun0 proto kernel metric 256 pref medium default via 2002:db8:1::2 dev tun0 metric 1024 pref medium $nft list ruleset table inet filter { flowtable ft { hook ingress priority filter devices = { eth0, eth1 } } chain forward { type filter hook forward priority filter; policy accept; meta l4proto { tcp, udp } flow add @ft } } Fixes: `b5964aac51` ("netfilter: flowtable: consolidate xmit path") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Florian Westphal <fw@strlen.de>	2025-12-10 11:55:58 +01:00
Slavin Liu	ad891bb3d0	ipvs: fix ipv4 null-ptr-deref in route error path The IPv4 code path in __ip_vs_get_out_rt() calls dst_link_failure() without ensuring skb->dev is set, leading to a NULL pointer dereference in fib_compute_spec_dst() when ipv4_link_failure() attempts to send ICMP destination unreachable messages. The issue emerged after commit `ed0de45a10` ("ipv4: recompile ip options in ipv4_link_failure") started calling __ip_options_compile() from ipv4_link_failure(). This code path eventually calls fib_compute_spec_dst() which dereferences skb->dev. An attempt was made to fix the NULL skb->dev dereference in commit `0113d9c9d1` ("ipv4: fix null-deref in ipv4_link_failure"), but it only addressed the immediate dev_net(skb->dev) dereference by using a fallback device. The fix was incomplete because fib_compute_spec_dst() later in the call chain still accesses skb->dev directly, which remains NULL when IPVS calls dst_link_failure(). The crash occurs when: 1. IPVS processes a packet in NAT mode with a misconfigured destination 2. Route lookup fails in __ip_vs_get_out_rt() before establishing a route 3. The error path calls dst_link_failure(skb) with skb->dev == NULL 4. ipv4_link_failure() → ipv4_send_dest_unreach() → __ip_options_compile() → fib_compute_spec_dst() 5. fib_compute_spec_dst() dereferences NULL skb->dev Apply the same fix used for IPv6 in commit `326bf17ea5` ("ipvs: fix ipv6 route unreach panic"): set skb->dev from skb_dst(skb)->dev before calling dst_link_failure(). KASAN: null-ptr-deref in range [0x0000000000000328-0x000000000000032f] CPU: 1 PID: 12732 Comm: syz.1.3469 Not tainted 6.6.114 #2 RIP: 0010:__in_dev_get_rcu include/linux/inetdevice.h:233 RIP: 0010:fib_compute_spec_dst+0x17a/0x9f0 net/ipv4/fib_frontend.c:285 Call Trace: <TASK> spec_dst_fill net/ipv4/ip_options.c:232 spec_dst_fill net/ipv4/ip_options.c:229 __ip_options_compile+0x13a1/0x17d0 net/ipv4/ip_options.c:330 ipv4_send_dest_unreach net/ipv4/route.c:1252 ipv4_link_failure+0x702/0xb80 net/ipv4/route.c:1265 dst_link_failure include/net/dst.h:437 __ip_vs_get_out_rt+0x15fd/0x19e0 net/netfilter/ipvs/ip_vs_xmit.c:412 ip_vs_nat_xmit+0x1d8/0xc80 net/netfilter/ipvs/ip_vs_xmit.c:764 Fixes: `ed0de45a10` ("ipv4: recompile ip options in ipv4_link_failure") Signed-off-by: Slavin Liu <slavin452@gmail.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Florian Westphal <fw@strlen.de>	2025-12-10 11:55:58 +01:00
Fernando Fernandez Mancera	2e2a720766	netfilter: nf_conncount: fix leaked ct in error paths There are some situations where ct might be leaked as error paths are skipping the refcounted check and return immediately. In order to solve it make sure that the check is always called. Fixes: `be102eb6a0` ("netfilter: nf_conncount: rework API to use sk_buff directly") Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Florian Westphal <fw@strlen.de>	2025-12-10 11:55:58 +01:00
Jakub Kicinski	6bcb7727d9	Merge branch 'inet-frags-flush-pending-skbs-in-fqdir_pre_exit' Jakub Kicinski says: ==================== inet: frags: flush pending skbs in fqdir_pre_exit() Fix the issue reported by NIPA starting on Sep 18th [1], where pernet_ops_rwsem is constantly held by a reader, preventing writers from grabbing it (specifically driver modules from loading). The fact that reports started around that time seems coincidental. The issue seems to be skbs queued for defrag preventing conntrack from exiting. First patch fixes another theoretical issue, it's mostly a leftover from an attempt to get rid of the inet_frag_queue refcnt, which I gave up on (still think it's doable but a bit of a time sink). Second patch is a minor refactor. The real fix is in the third patch. It's the simplest fix I can think of which is to flush the frag queues. Perhaps someone has a better suggestion? Last patch adds an explicit warning for conntrack getting stuck, as this seems like something that can easily happen if bugs sneak in. The warning will hopefully save us the first 20% of the investigation effort. Link: https://lore.kernel.org/20251001082036.0fc51440@kernel.org # [1] ==================== Link: https://patch.msgid.link/20251207010942.1672972-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-12-10 01:15:33 -08:00
Jakub Kicinski	92df4c56cf	netfilter: conntrack: warn when cleanup is stuck nf_conntrack_cleanup_net_list() calls schedule() so it does not show up as a hung task. Add an explicit check to make debugging leaked skbs/conntack references more obvious. Acked-by: Florian Westphal <fw@strlen.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251207010942.1672972-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-12-10 01:15:27 -08:00
Jakub Kicinski	006a5035b4	inet: frags: flush pending skbs in fqdir_pre_exit() We have been seeing occasional deadlocks on pernet_ops_rwsem since September in NIPA. The stuck task was usually modprobe (often loading a driver like ipvlan), trying to take the lock as a Writer. lockdep does not track readers for rwsems so the read wasn't obvious from the reports. On closer inspection the Reader holding the lock was conntrack looping forever in nf_conntrack_cleanup_net_list(). Based on past experience with occasional NIPA crashes I looked thru the tests which run before the crash and noticed that the crash follows ip_defrag.sh. An immediate red flag. Scouring thru (de)fragmentation queues reveals skbs sitting around, holding conntrack references. The problem is that since conntrack depends on nf_defrag_ipv6, nf_defrag_ipv6 will load first. Since nf_defrag_ipv6 loads first its netns exit hooks run _after_ conntrack's netns exit hook. Flush all fragment queue SKBs during fqdir_pre_exit() to release conntrack references before conntrack cleanup runs. Also flush the queues in timer expiry handlers when they discover fqdir->dead is set, in case packet sneaks in while we're running the pre_exit flush. The commit under Fixes is not exactly the culprit, but I think previously the timer firing would eventually unblock the spinning conntrack. Fixes: `d5dd88794a` ("inet: fix various use-after-free in defrags units") Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251207010942.1672972-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-12-10 01:15:27 -08:00
Jakub Kicinski	1231eec699	inet: frags: add inet_frag_queue_flush() Instead of exporting inet_frag_rbtree_purge() which requires that caller takes care of memory accounting, add a new helper. We will need to call it from a few places in the next patch. Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251207010942.1672972-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-12-10 01:15:27 -08:00
Jakub Kicinski	8ef522c8a5	inet: frags: avoid theoretical race in ip_frag_reinit() In ip_frag_reinit() we want to move the frag timeout timer into the future. If the timer fires in the meantime we inadvertently scheduled it again, and since the timer assumes a ref on frag_queue we need to acquire one to balance things out. This is technically racy, we should have acquired the reference _before_ we touch the timer, it may fire again before we take the ref. Avoid this entire dance by using mod_timer_pending() which only modifies the timer if its pending (and which exists since Linux v2.6.30) Note that this was the only place we ever took a ref on frag_queue since Eric's conversion to RCU. So we could potentially replace the whole refcnt field with an atomic flag and a bit more RCU. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251207010942.1672972-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-12-10 01:15:27 -08:00
Jakub Kicinski	2f6e056e95	Merge branch 'selftests-fix-build-warnings-and-errors' (part) Guenter Roeck says: ==================== selftests: Fix build warnings and errors This series fixes build warnings and errors observed when trying to build selftests. ==================== Link: https://patch.msgid.link/20251205171010.515236-1-linux@roeck-us.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-12-10 01:11:12 -08:00
Guenter Roeck	91dc09a609	selftests: net: tfo: Fix build warning Fix tfo.c: In function ‘run_server’: tfo.c:84:9: warning: ignoring return value of ‘read’ declared with attribute ‘warn_unused_result’ by evaluating the return value from read() and displaying an error message if it reports an error. Fixes: `c65b5bb232` ("selftests: net: add passive TFO test binary") Cc: David Wei <dw@davidwei.uk> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Link: https://patch.msgid.link/20251205171010.515236-14-linux@roeck-us.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-12-10 01:11:12 -08:00
Guenter Roeck	59546e8744	selftests: net: Fix build warnings Fix ksft.h: In function ‘ksft_ready’: ksft.h:27:9: warning: ignoring return value of ‘write’ declared with attribute ‘warn_unused_result’ ksft.h: In function ‘ksft_wait’: ksft.h:51:9: warning: ignoring return value of ‘read’ declared with attribute ‘warn_unused_result’ by checking the return value of the affected functions and displaying an error message if an error is seen. Fixes: `2b6d490b82` ("selftests: drv-net: Factor out ksft C helpers") Cc: Joe Damato <jdamato@fastly.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Link: https://patch.msgid.link/20251205171010.515236-11-linux@roeck-us.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-12-10 01:11:12 -08:00
Guenter Roeck	06f7cae92f	selftest: af_unix: Support compilers without flex-array-member-not-at-end support Fix: gcc: error: unrecognized command-line option ‘-Wflex-array-member-not-at-end’ by making the compiler option dependent on its support. Fixes: `1838731f10` ("selftest: af_unix: Add -Wall and -Wflex-array-member-not-at-end to CFLAGS.") Cc: Kuniyuki Iwashima <kuniyu@google.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Link: https://patch.msgid.link/20251205171010.515236-7-linux@roeck-us.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-12-10 01:08:51 -08:00
Ankit Khushwaha	9580f6d47d	selftests: tls: fix warning of uninitialized variable In 'poll_partial_rec_async' a uninitialized char variable 'token' with is used for write/read instruction to synchronize between threads via a pipe. tls.c:2833:26: warning: variable 'token' is uninitialized when passed as a const pointer argument Initialize 'token' to '\0' to silence compiler warning. Signed-off-by: Ankit Khushwaha <ankitkhushwaha.linux@gmail.com> Link: https://patch.msgid.link/20251205163242.14615-1-ankitkhushwaha.linux@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-12-10 01:03:02 -08:00
Alexey Simakov	50b3db3e11	broadcom: b44: prevent uninitialized value usage On execution path with raised B44_FLAG_EXTERNAL_PHY, b44_readphy() leaves bmcr value uninitialized and it is used later in the code. Add check of this flag at the beginning of the b44_nway_reset() and exit early of the function with restarting autonegotiation if an external PHY is used. Fixes: `753f492093` ("[B44]: port to native ssb support") Reviewed-by: Jonas Gorski <jonas.gorski@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Alexey Simakov <bigalex934@gmail.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20251205155815.4348-1-bigalex934@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-12-10 01:02:20 -08:00
caoping	6af2a01d65	net/handshake: restore destructor on submit failure handshake_req_submit() replaces sk->sk_destruct but never restores it when submission fails before the request is hashed. handshake_sk_destruct() then returns early and the original destructor never runs, leaking the socket. Restore sk_destruct on the error path. Fixes: `3b3009ea8a` ("net/handshake: Create a NETLINK service for handling handshake requests") Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: caoping <caoping@cmss.chinamobile.com> Link: https://patch.msgid.link/20251204091058.1545151-1-caoping@cmss.chinamobile.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-12-10 00:51:49 -08:00
Arnd Bergmann	9e7477a427	net: ti: icssg-prueth: add PTP_1588_CLOCK_OPTIONAL dependency The new icssg-prueth driver needs the same dependency as the other parts that use the ptp-1588: WARNING: unmet direct dependencies detected for TI_ICSS_IEP Depends on [m]: NETDEVICES [=y] && ETHERNET [=y] && NET_VENDOR_TI [=y] && PTP_1588_CLOCK_OPTIONAL [=m] && TI_PRUSS [=y] Selected by [y]: - TI_PRUETH [=y] && NETDEVICES [=y] && ETHERNET [=y] && NET_VENDOR_TI [=y] && PRU_REMOTEPROC [=y] && NET_SWITCHDEV [=y] Add the correct dependency on the two drivers missing it, and remove the pointless 'imply' in the process. Fixes: `e654b85a69` ("net: ti: icssg-prueth: Add ICSSG Ethernet driver for AM65x SR1.0 platforms") Fixes: `511f6c1ae0` ("net: ti: icssm-prueth: Adds ICSSM Ethernet driver") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20251204100138.1034175-1-arnd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-12-10 00:49:56 -08:00
Ilya Maximets	5ace7ef87f	net: openvswitch: fix middle attribute validation in push_nsh() action The push_nsh() action structure looks like this: OVS_ACTION_ATTR_PUSH_NSH(OVS_KEY_ATTR_NSH(OVS_NSH_KEY_ATTR_BASE,...)) The outermost OVS_ACTION_ATTR_PUSH_NSH attribute is OK'ed by the nla_for_each_nested() inside __ovs_nla_copy_actions(). The innermost OVS_NSH_KEY_ATTR_BASE/MD1/MD2 are OK'ed by the nla_for_each_nested() inside nsh_key_put_from_nlattr(). But nothing checks if the attribute in the middle is OK. We don't even check that this attribute is the OVS_KEY_ATTR_NSH. We just do a double unwrap with a pair of nla_data() calls - first time directly while calling validate_push_nsh() and the second time as part of the nla_for_each_nested() macro, which isn't safe, potentially causing invalid memory access if the size of this attribute is incorrect. The failure may not be noticed during validation due to larger netlink buffer, but cause trouble later during action execution where the buffer is allocated exactly to the size: BUG: KASAN: slab-out-of-bounds in nsh_hdr_from_nlattr+0x1dd/0x6a0 [openvswitch] Read of size 184 at addr ffff88816459a634 by task a.out/22624 CPU: 8 UID: 0 PID: 22624 6.18.0-rc7+ #115 PREEMPT(voluntary) Call Trace: <TASK> dump_stack_lvl+0x51/0x70 print_address_description.constprop.0+0x2c/0x390 kasan_report+0xdd/0x110 kasan_check_range+0x35/0x1b0 __asan_memcpy+0x20/0x60 nsh_hdr_from_nlattr+0x1dd/0x6a0 [openvswitch] push_nsh+0x82/0x120 [openvswitch] do_execute_actions+0x1405/0x2840 [openvswitch] ovs_execute_actions+0xd5/0x3b0 [openvswitch] ovs_packet_cmd_execute+0x949/0xdb0 [openvswitch] genl_family_rcv_msg_doit+0x1d6/0x2b0 genl_family_rcv_msg+0x336/0x580 genl_rcv_msg+0x9f/0x130 netlink_rcv_skb+0x11f/0x370 genl_rcv+0x24/0x40 netlink_unicast+0x73e/0xaa0 netlink_sendmsg+0x744/0xbf0 __sys_sendto+0x3d6/0x450 do_syscall_64+0x79/0x2c0 entry_SYSCALL_64_after_hwframe+0x76/0x7e </TASK> Let's add some checks that the attribute is properly sized and it's the only one attribute inside the action. Technically, there is no real reason for OVS_KEY_ATTR_NSH to be there, as we know that we're pushing an NSH header already, it just creates extra nesting, but that's how uAPI works today. So, keeping as it is. Fixes: `b2d0f5d5dc` ("openvswitch: enable NSH support") Reported-by: Junvy Yang <zhuque@tencent.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Eelco Chaudron echaudro@redhat.com Reviewed-by: Aaron Conole <aconole@redhat.com> Link: https://patch.msgid.link/20251204105334.900379-1-i.maximets@ovn.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-12-10 00:46:55 -08:00
Marc Kleine-Budde	3e54d3b4a8	can: gs_usb: gs_can_open(): fix error handling Commit `2603be9e81` ("can: gs_usb: gs_can_open(): improve error handling") added missing error handling to the gs_can_open() function. The driver uses 2 USB anchors to track the allocated URBs: the TX URBs in struct gs_can::tx_submitted for each netdev and the RX URBs in struct gs_usb::rx_submitted for the USB device. gs_can_open() allocates the RX URBs, while TX URBs are allocated during gs_can_start_xmit(). The cleanup in gs_can_open() kills all anchored dev->tx_submitted URBs (which is not necessary since the netdev is not yet registered), but misses the parent->rx_submitted URBs. Fix the problem by killing the rx_submitted instead of the tx_submitted. Fixes: `2603be9e81` ("can: gs_usb: gs_can_open(): improve error handling") Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20251210-gs_usb-fix-error-handling-v1-1-d6a5a03f10bb@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2025-12-10 09:30:31 +01:00
Arnd Bergmann	6abd4577bc	can: fix build dependency A recent bugfix introduced a new problem with Kconfig dependencies: WARNING: unmet direct dependencies detected for CAN_DEV Depends on [n]: NETDEVICES [=n] && CAN [=m] Selected by [m]: - CAN [=m] && NET [=y] Since the CAN core code now links into the CAN device code, that particular function needs to be available, though the rest of it does not. Revert the incomplete fix and instead use Makefile logic to avoid the link failure. Fixes: `cb2dc6d286` ("can: Kconfig: select CAN driver infrastructure by default") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202512091523.zty3CLmc-lkp@intel.com/ Signed-off-by: Arnd Bergmann <arnd@arndb.de> Tested-by: Oliver Hartkopp <socketcan@hartkopp.net> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://patch.msgid.link/20251204100015.1033688-1-arnd@kernel.org [mkl: removed module option from CAN_DEV help text (thanks Vincent)] [mkl: removed '&& CAN' from Kconfig dependency (thanks Vincent)] Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2025-12-10 09:19:34 +01:00
Jakub Kicinski	186468c67f	Merge branch 'mptcp-misc-fixes-for-v6-19-rc1' Matthieu Baerts says: ==================== mptcp: misc fixes for v6.19-rc1 Here are various unrelated fixes: - Patches 1-2: ignore unknown in-kernel PM endpoint flags instead of pretending they are supported. A fix for v5.7. - Patch 3: avoid potential unnecessary rtx timer expiration. A fix for v5.15. - Patch 4: avoid a deadlock on fallback in case of SKB creation failure while re-injecting. ==================== Link: https://patch.msgid.link/20251205-net-mptcp-misc-fixes-6-19-rc1-v1-0-9e4781a6c1b8@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-12-08 23:54:06 -08:00
Paolo Abeni	ffb8c27b05	mptcp: avoid deadlock on fallback while reinjecting Jakub reported an MPTCP deadlock at fallback time: WARNING: possible recursive locking detected 6.18.0-rc7-virtme #1 Not tainted -------------------------------------------- mptcp_connect/20858 is trying to acquire lock: ff1100001da18b60 (&msk->fallback_lock){+.-.}-{3:3}, at: __mptcp_try_fallback+0xd8/0x280 but task is already holding lock: ff1100001da18b60 (&msk->fallback_lock){+.-.}-{3:3}, at: __mptcp_retrans+0x352/0xaa0 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&msk->fallback_lock); lock(&msk->fallback_lock); * DEADLOCK * May be due to missing lock nesting notation 3 locks held by mptcp_connect/20858: #0: ff1100001da18290 (sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_sendmsg+0x114/0x1bc0 #1: ff1100001db40fd0 (k-sk_lock-AF_INET#2){+.+.}-{0:0}, at: __mptcp_retrans+0x2cb/0xaa0 #2: ff1100001da18b60 (&msk->fallback_lock){+.-.}-{3:3}, at: __mptcp_retrans+0x352/0xaa0 stack backtrace: CPU: 0 UID: 0 PID: 20858 Comm: mptcp_connect Not tainted 6.18.0-rc7-virtme #1 PREEMPT(full) Hardware name: Bochs, BIOS Bochs 01/01/2011 Call Trace: <TASK> dump_stack_lvl+0x6f/0xa0 print_deadlock_bug.cold+0xc0/0xcd validate_chain+0x2ff/0x5f0 __lock_acquire+0x34c/0x740 lock_acquire.part.0+0xbc/0x260 _raw_spin_lock_bh+0x38/0x50 __mptcp_try_fallback+0xd8/0x280 mptcp_sendmsg_frag+0x16c2/0x3050 __mptcp_retrans+0x421/0xaa0 mptcp_release_cb+0x5aa/0xa70 release_sock+0xab/0x1d0 mptcp_sendmsg+0xd5b/0x1bc0 sock_write_iter+0x281/0x4d0 new_sync_write+0x3c5/0x6f0 vfs_write+0x65e/0xbb0 ksys_write+0x17e/0x200 do_syscall_64+0xbb/0xfd0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 RIP: 0033:0x7fa5627cbc5e Code: 4d 89 d8 e8 14 bd 00 00 4c 8b 5d f8 41 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 11 c9 c3 0f 1f 80 00 00 00 00 48 8b 45 10 0f 05 <c9> c3 83 e2 39 83 fa 08 75 e7 e8 13 ff ff ff 0f 1f 00 f3 0f 1e fa RSP: 002b:00007fff1fe14700 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007fa5627cbc5e RDX: 0000000000001f9c RSI: 00007fff1fe16984 RDI: 0000000000000005 RBP: 00007fff1fe14710 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000202 R12: 00007fff1fe16920 R13: 0000000000002000 R14: 0000000000001f9c R15: 0000000000001f9c The packet scheduler could attempt a reinjection after receiving an MP_FAIL and before the infinite map has been transmitted, causing a deadlock since MPTCP needs to do the reinjection atomically from WRT fallback. Address the issue explicitly avoiding the reinjection in the critical scenario. Note that this is the only fallback critical section that could potentially send packets and hit the double-lock. Reported-by: Jakub Kicinski <kuba@kernel.org> Closes: https://netdev-ctrl.bots.linux.dev/logs/vmksft/mptcp-dbg/results/412720/1-mptcp-join-sh/stderr Fixes: `f8a1d9b18c` ("mptcp: make fallback action and fallback decision atomic") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251205-net-mptcp-misc-fixes-6-19-rc1-v1-4-9e4781a6c1b8@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-12-08 23:54:03 -08:00
Paolo Abeni	2ea6190f42	mptcp: schedule rtx timer only after pushing data The MPTCP protocol usually schedule the retransmission timer only when there is some chances for such retransmissions to happen. With a notable exception: __mptcp_push_pending() currently schedule such timer unconditionally, potentially leading to unnecessary rtx timer expiration. The issue is present since the blamed commit below but become easily reproducible after commit `27b0e701d3` ("mptcp: drop bogus optimization in __mptcp_check_push()") Fixes: `33d41c9cd7` ("mptcp: more accurate timeout") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251205-net-mptcp-misc-fixes-6-19-rc1-v1-3-9e4781a6c1b8@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-12-08 23:54:03 -08:00
Matthieu Baerts (NGI0)	29f4801e9c	selftests: mptcp: pm: ensure unknown flags are ignored This validates the previous commit: the userspace can set unknown flags -- the 7th bit is currently unused -- without errors, but only the supported ones are printed in the endpoints dumps. The 'Fixes' tag here below is the same as the one from the previous commit: this patch here is not fixing anything wrong in the selftests, but it validates the previous fix for an issue introduced by this commit ID. Fixes: `01cacb00b3` ("mptcp: add netlink-based PM") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251205-net-mptcp-misc-fixes-6-19-rc1-v1-2-9e4781a6c1b8@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-12-08 23:54:02 -08:00
Matthieu Baerts (NGI0)	0ace3297a7	mptcp: pm: ignore unknown endpoint flags Before this patch, the kernel was saving any flags set by the userspace, even unknown ones. This doesn't cause critical issues because the kernel is only looking at specific ones. But on the other hand, endpoints dumps could tell the userspace some recent flags seem to be supported on older kernel versions. Instead, ignore all unknown flags when parsing them. By doing that, the userspace can continue to set unsupported flags, but it has a way to verify what is supported by the kernel. Note that it sounds better to continue accepting unsupported flags not to change the behaviour, but also that eases things on the userspace side by adding "optional" endpoint types only supported by newer kernel versions without having to deal with the different kernel versions. A note for the backports: there will be conflicts in mptcp.h on older versions not having the mentioned flags, the new line should still be added last, and the '5' needs to be adapted to have the same value as the last entry. Fixes: `01cacb00b3` ("mptcp: add netlink-based PM") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251205-net-mptcp-misc-fixes-6-19-rc1-v1-1-9e4781a6c1b8@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-12-08 23:54:02 -08:00
Jakub Kicinski	db6b35cffe	tools: ynl: fix build on systems with old kernel headers The wireguard YNL conversion was missing the customary .deps entry. NIPA doesn't catch this but my CentOS 9 system complains: wireguard-user.c:72:10: error: ‘WGALLOWEDIP_A_FLAGS’ undeclared here wireguard-user.c:58:67: error: parameter 1 (‘value’) has incomplete type 58 \| const char *wireguard_wgallowedip_flags_str(enum wgallowedip_flag value) \| ~~~~~~~~~~~~~~~~~~~~~~^~~~~ And similarly does Ubuntu 22.04. One extra complication here is that we renamed the header guard, so we need to compat with both old and new guard define. Reviewed-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Link: https://patch.msgid.link/20251207013848.1692990-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-12-08 23:53:17 -08:00
Jakub Kicinski	e56cadaa27	ynl: add regen hint to new headers Recent commit `68e83f3472` ("tools: ynl-gen: add regeneration comment") added a hint how to regenerate the code to the headers. Update the new headers from this release cycle to also include it. Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20251207004740.1657799-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-12-08 23:52:43 -08:00
Eric Biggers	e9e5047df9	mptcp: select CRYPTO_LIB_UTILS instead of CRYPTO Since the only crypto functions used by the mptcp code are the SHA-256 library functions and crypto_memneq(), select only the options needed for those: CRYPTO_LIB_SHA256 and CRYPTO_LIB_UTILS. Previously, CRYPTO was selected instead of CRYPTO_LIB_UTILS. That does pull in CRYPTO_LIB_UTILS as well, but it's unnecessarily broad. Years ago, the CRYPTO_LIB_* options were visible only when CRYPTO. That may be another reason why CRYPTO is selected here. However, that was fixed years ago, and the libraries can now be selected directly. Signed-off-by: Eric Biggers <ebiggers@kernel.org> Reviewed-by: Mat Martineau <martineau@kernel.org> Link: https://patch.msgid.link/20251204054417.491439-1-ebiggers@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-12-08 23:44:16 -08:00
Mateusz Guzik	2183a5c8a0	af_unix: annotate unix_gc_lock with __cacheline_aligned_in_smp Otherwise the lock is susceptible to ever-changing false-sharing due to unrelated changes. This in particular popped up here where an unrelated change improved performance: https://lore.kernel.org/oe-lkp/202511281306.51105b46-lkp@intel.com/ Stabilize it with an explicit annotation which also has a side effect of furher improving scalability: > in our oiginal report, `284922f4c5` has a 6.1% performance improvement comparing > to parent `17d85f33a8`. > we applied your patch directly upon `284922f4c5`. as below, now by > "284922f4c5 + your patch" > we observe a 12.8% performance improvements (still comparing to `17d85f33a8`). Note nothing was done for the other fields, so some fluctuation is still possible. Tested-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20251203100122.291550-1-mjguzik@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-12-08 23:43:24 -08:00
Michael Chan	0373d5c387	bnxt_en: Fix XDP_TX path For XDP_TX action in bnxt_rx_xdp(), clearing of the event flags is not correct. __bnxt_poll_work() -> bnxt_rx_pkt() -> bnxt_rx_xdp() may be looping within NAPI and some event flags may be set in earlier iterations. In particular, if BNXT_TX_EVENT is set earlier indicating some XDP_TX packets are ready and pending, it will be cleared if it is XDP_TX action again. Normally, we will set BNXT_TX_EVENT again when we successfully call __bnxt_xmit_xdp(). But if the TX ring has no more room, the flag will not be set. This will cause the TX producer to be ahead but the driver will not hit the TX doorbell. For multi-buf XDP_TX, there is no need to clear the event flags and set BNXT_AGG_EVENT. The BNXT_AGG_EVENT flag should have been set earlier in bnxt_rx_pkt(). The visible symptom of this is that the RX ring associated with the TX XDP ring will eventually become empty and all packets will be dropped. Because this condition will cause the driver to not refill the RX ring seeing that the TX ring has forever pending XDP_TX packets. The fix is to only clear BNXT_RX_EVENT when we have successfully called __bnxt_xmit_xdp(). Fixes: `7f0a168b04` ("bnxt_en: Add completion ring pointer in TX and RX ring structures") Reported-by: Pavel Dubovitsky <pdubovitsky@meta.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20251203003024.2246699-1-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-12-04 17:54:25 -08:00
Tim Hostetler	a479a27f4d	gve: Move gve_init_clock to after AQ CONFIGURE_DEVICE_RESOURCES call commit `46e7860ef9` ("gve: Move ptp_schedule_worker to gve_init_clock") moved the first invocation of the AQ command REPORT_NIC_TIMESTAMP to gve_probe(). However, gve_init_clock() invoking REPORT_NIC_TIMESTAMP is not valid until after gve_probe() invokes the AQ command CONFIGURE_DEVICE_RESOURCES. Failure to do so results in the following error: gve 0000:00:07.0: failed to read NIC clock -11 This was missed earlier because the driver under test was loaded at runtime instead of boot-time. The boot-time driver had already initialized the device, causing the runtime driver to successfully call gve_init_clock() incorrectly. Fixes: `46e7860ef9` ("gve: Move ptp_schedule_worker to gve_init_clock") Reviewed-by: Ankit Garg <nktgrg@google.com> Signed-off-by: Tim Hostetler <thostet@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20251202200207.1434749-1-hramamurthy@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-12-04 17:54:16 -08:00
René Rebe	dd75c723ef	r8169: fix RTL8117 Wake-on-Lan in DASH mode Wake-on-Lan does currently not work for r8169 in DASH mode, e.g. the ASUS Pro WS X570-ACE with RTL8168fp/RTL8117. Fix by not returning early in rtl_prepare_power_down when dash_enabled. While this fixes WoL, it still kills the OOB RTL8117 remote management BMC connection. Fix by not calling rtl8168_driver_stop if WoL is enabled. Fixes: `065c27c184` ("r8169: phy power ops") Signed-off-by: René Rebe <rene@exactco.de> Cc: stable@vger.kernel.org Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/20251202.194137.1647877804487085954.rene@exactco.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-12-04 17:54:07 -08:00
Jakub Kicinski	e7a9530d12	Merge branch 'mlxsw-three-m-router-fixes' Petr Machata says: ==================== mlxsw: Three (m)router fixes This patchset contains two fixes in mlxsw Spectrum router code, and one for the Spectrum multicast router code. Please see the individual patches for more details. ==================== Link: https://patch.msgid.link/cover.1764695650.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-12-04 17:53:53 -08:00
Ido Schimmel	8ac1dacec4	mlxsw: spectrum_mr: Fix use-after-free when updating multicast route stats Cited commit added a dedicated mutex (instead of RTNL) to protect the multicast route list, so that it will not change while the driver periodically traverses it in order to update the kernel about multicast route stats that were queried from the device. One instance of list entry deletion (during route replace) was missed and it can result in a use-after-free [1]. Fix by acquiring the mutex before deleting the entry from the list and releasing it afterwards. [1] BUG: KASAN: slab-use-after-free in mlxsw_sp_mr_stats_update+0x4a5/0x540 drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.c:1006 [mlxsw_spectrum] Read of size 8 at addr ffff8881523c2fa8 by task kworker/2:5/22043 CPU: 2 UID: 0 PID: 22043 Comm: kworker/2:5 Not tainted 6.18.0-rc1-custom-g1a3d6d7cd014 #1 PREEMPT(full) Hardware name: Mellanox Technologies Ltd. MSN2010/SA002610, BIOS 5.6.5 08/24/2017 Workqueue: mlxsw_core mlxsw_sp_mr_stats_update [mlxsw_spectrum] Call Trace: <TASK> dump_stack_lvl+0xba/0x110 print_report+0x174/0x4f5 kasan_report+0xdf/0x110 mlxsw_sp_mr_stats_update+0x4a5/0x540 drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.c:1006 [mlxsw_spectrum] process_one_work+0x9cc/0x18e0 worker_thread+0x5df/0xe40 kthread+0x3b8/0x730 ret_from_fork+0x3e9/0x560 ret_from_fork_asm+0x1a/0x30 </TASK> Allocated by task 29933: kasan_save_stack+0x30/0x50 kasan_save_track+0x14/0x30 __kasan_kmalloc+0x8f/0xa0 mlxsw_sp_mr_route_add+0xd8/0x4770 [mlxsw_spectrum] mlxsw_sp_router_fibmr_event_work+0x371/0xad0 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:7965 [mlxsw_spectrum] process_one_work+0x9cc/0x18e0 worker_thread+0x5df/0xe40 kthread+0x3b8/0x730 ret_from_fork+0x3e9/0x560 ret_from_fork_asm+0x1a/0x30 Freed by task 29933: kasan_save_stack+0x30/0x50 kasan_save_track+0x14/0x30 __kasan_save_free_info+0x3b/0x70 __kasan_slab_free+0x43/0x70 kfree+0x14e/0x700 mlxsw_sp_mr_route_add+0x2dea/0x4770 drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.c:444 [mlxsw_spectrum] mlxsw_sp_router_fibmr_event_work+0x371/0xad0 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:7965 [mlxsw_spectrum] process_one_work+0x9cc/0x18e0 worker_thread+0x5df/0xe40 kthread+0x3b8/0x730 ret_from_fork+0x3e9/0x560 ret_from_fork_asm+0x1a/0x30 Fixes: `f38656d067` ("mlxsw: spectrum_mr: Protect multicast route list with a lock") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/f996feecfd59fde297964bfc85040b6d83ec6089.1764695650.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-12-04 17:53:49 -08:00
Ido Schimmel	8b0e69763e	mlxsw: spectrum_router: Fix neighbour use-after-free We sometimes observe use-after-free when dereferencing a neighbour [1]. The problem seems to be that the driver stores a pointer to the neighbour, but without holding a reference on it. A reference is only taken when the neighbour is used by a nexthop. Fix by simplifying the reference counting scheme. Always take a reference when storing a neighbour pointer in a neighbour entry. Avoid taking a referencing when the neighbour is used by a nexthop as the neighbour entry associated with the nexthop already holds a reference. Tested by running the test that uncovered the problem over 300 times. Without this patch the problem was reproduced after a handful of iterations. [1] BUG: KASAN: slab-use-after-free in mlxsw_sp_neigh_entry_update+0x2d4/0x310 Read of size 8 at addr ffff88817f8e3420 by task ip/3929 CPU: 3 UID: 0 PID: 3929 Comm: ip Not tainted 6.18.0-rc4-virtme-g36b21a067510 #3 PREEMPT(full) Hardware name: Nvidia SN5600/VMOD0013, BIOS 5.13 05/31/2023 Call Trace: <TASK> dump_stack_lvl+0x6f/0xa0 print_address_description.constprop.0+0x6e/0x300 print_report+0xfc/0x1fb kasan_report+0xe4/0x110 mlxsw_sp_neigh_entry_update+0x2d4/0x310 mlxsw_sp_router_rif_gone_sync+0x35f/0x510 mlxsw_sp_rif_destroy+0x1ea/0x730 mlxsw_sp_inetaddr_port_vlan_event+0xa1/0x1b0 __mlxsw_sp_inetaddr_lag_event+0xcc/0x130 __mlxsw_sp_inetaddr_event+0xf5/0x3c0 mlxsw_sp_router_netdevice_event+0x1015/0x1580 notifier_call_chain+0xcc/0x150 call_netdevice_notifiers_info+0x7e/0x100 __netdev_upper_dev_unlink+0x10b/0x210 netdev_upper_dev_unlink+0x79/0xa0 vrf_del_slave+0x18/0x50 do_set_master+0x146/0x7d0 do_setlink.isra.0+0x9a0/0x2880 rtnl_newlink+0x637/0xb20 rtnetlink_rcv_msg+0x6fe/0xb90 netlink_rcv_skb+0x123/0x380 netlink_unicast+0x4a3/0x770 netlink_sendmsg+0x75b/0xc90 __sock_sendmsg+0xbe/0x160 ____sys_sendmsg+0x5b2/0x7d0 ___sys_sendmsg+0xfd/0x180 __sys_sendmsg+0x124/0x1c0 do_syscall_64+0xbb/0xfd0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 [...] Allocated by task 109: kasan_save_stack+0x30/0x50 kasan_save_track+0x14/0x30 __kasan_kmalloc+0x7b/0x90 __kmalloc_noprof+0x2c1/0x790 neigh_alloc+0x6af/0x8f0 ___neigh_create+0x63/0xe90 mlxsw_sp_nexthop_neigh_init+0x430/0x7e0 mlxsw_sp_nexthop_type_init+0x212/0x960 mlxsw_sp_nexthop6_group_info_init.constprop.0+0x81f/0x1280 mlxsw_sp_nexthop6_group_get+0x392/0x6a0 mlxsw_sp_fib6_entry_create+0x46a/0xfd0 mlxsw_sp_router_fib6_replace+0x1ed/0x5f0 mlxsw_sp_router_fib6_event_work+0x10a/0x2a0 process_one_work+0xd57/0x1390 worker_thread+0x4d6/0xd40 kthread+0x355/0x5b0 ret_from_fork+0x1d4/0x270 ret_from_fork_asm+0x11/0x20 Freed by task 154: kasan_save_stack+0x30/0x50 kasan_save_track+0x14/0x30 __kasan_save_free_info+0x3b/0x60 __kasan_slab_free+0x43/0x70 kmem_cache_free_bulk.part.0+0x1eb/0x5e0 kvfree_rcu_bulk+0x1f2/0x260 kfree_rcu_work+0x130/0x1b0 process_one_work+0xd57/0x1390 worker_thread+0x4d6/0xd40 kthread+0x355/0x5b0 ret_from_fork+0x1d4/0x270 ret_from_fork_asm+0x11/0x20 Last potentially related work creation: kasan_save_stack+0x30/0x50 kasan_record_aux_stack+0x8c/0xa0 kvfree_call_rcu+0x93/0x5b0 mlxsw_sp_router_neigh_event_work+0x67d/0x860 process_one_work+0xd57/0x1390 worker_thread+0x4d6/0xd40 kthread+0x355/0x5b0 ret_from_fork+0x1d4/0x270 ret_from_fork_asm+0x11/0x20 Fixes: `6cf3c971dc` ("mlxsw: spectrum_router: Add private neigh table") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/92d75e21d95d163a41b5cea67a15cd33f547cba6.1764695650.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-12-04 17:53:49 -08:00
Ido Schimmel	b6b638bda2	mlxsw: spectrum_router: Fix possible neighbour reference count leak mlxsw_sp_router_schedule_work() takes a reference on a neighbour, expecting a work item to release it later on. However, we might fail to schedule the work item, in which case the neighbour reference count will be leaked. Fix by taking the reference just before scheduling the work item. Note that mlxsw_sp_router_schedule_work() can receive a NULL neighbour pointer, but neigh_clone() handles that correctly. Spotted during code review, did not actually observe the reference count leak. Fixes: `151b89f602` ("mlxsw: spectrum_router: Reuse work neighbor initialization in work scheduler") Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/ec2934ae4aca187a8d8c9329a08ce93cca411378.1764695650.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-12-04 17:53:48 -08:00

1 2 3 4 5 ...

1402760 Commits (a67fd55f6a09f4119b7232c19e0f348fe31ab0db) All Branches Search

1402760 Commits (a67fd55f6a09f4119b7232c19e0f348fe31ab0db)

All Branches