mirror-linux

Commit Graph

Author	SHA1	Message	Date
Linus Torvalds	8804d970fa	Summary of significant series in this pull request: - The 3 patch series "mm, swap: improve cluster scan strategy" from Kairui Song improves performance and reduces the failure rate of swap cluster allocation. - The 4 patch series "support large align and nid in Rust allocators" from Vitaly Wool permits Rust allocators to set NUMA node and large alignment when perforning slub and vmalloc reallocs. - The 2 patch series "mm/damon/vaddr: support stat-purpose DAMOS" from Yueyang Pan extend DAMOS_STAT's handling of the DAMON operations sets for virtual address spaces for ops-level DAMOS filters. - The 3 patch series "execute PROCMAP_QUERY ioctl under per-vma lock" from Suren Baghdasaryan reduces mmap_lock contention during reads of /proc/pid/maps. - The 2 patch series "mm/mincore: minor clean up for swap cache checking" from Kairui Song performs some cleanup in the swap code. - The 11 patch series "mm: vm_normal_page() improvements" from David Hildenbrand provides code cleanup in the pagemap code. - The 5 patch series "add persistent huge zero folio support" from Pankaj Raghav provides a block layer speedup by optionalls making the huge_zero_pagepersistent, instead of releasing it when its refcount falls to zero. - The 3 patch series "kho: fixes and cleanups" from Mike Rapoport adds a few touchups to the recently added Kexec Handover feature. - The 10 patch series "mm: make mm->flags a bitmap and 64-bit on all arches" from Lorenzo Stoakes turns mm_struct.flags into a bitmap. To end the constant struggle with space shortage on 32-bit conflicting with 64-bit's needs. - The 2 patch series "mm/swapfile.c and swap.h cleanup" from Chris Li cleans up some swap code. - The 7 patch series "selftests/mm: Fix false positives and skip unsupported tests" from Donet Tom fixes a few things in our selftests code. - The 7 patch series "prctl: extend PR_SET_THP_DISABLE to only provide THPs when advised" from David Hildenbrand "allows individual processes to opt-out of THP=always into THP=madvise, without affecting other workloads on the system". It's a long story - the [1/N] changelog spells out the considerations. - The 11 patch series "Add and use memdesc_flags_t" from Matthew Wilcox gets us started on the memdesc project. Please see https://kernelnewbies.org/MatthewWilcox/Memdescs and https://blogs.oracle.com/linux/post/introducing-memdesc. - The 3 patch series "Tiny optimization for large read operations" from Chi Zhiling improves the efficiency of the pagecache read path. - The 5 patch series "Better split_huge_page_test result check" from Zi Yan improves our folio splitting selftest code. - The 2 patch series "test that rmap behaves as expected" from Wei Yang adds some rmap selftests. - The 3 patch series "remove write_cache_pages()" from Christoph Hellwig removes that function and converts its two remaining callers. - The 2 patch series "selftests/mm: uffd-stress fixes" from Dev Jain fixes some UFFD selftests issues. - The 3 patch series "introduce kernel file mapped folios" from Boris Burkov introduces the concept of "kernel file pages". Using these permits btrfs to account its metadata pages to the root cgroup, rather than to the cgroups of random inappropriate tasks. - The 2 patch series "mm/pageblock: improve readability of some pageblock handling" from Wei Yang provides some readability improvements to the page allocator code. - The 11 patch series "mm/damon: support ARM32 with LPAE" from SeongJae Park teaches DAMON to understand arm32 highmem. - The 4 patch series "tools: testing: Use existing atomic.h for vma/maple tests" from Brendan Jackman performs some code cleanups and deduplication under tools/testing/. - The 2 patch series "maple_tree: Fix testing for 32bit compiles" from Liam Howlett fixes a couple of 32-bit issues in tools/testing/radix-tree.c. - The 2 patch series "kasan: unify kasan_enabled() and remove arch-specific implementations" from Sabyrzhan Tasbolatov moves KASAN arch-specific initialization code into a common arch-neutral implementation. - The 3 patch series "mm: remove zpool" from Johannes Weiner removes zspool - an indirection layer which now only redirects to a single thing (zsmalloc). - The 2 patch series "mm: task_stack: Stack handling cleanups" from Pasha Tatashin makes a couple of cleanups in the fork code. - The 37 patch series "mm: remove nth_page()" from David Hildenbrand makes rather a lot of adjustments at various nth_page() callsites, eventually permitting the removal of that undesirable helper function. - The 2 patch series "introduce kasan.write_only option in hw-tags" from Yeoreum Yun creates a KASAN read-only mode for ARM, using that architecture's memory tagging feature. It is felt that a read-only mode KASAN is suitable for use in production systems rather than debug-only. - The 3 patch series "mm: hugetlb: cleanup hugetlb folio allocation" from Kefeng Wang does some tidying in the hugetlb folio allocation code. - The 12 patch series "mm: establish const-correctness for pointer parameters" from Max Kellermann makes quite a number of the MM API functions more accurate about the constness of their arguments. This was getting in the way of subsystems (in this case CEPH) when they attempt to improving their own const/non-const accuracy. - The 7 patch series "Cleanup free_pages() misuse" from Vishal Moola fixes a number of code sites which were confused over when to use free_pages() vs __free_pages(). - The 3 patch series "Add Rust abstraction for Maple Trees" from Alice Ryhl makes the mapletree code accessible to Rust. Required by nouveau and by its forthcoming successor: the new Rust Nova driver. - The 2 patch series "selftests/mm: split_huge_page_test: split_pte_mapped_thp improvements" from David Hildenbrand adds a fix and some cleanups to the thp selftesting code. - The 14 patch series "mm, swap: introduce swap table as swap cache (phase I)" from Chris Li and Kairui Song is the first step along the path to implementing "swap tables" - a new approach to swap allocation and state tracking which is expected to yield speed and space improvements. This patchset itself yields a 5-20% performance benefit in some situations. - The 3 patch series "Some ptdesc cleanups" from Matthew Wilcox utilizes the new memdesc layer to clean up the ptdesc code a little. - The 3 patch series "Fix va_high_addr_switch.sh test failure" from Chunyu Hu fixes some issues in our 5-level pagetable selftesting code. - The 2 patch series "Minor fixes for memory allocation profiling" from Suren Baghdasaryan addresses a couple of minor issues in relatively new memory allocation profiling feature. - The 3 patch series "Small cleanups" from Matthew Wilcox has a few cleanups in preparation for more memdesc work. - The 2 patch series "mm/damon: add addr_unit for DAMON_LRU_SORT and DAMON_RECLAIM" from Quanmin Yan makes some changes to DAMON in furtherance of supporting arm highmem. - The 2 patch series "selftests/mm: Add -Wunreachable-code and fix warnings" from Muhammad Anjum adds that compiler check to selftests code and fixes the fallout, by removing dead code. - The 10 patch series "Improvements to Victim Process Thawing and OOM Reaper Traversal Order" from zhongjinji makes a number of improvements in the OOM killer: mainly thawing a more appropriate group of victim threads so they can release resources. - The 5 patch series "mm/damon: misc fixups and improvements for 6.18" from SeongJae Park is a bunch of small and unrelated fixups for DAMON. - The 7 patch series "mm/damon: define and use DAMON initialization check function" from SeongJae Park implement reliability and maintainability improvements to a recently-added bug fix. - The 2 patch series "mm/damon/stat: expose auto-tuned intervals and non-idle ages" from SeongJae Park provides additional transparency to userspace clients of the DAMON_STAT information. - The 2 patch series "Expand scope of khugepaged anonymous collapse" from Dev Jain removes some constraints on khubepaged's collapsing of anon VMAs. It also increases the success rate of MADV_COLLAPSE against an anon vma. - The 2 patch series "mm: do not assume file == vma->vm_file in compat_vma_mmap_prepare()" from Lorenzo Stoakes moves us further towards removal of file_operations.mmap(). This patchset concentrates upon clearing up the treatment of stacked filesystems. - The 6 patch series "mm: Improve mlock tracking for large folios" from Kiryl Shutsemau provides some fixes and improvements to mlock's tracking of large folios. /proc/meminfo's "Mlocked" field became more accurate. - The 2 patch series "mm/ksm: Fix incorrect accounting of KSM counters during fork" from Donet Tom fixes several user-visible KSM stats inaccuracies across forks and adds selftest code to verify these counters. - The 2 patch series "mm_slot: fix the usage of mm_slot_entry" from Wei Yang addresses some potential but presently benign issues in KSM's mm_slot handling. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaN3cywAKCRDdBJ7gKXxA jtaPAQDmIuIu7+XnVUK5V11hsQ/5QtsUeLHV3OsAn4yW5/3dEQD/UddRU08ePN+1 2VRB0EwkLAdfMWW7TfiNZ+yhuoiL/AA= =4mhY -----END PGP SIGNATURE----- Merge tag 'mm-stable-2025-10-01-19-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "mm, swap: improve cluster scan strategy" from Kairui Song improves performance and reduces the failure rate of swap cluster allocation - "support large align and nid in Rust allocators" from Vitaly Wool permits Rust allocators to set NUMA node and large alignment when perforning slub and vmalloc reallocs - "mm/damon/vaddr: support stat-purpose DAMOS" from Yueyang Pan extend DAMOS_STAT's handling of the DAMON operations sets for virtual address spaces for ops-level DAMOS filters - "execute PROCMAP_QUERY ioctl under per-vma lock" from Suren Baghdasaryan reduces mmap_lock contention during reads of /proc/pid/maps - "mm/mincore: minor clean up for swap cache checking" from Kairui Song performs some cleanup in the swap code - "mm: vm_normal_page() improvements" from David Hildenbrand provides code cleanup in the pagemap code - "add persistent huge zero folio support" from Pankaj Raghav provides a block layer speedup by optionalls making the huge_zero_pagepersistent, instead of releasing it when its refcount falls to zero - "kho: fixes and cleanups" from Mike Rapoport adds a few touchups to the recently added Kexec Handover feature - "mm: make mm->flags a bitmap and 64-bit on all arches" from Lorenzo Stoakes turns mm_struct.flags into a bitmap. To end the constant struggle with space shortage on 32-bit conflicting with 64-bit's needs - "mm/swapfile.c and swap.h cleanup" from Chris Li cleans up some swap code - "selftests/mm: Fix false positives and skip unsupported tests" from Donet Tom fixes a few things in our selftests code - "prctl: extend PR_SET_THP_DISABLE to only provide THPs when advised" from David Hildenbrand "allows individual processes to opt-out of THP=always into THP=madvise, without affecting other workloads on the system". It's a long story - the [1/N] changelog spells out the considerations - "Add and use memdesc_flags_t" from Matthew Wilcox gets us started on the memdesc project. Please see https://kernelnewbies.org/MatthewWilcox/Memdescs and https://blogs.oracle.com/linux/post/introducing-memdesc - "Tiny optimization for large read operations" from Chi Zhiling improves the efficiency of the pagecache read path - "Better split_huge_page_test result check" from Zi Yan improves our folio splitting selftest code - "test that rmap behaves as expected" from Wei Yang adds some rmap selftests - "remove write_cache_pages()" from Christoph Hellwig removes that function and converts its two remaining callers - "selftests/mm: uffd-stress fixes" from Dev Jain fixes some UFFD selftests issues - "introduce kernel file mapped folios" from Boris Burkov introduces the concept of "kernel file pages". Using these permits btrfs to account its metadata pages to the root cgroup, rather than to the cgroups of random inappropriate tasks - "mm/pageblock: improve readability of some pageblock handling" from Wei Yang provides some readability improvements to the page allocator code - "mm/damon: support ARM32 with LPAE" from SeongJae Park teaches DAMON to understand arm32 highmem - "tools: testing: Use existing atomic.h for vma/maple tests" from Brendan Jackman performs some code cleanups and deduplication under tools/testing/ - "maple_tree: Fix testing for 32bit compiles" from Liam Howlett fixes a couple of 32-bit issues in tools/testing/radix-tree.c - "kasan: unify kasan_enabled() and remove arch-specific implementations" from Sabyrzhan Tasbolatov moves KASAN arch-specific initialization code into a common arch-neutral implementation - "mm: remove zpool" from Johannes Weiner removes zspool - an indirection layer which now only redirects to a single thing (zsmalloc) - "mm: task_stack: Stack handling cleanups" from Pasha Tatashin makes a couple of cleanups in the fork code - "mm: remove nth_page()" from David Hildenbrand makes rather a lot of adjustments at various nth_page() callsites, eventually permitting the removal of that undesirable helper function - "introduce kasan.write_only option in hw-tags" from Yeoreum Yun creates a KASAN read-only mode for ARM, using that architecture's memory tagging feature. It is felt that a read-only mode KASAN is suitable for use in production systems rather than debug-only - "mm: hugetlb: cleanup hugetlb folio allocation" from Kefeng Wang does some tidying in the hugetlb folio allocation code - "mm: establish const-correctness for pointer parameters" from Max Kellermann makes quite a number of the MM API functions more accurate about the constness of their arguments. This was getting in the way of subsystems (in this case CEPH) when they attempt to improving their own const/non-const accuracy - "Cleanup free_pages() misuse" from Vishal Moola fixes a number of code sites which were confused over when to use free_pages() vs __free_pages() - "Add Rust abstraction for Maple Trees" from Alice Ryhl makes the mapletree code accessible to Rust. Required by nouveau and by its forthcoming successor: the new Rust Nova driver - "selftests/mm: split_huge_page_test: split_pte_mapped_thp improvements" from David Hildenbrand adds a fix and some cleanups to the thp selftesting code - "mm, swap: introduce swap table as swap cache (phase I)" from Chris Li and Kairui Song is the first step along the path to implementing "swap tables" - a new approach to swap allocation and state tracking which is expected to yield speed and space improvements. This patchset itself yields a 5-20% performance benefit in some situations - "Some ptdesc cleanups" from Matthew Wilcox utilizes the new memdesc layer to clean up the ptdesc code a little - "Fix va_high_addr_switch.sh test failure" from Chunyu Hu fixes some issues in our 5-level pagetable selftesting code - "Minor fixes for memory allocation profiling" from Suren Baghdasaryan addresses a couple of minor issues in relatively new memory allocation profiling feature - "Small cleanups" from Matthew Wilcox has a few cleanups in preparation for more memdesc work - "mm/damon: add addr_unit for DAMON_LRU_SORT and DAMON_RECLAIM" from Quanmin Yan makes some changes to DAMON in furtherance of supporting arm highmem - "selftests/mm: Add -Wunreachable-code and fix warnings" from Muhammad Anjum adds that compiler check to selftests code and fixes the fallout, by removing dead code - "Improvements to Victim Process Thawing and OOM Reaper Traversal Order" from zhongjinji makes a number of improvements in the OOM killer: mainly thawing a more appropriate group of victim threads so they can release resources - "mm/damon: misc fixups and improvements for 6.18" from SeongJae Park is a bunch of small and unrelated fixups for DAMON - "mm/damon: define and use DAMON initialization check function" from SeongJae Park implement reliability and maintainability improvements to a recently-added bug fix - "mm/damon/stat: expose auto-tuned intervals and non-idle ages" from SeongJae Park provides additional transparency to userspace clients of the DAMON_STAT information - "Expand scope of khugepaged anonymous collapse" from Dev Jain removes some constraints on khubepaged's collapsing of anon VMAs. It also increases the success rate of MADV_COLLAPSE against an anon vma - "mm: do not assume file == vma->vm_file in compat_vma_mmap_prepare()" from Lorenzo Stoakes moves us further towards removal of file_operations.mmap(). This patchset concentrates upon clearing up the treatment of stacked filesystems - "mm: Improve mlock tracking for large folios" from Kiryl Shutsemau provides some fixes and improvements to mlock's tracking of large folios. /proc/meminfo's "Mlocked" field became more accurate - "mm/ksm: Fix incorrect accounting of KSM counters during fork" from Donet Tom fixes several user-visible KSM stats inaccuracies across forks and adds selftest code to verify these counters - "mm_slot: fix the usage of mm_slot_entry" from Wei Yang addresses some potential but presently benign issues in KSM's mm_slot handling * tag 'mm-stable-2025-10-01-19-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (372 commits) mm: swap: check for stable address space before operating on the VMA mm: convert folio_page() back to a macro mm/khugepaged: use start_addr/addr for improved readability hugetlbfs: skip VMAs without shareable locks in hugetlb_vmdelete_list alloc_tag: fix boot failure due to NULL pointer dereference mm: silence data-race in update_hiwater_rss mm/memory-failure: don't select MEMORY_ISOLATION mm/khugepaged: remove definition of struct khugepaged_mm_slot mm/ksm: get mm_slot by mm_slot_entry() when slot is !NULL hugetlb: increase number of reserving hugepages via cmdline selftests/mm: add fork inheritance test for ksm_merging_pages counter mm/ksm: fix incorrect KSM counter handling in mm_struct during fork drivers/base/node: fix double free in register_one_node() mm: remove PMD alignment constraint in execmem_vmalloc() mm/memory_hotplug: fix typo 'esecially' -> 'especially' mm/rmap: improve mlock tracking for large folios mm/filemap: map entire large folio faultaround mm/fault: try to map the entire file folio in finish_fault() mm/rmap: mlock large folios in try_to_unmap_one() mm/rmap: fix a mlock race condition in folio_referenced_one() ...	2025-10-02 18:18:33 -07:00
Gopi Krishna Menon	03faea8466	selftests/net: add tcp_port_share to .gitignore Add the tcp_port_share test binary to .gitignore to avoid accidentally staging the build artifact. Signed-off-by: Gopi Krishna Menon <krishnagopi487@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250929163140.122383-1-krishnagopi487@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-30 08:30:26 -07:00
Jakub Kicinski	b3820e0e6c	selftests: drv-net: psp: add tests for destroying devices Add tests for making sure device can disappear while associations exist. This is netdevsim-only since destroying real devices is more tricky. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com> Link: https://patch.msgid.link/20250927225420.1443468-9-kuba@kernel.org Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-09-30 15:17:22 +02:00
Jakub Kicinski	81b8908531	selftests: drv-net: psp: add association tests Add tests for exercising PSP associations for TCP sockets. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com> Link: https://patch.msgid.link/20250927225420.1443468-6-kuba@kernel.org Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-09-30 15:17:22 +02:00
Jakub Kicinski	8a5f956a9f	selftests: drv-net: base device access API test Simple PSP test to getting info about PSP devices. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com> Link: https://patch.msgid.link/20250927225420.1443468-3-kuba@kernel.org Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-09-30 15:17:21 +02:00
Kuniyuki Iwashima	9b62d53cc8	selftest: packetdrill: Import client-ack-dropped-then-recovery-ms-timestamps.pkt This also does not have the non-experimental version, so converted to FO. The comment in .pkt explains the detailed scenario. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250927213022.1850048-14-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 18:41:39 -07:00
Kuniyuki Iwashima	05b9f505fb	selftest: packetdrill: Import sockopt-fastopen-key.pkt sockopt-fastopen-key.pkt does not have the non-experimental version, so the Experimental version is converted, FOEXP -> FO. The test sets net.ipv4.tcp_fastopen_key=0-0-0-0 and instead sets another key via setsockopt(TCP_FASTOPEN_KEY). The first listener generates a valid cookie in response to TFO option without cookie, and the second listner creates a TFO socket using the valid cookie. TCP_FASTOPEN_KEY is adjusted to use the common key in default.sh so that we can use TFO_COOKIE and support dualstack. Similarly, TFO_COOKIE_ZERO for the 0-0-0-0 key is defined. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250927213022.1850048-13-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 18:41:39 -07:00
Kuniyuki Iwashima	be90c7b3d5	selftest: packetdrill: Refine tcp_fastopen_server_reset-after-disconnect.pkt. These changes are applied to follow the imported packetdrill tests. * Call setsockopt(TCP_FASTOPEN) * Remove unnecessary accept() delay * Add assertion for TCP states * Rename to tcp_fastopen_server_trigger-rst-reconnect.pkt. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250927213022.1850048-12-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 18:41:39 -07:00
Kuniyuki Iwashima	21f7fb31ae	selftest: packetdrill: Import opt34/-trigger-rst.pkt. This imports the non-experimental version of opt34/-trigger-rst.pkt. \| accept() \| SYN data \| -----------------------------------+----------+----------+ listener-closed-trigger-rst.pkt \| no \| unread \| unread-data-closed-trigger-rst.pkt \| yes \| unread \| Both files test that close()ing a SYN_RECV socket with unread SYN data triggers RST. The files are renamed to have the common prefix, trigger-rst. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250927213022.1850048-11-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 18:41:39 -07:00
Kuniyuki Iwashima	5920f154e1	selftest: packetdrill: Import opt34/reset-* tests. This imports the non-experimental version of opt34/reset-*.pkt. \| Child \| RST \| sk_err \| ---------------------------------+---------+-------------------------------+---------+ reset-after-accept.pkt \| TFO \| after accept(), SYN_RECV \| read() \| reset-close-with-unread-data.pkt \| TFO \| after accept(), SYN_RECV \| write() \| reset-before-accept.pkt \| TFO \| before accept(), SYN_RECV \| read() \| reset-non-tfo-socket.pkt \| non-TFO \| before accept(), ESTABLISHED \| write() \| The first 3 files test scenarios where a SYN_RECV socket receives RST before/after accept() and data in SYN must be read() without error, but the following read() or fist write() will return ECONNRESET. The last test is similar but with non-TFO socket. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250927213022.1850048-10-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 18:41:39 -07:00
Kuniyuki Iwashima	a8b1750e68	selftest: packetdrill: Import opt34/icmp-before-accept.pkt. This imports the non-experimental version of icmp-before-accept.pkt. This file tests the scenario where an ICMP unreachable packet for a not-yet-accept()ed socket changes its state to TCP_CLOSE, but the SYN data must be read without error, and the following read() returns EHOSTUNREACH. Note that this test support only IPv4 as icmp is used. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250927213022.1850048-9-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 18:41:39 -07:00
Kuniyuki Iwashima	5ed080f85a	selftest: packetdrill: Import opt34/fin-close-socket.pkt. This imports the non-experimental version of fin-close-socket.pkt. This file tests the scenario where a TFO child socket's state transitions from SYN_RECV to CLOSE_WAIT before accept()ed. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250927213022.1850048-8-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 18:41:39 -07:00
Kuniyuki Iwashima	e57b3933ab	selftest: packetdrill: Add test for experimental option. The only difference between non-experimental vs experimental TFO option handling is SYN+ACK generation. When tcp_parse_fastopen_option() parses a TFO option, it sets tcp_fastopen_cookie.exp to false if the option number is 34, and true if 255. The value is carried to tcp_options_write() to generate a TFO option with the same option number. Other than that, all the TFO handling is the same and the kernel must generate the same cookie regardless of the option number. Let's add a test for the handling so that we can consolidate fastopen/server/ tests and fastopen/server/opt34 tests. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250927213022.1850048-7-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 18:41:39 -07:00
Kuniyuki Iwashima	399e0a7ed9	selftest: packetdrill: Add test for TFO_SERVER_WO_SOCKOPT1. TFO_SERVER_WO_SOCKOPT1 is no longer enabled by default, and each server test requires setsockopt(TCP_FASTOPEN). Let's add a basic test for TFO_SERVER_WO_SOCKOPT1. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250927213022.1850048-6-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 18:41:38 -07:00
Kuniyuki Iwashima	0b8f164eb2	selftest: packetdrill: Import TFO server basic tests. This imports basic TFO server tests from google/packetdrill. The repository has two versions of tests for most scenarios; one uses the non-experimental option (34), and the other uses the experimental option (255) with 0xF989. This only imports the following tests of the non-experimental version placed in [0]. I will add a specific test for the experimental option handling later. \| TFO \| Cookie \| Payload \| ---------------------------+-----+--------+---------+ basic-rw.pkt \| yes \| yes \| yes \| basic-zero-payload.pkt \| yes \| yes \| no \| basic-cookie-not-reqd.pkt \| yes \| no \| yes \| basic-non-tfo-listener.pkt \| no \| yes \| yes \| pure-syn-data.pkt \| yes \| no \| yes \| The original pure-syn-data.pkt missed setsockopt(TCP_FASTOPEN) and did not test TFO server in some scenarios unintentionally, so setsockopt() is added where needed. In addition, non-TFO scenario is stripped as it is covered by basic-non-tfo-listener.pkt. Also, I added basic- prefix. Link: https://github.com/google/packetdrill/tree/bfc96251310f/gtests/net/tcp/fastopen/server/opt34 #[0] Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250927213022.1850048-5-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 18:41:38 -07:00
Kuniyuki Iwashima	97b3b8306f	selftest: packetdrill: Define common TCP Fast Open cookie. TCP Fast Open cookie is generated in __tcp_fastopen_cookie_gen_cipher(). The cookie value is generated from src/dst IPs and a key configured by setsockopt(TCP_FASTOPEN_KEY) or net.ipv4.tcp_fastopen_key. The default.sh sets net.ipv4.tcp_fastopen_key, and the original packetdrill defines the corresponding cookie as TFO_COOKIE in run_all.py. [0] Then, each test does not need to care about the value, and we can easily update TFO_COOKIE in case __tcp_fastopen_cookie_gen_cipher() changes the algorithm. However, some tests use the bare hex value for specific IPv4 addresses and do not support IPv6. Let's define the same TFO_COOKIE in ksft_runner.sh. We will replace such bare hex values with TFO_COOKIE except for a single test for setsockopt(TCP_FASTOPEN_KEY). Link: https://github.com/google/packetdrill/blob/7230b3990f94/gtests/net/packetdrill/run_all.py#L65 #[0] Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250927213022.1850048-4-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 18:41:19 -07:00
Kuniyuki Iwashima	261cb8b123	selftest: packetdrill: Require explicit setsockopt(TCP_FASTOPEN). To enable TCP Fast Open on a server, net.ipv4.tcp_fastopen must have 0x2 (TFO_SERVER_ENABLE), and we need to do either 1. Call setsockopt(TCP_FASTOPEN) for the socket 2. Set 0x400 (TFO_SERVER_WO_SOCKOPT1) additionally to net.ipv4.tcp_fastopen The default.sh sets 0x70403 so that each test does not need setsockopt(). (0x1 is TFO_CLIENT_ENABLE, and 0x70000 is ...???) However, some tests overwrite net.ipv4.tcp_fastopen without TFO_SERVER_WO_SOCKOPT1 and forgot setsockopt(TCP_FASTOPEN). For example, pure-syn-data.pkt [0] tests non-TFO servers unintentionally, except in the first scenario. To prevent such an accident, let's require explicit setsockopt(). TFO_CLIENT_ENABLE is necessary for tcp_syscall_bad_arg_fastopen-invalid-buf-ptr.pkt. Link: https://github.com/google/packetdrill/blob/bfc96251310f/gtests/net/tcp/fastopen/server/opt34/pure-syn-data.pkt #[0] Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250927213022.1850048-3-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 18:41:07 -07:00
Kuniyuki Iwashima	70dd4775db	selftest: packetdrill: Set ktap_set_plan properly for single protocol test. The cited commit forgot to update the ktap_set_plan call. ktap_set_plan sets the number of tests (KSFT_NUM_TESTS), which must match the number of executed tests (KTAP_CNT_PASS + KTAP_CNT_SKIP + KTAP_CNT_XFAIL) in ktap_finished. Otherwise, the selftest exit()s with 1. Let's adjust KSFT_NUM_TESTS based on supported protocols. While at it, misalignment is fixed up. Fixes: `a5c10aa3d1` ("selftests/net: packetdrill: Support single protocol test.") Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250927213022.1850048-2-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 18:41:06 -07:00
Matthieu Baerts (NGI0)	c912f935a5	selftests: mptcp: join: validate new laminar endp Here are a few sub-tests for mptcp_join.sh, validating the new 'laminar' endpoint type. In a setup where subflows created using the routing rules would be rejected by the listener, and where the latter announces one IP address, some cases are verified: - Without any 'laminar' endpoints: no new subflows are created. - With one 'laminar' endpoint: a second subflow is created. - With multiple 'laminar' endpoints: 2 IPv4 subflows are created. - With one 'laminar' endpoint, but the server announcing a second IP address, only one subflow is created. - With one 'laminar' + 'subflow' endpoint, the same endpoint is only used once. Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250927-net-next-mptcp-rcv-path-imp-v1-8-5da266aa9c1a@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 18:23:36 -07:00
Petr Machata	fca6ff9191	selftests: forwarding: README: Mention defer, adf_ Mention how it would be nice if new code used defer. Also if it does that in dirtying helpers, how it would be nice if these were named adf_*. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/0764bdb9266cd516da23ddeec110e01118cf981e.1758821127.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-26 17:48:41 -07:00
Petr Machata	040a6cbead	selftests: forwarding: lib: Add an autodefer variant of forwarding_enable() Most forwarding tests invoke forwarding_enable() to enable the router and forwarding_restore() to restore the original configuration. Add a helper, adf_forwarding_enable(), which is like forwarding_enable(), but takes care of scheduling the cleanup automatically. Convert the tests that currently use defer to schedule the cleanup. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/78b752c40069cde21c44dcf4c7b966a76a0eef2c.1758821127.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-26 17:48:40 -07:00
Petr Machata	f53748d56d	selftests: forwarding: lib: Add an autodefer variant of simple_if_init() Most forwarding tests invoke simple_if_init() to set up a VRF-based "host" and simple_if_fini() to tear it down again. Add a helper, adf_simple_if_init(), which is like simple_if_fini(), but takes care of scheduling the cleanup automatically. Convert the tests that currently use defer to schedule the cleanup. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/6b9ee1a7946a36fd32a47fdb1aa9325198ffc695.1758821127.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-26 17:48:40 -07:00
Petr Machata	02aabe00b2	selftests: forwarding: lib: Add an autodefer variant of vrf_prepare() Most forwarding tests invoke vrf_prepare() to set up VRF forwarding and vrf_cleanup() to restore the original configuration. Add a helper, adf_vrf_prepare(), which is like vrf_prepare(), but takes care of scheduling the cleanup automatically. Convert a number of tests that currently use defer to schedule the cleanup. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/2f2000e54ae700d560a8d6128322dade3bd2207e.1758821127.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-26 17:48:40 -07:00
Petr Machata	14b72996ae	selftests: net: vlan_bridge_binding: Rename dfr_set_binding_() to adf_ This test contains two autodefer-like helpers, but namespaces them as dfr_* instead of adf_* like this patchset. Rename them. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/5f0c81b39e9e1f56f706cc4b53f82238a1d1e2f9.1758821127.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-26 17:48:39 -07:00
Petr Machata	b628dfcd54	selftests: net: lib: Rename bridge_vlan_add() to adf_* Rename this function to mark it as autodefer. For details, see the discussion in the cover letter. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/93526ce79e635a3ec34753c796edf0c96711547d.1758821127.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-26 17:48:39 -07:00
Petr Machata	d85bcf6505	selftests: net: lib: Rename ip_route_add() to adf_* Rename this function to mark it as autodefer. For details, see the discussion in the cover letter. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/403143183373419e4a31df4665d6bfaa273eb761.1758821127.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-26 17:48:39 -07:00
Petr Machata	773603d6db	selftests: net: lib: Rename ip_addr_add() to adf_* Rename this function to mark it as autodefer. For details, see the discussion in the cover letter. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/706327a5db660c7f18ba9fbfba7ce913da065e3e.1758821127.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-26 17:48:38 -07:00
Petr Machata	a55f9fb343	selftests: net: lib: Rename ip_link_set_down() to adf_* Rename this function to mark it as autodefer. For details, see the discussion in the cover letter. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/e5bf4cb3405fb50fe6e217a04268952e97410dc2.1758821127.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-26 17:48:38 -07:00
Petr Machata	34d3f8b75e	selftests: net: lib: Rename ip_link_set_up() to adf_* Rename this function to mark it as autodefer. For details, see the discussion in the cover letter. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/475716ef792f5bd42e5c8ef1c3e287b1294f1630.1758821127.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-26 17:48:37 -07:00
Petr Machata	beb98a3477	selftests: net: lib: Rename ip_link_set_addr() to adf_* Rename this function to mark it as autodefer. For details, see the discussion in the cover letter. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/5318e90f7f491f9f397ac221a8b47fdbedd0d3b2.1758821127.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-26 17:48:37 -07:00
Petr Machata	c3cbd21fe1	selftests: net: lib: Rename ip_link_set_master() to adf_* Rename this function to mark it as autodefer. For details, see the discussion in the cover letter. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/53ce64231faa1396a968b2869af5f1c0aebec2c9.1758821127.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-26 17:48:37 -07:00
Petr Machata	191c4912f9	selftests: net: lib: Rename ip_link_add() to adf_* Rename this function to mark it as autodefer. For details, see the discussion in the cover letter. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/0b163cca1bf2ec44270e0fc89108f488d99d9c9d.1758821127.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-26 17:48:36 -07:00
Matthieu Baerts (NGI0)	008385efd0	selftests: mptcp: join: validate C-flag + def limit The previous commit adds an exception for the C-flag case. The 'mptcp_join.sh' selftest is extended to validate this case. In this subtest, there is a typical CDN deployment with a client where MPTCP endpoints have been 'automatically' configured: - the server set net.mptcp.allow_join_initial_addr_port=0 - the client has multiple 'subflow' endpoints, and the default limits: not accepting ADD_ADDRs. Without the parent patch, the client is not able to establish new subflows using its 'subflow' endpoints. The parent commit fixes that. The 'Fixes' tag here below is the same as the one from the previous commit: this patch here is not fixing anything wrong in the selftests, but it validates the previous fix for an issue introduced by this commit ID. Fixes: `df377be387` ("mptcp: add deny_join_id0 in mptcp_options_received") Cc: stable@vger.kernel.org Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250925-net-next-mptcp-c-flag-laminar-v1-2-ad126cc47c6b@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-26 17:44:03 -07:00
Alessandro Zanni	81dcfdd21d	selftest: net: Fix error message if empty variable Fix to avoid cases where the `res` shell variable is empty in script comparisons. The comparison has been modified into string comparison to handle other possible values the variable could assume. The issue can be reproduced with the command: make kselftest TARGETS=net It solves the error: ./tfo_passive.sh: line 98: [: -eq: unary operator expected Signed-off-by: Alessandro Zanni <alessandro.zanni87@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250925132832.9828-1-alessandro.zanni87@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-26 15:23:33 -07:00
Amery Hung	11ae737efe	selftests: drv-net: Reload pkt pointer after calling filter_udphdr Fix a verification failure. filter_udphdr() calls bpf_xdp_pull_data(), which will invalidate all pkt pointers. Therefore, all ctx->data loaded before filter_udphdr() cannot be used. Reload it to prevent verification errors. The error may not appear on some compiler versions if they decide to load ctx->data after filter_udphdr() when it is first used. Fixes: `efec2e55bd` ("selftests: drv-net: Pull data before parsing headers") Signed-off-by: Amery Hung <ameryhung@gmail.com> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20250925161452.1290694-1-ameryhung@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-26 13:54:46 -07:00
Jakub Kicinski	203e3beb73	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.17-rc8). Conflicts: drivers/net/can/spi/hi311x.c `6b69680847` ("can: hi311x: fix null pointer dereference when resuming from sleep before interface was enabled") `27ce71e1ce` ("net: WQ_PERCPU added to alloc_workqueue users") https://lore.kernel.org/72ce7599-1b5b-464a-a5de-228ff9724701@kernel.org net/smc/smc_loopback.c drivers/dibs/dibs_loopback.c `a35c04de25` ("net/smc: fix warning in smc_rx_splice() when calling get_page()") `cc21191b58` ("dibs: Move data path to dibs layer") https://lore.kernel.org/74368a5c-48ac-4f8e-a198-40ec1ed3cf5f@kernel.org Adjacent changes: drivers/net/dsa/lantiq/lantiq_gswip.c `c0054b25e2` ("net: dsa: lantiq_gswip: move gswip_add_single_port_br() call to port_setup()") `7a1eaef0a7` ("net: dsa: lantiq_gswip: support model-specific mac_select_pcs()") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-25 11:00:59 -07:00
Richard Gobert	5e9ff9378a	selftests/net: test ipip packets in gro.sh Add IPIP test-cases to the GRO selftest. This selftest already contains IP ID test-cases. They are now also tested for encapsulated packets. This commit also fixes ipip packet generation in the test. Signed-off-by: Richard Gobert <richardbgobert@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250923085908.4687-6-richardbgobert@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-09-25 12:42:49 +02:00
Richard Gobert	f095a358fa	net: gro: remove unnecessary df checks Currently, packets with fixed IDs will be merged only if their don't-fragment bit is set. This restriction is unnecessary since packets without the don't-fragment bit will be forwarded as-is even if they were merged together. The merged packets will be segmented into their original forms before being forwarded, either by GSO or by TSO. The IDs will also remain identical unless NETIF_F_TSO_MANGLEID is set, in which case the IDs can become incrementing, which is also fine. Clean up the code by removing the unnecessary don't-fragment checks. Signed-off-by: Richard Gobert <richardbgobert@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250923085908.4687-5-richardbgobert@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-09-25 12:42:49 +02:00
Jakub Kicinski	c7ab8024ca	netfilter pull request nf-next-25-09-24 -----BEGIN PGP SIGNATURE----- iQJBBAABCAArFiEEgKkgxbID4Gn1hq6fcJGo2a1f9gAFAmjT71ENHGZ3QHN0cmxl bi5kZQAKCRBwkajZrV/2AN7OEAClGm5eABrjTlHq8REn7S4xUBo9/jOESTvg22Bg A4qu14OV7rGMT/mLyV0kuNzHs18NT9bhpTwANL0bjz7YOK+v8QRlsHzNaJ7dzJ00 WUPPYFnAhCA3qTCWLLvCnzpZgAopnRpFAgpXP6ddZ2kpsT+fo5B67kcasOQoVJzz 9Kk98r0kgo6YKghmaIhBgchTt+rxZbo4R0Jb9BoBPXeQ32AghacOlngXPiAJA2d/ Yeqet5p0gg8oI62KhhBC+uXyR29DG4c3iD2pUUhPnxV1Gt3S6AG1PTkt0KYzqVKm XEIxjrmtn3+2NYcaaH340mQwBbBV1EdZroHeBPUN9gYr5MpkHjppBJyrQ76cXXcr xKw2C357H4O8tT2nxz4LvJagRKhnZa0P6vWL6HeFIlFHXOqEJXYG6X2jbDzomUXd NPQrwKJnf2iYiRcc4U5xyABZE98/kVmqcclJDs0Zxz0eX8N8e26d/wjFKAZxfJyY pCUjF7lo7Doq6sOiB25OZjbPN3Fz/BcBTYTCTK+8MuQM1EaKXm33RcWjtO2gDUoe vyGfQResdwFELj9niPf27Nymezm1nX27x6f/jqCmioLxqQDGAGPsgJIMHy9XO8wz 3YU/92dbwBdOQ6laKT7f5ClfPGDQlpqQTXXyxVHLFUd3T6JHJGUv0AplwVNZk6l8 KCXEWg== =uK7h -----END PGP SIGNATURE----- Merge tag 'nf-next-25-09-24' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Florian Westphal says: ==================== netfilter: fixes for net-next These fixes target next because the bug is either not severe or has existed for so long that there is no reason to cram them in at the last minute. 1) Fix IPVS ftp unregistering during netns cleanup, broken since netns support was introduced in 2011 in the 2.6.39 kernel. From Slavin Liu. 2) nfnetlink must reset the 'nlh' pointer back to the original address when a batch is replayed, else we emit bogus ACK messages and conceal real errno from userspace. From Fernando Fernandez Mancera. This was broken since 6.10. 3) Recent fix for nftables 'pipapo' set type was incomplete, it only made things work for the AVX2 version of the algorithm. 4) Testing revealed another problem with avx2 version that results in out-of-bounds read access, this bug always existed since feature was added in 5.7 kernel. This also comes with a selftest update. Last fix resolves a long-standing bug (since 4.9) in conntrack /proc interface: Decrease skip count when we reap an expired entry during dump. As-is we erronously elide one conntrack entry from dump for every expired entry seen. From Eric Dumazet. * tag 'nf-next-25-09-24' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: nf_conntrack: do not skip entries in /proc/net/nf_conntrack selftests: netfilter: nft_concat_range.sh: add check for double-create bug netfilter: nft_set_pipapo_avx2: fix skip of expired entries netfilter: nft_set_pipapo: use 0 genmask for packetpath lookups netfilter: nfnetlink: reset nlh pointer during batch replay ipvs: Defer ip_vs_ftp unregister during netns cleanup ==================== Link: https://patch.msgid.link/20250924140654.10210-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-24 17:45:15 -07:00
Jakub Kicinski	5e3fee34f6	bpf-next-for-netdev -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ6NaUOruQGUkvPdG4raS+Z+3y5EwUCaNNwBQAKCRAraS+Z+3y5 E8heAQDdJTR9rwAL7gD79cldlHP5PTmjyidLIoFG/efaGSbN1AD9EdvrykDU4xOG aGaO8TooGUZf7vAL8tIFuMeydYvi/gM= =Qu4T -----END PGP SIGNATURE----- Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Martin KaFai Lau says: ==================== pull-request: bpf-next 2025-09-23 We've added 9 non-merge commits during the last 33 day(s) which contain a total of 10 files changed, 480 insertions(+), 53 deletions(-). The main changes are: 1) A new bpf_xdp_pull_data kfunc that supports pulling data from a frag into the linear area of a xdp_buff, from Amery Hung. This includes changes in the xdp_native.bpf.c selftest, which Nimrod's future work depends on. It is a merge from a stable branch 'xdp_pull_data' which has also been merged to bpf-next. There is a conflict with recent changes in 'include/net/xdp.h' in the net-next tree that will need to be resolved. 2) A compiler warning fix when CONFIG_NET=n in the recent dynptr skb_meta support, from Jakub Sitnicki. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: selftests: drv-net: Pull data before parsing headers selftests/bpf: Test bpf_xdp_pull_data bpf: Support specifying linear xdp packet data size for BPF_PROG_TEST_RUN bpf: Make variables in bpf_prog_test_run_xdp less confusing bpf: Clear packet pointers after changing packet data in kfuncs bpf: Support pulling non-linear xdp data bpf: Allow bpf_xdp_shrink_data to shrink a frag from head and tail bpf: Clear pfmemalloc flag when freeing all fragments bpf: Return an error pointer for skb metadata when CONFIG_NET=n ==================== Link: https://patch.msgid.link/20250924050303.2466356-1-martin.lau@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-24 10:22:37 -07:00
Florian Westphal	94bd247bc2	selftests: netfilter: nft_concat_range.sh: add check for double-create bug Add a test case for bug resolved with: 'netfilter: nft_set_pipapo_avx2: fix skip of expired entries'. It passes on nf.git (it uses the generic/C version for insertion duplicate check) but fails on unpatched nf-next if AVX2 is supported: cannot create same element twice 0s [FAIL] Could create element twice in same transaction table inet filter { # handle 8 [..] elements = { 1.2.3.4 . 1.2.4.1 counter packets 0 bytes 0, 1.2.4.1 . 1.2.3.4 counter packets 0 bytes 0, 1.2.3.4 . 1.2.4.1 counter packets 0 bytes 0, 1.2.4.1 . 1.2.3.4 counter packets 0 bytes 0 } Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de>	2025-09-24 11:50:28 +02:00
Petr Machata	f67e9ae72d	selftests: bridge_fdb_local_vlan_0: Test FDB vs. NET_ADDR_SET behavior The previous patch fixed an issue whereby no FDB entry would be created for the bridge itself on VLAN 0 under some circumstances. This could break forwarding. Add a test for the fix. Signed-off-by: Petr Machata <petrm@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/137cc25396f5a4f407267af895a14bc45552ba5f.1758550408.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-23 17:10:49 -07:00
Ido Schimmel	00af023d90	selftests: fib_nexthops: Add test cases for FDB status change Add the following test cases for both IPv4 and IPv6: * Can change from FDB nexthop to non-FDB nexthop and vice versa. * Can change FDB nexthop address while in a group. * Cannot change from FDB nexthop to non-FDB nexthop and vice versa while in a group. Output without "nexthop: Forbid FDB status change while nexthop is in a group": # ./fib_nexthops.sh -t "ipv6_fdb_grp_fcnal ipv4_fdb_grp_fcnal" IPv6 fdb groups functional -------------------------- [...] TEST: Replace FDB nexthop to non-FDB nexthop [ OK ] TEST: Replace non-FDB nexthop to FDB nexthop [ OK ] TEST: Replace FDB nexthop address while in a group [ OK ] TEST: Replace FDB nexthop to non-FDB nexthop while in a group [FAIL] TEST: Replace non-FDB nexthop to FDB nexthop while in a group [FAIL] [...] IPv4 fdb groups functional -------------------------- [...] TEST: Replace FDB nexthop to non-FDB nexthop [ OK ] TEST: Replace non-FDB nexthop to FDB nexthop [ OK ] TEST: Replace FDB nexthop address while in a group [ OK ] TEST: Replace FDB nexthop to non-FDB nexthop while in a group [FAIL] TEST: Replace non-FDB nexthop to FDB nexthop while in a group [FAIL] [...] Tests passed: 36 Tests failed: 4 Tests skipped: 0 Output with "nexthop: Forbid FDB status change while nexthop is in a group": # ./fib_nexthops.sh -t "ipv6_fdb_grp_fcnal ipv4_fdb_grp_fcnal" IPv6 fdb groups functional -------------------------- [...] TEST: Replace FDB nexthop to non-FDB nexthop [ OK ] TEST: Replace non-FDB nexthop to FDB nexthop [ OK ] TEST: Replace FDB nexthop address while in a group [ OK ] TEST: Replace FDB nexthop to non-FDB nexthop while in a group [ OK ] TEST: Replace non-FDB nexthop to FDB nexthop while in a group [ OK ] [...] IPv4 fdb groups functional -------------------------- [...] TEST: Replace FDB nexthop to non-FDB nexthop [ OK ] TEST: Replace non-FDB nexthop to FDB nexthop [ OK ] TEST: Replace FDB nexthop address while in a group [ OK ] TEST: Replace FDB nexthop to non-FDB nexthop while in a group [ OK ] TEST: Replace non-FDB nexthop to FDB nexthop while in a group [ OK ] [...] Tests passed: 40 Tests failed: 0 Tests skipped: 0 Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250921150824.149157-4-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-23 17:01:05 -07:00
Ido Schimmel	c29913109c	selftests: fib_nexthops: Fix creation of non-FDB nexthops The test creates non-FDB nexthops without a nexthop device which leads to the expected failure, but for the wrong reason: # ./fib_nexthops.sh -t "ipv6_fdb_grp_fcnal ipv4_fdb_grp_fcnal" -v IPv6 fdb groups functional -------------------------- [...] COMMAND: ip -netns me-nRsN3E nexthop add id 63 via 2001:db8:91::4 Error: Device attribute required for non-blackhole and non-fdb nexthops. COMMAND: ip -netns me-nRsN3E nexthop add id 64 via 2001:db8:91::5 Error: Device attribute required for non-blackhole and non-fdb nexthops. COMMAND: ip -netns me-nRsN3E nexthop add id 103 group 63/64 fdb Error: Invalid nexthop id. TEST: Fdb Nexthop group with non-fdb nexthops [ OK ] [...] IPv4 fdb groups functional -------------------------- [...] COMMAND: ip -netns me-nRsN3E nexthop add id 14 via 172.16.1.2 Error: Device attribute required for non-blackhole and non-fdb nexthops. COMMAND: ip -netns me-nRsN3E nexthop add id 15 via 172.16.1.3 Error: Device attribute required for non-blackhole and non-fdb nexthops. COMMAND: ip -netns me-nRsN3E nexthop add id 103 group 14/15 fdb Error: Invalid nexthop id. TEST: Fdb Nexthop group with non-fdb nexthops [ OK ] COMMAND: ip -netns me-nRsN3E nexthop add id 16 via 172.16.1.2 fdb COMMAND: ip -netns me-nRsN3E nexthop add id 17 via 172.16.1.3 fdb COMMAND: ip -netns me-nRsN3E nexthop add id 104 group 14/15 Error: Invalid nexthop id. TEST: Non-Fdb Nexthop group with fdb nexthops [ OK ] [...] COMMAND: ip -netns me-0dlhyd ro add 172.16.0.0/22 nhid 15 Error: Nexthop id does not exist. TEST: Route add with fdb nexthop [ OK ] In addition, as can be seen in the above output, a couple of IPv4 test cases used the non-FDB nexthops (14 and 15) when they intended to use the FDB nexthops (16 and 17). These test cases only passed because failure was expected, but they failed for the wrong reason. Fix the test to create the non-FDB nexthops with a nexthop device and adjust the IPv4 test cases to use the FDB nexthops instead of the non-FDB nexthops. Output after the fix: # ./fib_nexthops.sh -t "ipv6_fdb_grp_fcnal ipv4_fdb_grp_fcnal" -v IPv6 fdb groups functional -------------------------- [...] COMMAND: ip -netns me-lNzfHP nexthop add id 63 via 2001:db8:91::4 dev veth1 COMMAND: ip -netns me-lNzfHP nexthop add id 64 via 2001:db8:91::5 dev veth1 COMMAND: ip -netns me-lNzfHP nexthop add id 103 group 63/64 fdb Error: FDB nexthop group can only have fdb nexthops. TEST: Fdb Nexthop group with non-fdb nexthops [ OK ] [...] IPv4 fdb groups functional -------------------------- [...] COMMAND: ip -netns me-lNzfHP nexthop add id 14 via 172.16.1.2 dev veth1 COMMAND: ip -netns me-lNzfHP nexthop add id 15 via 172.16.1.3 dev veth1 COMMAND: ip -netns me-lNzfHP nexthop add id 103 group 14/15 fdb Error: FDB nexthop group can only have fdb nexthops. TEST: Fdb Nexthop group with non-fdb nexthops [ OK ] COMMAND: ip -netns me-lNzfHP nexthop add id 16 via 172.16.1.2 fdb COMMAND: ip -netns me-lNzfHP nexthop add id 17 via 172.16.1.3 fdb COMMAND: ip -netns me-lNzfHP nexthop add id 104 group 16/17 Error: Non FDB nexthop group cannot have fdb nexthops. TEST: Non-Fdb Nexthop group with fdb nexthops [ OK ] [...] COMMAND: ip -netns me-lNzfHP ro add 172.16.0.0/22 nhid 16 Error: Route cannot point to a fdb nexthop. TEST: Route add with fdb nexthop [ OK ] [...] Tests passed: 30 Tests failed: 0 Tests skipped: 0 Fixes: `0534c5489c` ("selftests: net: add fdb nexthop tests") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250921150824.149157-3-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-23 17:01:05 -07:00
Alok Tiwari	f770645860	selftests: rtnetlink: correct error message in rtnetlink.sh fou test The rtnetlink FOU selftest prints an incorrect string: "FAIL: fou"s. Change it to the intended "FAIL: fou" by removing a stray character in the end_test string of the test. Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250921192111.1567498-1-alok.a.tiwari@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-23 16:55:05 -07:00
Martin KaFai Lau	55d5a5154d	Merge branch 'bpf-next/xdp_pull_data' into 'bpf-next/net' Merge the xdp_pull_data stable branch into the net branch. No conflict. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2025-09-23 15:46:52 -07:00
Amery Hung	efec2e55bd	selftests: drv-net: Pull data before parsing headers It is possible for drivers to generate xdp packets with data residing entirely in fragments. To keep parsing headers using direct packet access, call bpf_xdp_pull_data() to pull headers into the linear data area. Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20250922233356.3356453-9-ameryhung@gmail.com	2025-09-23 15:21:26 -07:00
Jakub Sitnicki	8a8241cdaa	selftests/net: Test tcp port reuse after unbinding a socket Exercise the scenario described in detail in the cover letter: 1) socket A: connect() from ephemeral port X 2) socket B: explicitly bind() to port X 3) check that port X is now excluded from ephemeral ports 4) close socket B to release the port bind 5) socket C: connect() from ephemeral port X As well as a corner case to test that the connect-bind flag is cleared: 1) connect() from ephemeral port X 2) disconnect the socket with connect(AF_UNSPEC) 3) bind() it explicitly to port X 4) check that port X is now excluded from ephemeral ports Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://patch.msgid.link/20250917-update-bind-bucket-state-on-unhash-v5-2-57168b661b47@cloudflare.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-09-23 10:12:15 +02:00
Matthieu Baerts (NGI0)	e6c3552945	selftests: mptcp: pm: get server-side flag server-side info linked to the MPTCP connect/established events can now come from the flags, in addition to the dedicated attribute. The attribute is now deprecated -- in favour of the new flag, and will be removed later on. Print this info only once. Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250919-net-next-mptcp-server-side-flag-v1-4-a97a5d561a8b@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-22 11:51:25 -07:00
Matthieu Baerts (NGI0)	c9809f03c1	mptcp: pm: netlink: only add server-side attr when true This attribute is a boolean. No need to add it to set it to 'false'. Indeed, the default value when this attribute is not set is naturally 'false'. A few bytes can then be saved by not adding this attribute if the connection is not on the server side. This prepares the future deprecation of its attribute, in favour of a new flag. Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250919-net-next-mptcp-server-side-flag-v1-1-a97a5d561a8b@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-22 11:51:24 -07:00

1 2 3 4 5 ...

2743 Commits (09cfd3c52ea76f43b3cb15e570aeddf633d65e80)