Commit Graph

2743 Commits (09cfd3c52ea76f43b3cb15e570aeddf633d65e80)

Author SHA1 Message Date
Linus Torvalds 8804d970fa Summary of significant series in this pull request:
- The 3 patch series "mm, swap: improve cluster scan strategy" from
   Kairui Song improves performance and reduces the failure rate of swap
   cluster allocation.
 
 - The 4 patch series "support large align and nid in Rust allocators"
   from Vitaly Wool permits Rust allocators to set NUMA node and large
   alignment when perforning slub and vmalloc reallocs.
 
 - The 2 patch series "mm/damon/vaddr: support stat-purpose DAMOS" from
   Yueyang Pan extend DAMOS_STAT's handling of the DAMON operations sets
   for virtual address spaces for ops-level DAMOS filters.
 
 - The 3 patch series "execute PROCMAP_QUERY ioctl under per-vma lock"
   from Suren Baghdasaryan reduces mmap_lock contention during reads of
   /proc/pid/maps.
 
 - The 2 patch series "mm/mincore: minor clean up for swap cache
   checking" from Kairui Song performs some cleanup in the swap code.
 
 - The 11 patch series "mm: vm_normal_page*() improvements" from David
   Hildenbrand provides code cleanup in the pagemap code.
 
 - The 5 patch series "add persistent huge zero folio support" from
   Pankaj Raghav provides a block layer speedup by optionalls making the
   huge_zero_pagepersistent, instead of releasing it when its refcount
   falls to zero.
 
 - The 3 patch series "kho: fixes and cleanups" from Mike Rapoport adds a
   few touchups to the recently added Kexec Handover feature.
 
 - The 10 patch series "mm: make mm->flags a bitmap and 64-bit on all
   arches" from Lorenzo Stoakes turns mm_struct.flags into a bitmap.  To
   end the constant struggle with space shortage on 32-bit conflicting with
   64-bit's needs.
 
 - The 2 patch series "mm/swapfile.c and swap.h cleanup" from Chris Li
   cleans up some swap code.
 
 - The 7 patch series "selftests/mm: Fix false positives and skip
   unsupported tests" from Donet Tom fixes a few things in our selftests
   code.
 
 - The 7 patch series "prctl: extend PR_SET_THP_DISABLE to only provide
   THPs when advised" from David Hildenbrand "allows individual processes
   to opt-out of THP=always into THP=madvise, without affecting other
   workloads on the system".
 
   It's a long story - the [1/N] changelog spells out the considerations.
 
 - The 11 patch series "Add and use memdesc_flags_t" from Matthew Wilcox
   gets us started on the memdesc project.  Please see
   https://kernelnewbies.org/MatthewWilcox/Memdescs and
   https://blogs.oracle.com/linux/post/introducing-memdesc.
 
 - The 3 patch series "Tiny optimization for large read operations" from
   Chi Zhiling improves the efficiency of the pagecache read path.
 
 - The 5 patch series "Better split_huge_page_test result check" from Zi
   Yan improves our folio splitting selftest code.
 
 - The 2 patch series "test that rmap behaves as expected" from Wei Yang
   adds some rmap selftests.
 
 - The 3 patch series "remove write_cache_pages()" from Christoph Hellwig
   removes that function and converts its two remaining callers.
 
 - The 2 patch series "selftests/mm: uffd-stress fixes" from Dev Jain
   fixes some UFFD selftests issues.
 
 - The 3 patch series "introduce kernel file mapped folios" from Boris
   Burkov introduces the concept of "kernel file pages".  Using these
   permits btrfs to account its metadata pages to the root cgroup, rather
   than to the cgroups of random inappropriate tasks.
 
 - The 2 patch series "mm/pageblock: improve readability of some
   pageblock handling" from Wei Yang provides some readability improvements
   to the page allocator code.
 
 - The 11 patch series "mm/damon: support ARM32 with LPAE" from SeongJae
   Park teaches DAMON to understand arm32 highmem.
 
 - The 4 patch series "tools: testing: Use existing atomic.h for
   vma/maple tests" from Brendan Jackman performs some code cleanups and
   deduplication under tools/testing/.
 
 - The 2 patch series "maple_tree: Fix testing for 32bit compiles" from
   Liam Howlett fixes a couple of 32-bit issues in
   tools/testing/radix-tree.c.
 
 - The 2 patch series "kasan: unify kasan_enabled() and remove
   arch-specific implementations" from Sabyrzhan Tasbolatov moves KASAN
   arch-specific initialization code into a common arch-neutral
   implementation.
 
 - The 3 patch series "mm: remove zpool" from Johannes Weiner removes
   zspool - an indirection layer which now only redirects to a single thing
   (zsmalloc).
 
 - The 2 patch series "mm: task_stack: Stack handling cleanups" from
   Pasha Tatashin makes a couple of cleanups in the fork code.
 
 - The 37 patch series "mm: remove nth_page()" from David Hildenbrand
   makes rather a lot of adjustments at various nth_page() callsites,
   eventually permitting the removal of that undesirable helper function.
 
 - The 2 patch series "introduce kasan.write_only option in hw-tags" from
   Yeoreum Yun creates a KASAN read-only mode for ARM, using that
   architecture's memory tagging feature.  It is felt that a read-only mode
   KASAN is suitable for use in production systems rather than debug-only.
 
 - The 3 patch series "mm: hugetlb: cleanup hugetlb folio allocation"
   from Kefeng Wang does some tidying in the hugetlb folio allocation code.
 
 - The 12 patch series "mm: establish const-correctness for pointer
   parameters" from Max Kellermann makes quite a number of the MM API
   functions more accurate about the constness of their arguments.  This
   was getting in the way of subsystems (in this case CEPH) when they
   attempt to improving their own const/non-const accuracy.
 
 - The 7 patch series "Cleanup free_pages() misuse" from Vishal Moola
   fixes a number of code sites which were confused over when to use
   free_pages() vs __free_pages().
 
 - The 3 patch series "Add Rust abstraction for Maple Trees" from Alice
   Ryhl makes the mapletree code accessible to Rust.  Required by nouveau
   and by its forthcoming successor: the new Rust Nova driver.
 
 - The 2 patch series "selftests/mm: split_huge_page_test:
   split_pte_mapped_thp improvements" from David Hildenbrand adds a fix and
   some cleanups to the thp selftesting code.
 
 - The 14 patch series "mm, swap: introduce swap table as swap cache
   (phase I)" from Chris Li and Kairui Song is the first step along the
   path to implementing "swap tables" - a new approach to swap allocation
   and state tracking which is expected to yield speed and space
   improvements.  This patchset itself yields a 5-20% performance benefit
   in some situations.
 
 - The 3 patch series "Some ptdesc cleanups" from Matthew Wilcox utilizes
   the new memdesc layer to clean up the ptdesc code a little.
 
 - The 3 patch series "Fix va_high_addr_switch.sh test failure" from
   Chunyu Hu fixes some issues in our 5-level pagetable selftesting code.
 
 - The 2 patch series "Minor fixes for memory allocation profiling" from
   Suren Baghdasaryan addresses a couple of minor issues in relatively new
   memory allocation profiling feature.
 
 - The 3 patch series "Small cleanups" from Matthew Wilcox has a few
   cleanups in preparation for more memdesc work.
 
 - The 2 patch series "mm/damon: add addr_unit for DAMON_LRU_SORT and
   DAMON_RECLAIM" from Quanmin Yan makes some changes to DAMON in
   furtherance of supporting arm highmem.
 
 - The 2 patch series "selftests/mm: Add -Wunreachable-code and fix
   warnings" from Muhammad Anjum adds that compiler check to selftests code
   and fixes the fallout, by removing dead code.
 
 - The 10 patch series "Improvements to Victim Process Thawing and OOM
   Reaper Traversal Order" from zhongjinji makes a number of improvements
   in the OOM killer: mainly thawing a more appropriate group of victim
   threads so they can release resources.
 
 - The 5 patch series "mm/damon: misc fixups and improvements for 6.18"
   from SeongJae Park is a bunch of small and unrelated fixups for DAMON.
 
 - The 7 patch series "mm/damon: define and use DAMON initialization
   check function" from SeongJae Park implement reliability and
   maintainability improvements to a recently-added bug fix.
 
 - The 2 patch series "mm/damon/stat: expose auto-tuned intervals and
   non-idle ages" from SeongJae Park provides additional transparency to
   userspace clients of the DAMON_STAT information.
 
 - The 2 patch series "Expand scope of khugepaged anonymous collapse"
   from Dev Jain removes some constraints on khubepaged's collapsing of
   anon VMAs.  It also increases the success rate of MADV_COLLAPSE against
   an anon vma.
 
 - The 2 patch series "mm: do not assume file == vma->vm_file in
   compat_vma_mmap_prepare()" from Lorenzo Stoakes moves us further towards
   removal of file_operations.mmap().  This patchset concentrates upon
   clearing up the treatment of stacked filesystems.
 
 - The 6 patch series "mm: Improve mlock tracking for large folios" from
   Kiryl Shutsemau provides some fixes and improvements to mlock's tracking
   of large folios.  /proc/meminfo's "Mlocked" field became more accurate.
 
 - The 2 patch series "mm/ksm: Fix incorrect accounting of KSM counters
   during fork" from Donet Tom fixes several user-visible KSM stats
   inaccuracies across forks and adds selftest code to verify these
   counters.
 
 - The 2 patch series "mm_slot: fix the usage of mm_slot_entry" from Wei
   Yang addresses some potential but presently benign issues in KSM's
   mm_slot handling.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaN3cywAKCRDdBJ7gKXxA
 jtaPAQDmIuIu7+XnVUK5V11hsQ/5QtsUeLHV3OsAn4yW5/3dEQD/UddRU08ePN+1
 2VRB0EwkLAdfMWW7TfiNZ+yhuoiL/AA=
 =4mhY
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2025-10-01-19-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - "mm, swap: improve cluster scan strategy" from Kairui Song improves
   performance and reduces the failure rate of swap cluster allocation

 - "support large align and nid in Rust allocators" from Vitaly Wool
   permits Rust allocators to set NUMA node and large alignment when
   perforning slub and vmalloc reallocs

 - "mm/damon/vaddr: support stat-purpose DAMOS" from Yueyang Pan extend
   DAMOS_STAT's handling of the DAMON operations sets for virtual
   address spaces for ops-level DAMOS filters

 - "execute PROCMAP_QUERY ioctl under per-vma lock" from Suren
   Baghdasaryan reduces mmap_lock contention during reads of
   /proc/pid/maps

 - "mm/mincore: minor clean up for swap cache checking" from Kairui Song
   performs some cleanup in the swap code

 - "mm: vm_normal_page*() improvements" from David Hildenbrand provides
   code cleanup in the pagemap code

 - "add persistent huge zero folio support" from Pankaj Raghav provides
   a block layer speedup by optionalls making the
   huge_zero_pagepersistent, instead of releasing it when its refcount
   falls to zero

 - "kho: fixes and cleanups" from Mike Rapoport adds a few touchups to
   the recently added Kexec Handover feature

 - "mm: make mm->flags a bitmap and 64-bit on all arches" from Lorenzo
   Stoakes turns mm_struct.flags into a bitmap. To end the constant
   struggle with space shortage on 32-bit conflicting with 64-bit's
   needs

 - "mm/swapfile.c and swap.h cleanup" from Chris Li cleans up some swap
   code

 - "selftests/mm: Fix false positives and skip unsupported tests" from
   Donet Tom fixes a few things in our selftests code

 - "prctl: extend PR_SET_THP_DISABLE to only provide THPs when advised"
   from David Hildenbrand "allows individual processes to opt-out of
   THP=always into THP=madvise, without affecting other workloads on the
   system".

   It's a long story - the [1/N] changelog spells out the considerations

 - "Add and use memdesc_flags_t" from Matthew Wilcox gets us started on
   the memdesc project. Please see

      https://kernelnewbies.org/MatthewWilcox/Memdescs and
      https://blogs.oracle.com/linux/post/introducing-memdesc

 - "Tiny optimization for large read operations" from Chi Zhiling
   improves the efficiency of the pagecache read path

 - "Better split_huge_page_test result check" from Zi Yan improves our
   folio splitting selftest code

 - "test that rmap behaves as expected" from Wei Yang adds some rmap
   selftests

 - "remove write_cache_pages()" from Christoph Hellwig removes that
   function and converts its two remaining callers

 - "selftests/mm: uffd-stress fixes" from Dev Jain fixes some UFFD
   selftests issues

 - "introduce kernel file mapped folios" from Boris Burkov introduces
   the concept of "kernel file pages". Using these permits btrfs to
   account its metadata pages to the root cgroup, rather than to the
   cgroups of random inappropriate tasks

 - "mm/pageblock: improve readability of some pageblock handling" from
   Wei Yang provides some readability improvements to the page allocator
   code

 - "mm/damon: support ARM32 with LPAE" from SeongJae Park teaches DAMON
   to understand arm32 highmem

 - "tools: testing: Use existing atomic.h for vma/maple tests" from
   Brendan Jackman performs some code cleanups and deduplication under
   tools/testing/

 - "maple_tree: Fix testing for 32bit compiles" from Liam Howlett fixes
   a couple of 32-bit issues in tools/testing/radix-tree.c

 - "kasan: unify kasan_enabled() and remove arch-specific
   implementations" from Sabyrzhan Tasbolatov moves KASAN arch-specific
   initialization code into a common arch-neutral implementation

 - "mm: remove zpool" from Johannes Weiner removes zspool - an
   indirection layer which now only redirects to a single thing
   (zsmalloc)

 - "mm: task_stack: Stack handling cleanups" from Pasha Tatashin makes a
   couple of cleanups in the fork code

 - "mm: remove nth_page()" from David Hildenbrand makes rather a lot of
   adjustments at various nth_page() callsites, eventually permitting
   the removal of that undesirable helper function

 - "introduce kasan.write_only option in hw-tags" from Yeoreum Yun
   creates a KASAN read-only mode for ARM, using that architecture's
   memory tagging feature. It is felt that a read-only mode KASAN is
   suitable for use in production systems rather than debug-only

 - "mm: hugetlb: cleanup hugetlb folio allocation" from Kefeng Wang does
   some tidying in the hugetlb folio allocation code

 - "mm: establish const-correctness for pointer parameters" from Max
   Kellermann makes quite a number of the MM API functions more accurate
   about the constness of their arguments. This was getting in the way
   of subsystems (in this case CEPH) when they attempt to improving
   their own const/non-const accuracy

 - "Cleanup free_pages() misuse" from Vishal Moola fixes a number of
   code sites which were confused over when to use free_pages() vs
   __free_pages()

 - "Add Rust abstraction for Maple Trees" from Alice Ryhl makes the
   mapletree code accessible to Rust. Required by nouveau and by its
   forthcoming successor: the new Rust Nova driver

 - "selftests/mm: split_huge_page_test: split_pte_mapped_thp
   improvements" from David Hildenbrand adds a fix and some cleanups to
   the thp selftesting code

 - "mm, swap: introduce swap table as swap cache (phase I)" from Chris
   Li and Kairui Song is the first step along the path to implementing
   "swap tables" - a new approach to swap allocation and state tracking
   which is expected to yield speed and space improvements. This
   patchset itself yields a 5-20% performance benefit in some situations

 - "Some ptdesc cleanups" from Matthew Wilcox utilizes the new memdesc
   layer to clean up the ptdesc code a little

 - "Fix va_high_addr_switch.sh test failure" from Chunyu Hu fixes some
   issues in our 5-level pagetable selftesting code

 - "Minor fixes for memory allocation profiling" from Suren Baghdasaryan
   addresses a couple of minor issues in relatively new memory
   allocation profiling feature

 - "Small cleanups" from Matthew Wilcox has a few cleanups in
   preparation for more memdesc work

 - "mm/damon: add addr_unit for DAMON_LRU_SORT and DAMON_RECLAIM" from
   Quanmin Yan makes some changes to DAMON in furtherance of supporting
   arm highmem

 - "selftests/mm: Add -Wunreachable-code and fix warnings" from Muhammad
   Anjum adds that compiler check to selftests code and fixes the
   fallout, by removing dead code

 - "Improvements to Victim Process Thawing and OOM Reaper Traversal
   Order" from zhongjinji makes a number of improvements in the OOM
   killer: mainly thawing a more appropriate group of victim threads so
   they can release resources

 - "mm/damon: misc fixups and improvements for 6.18" from SeongJae Park
   is a bunch of small and unrelated fixups for DAMON

 - "mm/damon: define and use DAMON initialization check function" from
   SeongJae Park implement reliability and maintainability improvements
   to a recently-added bug fix

 - "mm/damon/stat: expose auto-tuned intervals and non-idle ages" from
   SeongJae Park provides additional transparency to userspace clients
   of the DAMON_STAT information

 - "Expand scope of khugepaged anonymous collapse" from Dev Jain removes
   some constraints on khubepaged's collapsing of anon VMAs. It also
   increases the success rate of MADV_COLLAPSE against an anon vma

 - "mm: do not assume file == vma->vm_file in compat_vma_mmap_prepare()"
   from Lorenzo Stoakes moves us further towards removal of
   file_operations.mmap(). This patchset concentrates upon clearing up
   the treatment of stacked filesystems

 - "mm: Improve mlock tracking for large folios" from Kiryl Shutsemau
   provides some fixes and improvements to mlock's tracking of large
   folios. /proc/meminfo's "Mlocked" field became more accurate

 - "mm/ksm: Fix incorrect accounting of KSM counters during fork" from
   Donet Tom fixes several user-visible KSM stats inaccuracies across
   forks and adds selftest code to verify these counters

 - "mm_slot: fix the usage of mm_slot_entry" from Wei Yang addresses
   some potential but presently benign issues in KSM's mm_slot handling

* tag 'mm-stable-2025-10-01-19-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (372 commits)
  mm: swap: check for stable address space before operating on the VMA
  mm: convert folio_page() back to a macro
  mm/khugepaged: use start_addr/addr for improved readability
  hugetlbfs: skip VMAs without shareable locks in hugetlb_vmdelete_list
  alloc_tag: fix boot failure due to NULL pointer dereference
  mm: silence data-race in update_hiwater_rss
  mm/memory-failure: don't select MEMORY_ISOLATION
  mm/khugepaged: remove definition of struct khugepaged_mm_slot
  mm/ksm: get mm_slot by mm_slot_entry() when slot is !NULL
  hugetlb: increase number of reserving hugepages via cmdline
  selftests/mm: add fork inheritance test for ksm_merging_pages counter
  mm/ksm: fix incorrect KSM counter handling in mm_struct during fork
  drivers/base/node: fix double free in register_one_node()
  mm: remove PMD alignment constraint in execmem_vmalloc()
  mm/memory_hotplug: fix typo 'esecially' -> 'especially'
  mm/rmap: improve mlock tracking for large folios
  mm/filemap: map entire large folio faultaround
  mm/fault: try to map the entire file folio in finish_fault()
  mm/rmap: mlock large folios in try_to_unmap_one()
  mm/rmap: fix a mlock race condition in folio_referenced_one()
  ...
2025-10-02 18:18:33 -07:00
Gopi Krishna Menon 03faea8466 selftests/net: add tcp_port_share to .gitignore
Add the tcp_port_share test binary to .gitignore to avoid
accidentally staging the build artifact.

Signed-off-by: Gopi Krishna Menon <krishnagopi487@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250929163140.122383-1-krishnagopi487@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-30 08:30:26 -07:00
Jakub Kicinski b3820e0e6c selftests: drv-net: psp: add tests for destroying devices
Add tests for making sure device can disappear while associations
exist. This is netdevsim-only since destroying real devices is
more tricky.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20250927225420.1443468-9-kuba@kernel.org
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-30 15:17:22 +02:00
Jakub Kicinski 81b8908531 selftests: drv-net: psp: add association tests
Add tests for exercising PSP associations for TCP sockets.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20250927225420.1443468-6-kuba@kernel.org
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-30 15:17:22 +02:00
Jakub Kicinski 8a5f956a9f selftests: drv-net: base device access API test
Simple PSP test to getting info about PSP devices.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20250927225420.1443468-3-kuba@kernel.org
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-30 15:17:21 +02:00
Kuniyuki Iwashima 9b62d53cc8 selftest: packetdrill: Import client-ack-dropped-then-recovery-ms-timestamps.pkt
This also does not have the non-experimental version, so converted to FO.

The comment in .pkt explains the detailed scenario.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250927213022.1850048-14-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29 18:41:39 -07:00
Kuniyuki Iwashima 05b9f505fb selftest: packetdrill: Import sockopt-fastopen-key.pkt
sockopt-fastopen-key.pkt does not have the non-experimental
version, so the Experimental version is converted, FOEXP -> FO.

The test sets net.ipv4.tcp_fastopen_key=0-0-0-0 and instead
sets another key via setsockopt(TCP_FASTOPEN_KEY).

The first listener generates a valid cookie in response to TFO
option without cookie, and the second listner creates a TFO socket
using the valid cookie.

TCP_FASTOPEN_KEY is adjusted to use the common key in default.sh
so that we can use TFO_COOKIE and support dualstack.  Similarly,
TFO_COOKIE_ZERO for the 0-0-0-0 key is defined.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250927213022.1850048-13-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29 18:41:39 -07:00
Kuniyuki Iwashima be90c7b3d5 selftest: packetdrill: Refine tcp_fastopen_server_reset-after-disconnect.pkt.
These changes are applied to follow the imported packetdrill tests.

  * Call setsockopt(TCP_FASTOPEN)
  * Remove unnecessary accept() delay
  * Add assertion for TCP states
  * Rename to tcp_fastopen_server_trigger-rst-reconnect.pkt.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250927213022.1850048-12-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29 18:41:39 -07:00
Kuniyuki Iwashima 21f7fb31ae selftest: packetdrill: Import opt34/*-trigger-rst.pkt.
This imports the non-experimental version of opt34/*-trigger-rst.pkt.

                                     | accept() | SYN data |
  -----------------------------------+----------+----------+
  listener-closed-trigger-rst.pkt    |    no    |  unread  |
  unread-data-closed-trigger-rst.pkt |   yes    |  unread  |

Both files test that close()ing a SYN_RECV socket with unread SYN data
triggers RST.

The files are renamed to have the common prefix, trigger-rst.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250927213022.1850048-11-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29 18:41:39 -07:00
Kuniyuki Iwashima 5920f154e1 selftest: packetdrill: Import opt34/reset-* tests.
This imports the non-experimental version of opt34/reset-*.pkt.

                                   |  Child  |              RST              | sk_err  |
  ---------------------------------+---------+-------------------------------+---------+
  reset-after-accept.pkt           |   TFO   |   after accept(), SYN_RECV    |  read() |
  reset-close-with-unread-data.pkt |   TFO   |   after accept(), SYN_RECV    | write() |
  reset-before-accept.pkt          |   TFO   |  before accept(), SYN_RECV    |  read() |
  reset-non-tfo-socket.pkt         | non-TFO |  before accept(), ESTABLISHED | write() |

The first 3 files test scenarios where a SYN_RECV socket receives RST
before/after accept() and data in SYN must be read() without error,
but the following read() or fist write() will return ECONNRESET.

The last test is similar but with non-TFO socket.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250927213022.1850048-10-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29 18:41:39 -07:00
Kuniyuki Iwashima a8b1750e68 selftest: packetdrill: Import opt34/icmp-before-accept.pkt.
This imports the non-experimental version of icmp-before-accept.pkt.

This file tests the scenario where an ICMP unreachable packet for a
not-yet-accept()ed socket changes its state to TCP_CLOSE, but the
SYN data must be read without error, and the following read() returns
EHOSTUNREACH.

Note that this test support only IPv4 as icmp is used.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250927213022.1850048-9-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29 18:41:39 -07:00
Kuniyuki Iwashima 5ed080f85a selftest: packetdrill: Import opt34/fin-close-socket.pkt.
This imports the non-experimental version of fin-close-socket.pkt.

This file tests the scenario where a TFO child socket's state
transitions from SYN_RECV to CLOSE_WAIT before accept()ed.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250927213022.1850048-8-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29 18:41:39 -07:00
Kuniyuki Iwashima e57b3933ab selftest: packetdrill: Add test for experimental option.
The only difference between non-experimental vs experimental TFO
option handling is SYN+ACK generation.

When tcp_parse_fastopen_option() parses a TFO option, it sets
tcp_fastopen_cookie.exp to false if the option number is 34,
and true if 255.

The value is carried to tcp_options_write() to generate a TFO option
with the same option number.

Other than that, all the TFO handling is the same and the kernel must
generate the same cookie regardless of the option number.

Let's add a test for the handling so that we can consolidate
fastopen/server/ tests and fastopen/server/opt34 tests.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250927213022.1850048-7-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29 18:41:39 -07:00
Kuniyuki Iwashima 399e0a7ed9 selftest: packetdrill: Add test for TFO_SERVER_WO_SOCKOPT1.
TFO_SERVER_WO_SOCKOPT1 is no longer enabled by default, and
each server test requires setsockopt(TCP_FASTOPEN).

Let's add a basic test for TFO_SERVER_WO_SOCKOPT1.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250927213022.1850048-6-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29 18:41:38 -07:00
Kuniyuki Iwashima 0b8f164eb2 selftest: packetdrill: Import TFO server basic tests.
This imports basic TFO server tests from google/packetdrill.

The repository has two versions of tests for most scenarios; one uses
the non-experimental option (34), and the other uses the experimental
option (255) with 0xF989.

This only imports the following tests of the non-experimental version
placed in [0].  I will add a specific test for the experimental option
handling later.

                             | TFO | Cookie | Payload |
  ---------------------------+-----+--------+---------+
  basic-rw.pkt               | yes |  yes   |   yes   |
  basic-zero-payload.pkt     | yes |  yes   |    no   |
  basic-cookie-not-reqd.pkt  | yes |   no   |   yes   |
  basic-non-tfo-listener.pkt |  no |  yes   |   yes   |
  pure-syn-data.pkt          | yes |   no   |   yes   |

The original pure-syn-data.pkt missed setsockopt(TCP_FASTOPEN) and did
not test TFO server in some scenarios unintentionally, so setsockopt()
is added where needed.  In addition, non-TFO scenario is stripped as
it is covered by basic-non-tfo-listener.pkt.  Also, I added basic- prefix.

Link: https://github.com/google/packetdrill/tree/bfc96251310f/gtests/net/tcp/fastopen/server/opt34 #[0]
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250927213022.1850048-5-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29 18:41:38 -07:00
Kuniyuki Iwashima 97b3b8306f selftest: packetdrill: Define common TCP Fast Open cookie.
TCP Fast Open cookie is generated in __tcp_fastopen_cookie_gen_cipher().

The cookie value is generated from src/dst IPs and a key configured by
setsockopt(TCP_FASTOPEN_KEY) or net.ipv4.tcp_fastopen_key.

The default.sh sets net.ipv4.tcp_fastopen_key, and the original packetdrill
defines the corresponding cookie as TFO_COOKIE in run_all.py. [0]

Then, each test does not need to care about the value, and we can easily
update TFO_COOKIE in case __tcp_fastopen_cookie_gen_cipher() changes the
algorithm.

However, some tests use the bare hex value for specific IPv4 addresses
and do not support IPv6.

Let's define the same TFO_COOKIE in ksft_runner.sh.

We will replace such bare hex values with TFO_COOKIE except for a single
test for setsockopt(TCP_FASTOPEN_KEY).

Link: https://github.com/google/packetdrill/blob/7230b3990f94/gtests/net/packetdrill/run_all.py#L65 #[0]
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250927213022.1850048-4-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29 18:41:19 -07:00
Kuniyuki Iwashima 261cb8b123 selftest: packetdrill: Require explicit setsockopt(TCP_FASTOPEN).
To enable TCP Fast Open on a server, net.ipv4.tcp_fastopen must
have 0x2 (TFO_SERVER_ENABLE), and we need to do either

  1. Call setsockopt(TCP_FASTOPEN) for the socket
  2. Set 0x400 (TFO_SERVER_WO_SOCKOPT1) additionally to net.ipv4.tcp_fastopen

The default.sh sets 0x70403 so that each test does not need setsockopt().
(0x1 is TFO_CLIENT_ENABLE, and 0x70000 is ...???)

However, some tests overwrite net.ipv4.tcp_fastopen without
TFO_SERVER_WO_SOCKOPT1 and forgot setsockopt(TCP_FASTOPEN).

For example, pure-syn-data.pkt [0] tests non-TFO servers unintentionally,
except in the first scenario.

To prevent such an accident, let's require explicit setsockopt().

TFO_CLIENT_ENABLE is necessary for
tcp_syscall_bad_arg_fastopen-invalid-buf-ptr.pkt.

Link: https://github.com/google/packetdrill/blob/bfc96251310f/gtests/net/tcp/fastopen/server/opt34/pure-syn-data.pkt #[0]
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250927213022.1850048-3-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29 18:41:07 -07:00
Kuniyuki Iwashima 70dd4775db selftest: packetdrill: Set ktap_set_plan properly for single protocol test.
The cited commit forgot to update the ktap_set_plan call.

ktap_set_plan sets the number of tests (KSFT_NUM_TESTS), which must
match the number of executed tests (KTAP_CNT_PASS + KTAP_CNT_SKIP +
KTAP_CNT_XFAIL) in ktap_finished.

Otherwise, the selftest exit()s with 1.

Let's adjust KSFT_NUM_TESTS based on supported protocols.

While at it, misalignment is fixed up.

Fixes: a5c10aa3d1 ("selftests/net: packetdrill: Support single protocol test.")
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250927213022.1850048-2-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29 18:41:06 -07:00
Matthieu Baerts (NGI0) c912f935a5 selftests: mptcp: join: validate new laminar endp
Here are a few sub-tests for mptcp_join.sh, validating the new 'laminar'
endpoint type.

In a setup where subflows created using the routing rules would be
rejected by the listener, and where the latter announces one IP address,
some cases are verified:

- Without any 'laminar' endpoints: no new subflows are created.

- With one 'laminar' endpoint: a second subflow is created.

- With multiple 'laminar' endpoints: 2 IPv4 subflows are created.

- With one 'laminar' endpoint, but the server announcing a second IP
  address, only one subflow is created.

- With one 'laminar' + 'subflow' endpoint, the same endpoint is only
  used once.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250927-net-next-mptcp-rcv-path-imp-v1-8-5da266aa9c1a@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29 18:23:36 -07:00
Petr Machata fca6ff9191 selftests: forwarding: README: Mention defer, adf_
Mention how it would be nice if new code used defer. Also if it does that
in dirtying helpers, how it would be nice if these were named adf_*.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/0764bdb9266cd516da23ddeec110e01118cf981e.1758821127.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-26 17:48:41 -07:00
Petr Machata 040a6cbead selftests: forwarding: lib: Add an autodefer variant of forwarding_enable()
Most forwarding tests invoke forwarding_enable() to enable the router and
forwarding_restore() to restore the original configuration. Add a helper,
adf_forwarding_enable(), which is like forwarding_enable(), but takes care
of scheduling the cleanup automatically.

Convert the tests that currently use defer to schedule the cleanup.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/78b752c40069cde21c44dcf4c7b966a76a0eef2c.1758821127.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-26 17:48:40 -07:00
Petr Machata f53748d56d selftests: forwarding: lib: Add an autodefer variant of simple_if_init()
Most forwarding tests invoke simple_if_init() to set up a VRF-based "host"
and simple_if_fini() to tear it down again. Add a helper,
adf_simple_if_init(), which is like simple_if_fini(), but takes care of
scheduling the cleanup automatically.

Convert the tests that currently use defer to schedule the cleanup.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/6b9ee1a7946a36fd32a47fdb1aa9325198ffc695.1758821127.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-26 17:48:40 -07:00
Petr Machata 02aabe00b2 selftests: forwarding: lib: Add an autodefer variant of vrf_prepare()
Most forwarding tests invoke vrf_prepare() to set up VRF forwarding and
vrf_cleanup() to restore the original configuration. Add a helper,
adf_vrf_prepare(), which is like vrf_prepare(), but takes care of
scheduling the cleanup automatically.

Convert a number of tests that currently use defer to schedule the cleanup.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/2f2000e54ae700d560a8d6128322dade3bd2207e.1758821127.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-26 17:48:40 -07:00
Petr Machata 14b72996ae selftests: net: vlan_bridge_binding: Rename dfr_set_binding_*() to adf_*
This test contains two autodefer-like helpers, but namespaces them as dfr_*
instead of adf_* like this patchset. Rename them.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/5f0c81b39e9e1f56f706cc4b53f82238a1d1e2f9.1758821127.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-26 17:48:39 -07:00
Petr Machata b628dfcd54 selftests: net: lib: Rename bridge_vlan_add() to adf_*
Rename this function to mark it as autodefer.
For details, see the discussion in the cover letter.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/93526ce79e635a3ec34753c796edf0c96711547d.1758821127.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-26 17:48:39 -07:00
Petr Machata d85bcf6505 selftests: net: lib: Rename ip_route_add() to adf_*
Rename this function to mark it as autodefer.
For details, see the discussion in the cover letter.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/403143183373419e4a31df4665d6bfaa273eb761.1758821127.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-26 17:48:39 -07:00
Petr Machata 773603d6db selftests: net: lib: Rename ip_addr_add() to adf_*
Rename this function to mark it as autodefer.
For details, see the discussion in the cover letter.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/706327a5db660c7f18ba9fbfba7ce913da065e3e.1758821127.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-26 17:48:38 -07:00
Petr Machata a55f9fb343 selftests: net: lib: Rename ip_link_set_down() to adf_*
Rename this function to mark it as autodefer.
For details, see the discussion in the cover letter.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/e5bf4cb3405fb50fe6e217a04268952e97410dc2.1758821127.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-26 17:48:38 -07:00
Petr Machata 34d3f8b75e selftests: net: lib: Rename ip_link_set_up() to adf_*
Rename this function to mark it as autodefer.
For details, see the discussion in the cover letter.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/475716ef792f5bd42e5c8ef1c3e287b1294f1630.1758821127.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-26 17:48:37 -07:00
Petr Machata beb98a3477 selftests: net: lib: Rename ip_link_set_addr() to adf_*
Rename this function to mark it as autodefer.
For details, see the discussion in the cover letter.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/5318e90f7f491f9f397ac221a8b47fdbedd0d3b2.1758821127.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-26 17:48:37 -07:00
Petr Machata c3cbd21fe1 selftests: net: lib: Rename ip_link_set_master() to adf_*
Rename this function to mark it as autodefer.
For details, see the discussion in the cover letter.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/53ce64231faa1396a968b2869af5f1c0aebec2c9.1758821127.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-26 17:48:37 -07:00
Petr Machata 191c4912f9 selftests: net: lib: Rename ip_link_add() to adf_*
Rename this function to mark it as autodefer.
For details, see the discussion in the cover letter.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/0b163cca1bf2ec44270e0fc89108f488d99d9c9d.1758821127.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-26 17:48:36 -07:00
Matthieu Baerts (NGI0) 008385efd0 selftests: mptcp: join: validate C-flag + def limit
The previous commit adds an exception for the C-flag case. The
'mptcp_join.sh' selftest is extended to validate this case.

In this subtest, there is a typical CDN deployment with a client where
MPTCP endpoints have been 'automatically' configured:

- the server set net.mptcp.allow_join_initial_addr_port=0

- the client has multiple 'subflow' endpoints, and the default limits:
  not accepting ADD_ADDRs.

Without the parent patch, the client is not able to establish new
subflows using its 'subflow' endpoints. The parent commit fixes that.

The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.

Fixes: df377be387 ("mptcp: add deny_join_id0 in mptcp_options_received")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250925-net-next-mptcp-c-flag-laminar-v1-2-ad126cc47c6b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-26 17:44:03 -07:00
Alessandro Zanni 81dcfdd21d selftest: net: Fix error message if empty variable
Fix to avoid cases where the `res` shell variable is
empty in script comparisons.
The comparison has been modified into string comparison to
handle other possible values the variable could assume.

The issue can be reproduced with the command:
make kselftest TARGETS=net

It solves the error:
./tfo_passive.sh: line 98: [: -eq: unary operator expected

Signed-off-by: Alessandro Zanni <alessandro.zanni87@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250925132832.9828-1-alessandro.zanni87@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-26 15:23:33 -07:00
Amery Hung 11ae737efe selftests: drv-net: Reload pkt pointer after calling filter_udphdr
Fix a verification failure. filter_udphdr() calls bpf_xdp_pull_data(),
which will invalidate all pkt pointers. Therefore, all ctx->data loaded
before filter_udphdr() cannot be used. Reload it to prevent verification
errors.

The error may not appear on some compiler versions if they decide to
load ctx->data after filter_udphdr() when it is first used.

Fixes: efec2e55bd ("selftests: drv-net: Pull data before parsing headers")
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250925161452.1290694-1-ameryhung@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-26 13:54:46 -07:00
Jakub Kicinski 203e3beb73 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.17-rc8).

Conflicts:

drivers/net/can/spi/hi311x.c
  6b69680847 ("can: hi311x: fix null pointer dereference when resuming from sleep before interface was enabled")
  27ce71e1ce ("net: WQ_PERCPU added to alloc_workqueue users")
https://lore.kernel.org/72ce7599-1b5b-464a-a5de-228ff9724701@kernel.org

net/smc/smc_loopback.c
drivers/dibs/dibs_loopback.c
  a35c04de25 ("net/smc: fix warning in smc_rx_splice() when calling get_page()")
  cc21191b58 ("dibs: Move data path to dibs layer")
https://lore.kernel.org/74368a5c-48ac-4f8e-a198-40ec1ed3cf5f@kernel.org

Adjacent changes:

drivers/net/dsa/lantiq/lantiq_gswip.c
  c0054b25e2 ("net: dsa: lantiq_gswip: move gswip_add_single_port_br() call to port_setup()")
  7a1eaef0a7 ("net: dsa: lantiq_gswip: support model-specific mac_select_pcs()")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-25 11:00:59 -07:00
Richard Gobert 5e9ff9378a selftests/net: test ipip packets in gro.sh
Add IPIP test-cases to the GRO selftest.

This selftest already contains IP ID test-cases. They are now
also tested for encapsulated packets.

This commit also fixes ipip packet generation in the test.

Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250923085908.4687-6-richardbgobert@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-25 12:42:49 +02:00
Richard Gobert f095a358fa net: gro: remove unnecessary df checks
Currently, packets with fixed IDs will be merged only if their
don't-fragment bit is set. This restriction is unnecessary since
packets without the don't-fragment bit will be forwarded as-is even
if they were merged together. The merged packets will be segmented
into their original forms before being forwarded, either by GSO or
by TSO. The IDs will also remain identical unless NETIF_F_TSO_MANGLEID
is set, in which case the IDs can become incrementing, which is also fine.

Clean up the code by removing the unnecessary don't-fragment checks.

Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250923085908.4687-5-richardbgobert@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-25 12:42:49 +02:00
Jakub Kicinski c7ab8024ca netfilter pull request nf-next-25-09-24
-----BEGIN PGP SIGNATURE-----
 
 iQJBBAABCAArFiEEgKkgxbID4Gn1hq6fcJGo2a1f9gAFAmjT71ENHGZ3QHN0cmxl
 bi5kZQAKCRBwkajZrV/2AN7OEAClGm5eABrjTlHq8REn7S4xUBo9/jOESTvg22Bg
 A4qu14OV7rGMT/mLyV0kuNzHs18NT9bhpTwANL0bjz7YOK+v8QRlsHzNaJ7dzJ00
 WUPPYFnAhCA3qTCWLLvCnzpZgAopnRpFAgpXP6ddZ2kpsT+fo5B67kcasOQoVJzz
 9Kk98r0kgo6YKghmaIhBgchTt+rxZbo4R0Jb9BoBPXeQ32AghacOlngXPiAJA2d/
 Yeqet5p0gg8oI62KhhBC+uXyR29DG4c3iD2pUUhPnxV1Gt3S6AG1PTkt0KYzqVKm
 XEIxjrmtn3+2NYcaaH340mQwBbBV1EdZroHeBPUN9gYr5MpkHjppBJyrQ76cXXcr
 xKw2C357H4O8tT2nxz4LvJagRKhnZa0P6vWL6HeFIlFHXOqEJXYG6X2jbDzomUXd
 NPQrwKJnf2iYiRcc4U5xyABZE98/kVmqcclJDs0Zxz0eX8N8e26d/wjFKAZxfJyY
 pCUjF7lo7Doq6sOiB25OZjbPN3Fz/BcBTYTCTK+8MuQM1EaKXm33RcWjtO2gDUoe
 vyGfQResdwFELj9niPf27Nymezm1nX27x6f/jqCmioLxqQDGAGPsgJIMHy9XO8wz
 3YU/92dbwBdOQ6laKT7f5ClfPGDQlpqQTXXyxVHLFUd3T6JHJGUv0AplwVNZk6l8
 KCXEWg==
 =uK7h
 -----END PGP SIGNATURE-----

Merge tag 'nf-next-25-09-24' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next

Florian Westphal says:

====================
netfilter: fixes for net-next

These fixes target next because the bug is either not severe or has
existed for so long that there is no reason to cram them in at the last
minute.

1) Fix IPVS ftp unregistering during netns cleanup, broken since netns
   support was introduced in 2011 in the 2.6.39 kernel.
   From Slavin Liu.

2) nfnetlink must reset the 'nlh' pointer back to the original
   address when a batch is replayed, else we emit bogus ACK messages
   and conceal real errno from userspace.
   From Fernando Fernandez Mancera.  This was broken since 6.10.

3) Recent fix for nftables 'pipapo' set type was incomplete, it only
   made things work for the AVX2 version of the algorithm.

4) Testing revealed another problem with avx2 version that results in
   out-of-bounds read access, this bug always existed since feature was
   added in 5.7 kernel.  This also comes with a selftest update.

Last fix resolves a long-standing bug (since 4.9) in conntrack /proc
interface:
Decrease skip count when we reap an expired entry during dump.
As-is we erronously elide one conntrack entry from dump for every expired
entry seen.  From Eric Dumazet.

* tag 'nf-next-25-09-24' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: nf_conntrack: do not skip entries in /proc/net/nf_conntrack
  selftests: netfilter: nft_concat_range.sh: add check for double-create bug
  netfilter: nft_set_pipapo_avx2: fix skip of expired entries
  netfilter: nft_set_pipapo: use 0 genmask for packetpath lookups
  netfilter: nfnetlink: reset nlh pointer during batch replay
  ipvs: Defer ip_vs_ftp unregister during netns cleanup
====================

Link: https://patch.msgid.link/20250924140654.10210-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-24 17:45:15 -07:00
Jakub Kicinski 5e3fee34f6 bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQ6NaUOruQGUkvPdG4raS+Z+3y5EwUCaNNwBQAKCRAraS+Z+3y5
 E8heAQDdJTR9rwAL7gD79cldlHP5PTmjyidLIoFG/efaGSbN1AD9EdvrykDU4xOG
 aGaO8TooGUZf7vAL8tIFuMeydYvi/gM=
 =Qu4T
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Martin KaFai Lau says:

====================
pull-request: bpf-next 2025-09-23

We've added 9 non-merge commits during the last 33 day(s) which contain
a total of 10 files changed, 480 insertions(+), 53 deletions(-).

The main changes are:

1) A new bpf_xdp_pull_data kfunc that supports pulling data from
   a frag into the linear area of a xdp_buff, from Amery Hung.

   This includes changes in the xdp_native.bpf.c selftest, which
   Nimrod's future work depends on.

   It is a merge from a stable branch 'xdp_pull_data' which has
   also been merged to bpf-next.

   There is a conflict with recent changes in 'include/net/xdp.h'
   in the net-next tree that will need to be resolved.

2) A compiler warning fix when CONFIG_NET=n in the recent dynptr
   skb_meta support, from Jakub Sitnicki.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
  selftests: drv-net: Pull data before parsing headers
  selftests/bpf: Test bpf_xdp_pull_data
  bpf: Support specifying linear xdp packet data size for BPF_PROG_TEST_RUN
  bpf: Make variables in bpf_prog_test_run_xdp less confusing
  bpf: Clear packet pointers after changing packet data in kfuncs
  bpf: Support pulling non-linear xdp data
  bpf: Allow bpf_xdp_shrink_data to shrink a frag from head and tail
  bpf: Clear pfmemalloc flag when freeing all fragments
  bpf: Return an error pointer for skb metadata when CONFIG_NET=n
====================

Link: https://patch.msgid.link/20250924050303.2466356-1-martin.lau@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-24 10:22:37 -07:00
Florian Westphal 94bd247bc2 selftests: netfilter: nft_concat_range.sh: add check for double-create bug
Add a test case for bug resolved with:
'netfilter: nft_set_pipapo_avx2: fix skip of expired entries'.

It passes on nf.git (it uses the generic/C version for insertion
duplicate check) but fails on unpatched nf-next if AVX2 is supported:

  cannot create same element twice      0s                        [FAIL]
Could create element twice in same transaction
table inet filter { # handle 8
[..]
  elements = { 1.2.3.4 . 1.2.4.1 counter packets 0 bytes 0,
               1.2.4.1 . 1.2.3.4 counter packets 0 bytes 0,
               1.2.3.4 . 1.2.4.1 counter packets 0 bytes 0,
               1.2.4.1 . 1.2.3.4 counter packets 0 bytes 0 }

Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
2025-09-24 11:50:28 +02:00
Petr Machata f67e9ae72d selftests: bridge_fdb_local_vlan_0: Test FDB vs. NET_ADDR_SET behavior
The previous patch fixed an issue whereby no FDB entry would be created for
the bridge itself on VLAN 0 under some circumstances. This could break
forwarding. Add a test for the fix.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/137cc25396f5a4f407267af895a14bc45552ba5f.1758550408.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-23 17:10:49 -07:00
Ido Schimmel 00af023d90 selftests: fib_nexthops: Add test cases for FDB status change
Add the following test cases for both IPv4 and IPv6:

* Can change from FDB nexthop to non-FDB nexthop and vice versa.
* Can change FDB nexthop address while in a group.
* Cannot change from FDB nexthop to non-FDB nexthop and vice versa while
  in a group.

Output without "nexthop: Forbid FDB status change while nexthop is in a
group":

 # ./fib_nexthops.sh -t "ipv6_fdb_grp_fcnal ipv4_fdb_grp_fcnal"

 IPv6 fdb groups functional
 --------------------------
 [...]
 TEST: Replace FDB nexthop to non-FDB nexthop                        [ OK ]
 TEST: Replace non-FDB nexthop to FDB nexthop                        [ OK ]
 TEST: Replace FDB nexthop address while in a group                  [ OK ]
 TEST: Replace FDB nexthop to non-FDB nexthop while in a group       [FAIL]
 TEST: Replace non-FDB nexthop to FDB nexthop while in a group       [FAIL]
 [...]

 IPv4 fdb groups functional
 --------------------------
 [...]
 TEST: Replace FDB nexthop to non-FDB nexthop                        [ OK ]
 TEST: Replace non-FDB nexthop to FDB nexthop                        [ OK ]
 TEST: Replace FDB nexthop address while in a group                  [ OK ]
 TEST: Replace FDB nexthop to non-FDB nexthop while in a group       [FAIL]
 TEST: Replace non-FDB nexthop to FDB nexthop while in a group       [FAIL]
 [...]

 Tests passed:  36
 Tests failed:   4
 Tests skipped:  0

Output with "nexthop: Forbid FDB status change while nexthop is in a
group":

 # ./fib_nexthops.sh -t "ipv6_fdb_grp_fcnal ipv4_fdb_grp_fcnal"

 IPv6 fdb groups functional
 --------------------------
 [...]
 TEST: Replace FDB nexthop to non-FDB nexthop                        [ OK ]
 TEST: Replace non-FDB nexthop to FDB nexthop                        [ OK ]
 TEST: Replace FDB nexthop address while in a group                  [ OK ]
 TEST: Replace FDB nexthop to non-FDB nexthop while in a group       [ OK ]
 TEST: Replace non-FDB nexthop to FDB nexthop while in a group       [ OK ]
 [...]

 IPv4 fdb groups functional
 --------------------------
 [...]
 TEST: Replace FDB nexthop to non-FDB nexthop                        [ OK ]
 TEST: Replace non-FDB nexthop to FDB nexthop                        [ OK ]
 TEST: Replace FDB nexthop address while in a group                  [ OK ]
 TEST: Replace FDB nexthop to non-FDB nexthop while in a group       [ OK ]
 TEST: Replace non-FDB nexthop to FDB nexthop while in a group       [ OK ]
 [...]

 Tests passed:  40
 Tests failed:   0
 Tests skipped:  0

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250921150824.149157-4-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-23 17:01:05 -07:00
Ido Schimmel c29913109c selftests: fib_nexthops: Fix creation of non-FDB nexthops
The test creates non-FDB nexthops without a nexthop device which leads
to the expected failure, but for the wrong reason:

 # ./fib_nexthops.sh -t "ipv6_fdb_grp_fcnal ipv4_fdb_grp_fcnal" -v

 IPv6 fdb groups functional
 --------------------------
 [...]
 COMMAND: ip -netns me-nRsN3E nexthop add id 63 via 2001:db8:91::4
 Error: Device attribute required for non-blackhole and non-fdb nexthops.
 COMMAND: ip -netns me-nRsN3E nexthop add id 64 via 2001:db8:91::5
 Error: Device attribute required for non-blackhole and non-fdb nexthops.
 COMMAND: ip -netns me-nRsN3E nexthop add id 103 group 63/64 fdb
 Error: Invalid nexthop id.
 TEST: Fdb Nexthop group with non-fdb nexthops                       [ OK ]
 [...]

 IPv4 fdb groups functional
 --------------------------
 [...]
 COMMAND: ip -netns me-nRsN3E nexthop add id 14 via 172.16.1.2
 Error: Device attribute required for non-blackhole and non-fdb nexthops.
 COMMAND: ip -netns me-nRsN3E nexthop add id 15 via 172.16.1.3
 Error: Device attribute required for non-blackhole and non-fdb nexthops.
 COMMAND: ip -netns me-nRsN3E nexthop add id 103 group 14/15 fdb
 Error: Invalid nexthop id.
 TEST: Fdb Nexthop group with non-fdb nexthops                       [ OK ]

 COMMAND: ip -netns me-nRsN3E nexthop add id 16 via 172.16.1.2 fdb
 COMMAND: ip -netns me-nRsN3E nexthop add id 17 via 172.16.1.3 fdb
 COMMAND: ip -netns me-nRsN3E nexthop add id 104 group 14/15
 Error: Invalid nexthop id.
 TEST: Non-Fdb Nexthop group with fdb nexthops                       [ OK ]
 [...]
 COMMAND: ip -netns me-0dlhyd ro add 172.16.0.0/22 nhid 15
 Error: Nexthop id does not exist.
 TEST: Route add with fdb nexthop                                    [ OK ]

In addition, as can be seen in the above output, a couple of IPv4 test
cases used the non-FDB nexthops (14 and 15) when they intended to use
the FDB nexthops (16 and 17). These test cases only passed because
failure was expected, but they failed for the wrong reason.

Fix the test to create the non-FDB nexthops with a nexthop device and
adjust the IPv4 test cases to use the FDB nexthops instead of the
non-FDB nexthops.

Output after the fix:

 # ./fib_nexthops.sh -t "ipv6_fdb_grp_fcnal ipv4_fdb_grp_fcnal" -v

 IPv6 fdb groups functional
 --------------------------
 [...]
 COMMAND: ip -netns me-lNzfHP nexthop add id 63 via 2001:db8:91::4 dev veth1
 COMMAND: ip -netns me-lNzfHP nexthop add id 64 via 2001:db8:91::5 dev veth1
 COMMAND: ip -netns me-lNzfHP nexthop add id 103 group 63/64 fdb
 Error: FDB nexthop group can only have fdb nexthops.
 TEST: Fdb Nexthop group with non-fdb nexthops                       [ OK ]
 [...]

 IPv4 fdb groups functional
 --------------------------
 [...]
 COMMAND: ip -netns me-lNzfHP nexthop add id 14 via 172.16.1.2 dev veth1
 COMMAND: ip -netns me-lNzfHP nexthop add id 15 via 172.16.1.3 dev veth1
 COMMAND: ip -netns me-lNzfHP nexthop add id 103 group 14/15 fdb
 Error: FDB nexthop group can only have fdb nexthops.
 TEST: Fdb Nexthop group with non-fdb nexthops                       [ OK ]

 COMMAND: ip -netns me-lNzfHP nexthop add id 16 via 172.16.1.2 fdb
 COMMAND: ip -netns me-lNzfHP nexthop add id 17 via 172.16.1.3 fdb
 COMMAND: ip -netns me-lNzfHP nexthop add id 104 group 16/17
 Error: Non FDB nexthop group cannot have fdb nexthops.
 TEST: Non-Fdb Nexthop group with fdb nexthops                       [ OK ]
 [...]
 COMMAND: ip -netns me-lNzfHP ro add 172.16.0.0/22 nhid 16
 Error: Route cannot point to a fdb nexthop.
 TEST: Route add with fdb nexthop                                    [ OK ]
 [...]
 Tests passed:  30
 Tests failed:   0
 Tests skipped:  0

Fixes: 0534c5489c ("selftests: net: add fdb nexthop tests")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250921150824.149157-3-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-23 17:01:05 -07:00
Alok Tiwari f770645860 selftests: rtnetlink: correct error message in rtnetlink.sh fou test
The rtnetlink FOU selftest prints an incorrect string:
"FAIL: fou"s. Change it to the intended "FAIL: fou" by
removing a stray character in the end_test string of the test.

Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250921192111.1567498-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-23 16:55:05 -07:00
Martin KaFai Lau 55d5a5154d Merge branch 'bpf-next/xdp_pull_data' into 'bpf-next/net'
Merge the xdp_pull_data stable branch into the net branch. No conflict.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2025-09-23 15:46:52 -07:00
Amery Hung efec2e55bd selftests: drv-net: Pull data before parsing headers
It is possible for drivers to generate xdp packets with data residing
entirely in fragments. To keep parsing headers using direct packet
access, call bpf_xdp_pull_data() to pull headers into the linear data
area.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250922233356.3356453-9-ameryhung@gmail.com
2025-09-23 15:21:26 -07:00
Jakub Sitnicki 8a8241cdaa selftests/net: Test tcp port reuse after unbinding a socket
Exercise the scenario described in detail in the cover letter:

  1) socket A: connect() from ephemeral port X
  2) socket B: explicitly bind() to port X
  3) check that port X is now excluded from ephemeral ports
  4) close socket B to release the port bind
  5) socket C: connect() from ephemeral port X

As well as a corner case to test that the connect-bind flag is cleared:

  1) connect() from ephemeral port X
  2) disconnect the socket with connect(AF_UNSPEC)
  3) bind() it explicitly to port X
  4) check that port X is now excluded from ephemeral ports

Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://patch.msgid.link/20250917-update-bind-bucket-state-on-unhash-v5-2-57168b661b47@cloudflare.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-23 10:12:15 +02:00
Matthieu Baerts (NGI0) e6c3552945 selftests: mptcp: pm: get server-side flag
server-side info linked to the MPTCP connect/established events can now
come from the flags, in addition to the dedicated attribute.

The attribute is now deprecated -- in favour of the new flag, and will
be removed later on.

Print this info only once.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250919-net-next-mptcp-server-side-flag-v1-4-a97a5d561a8b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-22 11:51:25 -07:00
Matthieu Baerts (NGI0) c9809f03c1 mptcp: pm: netlink: only add server-side attr when true
This attribute is a boolean. No need to add it to set it to 'false'.

Indeed, the default value when this attribute is not set is naturally
'false'. A few bytes can then be saved by not adding this attribute if
the connection is not on the server side.

This prepares the future deprecation of its attribute, in favour of a
new flag.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250919-net-next-mptcp-server-side-flag-v1-1-a97a5d561a8b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-22 11:51:24 -07:00