Commit Graph

19279 Commits (f9db1fc56281b96fe8748632b3894de970a8a850)

Author SHA1 Message Date
Thomas Weißschuh 9a99129fd6 kunit: qemu_configs: Add PowerPC 32-bit BE and 64-bit LE
Add basic configs to run kunit tests on some more PowerPC variants.

Link: https://lore.kernel.org/r/20250415-kunit-ppc-v1-2-f5a170264147@linutronix.de
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2025-04-15 10:55:04 -06:00
Thomas Weißschuh 09ea90e598 kunit: qemu_configs: powerpc: Explicitly enable CONFIG_CPU_BIG_ENDIAN=y
The configuration generated by kunit ends up with big endian.
A new kunit configuration for little endian is to be added.
To make the difference clearer spell out the endianness in the kunit
reference config.

Link: https://lore.kernel.org/r/20250415-kunit-ppc-v1-1-f5a170264147@linutronix.de
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2025-04-15 10:54:57 -06:00
Thomas Weißschuh 6cf6b0a6f2 kunit: tool: Implement listing of available architectures
To implement custom scripting around kunit.py it is useful to get a list of
available architectures. While it is possible to manually inspect
tools/testing/kunit/qemu_configs/, this is annoying to implement and
introduces a dependency on a kunit.py implementation detail.

Introduce 'kunit.py run --arch help' which lists all known architectures
in an easy to parse list. This is equivalent on how QEMU implements
listing of possible argument values.

Link: https://lore.kernel.org/r/20250415-kunit-list-v2-1-aa452cd317ae@linutronix.de
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Rae Moar <rmoar@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2025-04-15 10:54:50 -06:00
zhenwei pi a862771d1a selftests: mptcp: use IPPROTO_MPTCP for getaddrinfo
mptcp_connect.c is a startup tutorial of MPTCP programming, however
there is a lack of ai_protocol(IPPROTO_MPTCP) usage. Add comment for
getaddrinfo MPTCP support.

This patch first uses IPPROTO_MPTCP to get addrinfo, and if glibc
version is too old, it falls back to using IPPROTO_TCP.

Co-developed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250413-net-next-mptcp-sched-mib-sft-misc-v2-8-0f83a4350150@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15 08:21:47 -07:00
Geliang Tang f9c7504d30 selftests: mptcp: diag: drop nlh parameter of recv_nlmsg
It's strange that 'nlh' variable is set to NULL in get_mptcpinfo() and then
this NULL pointer is passed to recv_nlmsg(). In fact, this variable should
be defined in recv_nlmsg(), not get_mptcpinfo().

So this patch drops this useless 'nlh' parameter of recv_nlmsg() and define
'nlh' variable in recv_nlmsg().

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250413-net-next-mptcp-sched-mib-sft-misc-v2-7-0f83a4350150@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15 08:21:47 -07:00
Matthieu Baerts (NGI0) 98dea4fd63 selftests: mptcp: validate MPJoinRejected counter
The parent commit adds this new counter, incremented when receiving a
connection request, if the PM didn't allow the creation of new subflows.

Most of the time, it is then kept at 0, except when the PM limits cause
the receiver side to reject new MPJoin connections. This is the case in
the following tests:

 - single subflow, limited by server
 - multiple subflows, limited by server
 - subflows limited by server w cookies
 - userspace pm type rejects join
 - userspace pm type prevents mp_prio

Simply set join_syn_rej=1 when checking the MPJoin counters for these
tests.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250413-net-next-mptcp-sched-mib-sft-misc-v2-6-0f83a4350150@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15 08:21:47 -07:00
Koichiro Den 290ffcfe30 selftests: gpio: gpio-aggregator: add a test case for _sysfs prefix reservation
The kernel doc for gpio-aggregator configfs interface, which was recently
added, states that users should not be able to create an aggregator with a
name prefixed by "_sysfs" via configfs. However, it was found that this
guard does not function as expected (thanks to Dan Carpenter for
identifying and fixing the issue).

Add a test case to verify the guard.

Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
Link: https://lore.kernel.org/r/20250412150119.1461023-1-koichiro.den@canonical.com
Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
2025-04-14 22:30:01 +02:00
Thomas Weißschuh b26c1a85f3 kunit: qemu_configs: SH: Respect kunit cmdline
The default SH kunit configuration sets CONFIG_CMDLINE_OVERWRITE which
completely disregards the cmdline passed from the bootloader/QEMU in favor
of the builtin CONFIG_CMDLINE.
However the kunit tool needs to pass arguments to the in-kernel kunit core,
for filters and other runtime parameters.

Enable CONFIG_CMDLINE_EXTEND instead, so kunit arguments are respected.

Link: https://lore.kernel.org/r/20250407-kunit-sh-v1-1-f5432a54cf2f@linutronix.de
Fixes: 8110a3cab0 ("kunit: tool: Add support for SH under QEMU")
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2025-04-14 10:08:01 -06:00
Thomas Weißschuh 9aa08e761b kunit: qemu_configs: Add riscv32 config
Add a basic config to run kunit tests on riscv32.

Link: https://lore.kernel.org/r/20250407-kunit-qemu-riscv32-v1-1-7b9800034a35@linutronix.de
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2025-04-14 10:02:37 -06:00
Richard Fitzgerald a571a9a1b1 kunit: configs: Enable CONFIG_INIT_STACK_ALL_PATTERN in all_tests
Enable CONFIG_INIT_STACK_ALL_PATTERN in all_tests.config. This helps
to detect use of uninitialized local variables.

This option found an uninitialized data bug in the cs_dsp test.

Link: https://lore.kernel.org/r/20250411095904.1593224-1-rf@opensource.cirrus.com
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2025-04-14 10:02:37 -06:00
Joel Granados 8e4acabdc8 sysctl: Add 0012 to test the u8 range check
Add a sysctl test that uses the new u8 test ctl files in a created by
the sysctl test module. Check that the u8 proc file that is valid is
created and that there are two messages in dmesg for the files that were
out of range.

Reviewed-by: Kees Cook <kees@kernel.org>
Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-04-14 14:13:41 +02:00
Richard Fitzgerald 1aa495a657
kunit: configs: Add some Cirrus Logic modules to all_tests
Add CONFIG_I2C and CONFIG_SND_SOC_CS35L56_I2C to all_tests.config
so that Cirrus Logic modules with KUnit tests will be built.

The CS35L56 driver doesn't currently have any KUnit tests itself,
but it enables two other libraries that have KUnit tests:
cs_dsp and cs-amp-lib.

Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Link: https://patch.msgid.link/20250411123608.1676462-2-rf@opensource.cirrus.com
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
2025-04-13 20:20:28 +01:00
Linus Torvalds 051ea726ee memblock: fix build of memblock test
Add missing stubs for mutex and free_reserved_area() to memblock tests
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCgAuFiEEeOVYVaWZL5900a/pOQOGJssO/ZEFAmf7kxgQHHJwcHRAa2Vy
 bmVsLm9yZwAKCRA5A4Ymyw79kbv4B/0XhZGFhmZt7yXlOU6ywPB94nLgH7lWkbHe
 Kxca0M7/77Vd19nRaHq9zIO2SQkkz3mKQNuSTsBLZLUrRi/oivWN9jFlaXMytLcT
 v2XeX7HCX4FqSAKbA0fu130/qIyHJAyy+cUIuAhy/043eE7mtj99yCHkwza7S4US
 gL1kFNBH9hLuLhm7Pnq7yBK6Wx2kuFhMwG6Deb6AWrSJ1pfv/wMfzwXfBiZ2oy/8
 CfDdeSXiIKoj9CDMnwrbVADZxVFvCC9YooatQoJ/Y+35MwN5XQyJFtS0rXvEHkvf
 jS+HQCUXKxoVeZKVhKnauZ8h1NgLzyhSvUB9iPBo/fdyGDXpU8OO
 =DpcO
 -----END PGP SIGNATURE-----

Merge tag 'fixes-2025-04-13' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock

Pull memblock fix from Mike Rapoport:
 "Fix build of memblock test.

  Add missing stubs for mutex and free_reserved_area() to memblock
  tests"

* tag 'fixes-2025-04-13' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
  memblock tests: Fix mutex related build error
2025-04-13 07:11:33 -07:00
Linus Torvalds 7cdabafc00 tracing fixes for v6.15
- Hide get_vm_area() from MMUless builds
 
   The function get_vm_area() is not defined when CONFIG_MMU is not defined.
   Hide that function within #ifdef CONFIG_MMU.
 
 - Fix output of synthetic events when they have dynamic strings
 
   The print fmt of the synthetic event's format file use to have "%.*s" for
   dynamic size strings even though the user space exported arguments had
   only __get_str() macro that provided just a nul terminated string. This
   was fixed so that user space could parse this properly. But the reason
   that it had "%.*s" was because internally it provided the maximum size of
   the string as one of the arguments. The fix that replaced "%.*s" with "%s"
   caused the trace output (when the kernel reads the event) to write
   "(efault)" as it would now read the length of the string as "%s".
 
   As the string provided is always nul terminated, there's no reason for the
   internal code to use "%.*s" anyway. Just remove the length argument to
   match the "%s" that is now in the format.
 
 - Fix the ftrace subops hash logic of the manager ops hash
 
   The function_graph uses the ftrace subops code. The subops code is a way
   to have a single ftrace_ops registered with ftrace to determine what
   functions will call the ftrace_ops callback. More than one user of
   function graph can register a ftrace_ops with it. The function graph
   infrastructure will then add this ftrace_ops as a subops with the main
   ftrace_ops it registers with ftrace. This is because the functions will
   always call the function graph callback which in turn calls the subops
   ftrace_ops callbacks.
 
   The main ftrace_ops must add a callback to all the functions that the
   subops want a callback from. When a subops is registered, it will update
   the main ftrace_ops hash to include the functions it wants. This is the
   logic that was broken.
 
   The ftrace_ops hash has a "filter_hash" and a "notrace_hash" were all the
   functions in the filter_hash but not in the notrace_hash are attached by
   ftrace. The original logic would have the main ftrace_ops filter_hash be a
   union of all the subops filter_hashes and the main notrace_hash would be a
   intersect of all the subops filter hashes. But this was incorrect because
   the notrace hash depends on the filter_hash it is associated to and not
   the union of all filter_hashes.
 
   Instead, when a subops is added, just include all the functions of the
   subops hash that are in its filter_hash but not in its notrace_hash. The
   main subops hash should not use its notrace hash, unless all of its subops
   hashes have an empty filter_hash (which means to attach to all functions),
   and then, and only then, the main ftrace_ops notrace hash can be the
   intersect of all the subops hashes.
 
   This not only fixes the bug, but also simplifies the code.
 
 - Add a selftest to better test the subops filtering
 
   Add a selftest that would catch the bug fixed by the above change.
 
 - Fix extra newline printed in function tracing with retval
 
   The function parameter code changed the output logic slightly and called
   print_graph_retval() and also printed a newline. The print_graph_retval()
   also prints a newline which caused blank lines to be printed in the
   function graph tracer when retval was added. This caused one of the
   selftests to fail if retvals were enabled. Instead remove the new line
   output from print_graph_retval() and have the callers always print the
   new line so that it doesn't have to do special logic if it calls
   print_graph_retval() or not.
 
 - Fix out-of-bound memory access in the runtime verifier
 
   When rv_is_container_monitor() is called on the last entry on the link
   list it references the next entry, which is the list head and causes an
   out-of-bound memory access.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZ/rXQxQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qoj7AQC0C2awpJSUIRj91qjPtMYuNUE3AVpB
 EEZEkt19LfE//gEA1fOx3Cors/LrY9dthn/3LMKL23vo9c4i0ffhs2X+1gE=
 =XJL5
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fixes from Steven Rostedt:

 - Hide get_vm_area() from MMUless builds

   The function get_vm_area() is not defined when CONFIG_MMU is not
   defined. Hide that function within #ifdef CONFIG_MMU.

 - Fix output of synthetic events when they have dynamic strings

   The print fmt of the synthetic event's format file use to have "%.*s"
   for dynamic size strings even though the user space exported
   arguments had only __get_str() macro that provided just a nul
   terminated string. This was fixed so that user space could parse this
   properly.

   But the reason that it had "%.*s" was because internally it provided
   the maximum size of the string as one of the arguments. The fix that
   replaced "%.*s" with "%s" caused the trace output (when the kernel
   reads the event) to write "(efault)" as it would now read the length
   of the string as "%s".

   As the string provided is always nul terminated, there's no reason
   for the internal code to use "%.*s" anyway. Just remove the length
   argument to match the "%s" that is now in the format.

 - Fix the ftrace subops hash logic of the manager ops hash

   The function_graph uses the ftrace subops code. The subops code is a
   way to have a single ftrace_ops registered with ftrace to determine
   what functions will call the ftrace_ops callback. More than one user
   of function graph can register a ftrace_ops with it. The function
   graph infrastructure will then add this ftrace_ops as a subops with
   the main ftrace_ops it registers with ftrace. This is because the
   functions will always call the function graph callback which in turn
   calls the subops ftrace_ops callbacks.

   The main ftrace_ops must add a callback to all the functions that the
   subops want a callback from. When a subops is registered, it will
   update the main ftrace_ops hash to include the functions it wants.
   This is the logic that was broken.

   The ftrace_ops hash has a "filter_hash" and a "notrace_hash" where
   all the functions in the filter_hash but not in the notrace_hash are
   attached by ftrace. The original logic would have the main ftrace_ops
   filter_hash be a union of all the subops filter_hashes and the main
   notrace_hash would be a intersect of all the subops filter hashes.
   But this was incorrect because the notrace hash depends on the
   filter_hash it is associated to and not the union of all
   filter_hashes.

   Instead, when a subops is added, just include all the functions of
   the subops hash that are in its filter_hash but not in its
   notrace_hash. The main subops hash should not use its notrace hash,
   unless all of its subops hashes have an empty filter_hash (which
   means to attach to all functions), and then, and only then, the main
   ftrace_ops notrace hash can be the intersect of all the subops
   hashes.

   This not only fixes the bug, but also simplifies the code.

 - Add a selftest to better test the subops filtering

   Add a selftest that would catch the bug fixed by the above change.

 - Fix extra newline printed in function tracing with retval

   The function parameter code changed the output logic slightly and
   called print_graph_retval() and also printed a newline. The
   print_graph_retval() also prints a newline which caused blank lines
   to be printed in the function graph tracer when retval was added.
   This caused one of the selftests to fail if retvals were enabled.
   Instead remove the new line output from print_graph_retval() and have
   the callers always print the new line so that it doesn't have to do
   special logic if it calls print_graph_retval() or not.

 - Fix out-of-bound memory access in the runtime verifier

   When rv_is_container_monitor() is called on the last entry on the
   link list it references the next entry, which is the list head and
   causes an out-of-bound memory access.

* tag 'trace-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  rv: Fix out-of-bound memory access in rv_is_container_monitor()
  ftrace: Do not have print_graph_retval() add a newline
  tracing/selftest: Add test to better test subops filtering of function graph
  ftrace: Fix accounting of subop hashes
  ftrace: Properly merge notrace hashes
  tracing: Do not add length to print format in synthetic events
  tracing: Hide get_vm_area() from MMUless builds
2025-04-12 15:37:40 -07:00
Linus Torvalds b676ac484f bpf-fixes
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmf6sD8ACgkQ6rmadz2v
 bTq86w//bbg2S1ZhSXXQvgRSbxfecvJ0r6XGDOaMsKxPXcqpbaMoSCYx2D8puO+b
 xm0vc+5qXlzuTHq9I8flDKrWdA+/sHxLQhXjcBA796vaY6IgJEnapf3kENyzZ3Vp
 agpNPlZe9FLaANDRivTFPVgzVjr07/3eL7VKItASksb/3yjBSa+vrIJVfGF1krQT
 slxTMzVMzB+p0MdKVjmeGn5EodWXp8TdVzQBPb8vnCn7U1h1HULSh4j1+nZ/Z1yr
 zC4/pVPmdDJe1H8ghBGm4f0nY+EwXPtZiVbXnYS2FhgjvthRKFYIyxN9F6kg7AD7
 NG0T6xw/QYNfPTR40PSiV/WHhH5qa2zRVtlepVU7tqqmsyRXi+0Eq/MfJyiuNzgN
 WWmJec0O/Ax4r2Xs/QgX3mFlRnLNi5gmc7fuOARmayAlqElZ9QdB2x6ebW5Fk4Qx
 9oyQACpcu6/oUKgeMSo52MDa82wUPPxpC6qdsefmQYaAcOKM5MD4SNd+eEnfX03E
 RAaItTW9az57a2BL9C/ejJO/SwY4Er+O8B3PO7GaKiURMSZa5nVlY+2QB2fJy6TA
 7IvSYjFD5E4risMbZgPFCqWkQ0yHbY7zEn/tbcNC5AFZoKv70jELPQTLPXq7UPLe
 BuKoL9VJyeXF7E1MQqQH33q3tfcwlIL++piCNHvTQoPadEba2dM=
 =Mezb
 -----END PGP SIGNATURE-----

Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Pull bpf fixes from Alexei Starovoitov:

 - Followup fixes for resilient spinlock (Kumar Kartikeya Dwivedi):
     - Make res_spin_lock test less verbose, since it was spamming BPF
       CI on failure, and make the check for AA deadlock stronger
     - Fix rebasing mistake and use architecture provided
       res_smp_cond_load_acquire
     - Convert BPF maps (queue_stack and ringbuf) to resilient spinlock
       to address long standing syzbot reports

 - Make sure that classic BPF load instruction from SKF_[NET|LL]_OFF
   offsets works when skb is fragmeneted (Willem de Bruijn)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf: Convert ringbuf map to rqspinlock
  bpf: Convert queue_stack map to rqspinlock
  bpf: Use architecture provided res_smp_cond_load_acquire
  selftests/bpf: Make res_spin_lock AA test condition stronger
  selftests/net: test sk_filter support for SKF_NET_OFF on frags
  bpf: support SKF_NET_OFF and SKF_LL_OFF on skb frags
  selftests/bpf: Make res_spin_lock test less verbose
2025-04-12 12:48:10 -07:00
Kuniyuki Iwashima b2bdce7adc selftest: net: Remove DCCP bits.
We will remove DCCP.

Let's remove DCCP bits from selftest.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250410023921.11307-2-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-11 18:58:10 -07:00
Anshuman Khandual 92868577d0 selftests/mm: fix compiler -Wmaybe-uninitialized warning
Following build warning comes up for cow test as 'transferred' variable has
not been initialized. Fix the warning via zero init for the variable.

  CC       cow
cow.c: In function `do_test_vmsplice_in_parent':
cow.c:365:61: warning: `transferred' may be used uninitialized [-Wmaybe-uninitialized]
  365 |                 cur = read(fds[0], new + total, transferred - total);
      |                                                 ~~~~~~~~~~~~^~~~~~~
cow.c:296:29: note: `transferred' was declared here
  296 |         ssize_t cur, total, transferred;
      |                             ^~~~~~~~~~~
  CC       compaction_test
  CC       gup_longterm

Link: https://lkml.kernel.org/r/20250409095006.1422620-1-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-11 17:32:41 -07:00
Baolin Wang 8c583e538a selftests: mincore: fix tmpfs mincore test failure
When running mincore test cases, I encountered the following failures:

"
mincore_selftest.c:359:check_tmpfs_mmap:Expected ra_pages (511) == 0 (0)
mincore_selftest.c:360:check_tmpfs_mmap:Read-ahead pages found in memory
check_tmpfs_mmap: Test terminated by assertion
          FAIL  global.check_tmpfs_mmap
not ok 5 global.check_tmpfs_mmap
FAILED: 4 / 5 tests passed
"

The reason for the test case failure is that my system automatically enabled
tmpfs large folio allocation by adding the 'transparent_hugepage_tmpfs=always'
cmdline. However, the test case still expects the tmpfs mounted on /dev/shm to
allocate small folios, which leads to assertion failures when verifying readahead
pages.

As discussed with David, there's no reason to continue checking the readahead
logic for tmpfs. Drop it to fix this issue.

Link: https://lkml.kernel.org/r/9a00856cc6a8b4e46f4ab8b1af11ce5fc1a31851.1744025467.git.baolin.wang@linux.alibaba.com
Fixes: d635ccdb43 ("mm: shmem: add a kernel command line to change the default huge policy for tmpfs")
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Acked-by: Zi Yan <ziy@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Barry Song <21cnbao@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-11 17:32:40 -07:00
Mark Brown 9c02223e2d selftests/mm: generate a temporary mountpoint for cgroup filesystem
Currently if the filesystem for the cgroups version it wants to use is not
mounted charge_reserved_hugetlb.sh and hugetlb_reparenting_test.sh tests
will attempt to mount it on the hard coded path /dev/cgroup/memory,
deleting that directory when the test finishes.  This will fail if there
is not a preexisting directory at that path, and since the directory is
deleted subsequent runs of the test will fail.  Instead of relying on this
hard coded directory name use mktemp to generate a temporary directory to
use as a mountpoint, fixing both the assumption and the disruption caused
by deleting a preexisting directory.

This means that if the relevant cgroup filesystem is not already mounted
then we rely on having coreutils (which provides mktemp) installed.  I
suspect that many current users are relying on having things automounted
by default, and given that the script relies on bash it's probably not an
unreasonable requirement.

Link: https://lkml.kernel.org/r/20250404-kselftest-mm-cgroup2-detection-v1-1-3dba6d32ba8c@kernel.org
Fixes: 209376ed2a ("selftests/vm: make charge_reserved_hugetlb.sh work with existing cgroup setting")
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: Aishwarya TCV <aishwarya.tcv@arm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-11 17:32:37 -07:00
Matthew Wilcox (Oracle) a30951d09c test suite: use %zu to print size_t
On 32-bit, we can't use %lu to print a size_t variable and gcc warns us
about it.  Shame it doesn't warn about it on 64-bit.

Link: https://lkml.kernel.org/r/20250403003311.359917-1-Liam.Howlett@oracle.com
Fixes: cc86e0c2f3 ("radix tree test suite: add support for slab bulk APIs")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-11 17:32:35 -07:00
Daniel Gomez be8254f694 radix-tree: add missing cleanup.h
Add shared cleanup.h header for radix-tree testing tools.

Fixes build error found with kdevops [1]:

cc -I../shared -I. -I../../include -I../../../lib -g -Og -Wall
-D_LGPL_SOURCE -fsanitize=address -fsanitize=undefined   -c -o
radix-tree.o radix-tree.c
In file included from ../shared/linux/idr.h:1,
                 from radix-tree.c:18:
../shared/linux/../../../../include/linux/idr.h:18:10: fatal error:
linux/cleanup.h: No such file or directory
   18 | #include <linux/cleanup.h>
      |          ^~~~~~~~~~~~~~~~~
compilation terminated.
make: *** [<builtin>: radix-tree.o] Error 1

[1] https://github.com/linux-kdevops/kdevops
https://github.com/linux-kdevops/linux-mm-kpd/
actions/runs/13971648496/job/39114756401

[akpm@linux-foundation.org: remove unneeded header guards, per Sidhartha]
Link: https://lkml.kernel.org/r/20250321-fix-radix-tree-build-v1-1-838a1e6540e2@samsung.com
Fixes: 6c8b0b835f ("perf/core: Simplify perf_pmu_register()")
Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
Cc: Daniel Gomez <da.gomez@samsung.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Luis Chamberalin <mcgrof@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-11 17:32:35 -07:00
Toke Høiland-Jørgensen 18c889a9a4 selftests/tc-testing: Add test for echo of big TC filters
Add a selftest that checks whether the kernel can successfully echo a
big tc filter, to test the fix introduced in commit:

369609fc62 ("tc: Ensure we have enough buffer space when sending filter netlink notifications")

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Link: https://patch.msgid.link/20250410104322.214620-1-toke@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-11 16:33:32 -07:00
Steven Rostedt a1fc89d409 tracing/selftest: Add test to better test subops filtering of function graph
A bug was discovered that showed the accounting of the subops of the
ftrace_ops filtering was incorrect. Add a new test to better test the
filtering.

This test creates two instances, where it will add various filters to both
the set_ftrace_filter and the set_ftrace_notrace files and enable
function_graph. Then it looks into the enabled_functions file to make sure
that the filters are behaving correctly.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Andy Chiu <andybnac@gmail.com>
Link: https://lore.kernel.org/20250409152720.380778379@goodmis.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-11 16:02:08 -04:00
Thomas Weißschuh 8e1930296f tools/nolibc: Add support for SPARC
Add support for 32bit and 64bit SPARC to nolibc.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Tested-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> # UltraSparc T4 (Niagara4)
Link: https://lore.kernel.org/lkml/20250322-nolibc-sparc-v2-1-89af018c6296@weissschuh.net/
2025-04-11 20:00:20 +02:00
Thomas Weißschuh fd293cb81a selftests/nolibc: only consider XARCH for CFLAGS when requested
If no explicit XARCH is specified, use the toolchains default.

Suggested-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Link: https://lore.kernel.org/lkml/20250326205434.bPx_kVUx@breakpoint.cc/
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20250402-nolibc-nolibc-test-native-v1-2-62f2f8585220@weissschuh.net
2025-04-11 20:00:19 +02:00
Thomas Weißschuh cdbf0f199e selftests/nolibc: drop dependency from sysroot to defconfig
The creation of the sysroot does not require a kernel configuration.

Drop the dependency.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20250402-nolibc-nolibc-test-native-v1-1-62f2f8585220@weissschuh.net
2025-04-11 20:00:19 +02:00
Paul E. McKenney a3204f778c rcutorture: Make torture.sh --do-rt use CONFIG_PREEMPT_RT
The torture.sh --do-rt command-line parameter is intended to mimic -rt
kernels.  Now that CONFIG_PREEMPT_RT is upstream, this commit makes this
mimicking more precise.

Note that testing of RCU priority boosting is disabled in favor
of forward-progress testing of RCU callbacks.  If it turns out to be
possible to make kernels built with CONFIG_PREEMPT_RT=y to tolerate
testing of both, both will be enabled.

[ paulmck: Apply Sebastian Siewior feedback. ]

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
2025-04-11 08:58:33 -04:00
Mickaël Salaün 6b4566400a
selftests/landlock: Add PID tests for audit records
Add audit.thread tests to check that the PID tied to a domain is not a
thread ID but the thread group ID.  These new tests would not pass
without the previous TGID fix.

Extend matches_log_domain_allocated() to check against the PID that
created the domain.

Test coverage for security/landlock is 93.6% of 1524 lines according to
gcc/gcov-14.

Cc: Christian Brauner <brauner@kernel.org>
Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20250410171725.1265860-3-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2025-04-11 12:53:22 +02:00
Mickaël Salaün e4a0f9e0ca
selftests/landlock: Factor out audit fixture in audit_test
The audit fixture needlessly stores and manages domain_stack.  Move it
to the audit.layers tests.  This will be useful to reuse the audit
fixture with the next patch.

Cc: Günther Noack <gnoack@google.com>
Link: https://lore.kernel.org/r/20250410171725.1265860-2-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2025-04-11 12:53:20 +02:00
Jakub Kicinski cd5e64fb95 netlink: specs: rename rtnetlink specs in accordance with family name
The rtnetlink family names are set to rt-$name within the YAML
but the files are called rt_$name. C codegen assumes that the
generated file name will match the family. The use of dashes
is in line with our general expectation that name properties
in the spec use dashes not underscores (even tho, as Donald
points out most genl families use underscores in the name).

We have 3 un-ideal options to choose from:

 - accept the slight inconsistency with old families using _, or
 - accept the slight annoyance with all languages having to do s/-/_/
   when looking up family ID, or
 - accept the inconsistency with all name properties in new YAML spec
   being separated with - and just the family name always using _.

Pick option 1 and rename the rtnl spec files.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250410014658.782120-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-10 20:14:40 -07:00
David Wei 6afd0a3c7e io_uring/zcrx: enable tcp-data-split in selftest
For bnxt when the agg ring is used then tcp-data-split is automatically
reported to be enabled, but __net_mp_open_rxq() requires tcp-data-split
to be explicitly enabled by the user.

Enable tcp-data-split explicitly in io_uring zc rx selftest.

Signed-off-by: David Wei <dw@davidwei.uk>
Link: https://patch.msgid.link/20250409163153.2747918-1-dw@davidwei.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-10 17:27:58 -07:00
Jakub Kicinski cb7103298d Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.15-rc2).

Conflict:

Documentation/networking/netdevices.rst
net/core/lock_debug.c
  04efcee6ef ("net: hold instance lock during NETDEV_CHANGE")
  03df156dd3 ("xdp: double protect netdev->xdp_flags with netdev->lock")

No adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-10 16:51:07 -07:00
Thomas Weißschuh 5f40eef1c7 selftests/nolibc: drop unnecessary sys/io.h include
The include of sys/io.h is not necessary anymore since
commit 67eb617a8e ("selftests/nolibc: simplify call to ioperm").
It's existence is also problematic as the header does not exist on all
architectures.

Reported-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20250324-nolibc-ioperm-v1-1-8a7cfb2876ae@weissschuh.net
2025-04-10 22:02:30 +02:00
Kumar Kartikeya Dwivedi 1ddb9ad2ac selftests/bpf: Make res_spin_lock AA test condition stronger
Let's make sure that we see a EDEADLK and ETIMEDOUT whenever checking
for the AA tests (in case of simple AA and AA after exhausting 31
entries).

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250410170023.2670683-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-04-10 12:45:37 -07:00
Tushar Vyavahare 4b30209255 selftests/xsk: Add tail adjustment tests and support check
Introduce tail adjustment functionality in xskxceiver using
bpf_xdp_adjust_tail(). Add `xsk_xdp_adjust_tail` to modify packet sizes
and drop unmodified packets. Implement `is_adjust_tail_supported` to check
helper availability. Develop packet resizing tests, including shrinking
and growing scenarios, with functions for both single-buffer and
multi-buffer cases. Update the test framework to handle various scenarios
and adjust MTU settings. These changes enhance the testing of packet tail
adjustments, improving AF_XDP framework reliability.

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250410033116.173617-3-tushar.vyavahare@intel.com
2025-04-10 10:08:55 -07:00
Tushar Vyavahare 3e730fe2af selftests/xsk: Add packet stream replacement function
Add pkt_stream_replace_ifobject function to replace the packet stream for
a given ifobject.

Enable separate TX and RX packet replacement, allowing RX side packet
length adjustments using bpf_xdp_adjust_tail() in the upcoming patch.
Currently, pkt_stream_replace() works on both TX and RX packet streams,
and this new function provides the ability to modify one of them.

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250410033116.173617-2-tushar.vyavahare@intel.com
2025-04-10 10:07:48 -07:00
Linus Torvalds ab59a86056 Including fixes from netfilter.
Current release - regressions:
 
   - core: hold instance lock during NETDEV_CHANGE
 
   - rtnetlink: fix bad unlock balance in do_setlink().
 
   - ipv6:
     - fix null-ptr-deref in addrconf_add_ifaddr().
     - align behavior across nexthops during path selection
 
 Previous releases - regressions:
 
   - sctp: prevent transport UaF in sendmsg
 
   - mptcp: only inc MPJoinAckHMacFailure for HMAC failures
 
 Previous releases - always broken:
 
   - sched:
     - make ->qlen_notify() idempotent
     - ensure sufficient space when sending filter netlink notifications
     - sch_sfq: really don't allow 1 packet limit
 
   - netfilter: fix incorrect avx2 match of 5th field octet
 
   - tls: explicitly disallow disconnect
 
   - eth: octeontx2-pf: fix VF root node parent queue priority
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmf3xusSHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkud0P/iWOQB0oj0nvxl2ionPzgJEPduxuF0V6
 YPyDBUzLC7Gq6NmTcdDlNJt8fE6UmKUIneghUm9Ss7LRpKv0/TPvorKMSK44Zt53
 a5q49JeoI0TvnnhJesdHjiF31hrInqZmcX8OjSH8Q/SCKuy7rsgzao0vjvhd7lxm
 wA6LlWnJO1Pf991nNpbjUSoAZ7CMNlEIewGkdq0+6UADC7D9VagKTgIkFKw1BvRw
 2Eb2pzvdO9Pj02+l/mjdRhUzMZlr+FG+WBqXk5oKR0YZ2t3CS4O9/UUBoAn775tM
 gCfzepNuAUXGX0I6h+DANCNuswWuG/IvYTdhy+hRWblYeCkILU60E8eVMlh7tpII
 fUd5GSRhX1NpGNHUlDG/4b6IcjMO3ebtce2cm2Y9t2CUe7EqB0HZyvTczNroTxip
 KXrXcCBuEkzxXCZhaN/CrBu8Piu8vJk/rMH5ha1khce9CkmYY+m9ruvsYjZmPI+/
 P/SFkRdb/yV/SIOmay8FCJsy60t4FOtLnlDDrnygq4Q/9a7VwafebVpKS1fbTELG
 ZTiELN3/PN2GUnfREf0DVLPfn9sdqrMZaclLOJpp/Zi1/RZpo52WHceXJShiu9pe
 A8B+3SuPgOaLfhwyqiHlWm5moc9kNF26vlrWfFjK1GrJdxMisYwQoWD5eHfQFhDX
 UaxlAmndwZa9
 =wkGz
 -----END PGP SIGNATURE-----

Merge tag 'net-6.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from netfilter.

  Current release - regressions:

    - core: hold instance lock during NETDEV_CHANGE

    - rtnetlink: fix bad unlock balance in do_setlink()

    - ipv6:
       - fix null-ptr-deref in addrconf_add_ifaddr()
       - align behavior across nexthops during path selection

  Previous releases - regressions:

    - sctp: prevent transport UaF in sendmsg

    - mptcp: only inc MPJoinAckHMacFailure for HMAC failures

  Previous releases - always broken:

    - sched:
       - make ->qlen_notify() idempotent
       - ensure sufficient space when sending filter netlink notifications
       - sch_sfq: really don't allow 1 packet limit

    - netfilter: fix incorrect avx2 match of 5th field octet

    - tls: explicitly disallow disconnect

    - eth: octeontx2-pf: fix VF root node parent queue priority"

* tag 'net-6.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (38 commits)
  ethtool: cmis_cdb: Fix incorrect read / write length extension
  selftests: netfilter: add test case for recent mismatch bug
  nft_set_pipapo: fix incorrect avx2 match of 5th field octet
  net: ppp: Add bound checking for skb data on ppp_sync_txmung
  net: Fix null-ptr-deref by sock_lock_init_class_and_name() and rmmod.
  ipv6: Align behavior across nexthops during path selection
  net: phy: allow MDIO bus PM ops to start/stop state machine for phylink-controlled PHY
  net: phy: move phy_link_change() prior to mdio_bus_phy_may_suspend()
  selftests/tc-testing: sfq: check that a derived limit of 1 is rejected
  net_sched: sch_sfq: move the limit validation
  net_sched: sch_sfq: use a temporary work area for validating configuration
  net: libwx: handle page_pool_dev_alloc_pages error
  selftests: mptcp: validate MPJoin HMacFailure counters
  mptcp: only inc MPJoinAckHMacFailure for HMAC failures
  rtnetlink: Fix bad unlock balance in do_setlink().
  net: ethtool: Don't call .cleanup_data when prepare_data fails
  tc: Ensure we have enough buffer space when sending filter netlink notifications
  net: libwx: Fix the wrong Rx descriptor field
  octeontx2-pf: qos: fix VF root node parent queue index
  selftests: tls: check that disconnect does nothing
  ...
2025-04-10 08:52:18 -07:00
Linus Torvalds e4742a89cf block-6.15-20250410
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmf3xQAQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpvRqD/9lzvEoNKxL2DzJfnY9mNSkLErXsP9iRqOa
 vI3fh0xce+4TMqG7IOuOfPClizOkJRdaBQPWmXaGhC3wGopeVkFYlhSs1kydUhJy
 Sd+zr2xC9rcz+5FMwG5L6IiUCW4vPewwrSgcvLYONbiJAEVW85lFv0+r6eG0LAQA
 Y8B31+bqVxb9/dSxlFn1XhU/fKLvAXz7k+Gmwu/TGNRribCs6s4jnXoqt/2Ina2t
 0PqLNNRCUdgsdH6MAsFmDUPHBmJcFN1WgNSGvAY1AWawfoC6/ZXTX1hdDAdSa/xd
 i6Rb40pwx7rBYNaPHgaY08WtGiv109zTpx7vjovM5iTCEzyhvHRjzxIUwPtdypzZ
 0cUOUsWKunVowP/R9S+ZT5Fw1rPgT33J1QXof8ZZkQjZ8Vmd9mvorclZMxbtbmfV
 G965c4JUYDpbyPa0x06Q/Q3tdF4StAQiXq4ajLDMv5+gju4PDc0YjrxVVtTMS5Xa
 Z+xPpEnpygav02euVAo3Y2/xi99eRO94D48kA5PGSqFRoFWhQ4cDpGUSU3f0/vaI
 3jIXeORN7mqYdEQbhtAN7Qp1OwqZp25IFmRmQyDyYwS6oZ4xLIVO9lJmApgJ9ylC
 kP85z0RuP2KrIi/dBx4g6rujQ7kO6KKzgG7HB5L+gII7mj90rVQQBQOz9oaTQ8iU
 XTkJRIA7Zw==
 =XoSU
 -----END PGP SIGNATURE-----

Merge tag 'block-6.15-20250410' of git://git.kernel.dk/linux

Pull block fixes from Jens Axboe:

 - Add a missing ublk selftest script, from test additions added last
   week

 - Two fixes for ublk error recovery and reissue

 - Cleanup of ublk argument passing

* tag 'block-6.15-20250410' of git://git.kernel.dk/linux:
  ublk: pass ublksrv_ctrl_cmd * instead of io_uring_cmd *
  ublk: don't fail request for recovery & reissue in case of ubq->canceling
  ublk: fix handling recovery & reissue in ublk_abort_queue()
  selftests: ublk: fix test_stripe_04
2025-04-10 07:02:22 -07:00
Florian Westphal 27eb86e22f selftests: netfilter: add test case for recent mismatch bug
Without 'nft_set_pipapo: fix incorrect avx2 match of 5th field octet"
this fails:

TEST: reported issues
  Add two elements, flush, re-add    1s  [ OK ]
  net,mac with reload                0s  [ OK ]
  net,port,proto                     3s  [ OK ]
  avx2 false match                   0s  [FAIL]
False match for fe80:dead:01fe:0a02:0b03:6007:8009:a001

Other tests do not detect the kernel bug as they only alter parts in
the /64 netmask.

Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-04-10 12:33:55 +02:00
David Woodhouse 7615b94b63 selftests/kexec: Add x86_64 selftest for kexec-jump and exception handling
Add a self test which exercises both the kexec-jump facility, and the
kexec exception handling.

Invoke a trivial payload which just does an int3 and returns,
flip-flopping its entry point for the next invocation between two
implementations of the same thing.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250326142404.256980-5-dwmw2@infradead.org
2025-04-10 12:17:14 +02:00
Hou Tao 7c6fb1cf33 selftests/bpf: Add test case for atomic update of fd htab
Add a test case to verify the atomic update of existing elements in the
htab of maps. The test proceeds in three steps:

1) fill the outer map with keys in the range [0, 8]
For each inner array map, the value of its first element is set as the
key used to lookup the inner map.

2) create 16 threads to lookup these keys concurrently
Each lookup thread first lookups the inner map, then it checks whether
the first value of the inner array map is the same as the key used to
lookup the inner map.

3) create 8 threads to overwrite these keys concurrently
Each update thread first creates an inner array, it sets the first value
of the array to the key used to update the outer map, then it uses the
key and the inner map to update the outer map.

Without atomic update support, the lookup operation may return -ENOENT
during the lookup of outer map, or return -EINVAL during the comparison
of the first value in the inner map and the key used for inner map, and
the test will fail. After the atomic update change, both the lookup and
the comparison will succeed.

Given that the update of outer map is slow, the test case sets the loop
number for each thread as 5 to reduce the total running time. However,
the loop number could also be adjusted through FD_HTAB_LOOP_NR
environment variable.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20250401062250.543403-7-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-04-09 20:12:54 -07:00
Willem de Bruijn fcd7132cb1 selftests/net: test sk_filter support for SKF_NET_OFF on frags
Verify that a classic BPF linux socket filter correctly matches
packet contents. Including when accessing contents in an
skb_frag.

1. Open a SOCK_RAW socket with a classic BPF filter on UDP dport 8000.
2. Open a tap device with IFF_NAPI_FRAGS to inject skbs with frags.
3. Send a packet for which the UDP header is in frag[0].
4. Receive this packet to demonstrate that the socket accepted it.

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20250408132833.195491-3-willemdebruijn.kernel@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-04-09 20:02:51 -07:00
Jiayuan Chen 7b2fa44de5 selftest/bpf/benchs: Add benchmark for sockmap usage
Add TCP+sockmap-based benchmark.
Since sockmap's own update and delete operations are generally less
critical, the performance of the fast forwarding framework built upon
it is the key aspect.

Also with cgset/cgexec, we can observe the behavior of sockmap under
memory pressure.

The benchmark can be run with:
'''
./bench sockmap -c 2 -p 1 -a --rx-verdict-ingress
'''

In the future, we plan to move socket_helpers.h out of the prog_tests
directory to make it accessible for the benchmark. This will enable
better support for various socket types.

Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Link: https://lore.kernel.org/r/20250407142234.47591-5-jiayuan.chen@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-04-09 19:59:00 -07:00
Jiayuan Chen 05ebde1bcb selftests/bpf: add ktls selftest
add ktls selftest for sockmap

Test results:
sockmap_ktls/sockmap_ktls disconnect_after_delete IPv4 SOCKMAP:OK
sockmap_ktls/sockmap_ktls update_fails_when_sock_has_ulp IPv4 SOCKMAP:OK
sockmap_ktls/sockmap_ktls disconnect_after_delete IPv4 SOCKMAP:OK
sockmap_ktls/sockmap_ktls update_fails_when_sock_has_ulp IPv4 SOCKMAP:OK
sockmap_ktls/sockmap_ktls disconnect_after_delete IPv4 SOCKMAP:OK
sockmap_ktls/sockmap_ktls update_fails_when_sock_has_ulp IPv4 SOCKMAP:OK
sockmap_ktls/sockmap_ktls disconnect_after_delete IPv4 SOCKMAP:OK
sockmap_ktls/sockmap_ktls update_fails_when_sock_has_ulp IPv4 SOCKMAP:OK
sockmap_ktls/tls simple offload:OK
sockmap_ktls/tls tx cork:OK
sockmap_ktls/tls tx cork with push:OK
sockmap_ktls/tls simple offload:OK
sockmap_ktls/tls tx cork:OK
sockmap_ktls/tls tx cork with push:OK
sockmap_ktls:OK

Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/r/20250219052015.274405-3-jiayuan.chen@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-04-09 19:57:14 -07:00
Amit Cohen 0ffb594212 selftests: test_bridge_neigh_suppress: Test unicast ARP/NS with suppression
Add test cases to check that unicast ARP/NS packets are replied once, even
if ARP/ND suppression is enabled.

Without the previous patch:
$ ./test_bridge_neigh_suppress.sh
...
Unicast ARP, per-port ARP suppression - VLAN 10
-----------------------------------------------
TEST: "neigh_suppress" is on                                        [ OK ]
TEST: Unicast ARP, suppression on, h1 filter                        [FAIL]
TEST: Unicast ARP, suppression on, h2 filter                        [ OK ]

Unicast ARP, per-port ARP suppression - VLAN 20
-----------------------------------------------
TEST: "neigh_suppress" is on                                        [ OK ]
TEST: Unicast ARP, suppression on, h1 filter                        [FAIL]
TEST: Unicast ARP, suppression on, h2 filter                        [ OK ]
...
Unicast NS, per-port NS suppression - VLAN 10
---------------------------------------------
TEST: "neigh_suppress" is on                                        [ OK ]
TEST: Unicast NS, suppression on, h1 filter                         [FAIL]
TEST: Unicast NS, suppression on, h2 filter                         [ OK ]

Unicast NS, per-port NS suppression - VLAN 20
---------------------------------------------
TEST: "neigh_suppress" is on                                        [ OK ]
TEST: Unicast NS, suppression on, h1 filter                         [FAIL]
TEST: Unicast NS, suppression on, h2 filter                         [ OK ]
...
Tests passed: 156
Tests failed:   4

With the previous patch:
$ ./test_bridge_neigh_suppress.sh
...
Unicast ARP, per-port ARP suppression - VLAN 10
-----------------------------------------------
TEST: "neigh_suppress" is on                                        [ OK ]
TEST: Unicast ARP, suppression on, h1 filter                        [ OK ]
TEST: Unicast ARP, suppression on, h2 filter                        [ OK ]

Unicast ARP, per-port ARP suppression - VLAN 20
-----------------------------------------------
TEST: "neigh_suppress" is on                                        [ OK ]
TEST: Unicast ARP, suppression on, h1 filter                        [ OK ]
TEST: Unicast ARP, suppression on, h2 filter                        [ OK ]
...
Unicast NS, per-port NS suppression - VLAN 10
---------------------------------------------
TEST: "neigh_suppress" is on                                        [ OK ]
TEST: Unicast NS, suppression on, h1 filter                         [ OK ]
TEST: Unicast NS, suppression on, h2 filter                         [ OK ]

Unicast NS, per-port NS suppression - VLAN 20
---------------------------------------------
TEST: "neigh_suppress" is on                                        [ OK ]
TEST: Unicast NS, suppression on, h1 filter                         [ OK ]
TEST: Unicast NS, suppression on, h2 filter                         [ OK ]
...
Tests passed: 160
Tests failed:   0

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/dc240b9649b31278295189f412223f320432c5f2.1744123493.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-09 19:13:43 -07:00
Saket Kumar Bhaskar 967e8def11 selftests/bpf: Fix bpf_nf selftest failure
For systems with missing iptables-legacy tool this selftest fails.

Add check to find if iptables-legacy tool is available and skip the
test if the tool is missing.

Fixes: de9c8d848d ("selftests/bpf: S/iptables/iptables-legacy/ in the bpf_nf and xdp_synproxy test")
Signed-off-by: Saket Kumar Bhaskar <skb99@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250409095633.33653-1-skb99@linux.ibm.com
2025-04-09 16:29:39 -07:00
Mykyta Yatsenko b8390dd1e0 selftests/bpf: Add BTF.ext line/func info getter tests
Add selftests checking that line and func info retrieved by newly added
libbpf APIs are the same as returned by kernel via bpf_prog_get_info_by_fd.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250408234417.452565-3-mykyta.yatsenko5@gmail.com
2025-04-09 16:16:56 -07:00
Mykyta Yatsenko 37b1b3ed20 selftests/bpf: Support struct/union presets in veristat
Extend commit e3c9abd0d1 ("selftests/bpf: Implement setting global
variables in veristat") to support applying presets to members of
the global structs or unions in veristat.
For example:
```
./veristat set_global_vars.bpf.o  -G "union1.struct3.var_u8_h = 0xBB"
```

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250408104544.140317-1-mykyta.yatsenko5@gmail.com
2025-04-09 16:16:12 -07:00
Linus Torvalds 3b07108ada linux_kselftest-fixes-6.15-rc2
Fixes tpm2, futex, and mincore tests. Creates a dedicated .gitignore
 for tpm2
 
 Details:
 
 selftests: tpm2: test_smoke: use POSIX-conformant expression operator
 selftests/futex: futex_waitv wouldblock test should fail
 selftests: tpm2: create a dedicated .gitignore
 selftests/mincore: Allow read-ahead pages to reach the end of the file
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmf271sACgkQCwJExA0N
 QxwvzxAAxdXGz11IfpCzIXQFZvAdcl+LYWEhfZ8hZg9O3xjLT3ULhzJ7MXVeGME1
 UagfjOeY9ZWMKspj2bIt3BRs5figTklTb/KAZ9fRv+3BHkflCF3TZNoJQTQGghi0
 7oMfQt6lwVIEOvLc6XZY4+xRabIPW57TAkzaQkUhSlyC68jwch90MC1JHCUibC+t
 92zc2VkVzdA9rmyiNXtnd9vTpsJyKGTXipUr+JsMlsoewd0U0bBu0nrxrgldwcEQ
 J7yi3OnuTmXIlQMTEdc3EDdXEEdKBK9+uGFHNQ1zQNgv46l6Wl9Px880iqPrb8Gx
 gJ6g2KNR30zXOFqaGIIANfUnlrnCV/nFFVHQB5k64oMDYonCX74JLtBBbNOpwgim
 dyxzMmCi5ldI6o1FsjJxfOCQPQfNnLMJsCM8LQhuVcU0c6TbTTmjY671A2oqMU42
 mKDDJg5ecNWNRbdwbgQnl7YSX4tuzBtgQhlGjUzk1mlwSGIMUJ/ep0D+bGh7OPQk
 4/p3Js3XPW+lu+gyxPuQkHstQS/esFuFZ9nZrgrdTBLM7wzngYEvQ8mOji4vpuFT
 f0WBRm29aIL8nz9oVtsYjYJxE+DayNk8foAAsFj4h0Ppo8Fg4wJ6IxLgWD5njfSa
 Z4RBMqVVvYrx58pwgxQ5nyyIpeWqQKFUxK6dAZKnvOj+J8/Xsvs=
 =KUYx
 -----END PGP SIGNATURE-----

Merge tag 'linux_kselftest-fixes-6.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kselftest fixes from Shuah Khan:

 - Fixes tpm2, futex, and mincore tests

 - Create a dedicated .gitignore for tpm2 tests

* tag 'linux_kselftest-fixes-6.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests/mincore: Allow read-ahead pages to reach the end of the file
  selftests/futex: futex_waitv wouldblock test should fail
  selftests: tpm2: test_smoke: use POSIX-conformant expression operator
  selftests: tpm2: create a dedicated .gitignore
2025-04-09 16:02:44 -07:00
Ingo Molnar 78a84fbfa4 Linux 6.15-rc1
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmfy3/YeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG/ygIAItY5dzf5fVnVEPy
 UrF+EzIaWGWRw3N+41AyT5X7z77FPX7E0cA6MD4KxfWW/OYzeAoeZSyrM2xIsEh3
 26qiohvJjpHjfHdzvKmxNItvW8+xBv3km00U/CWWqJo89JsIVnJtrSBHOut2/gNp
 f6sGoOrrR4GXXz8JX3yG/pmizr23lN81ZkVdz0ayYEK4uY92hSsBspvyFWcdffgF
 o8NCtR+JVGac8xm+f3VPSLyunLMXsh8NWETumMHP6tHQif36I3BQqeU8DgXCgjEK
 pfZ8gEyRtXIKbEt+qniUetT+2Cwu/lAN2GjTu0LqIe9Ro3HzjtotwQdk5h6kC+Lc
 BogxIs8=
 =bf5G
 -----END PGP SIGNATURE-----

Merge tag 'v6.15-rc1' into x86/mm, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-04-09 22:00:25 +02:00
Malaya Kumar Rout 60567e93c0 selftests/x86/lam: Fix clean up fds in do_uring() and allocate_dsa_pasid()
Resolve minor fd leaks reported by cppcheck in lam.c.

Specifically, the 'file_fd' and 'fd' were not closed in do_uring()
and allocate_dsa_pasid() functions, respectively.

cppcheck output before this patch:

  tools/testing/selftests/x86/lam.c:685:3: error: Resource leak: file_fd [resourceLeak]
  tools/testing/selftests/x86/lam.c:693:3: error: Resource leak: file_fd [resourceLeak]
  tools/testing/selftests/x86/lam.c:1195:2: error: Resource leak: fd [resourceLeak]

cppcheck output after this patch:

  No resource leaks found

While this is a standalone test tool that doesn't really leak anything
in practice, as exit() cleans it up all, clean up resources nevertheless.

[ mingo: Updated the changelog. ]

Signed-off-by: Malaya Kumar Rout <malayarout91@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20250409135341.28987-1-malayarout91@gmail.com
2025-04-09 21:30:37 +02:00
Koichiro Den 6d7f0c1103 selftests: gpio: add test cases for gpio-aggregator
Add a set of tests for gpio-aggregator module. This test covers both
pre-existing new_device/delete_device interface and new configfs-based
interface.

Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
Link: https://lore.kernel.org/r/20250407043019.4105613-10-koichiro.den@canonical.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-04-09 16:57:30 +02:00
Octavian Purdila 26e705184e selftests/tc-testing: sfq: check that a derived limit of 1 is rejected
Because the limit is updated indirectly when other parameters are
updated, there are cases where even though the user requests a limit
of 2 it can actually be set to 1.

Add the following test cases to check that the kernel rejects them:
- limit 2 depth 1 flows 1
- limit 2 depth 1 divisor 1

Signed-off-by: Octavian Purdila <tavip@google.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2025-04-09 12:55:48 +01:00
Linus Torvalds a245882457 linux_kselftest-kunit-6.15-rc2
Fixes tool to report test count in case of a late test plan when tests
 are specified before the test plan. Fixes spelling error in the commit
 that went into 6.15-rc1.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmf1oWUACgkQCwJExA0N
 QxzvqRAAwnXdGUavaH2cqUQ9I6RAD/ncbn1p9RdsZrD6Yk47Wn/0HEh0NPko2clq
 36I6SXa30ev2kX5dJ2AVwPTKKHVYFQlpULd6LENhXRBCCiDHdvLK/JVT9nfAza7u
 oh5/MWG0CSzdwsP4XB+aWNCgzezT0n9Tzdo/wTF0vlEHAYwYQfDxLZcNgJ6CxnLJ
 njNxWhqidjGsUT/aNkCpO+mDx66jNbFnPizZzzsbacd6LHtuG9y2pLsxSuLwDHgc
 RjgTnttuUJiyFuxKqp5/ert9PHTHVlRHVJGaFGdPdhzA0kbWGFC2iruhzQi6Li7z
 HQ0giHlQ5L0RHYGpws2gbMqQ8O206Q1Xbpu6FjRRqbCAtXOtiQ4B1LberFKT2ax2
 4OR4YmutgjgP9LJYYH5ATl5H50PaeXwYB+qC0/+33ihnjiO6PCN283bYuX5LXFRJ
 6K/SqVO1MVvWueWFPQxakDDY29W1B1fMp9hVAyKuhFqTFFxZn82MCSa7P8+LGfFJ
 hsIwLalBH0noXgnhdmk8p6i81VcYZg7HCAQmcJKmtSBHFnS8TFYCrt1CTiKReLXR
 L3vGuoRuP/I/HOWoNc3b0KJKz+RluWbmCHbZatpUOhFn8nGfBBTvHDK9TrbzM0lz
 PZJW4KlCw2ebnCYb0m4TMirRtpc8qmf58yopvCz3xO1MSGx6FNU=
 =VdOJ
 -----END PGP SIGNATURE-----

Merge tag 'linux_kselftest-kunit-6.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kunit fixes from Shuah Khan:

 - Fix the tool to report test count in case of a late test plan when
   tests are specified before the test plan

 - Fix spelling error

* tag 'linux_kselftest-kunit-6.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  kunit: Spelling s/slowm/slow/
  kunit: tool: fix count of tests if late test plan
2025-04-08 17:16:43 -07:00
Matthieu Baerts (NGI0) 6767698cf9 selftests: mptcp: validate MPJoin HMacFailure counters
The parent commit fixes an issue around these counters where one of them
-- MPJoinAckHMacFailure -- was wrongly incremented in some cases.

This makes sure the counter is always 0. It should be incremented only
in case of corruption, or a wrong implementation, which should not be
the case in these selftests.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250407-net-mptcp-hmac-failure-mib-v1-2-3c9ecd0a3a50@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-08 16:16:17 -07:00
Victor Nogueira 5ac40e6b5b selftests: tc-testing: Pre-load IFE action and its submodules
Recently we had some issues in parallel TDC where some of IFE tests are
failing due to some of IFE's submodules (like act_meta_skbtcindex and
act_meta_skbprio) taking too long to load [1]. To avoid that issue,
pre-load IFE and all its submodules before running any of the tests in
tdc.sh

[1] https://lore.kernel.org/netdev/e909b2a0-244e-4141-9fa9-1b7d96ab7d71@mojatatu.com/T/#u

Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Link: https://patch.msgid.link/20250407215656.2535990-1-victor@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-08 16:15:52 -07:00
Qiuxu Zhuo 197c1eaa7b selftests/mincore: Allow read-ahead pages to reach the end of the file
When running the mincore_selftest on a system with an XFS file system, it
failed the "check_file_mmap" test case due to the read-ahead pages reaching
the end of the file. The failure log is as below:

   RUN           global.check_file_mmap ...
  mincore_selftest.c:264:check_file_mmap:Expected i (1024) < vec_size (1024)
  mincore_selftest.c:265:check_file_mmap:Read-ahead pages reached the end of the file
  check_file_mmap: Test failed
           FAIL  global.check_file_mmap

This is because the read-ahead window size of the XFS file system on this
machine is 4 MB, which is larger than the size from the #PF address to the
end of the file. As a result, all the pages for this file are populated.

  blockdev --getra /dev/nvme0n1p5
    8192
  blockdev --getbsz /dev/nvme0n1p5
    512

This issue can be fixed by extending the current FILE_SIZE 4MB to a larger
number, but it will still fail if the read-ahead window size of the file
system is larger enough. Additionally, in the real world, read-ahead pages
reaching the end of the file can happen and is an expected behavior.
Therefore, allowing read-ahead pages to reach the end of the file is a
better choice for the "check_file_mmap" test case.

Link: https://lore.kernel.org/r/20250311080940.21413-1-qiuxu.zhuo@intel.com
Reported-by: Yi Lai <yi1.lai@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2025-04-08 17:08:50 -06:00
Edward Liaw 7d50e00fef selftests/futex: futex_waitv wouldblock test should fail
Testcase should fail if -EWOULDBLOCK is not returned when expected value
differs from actual value from the waiter.

Link: https://lore.kernel.org/r/20250404221225.1596324-1-edliaw@google.com
Fixes: 9d57f7c797 ("selftests: futex: Test sys_futex_waitv() wouldblock")
Signed-off-by: Edward Liaw <edliaw@google.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2025-04-08 16:50:19 -06:00
Rae Moar 14e594a1fc kunit: tool: fix count of tests if late test plan
Fix test count with late test plan.

For example,
  TAP version 13
  ok 1 test1
  1..4

Returns a count of 1 passed, 1 crashed (because it expects tests after
the test plan): returning the total count of 2 tests

Change this to be 1 passed, 1 error: total count of 1 test

Link: https://lore.kernel.org/r/20250319223351.1517262-1-rmoar@google.com
Signed-off-by: Rae Moar <rmoar@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <shuah@kernel.org>
2025-04-08 14:57:24 -06:00
Ahmed Salem 170ec11935 selftests: tpm2: test_smoke: use POSIX-conformant expression operator
Use POSIX-conformant expression operator symbol '='.

The use of the non POSIX-conformant symbol '==' would work
in bash, but not in sh where the unexpected operator error
would result in test_smoke.sh being skipped.

Instead of changing the shebang to use bash, which may not be
available on all systems, use the POSIX-conformant expression
symbol '=' to test for equality.

Without this patch:
===================
 # make -j8 TARGETS=tpm2 kselftest
 # selftests: tpm2: test_smoke.sh
 # ./test_smoke.sh: 9: [: 2: unexpected operator
 ok 1 selftests: tpm2: test_smoke.sh # SKIP

With this patch:
================
 # make -j8 TARGETS=tpm2 kselftest
 # selftests: tpm2: test_smoke.sh
 # Ran 9 tests in 9.236s
 ok 1 selftests: tpm2: test_smoke.sh

Link: https://lore.kernel.org/r/37ztyakgrrtgvec344mg7mspchwjpxxtsprtjidso3pwkmm4f4@awsa5mzgqmtb
Signed-off-by: Ahmed Salem <x0rw3ll@gmail.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2025-04-08 14:56:13 -06:00
Khaled Elnaggar 5cd2950359 selftests: tpm2: create a dedicated .gitignore
The tpm2 selftests produce two logs: SpaceTest.log and
AsyncTest.log. Only SpaceTest.log was listed in selftests/.gitignore,
while AsyncTest.log remained untracked.

This change creates a dedicated .gitignore in the tpm2/ directory to
manage these entries, keeping tpm2-specific patterns isolated from
parent .gitignore.

Fixed white-space errors during commit
Shuah Khan <skhan@linuxfoundation.org>

Link: https://lore.kernel.org/r/20250126195147.902608-1-khaledelnaggarlinux@gmail.com
Signed-off-by: Khaled Elnaggar <khaledelnaggarlinux@gmail.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2025-04-08 14:56:13 -06:00
Linus Torvalds 0e8863244e ARM:
* Rework heuristics for resolving the fault IPA (HPFAR_EL2 v. re-walk
   stage-1 page tables) to align with the architecture. This avoids
   possibly taking an SEA at EL2 on the page table walk or using an
   architecturally UNKNOWN fault IPA.
 
 * Use acquire/release semantics in the KVM FF-A proxy to avoid reading
   a stale value for the FF-A version.
 
 * Fix KVM guest driver to match PV CPUID hypercall ABI.
 
 * Use Inner Shareable Normal Write-Back mappings at stage-1 in KVM
   selftests, which is the only memory type for which atomic
   instructions are architecturally guaranteed to work.
 
 s390:
 
 * Don't use %pK for debug printing and tracepoints.
 
 x86:
 
 * Use a separate subclass when acquiring KVM's per-CPU posted interrupts
   wakeup lock in the scheduled out path, i.e. when adding a vCPU on
   the list of vCPUs to wake, to workaround a false positive deadlock.
   The schedule out code runs with a scheduler lock that the wakeup
   handler takes in the opposite order; but it does so with IRQs disabled
   and cannot run concurrently with a wakeup.
 
 * Explicitly zero-initialize on-stack CPUID unions
 
 * Allow building irqbypass.ko as as module when kvm.ko is a module
 
 * Wrap relatively expensive sanity check with KVM_PROVE_MMU
 
 * Acquire SRCU in KVM_GET_MP_STATE to protect guest memory accesses
 
 selftests:
 
 * Add more scenarios to the MONITOR/MWAIT test.
 
 * Add option to rseq test to override /dev/cpu_dma_latency
 
 * Bring list of exit reasons up to date
 
 * Cleanup Makefile to list once tests that are valid on all architectures
 
 Other:
 
 * Documentation fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmf083IUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroN1dgf/QwfpZcHoMNQSnrc1jMy2LHrArln2
 XfmsOGZTU7kyoLQsLWGAPNocOveGdiemTDsj5ZXoNMnqV8hCBr+tZuv2gWI1rr/o
 kiGerdIgSZ9piTjBlJkVAaOzbWhg2DUnr7qVVzEzFY9+rPNyQ81vgAfU7h56KhYB
 optecozmBrHHAxvQZwmPeL9UyPWFjOF1BY/8LTMx7X+aVuCX6qx1JqO3a3ylAw4J
 tGXv6qFJfuCnu1d1b4X0ILce0iMUTOjQzvTcIm+BKjYycecl+3j1aczC/BOorIgc
 mf0+XeauhcTduK73pirnvx2b05eOxntgkOpwJytO2RP6pE0uK+2Th/C3Qg==
 =ba/Y
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "ARM:

   - Rework heuristics for resolving the fault IPA (HPFAR_EL2 v. re-walk
     stage-1 page tables) to align with the architecture. This avoids
     possibly taking an SEA at EL2 on the page table walk or using an
     architecturally UNKNOWN fault IPA

   - Use acquire/release semantics in the KVM FF-A proxy to avoid
     reading a stale value for the FF-A version

   - Fix KVM guest driver to match PV CPUID hypercall ABI

   - Use Inner Shareable Normal Write-Back mappings at stage-1 in KVM
     selftests, which is the only memory type for which atomic
     instructions are architecturally guaranteed to work

  s390:

   - Don't use %pK for debug printing and tracepoints

  x86:

   - Use a separate subclass when acquiring KVM's per-CPU posted
     interrupts wakeup lock in the scheduled out path, i.e. when adding
     a vCPU on the list of vCPUs to wake, to workaround a false positive
     deadlock. The schedule out code runs with a scheduler lock that the
     wakeup handler takes in the opposite order; but it does so with
     IRQs disabled and cannot run concurrently with a wakeup

   - Explicitly zero-initialize on-stack CPUID unions

   - Allow building irqbypass.ko as as module when kvm.ko is a module

   - Wrap relatively expensive sanity check with KVM_PROVE_MMU

   - Acquire SRCU in KVM_GET_MP_STATE to protect guest memory accesses

  selftests:

   - Add more scenarios to the MONITOR/MWAIT test

   - Add option to rseq test to override /dev/cpu_dma_latency

   - Bring list of exit reasons up to date

   - Cleanup Makefile to list once tests that are valid on all
     architectures

  Other:

   - Documentation fixes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (26 commits)
  KVM: arm64: Use acquire/release to communicate FF-A version negotiation
  KVM: arm64: selftests: Explicitly set the page attrs to Inner-Shareable
  KVM: arm64: selftests: Introduce and use hardware-definition macros
  KVM: VMX: Use separate subclasses for PI wakeup lock to squash false positive
  KVM: VMX: Assert that IRQs are disabled when putting vCPU on PI wakeup list
  KVM: x86: Explicitly zero-initialize on-stack CPUID unions
  KVM: Allow building irqbypass.ko as as module when kvm.ko is a module
  KVM: x86/mmu: Wrap sanity check on number of TDP MMU pages with KVM_PROVE_MMU
  KVM: selftests: Add option to rseq test to override /dev/cpu_dma_latency
  KVM: x86: Acquire SRCU in KVM_GET_MP_STATE to protect guest memory accesses
  Documentation: kvm: remove KVM_CAP_MIPS_TE
  Documentation: kvm: organize capabilities in the right section
  Documentation: kvm: fix some definition lists
  Documentation: kvm: drop "Capability" heading from capabilities
  Documentation: kvm: give correct name for KVM_CAP_SPAPR_MULTITCE
  Documentation: KVM: KVM_GET_SUPPORTED_CPUID now exposes TSC_DEADLINE
  selftests: kvm: list once tests that are valid on all architectures
  selftests: kvm: bring list of exit reasons up to date
  selftests: kvm: revamp MONITOR/MWAIT tests
  KVM: arm64: Don't translate FAR if invalid/unsafe
  ...
2025-04-08 13:47:55 -07:00
Linus Torvalds e37f72b3b4 cgroup: Fixes for v6.15-rc1
- A number of cpuset remote partition related fixes and cleanups along with
   selftest updates.
 
 - A change from this merge window made cgroup_rstat_updated_list() called
   outside cgroup_rstat_lock leading to list corruptions. Fix it by
   relocating the call inside the lock.
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZ/QMSQ4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGebUAP0bdg/hIX5OjhREbaDKWoUyAHnHqMdg3Dvngvhp
 d9aOqQD/b1jdVfDINFtb2qjOpizPjyI0ycQxrr9K3DrSYmUAKAs=
 =hFhq
 -----END PGP SIGNATURE-----

Merge tag 'cgroup-for-6.15-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup fixes from Tejun Heo:

 - A number of cpuset remote partition related fixes and cleanups along
   with selftest updates.

 - A change from this merge window made cgroup_rstat_updated_list()
   called outside cgroup_rstat_lock leading to list corruptions. Fix it
   by relocating the call inside the lock.

* tag 'cgroup-for-6.15-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup/cpuset: Fix race between newly created partition and dying one
  cgroup: rstat: call cgroup_rstat_updated_list with cgroup_rstat_lock
  selftest/cgroup: Add a remote partition transition test to test_cpuset_prs.sh
  selftest/cgroup: Clean up and restructure test_cpuset_prs.sh
  selftest/cgroup: Update test_cpuset_prs.sh to use | as effective CPUs and state separator
  cgroup/cpuset: Remove unneeded goto in sched_partition_write() and rename it
  cgroup/cpuset: Code cleanup and comment update
  cgroup/cpuset: Don't allow creation of local partition over a remote one
  cgroup/cpuset: Remove remote_partition_check() & make update_cpumasks_hier() handle remote partition
  cgroup/cpuset: Fix error handling in remote_partition_disable()
  cgroup/cpuset: Fix incorrect isolated_cpus update in update_parent_effective_cpumask()
2025-04-08 12:15:05 -07:00
Linus Torvalds 97c484ccb8 CRC cleanups for 6.15
Finish cleaning up the CRC kconfig options by removing the remaining
 unnecessary prompts and an unnecessary 'default y', removing
 CONFIG_LIBCRC32C, and documenting all the CRC library options.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCZ/P7QhQcZWJpZ2dlcnNA
 Z29vZ2xlLmNvbQAKCRDzXCl4vpKOKyoOAQCynFcS1dWuD27S+SdUREmBjMAoZo5M
 zdsIvlPv9KLycgD/QX5lXjW3KIYY6jQ8vHUuLVwfDl/JEp4GJS9dLGU+agg=
 =0R1T
 -----END PGP SIGNATURE-----

Merge tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux

Pull CRC cleanups from Eric Biggers:
 "Finish cleaning up the CRC kconfig options by removing the remaining
  unnecessary prompts and an unnecessary 'default y', removing
  CONFIG_LIBCRC32C, and documenting all the CRC library options"

* tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
  lib/crc: remove CONFIG_LIBCRC32C
  lib/crc: document all the CRC library kconfig options
  lib/crc: remove unnecessary prompt for CONFIG_CRC_ITU_T
  lib/crc: remove unnecessary prompt for CONFIG_CRC_T10DIF
  lib/crc: remove unnecessary prompt for CONFIG_CRC16
  lib/crc: remove unnecessary prompt for CONFIG_CRC_CCITT
  lib/crc: remove unnecessary prompt for CONFIG_CRC32 and drop 'default y'
2025-04-08 12:09:28 -07:00
Paul E. McKenney 75d8bf48a8 rcutorture: Make srcu_lockdep.sh check reader-conflict handling
Mixing different flavors of RCU readers is forbidden, for example, you
should not use srcu_read_lock() and srcu_read_lock_nmisafe() on the same
srcu_struct structure.  There are checks for this, but these checks are
not tested on a regular basis.  This commit therefore adds such tests
to srcu_lockdep.sh.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
2025-04-08 14:55:38 -04:00
Paul E. McKenney 31b7ce3d98 rcutorture: Make srcu_lockdep.sh check kernel Kconfig
The srcu_lockdep.sh currently blindly trusts the rcutorture SRCU-P
scenario to build its kernel with lockdep enabled.  Of course, this
dependency might not be obvious to someone rebalancing SRCU scenarios.
This commit therefore adds code to srcu_lockdep.sh that verifies that
the .config file has lockdep enabled.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
2025-04-08 14:55:38 -04:00
Paolo Bonzini c478032df0 KVM/arm64: First batch of fixes for 6.15
- Rework heuristics for resolving the fault IPA (HPFAR_EL2 v. re-walk
    stage-1 page tables) to align with the architecture. This avoids
    possibly taking an SEA at EL2 on the page table walk or using an
    architecturally UNKNOWN fault IPA.
 
  - Use acquire/release semantics in the KVM FF-A proxy to avoid reading
    a stale value for the FF-A version.
 
  - Fix KVM guest driver to match PV CPUID hypercall ABI.
 
  - Use Inner Shareable Normal Write-Back mappings at stage-1 in KVM
    selftests, which is the only memory type for which atomic
    instructions are architecturally guaranteed to work.
 -----BEGIN PGP SIGNATURE-----
 
 iI0EABYIADUWIQSNXHjWXuzMZutrKNKivnWIJHzdFgUCZ/RO9hccb2xpdmVyLnVw
 dG9uQGxpbnV4LmRldgAKCRCivnWIJHzdFmRuAP0YajO4qHJe1vHtCkamuPnEY0Kp
 E+t2TwPafPbrPdQ1PgEAq6lHuSdUnid1r/uhRKIT+ywW8tE97eNwQAa1LFma0Ac=
 =d4G5
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-fixes-6.15-1' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64: First batch of fixes for 6.15

 - Rework heuristics for resolving the fault IPA (HPFAR_EL2 v. re-walk
   stage-1 page tables) to align with the architecture. This avoids
   possibly taking an SEA at EL2 on the page table walk or using an
   architecturally UNKNOWN fault IPA.

 - Use acquire/release semantics in the KVM FF-A proxy to avoid reading
   a stale value for the FF-A version.

 - Fix KVM guest driver to match PV CPUID hypercall ABI.

 - Use Inner Shareable Normal Write-Back mappings at stage-1 in KVM
   selftests, which is the only memory type for which atomic
   instructions are architecturally guaranteed to work.
2025-04-08 05:49:31 -04:00
Jakub Kicinski a1328a671e selftests: tls: check that disconnect does nothing
"Inspired" by syzbot test, pre-queue some data, disconnect()
and try to receive(). This used to trigger a warning in TLS's strp.
Now we expect the disconnect() to have almost no effect.

Link: https://lore.kernel.org/67e6be74.050a0220.2f068f.007e.GAE@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20250404180334.3224206-2-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-08 11:38:49 +02:00
Cong Wang ce94507f5f selftests/tc-testing: Add a test case for FQ_CODEL with ETS parent
Add a test case for FQ_CODEL with ETS parent to verify packet drop
behavior when the queue becomes empty. This helps ensure proper
notification mechanisms between qdiscs.

Note this is best-effort, it is hard to play with those parameters
perfectly to always trigger ->qlen_notify().

Cc: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Link: https://patch.msgid.link/20250403211636.166257-6-xiyou.wangcong@gmail.com
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-08 10:58:16 +02:00
Cong Wang 0d5c27ecb6 selftests/tc-testing: Add a test case for FQ_CODEL with DRR parent
Add a test case for FQ_CODEL with DRR parent to verify packet drop
behavior when the queue becomes empty. This helps ensure proper
notification mechanisms between qdiscs.

Note this is best-effort, it is hard to play with those parameters
perfectly to always trigger ->qlen_notify().

Cc: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Link: https://patch.msgid.link/20250403211636.166257-5-xiyou.wangcong@gmail.com
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-08 10:58:16 +02:00
Cong Wang 72b05c1bf7 selftests/tc-testing: Add a test case for FQ_CODEL with HFSC parent
Add a test case for FQ_CODEL with HFSC parent to verify packet drop
behavior when the queue becomes empty. This helps ensure proper
notification mechanisms between qdiscs.

Note this is best-effort, it is hard to play with those parameters
perfectly to always trigger ->qlen_notify().

Cc: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Link: https://patch.msgid.link/20250403211636.166257-4-xiyou.wangcong@gmail.com
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-08 10:58:15 +02:00
Cong Wang 4cb1837ac5 selftests/tc-testing: Add a test case for FQ_CODEL with QFQ parent
Add a test case for FQ_CODEL with QFQ parent to verify packet drop
behavior when the queue becomes empty. This helps ensure proper
notification mechanisms between qdiscs.

Note this is best-effort, it is hard to play with those parameters
perfectly to always trigger ->qlen_notify().

Cc: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Link: https://patch.msgid.link/20250403211636.166257-3-xiyou.wangcong@gmail.com
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-08 10:58:15 +02:00
Cong Wang cbe9588b12 selftests/tc-testing: Add a test case for FQ_CODEL with HTB parent
Add a test case for FQ_CODEL with HTB parent to verify packet drop
behavior when the queue becomes empty. This helps ensure proper
notification mechanisms between qdiscs.

Note this is best-effort, it is hard to play with those parameters
perfectly to always trigger ->qlen_notify().

Cc: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Link: https://patch.msgid.link/20250403211636.166257-2-xiyou.wangcong@gmail.com
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-08 10:58:15 +02:00
Taehee Yoo 22d3a63d53 selftests: drv-net: test random value for hds-thresh
hds.py has been testing 0(set_hds_thresh_zero()),
MAX(set_hds_thresh_max()), GT(set_hds_thresh_gt()) values for hds-thresh.
However if a hds-thresh value was already 0, set_hds_thresh_zero()
can't test properly.
So, it tests random value first and then tests 0, MAX, GT values.

Testing bnxt:
    TAP version 13
    1..13
    ok 1 hds.get_hds
    ok 2 hds.get_hds_thresh
    ok 3 hds.set_hds_disable # SKIP disabling of HDS not supported by
    the device
    ok 4 hds.set_hds_enable
    ok 5 hds.set_hds_thresh_random
    ok 6 hds.set_hds_thresh_zero
    ok 7 hds.set_hds_thresh_max
    ok 8 hds.set_hds_thresh_gt
    ok 9 hds.set_xdp
    ok 10 hds.enabled_set_xdp
    ok 11 hds.ioctl
    ok 12 hds.ioctl_set_xdp
    ok 13 hds.ioctl_enabled_set_xdp
    # Totals: pass:12 fail:0 xfail:0 xpass:0 skip:1 error:0

Testing lo:
    TAP version 13
    1..13
    ok 1 hds.get_hds # SKIP tcp-data-split not supported by device
    ok 2 hds.get_hds_thresh # SKIP hds-thresh not supported by device
    ok 3 hds.set_hds_disable # SKIP ring-set not supported by the device
    ok 4 hds.set_hds_enable # SKIP ring-set not supported by the device
    ok 5 hds.set_hds_thresh_random # SKIP hds-thresh not supported by
    device
    ok 6 hds.set_hds_thresh_zero # SKIP ring-set not supported by the
    device
    ok 7 hds.set_hds_thresh_max # SKIP hds-thresh not supported by
    device
    ok 8 hds.set_hds_thresh_gt # SKIP hds-thresh not supported by device
    ok 9 hds.set_xdp # SKIP tcp-data-split not supported by device
    ok 10 hds.enabled_set_xdp # SKIP tcp-data-split not supported by
    device
    ok 11 hds.ioctl # SKIP tcp-data-split not supported by device
    ok 12 hds.ioctl_set_xdp # SKIP tcp-data-split not supported by
    device
    ok 13 hds.ioctl_enabled_set_xdp # SKIP tcp-data-split not supported
    by device
    # Totals: pass:0 fail:0 xfail:0 xpass:0 skip:13 error:0

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Link: https://patch.msgid.link/20250404122126.1555648-3-ap420073@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-07 11:00:00 -07:00
Andrea Righi 01d541baed selftests/sched_ext: Add test for scx_bpf_select_cpu_and()
Add a selftest to validate the behavior of the built-in idle CPU
selection policy applied to a subset of allowed CPUs, using
scx_bpf_select_cpu_and().

Signed-off-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-04-07 07:13:52 -10:00
Christian Brauner 25a6cc9a63
selftests/filesystems: add open() test for anonymous inodes
Test that anonymous inodes cannot be open()ed.

Link: https://lore.kernel.org/20250407-work-anon_inode-v1-9-53a44c20d44e@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-07 16:20:15 +02:00
Christian Brauner f8ca403ae7
selftests/filesystems: add exec() test for anonymous inodes
Test that anonymous inodes cannot be exec()ed.

Link: https://lore.kernel.org/20250407-work-anon_inode-v1-8-53a44c20d44e@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-07 16:20:14 +02:00
Christian Brauner fcf31ec7ca
selftests/filesystems: add chmod() test for anonymous inodes
Test that anonymous inodes cannot be chmod()ed.

Link: https://lore.kernel.org/20250407-work-anon_inode-v1-7-53a44c20d44e@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-07 16:20:14 +02:00
Christian Brauner c784159750
selftests/filesystems: add chown() test for anonymous inodes
Test that anonymous inodes cannot be chown()ed.

Link: https://lore.kernel.org/20250407-work-anon_inode-v1-6-53a44c20d44e@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-07 16:20:14 +02:00
Christian Brauner 4fc3f73c16
selftest/pidfd: add test for thread-group leader pidfd open for thread
Verify that we report ENOENT when userspace tries to create a
thread-group leader pidfd for a thread pidfd that isn't a thread-group
leader.

Link: https://lore.kernel.org/r/20250403-work-pidfd-fixes-v1-4-a123b6ed6716@kernel.org
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-07 09:38:24 +02:00
Christian Brauner 76d2d75ddc
selftests/pidfd: adapt to recent changes
Adapt to changes in commit 9133607de3 ("exit: fix the usage of
delay_group_leader->exit_code in do_notify_parent() and pidfs_exit()").

Even if the thread-group leader exited early and succesfully it's exit
status will only be reported once the whole thread-group has exited and
it will share the exit code of the thread-group. So if the thread-group
was SIGKILLed the thread-group leader will also be reported as having
been SIGKILLed.

Link: https://lore.kernel.org/r/20250403-work-pidfd-fixes-v1-1-a123b6ed6716@kernel.org
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-07 09:38:24 +02:00
Wei Yang 3b394dff15 memblock tests: add test for memblock_set_node
Add a test to check memblock_set_node() behavior.

And create a corner case in which the memblock.reserved array is doubled
during memblock_set_node(). And finally make sure all regions in
memblock.reserved are with valid node id.

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
CC: Mike Rapoport <rppt@kernel.org>
CC: Yajun Deng <yajun.deng@linux.dev>
Link: https://lore.kernel.org/r/20250318071948.23854-4-richard.weiyang@gmail.com
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
2025-04-07 09:28:01 +03:00
Masami Hiramatsu (Google) ed471e1984 memblock tests: Fix mutex related build error
Fix mutex and free_reserved_area() related build errors which have
been introduced by commit 74e2498ccf ("mm/memblock: Add reserved
memory release function").

Fixes: 74e2498ccf ("mm/memblock: Add reserved memory release function")
Reported-by: Wei Yang <richard.weiyang@gmail.com>
Closes: https://lore.kernel.org/all/20250405023018.g2ae52nrz2757b3n@master/
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/174399023133.47537.7375975856054461445.stgit@devnote2
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
2025-04-07 09:28:01 +03:00
Raghavendra Rao Ananta c8631ea59b KVM: arm64: selftests: Explicitly set the page attrs to Inner-Shareable
Atomic instructions such as 'ldset' in the guest have been observed to
cause an EL1 data abort with FSC 0x35 (IMPLEMENTATION DEFINED fault
(Unsupported Exclusive or Atomic access)) on Neoverse-N3.

Per DDI0487L.a B2.2.6, atomic instructions are only architecturally
guaranteed for Inner/Outer Shareable Normal Write-Back memory. For
anything else the behavior is IMPLEMENTATION DEFINED and can lose
atomicity, or, in this case, generate an abort.

It would appear that selftests sets up the stage-1 mappings as Non
Shareable, leading to the observed abort. Explicitly set the
Shareability field to Inner Shareable for non-LPA2 page tables. Note
that for the LPA2 page table format, translations for cacheable memory
inherit the shareability attribute of the PTW, i.e. TCR_ELx.SH{0,1}.

Suggested-by: Oliver Upton <oupton@google.com>
Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Link: https://lore.kernel.org/r/20250405001042.1470552-3-rananta@google.com
[oliver: Rephrase changelog]
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-04-06 11:13:50 -07:00
Raghavendra Rao Ananta d8d78398e5 KVM: arm64: selftests: Introduce and use hardware-definition macros
The kvm selftest library for arm64 currently configures the hardware
fields, such as shift and mask in the page-table entries and registers,
directly with numbers. While it add comments at places, it's better to
rewrite them with appropriate macros to improve the readability and
reduce the risk of errors. Hence, introduce macros to define the
hardware fields and use them in the arm64 processor library.

Most of the definitions are primary copied from the Linux's header,
arch/arm64/include/asm/pgtable-hwdef.h.

No functional change intended.

Suggested-by: Oliver Upton <oupton@google.com>
Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Link: https://lore.kernel.org/r/20250405001042.1470552-2-rananta@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-04-06 11:13:41 -07:00
Kumar Kartikeya Dwivedi 9bae8f4f21 selftests/bpf: Make res_spin_lock test less verbose
Currently, the res_spin_lock test is too chatty as it constantly prints
the test_run results for each iteration in each thread, so in case
verbose output is requested or things go wrong, it will flood the logs
of CI and other systems with repeated messages that offer no valuable
insight. Reduce this by doing assertions when the condition actually
flips, and proceed to break out and exit the threads. We still assert
to mark the test as failed and print the expected and reported values.

Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250403220841.66654-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-04-04 22:56:32 -07:00
JP Kobryn a97915559f cgroup: change rstat function signatures from cgroup-based to css-based
This non-functional change serves as preparation for moving to
subsystem-based rstat trees. To simplify future commits, change the
signatures of existing cgroup-based rstat functions to become css-based and
rename them to reflect that.

Though the signatures have changed, the implementations have not. Within
these functions use the css->cgroup pointer to obtain the associated cgroup
and allow code to function the same just as it did before this patch. At
applicable call sites, pass the subsystem-specific css pointer as an
argument or pass a pointer to cgroup::self if not in subsystem context.

Note that cgroup_rstat_updated_list() and cgroup_rstat_push_children()
are not altered yet since there would be a larger amount of css to
cgroup conversions which may overcomplicate the code at this
intermediate phase.

Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-04-04 10:06:25 -10:00
Eric Biggers a6d0dbba95 lib/crc: remove unnecessary prompt for CONFIG_CRC_T10DIF
All modules that need CONFIG_CRC_T10DIF already select it, so there is no
need to bother users about the option.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250401221600.24878-5-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-04-04 11:31:42 -07:00
Linus Torvalds 4a1d8ababd RISC-V Patches for the 6.15 Merge Window, Part 1
* The sub-architecture selection Kconfig system has been cleaned up,
   the documentation has been improved, and various detections have been
   fixed.
 * The vector-related extensions dependencies are now validated when
   parsing from device tree and in the DT bindings.
 * Misaligned access probing can be overridden via a kernel command-line
   parameter, along with various fixes to misalign access handling.
 * Support for relocatable !MMU kernels builds.
 * Support for hpge pfnmaps, which should improve TLB utilization.
 * Support for runtime constants, which improves the d_hash()
   performance.
 * Support for bfloat16, Zicbom, Zaamo, Zalrsc, Zicntr, Zihpm.
 * Various fixes, including:
       - We were missing a secondary mmu notifier call when flushing the
 	tlb which is required for IOMMU.
       - Fix ftrace panics by saving the registers as expected by ftrace.
       - Fix a couple of stimecmp usage related to cpu hotplug.
       - purgatory_start is now aligned as per the STVEC requirements.
       - A fix for hugetlb when calculating the size of non-present PTEs.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmfv/soTHHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRAuExnzX7sYierZEACDwI9lJFCEbQPon3z8rAy1moTj0+AZ
 bMfZFqMphUTrJ0cMm2+Bc+XZgck12zHCyu1UljDcZVYMCHA9aOoj5C5NkBBVLCuL
 uLYrhIoQXtJaVIANiFl0SHAZmh2s2OoSgmUzrEZ8JGlHpKCF7EVX5bHEsOvzn9ir
 B2W992W6q3ISuKXHKsTpa7rmTtf7swGYg6zW3pX3l6HmY+EMEQOcQl0tAB383J/T
 lm0K4+YvLpRJdm2ARpNGWlcFXj9/UXUM5hplK3aBAHpPKQ5/83/4tMDsfRvhpEVC
 VJXNgK+H4XLD542aQ8d4ZROguyhwn9e2n6Dkv0OqfNk4lg5pUBcJUZftQ+rB7AWg
 VYB1KVpxhwcruheXJFz8S3EzjZTcS+JrcD80vvx8JmHdXkZwHTfYUgiFwe/TR7yr
 b518fEbXpVwDZiCbaAe3Cmpw0mlNnSVmU4hgNbiwt0fu9DGdPN9WQbyds68RKb7A
 TWwDmmD6kV2BTWl0mHPtu9VhX58CDG+0WYbHA7r82p2T50187766C92GYfN2UPpz
 lH0iMRDkmucclZ3fEoosJ+HsDntc4oe6Bhdzuj52Q7vBpDd/QB6t5cfrlDpEEdgU
 3qoWMN5mb5l1rbvrqENh5ZgmEpzV8K0R5F5quiXh/9wO0y1kepDslTqC2oXK/m0p
 DzsvvD6UnNMOUQ==
 =nCJo
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-6.15-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V updates from Palmer Dabbelt:

 - The sub-architecture selection Kconfig system has been cleaned up,
   the documentation has been improved, and various detections have been
   fixed

 - The vector-related extensions dependencies are now validated when
   parsing from device tree and in the DT bindings

 - Misaligned access probing can be overridden via a kernel command-line
   parameter, along with various fixes to misalign access handling

 - Support for relocatable !MMU kernels builds

 - Support for hpge pfnmaps, which should improve TLB utilization

 - Support for runtime constants, which improves the d_hash()
   performance

 - Support for bfloat16, Zicbom, Zaamo, Zalrsc, Zicntr, Zihpm

 - Various fixes, including:
      - We were missing a secondary mmu notifier call when flushing the
        tlb which is required for IOMMU
      - Fix ftrace panics by saving the registers as expected by ftrace
      - Fix a couple of stimecmp usage related to cpu hotplug
      - purgatory_start is now aligned as per the STVEC requirements
      - A fix for hugetlb when calculating the size of non-present PTEs

* tag 'riscv-for-linus-6.15-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (65 commits)
  riscv: Add norvc after .option arch in runtime const
  riscv: Make sure toolchain supports zba before using zba instructions
  riscv/purgatory: 4B align purgatory_start
  riscv/kexec_file: Handle R_RISCV_64 in purgatory relocator
  selftests: riscv: fix v_exec_initval_nolibc.c
  riscv: Fix hugetlb retrieval of number of ptes in case of !present pte
  riscv: print hartid on bringup
  riscv: Add norvc after .option arch in runtime const
  riscv: Remove CONFIG_PAGE_OFFSET
  riscv: Support CONFIG_RELOCATABLE on riscv32
  asm-generic: Always define Elf_Rel and Elf_Rela
  riscv: Support CONFIG_RELOCATABLE on NOMMU
  riscv: Allow NOMMU kernels to access all of RAM
  riscv: Remove duplicate CONFIG_PAGE_OFFSET definition
  RISC-V: errata: Use medany for relocatable builds
  dt-bindings: riscv: document vector crypto requirements
  dt-bindings: riscv: add vector sub-extension dependencies
  dt-bindings: riscv: d requires f
  RISC-V: add f & d extension validation checks
  RISC-V: add vector crypto extension validation checks
  ...
2025-04-04 09:49:17 -07:00
Linus Torvalds 61f96e684e Including fixes from netfilter.
Current release - regressions:
 
  - 4 fixes for the netdev per-instance locking
 
 Current release - new code bugs:
 
  - consolidate more code between existing Rx zero-copy and uring so that
    the latter doesn't miss / have to duplicate the safety checks
 
 Previous releases - regressions:
 
  - ipv6: fix omitted Netlink attributes when using SKIP_STATS
 
 Previous releases - always broken:
 
  - net: fix geneve_opt length integer overflow
 
  - udp: fix multiple wrap arounds of sk->sk_rmem_alloc when it
    approaches INT_MAX
 
  - dsa: mvpp2: add a lock to avoid corruption of the shared TCAM
 
  - dsa: airoha: fix issues with traffic QoS configuration / offload,
    and flow table offload
 
 Misc:
 
  - touch up the Netlink YAML specs of old families to make them usable
    for user space C codegen
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmfv+nEACgkQMUZtbf5S
 Irs01A//d20bpdDVz2sRLdAzzIaGDLdOmw4T92e9eW7WkhUSGNAG7vZCv5lanIxq
 toQMLAahyJrMdizGPLfhH+csz3eQMwYUwlIRXNfJTEdk9o/+naWdtzbPDJdjcAu/
 jRwOKx44JtbXIwmzFe/vNwP8ex+JMZqjvdcCZcJONc4XVpHeAeKbPsd9c8aX8DR2
 pSMR/3mpAHXFd54mFVUSEDXCZBClpAT0sjZ4RMt3pZKELp+8N2AAi0nFt9r0W+YB
 ZPhYX2hSJ+msuUa24jeBHWhrxvV/PVbKDg7S58F6+Us2hDKyYx9k6IEQeadntd9c
 EzZSboSgzjf1ew6Yuitv1o9b/C1NCdzflES7kXgibFGUJ+6bP2pv5bgOc4mDhTz4
 zeY9EqxguN1dpFX+Y7gyCQcUe/6UACi6Y4h1aCmdZkCoenf9FsJPoeSWWqmttDNN
 5DEx3szJZKY+O4okmfpCFJ1SnfEe9E4Ek/+s6aIWNXu6C3EsnX6Q8Kj4Qz74UuLP
 LpGFCqRwpDLyfqZIEaX6Ed6sWykLg6TWU0/B2jWmFyQ/KQCCjhL79iaDllAMOOoT
 hN5sJAUiHk1QoMBW37nEu/WYWX5vqCVhltJBfPVtVS9dgJQChDCp/mrJ9ZJi3wof
 FyPeLaOh9N6IhR+L4Iipvuu/94dfPHtj8o5dnPkrh1fwxueUFFI=
 =phQZ
 -----END PGP SIGNATURE-----

Merge tag 'net-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from netfilter.

  Current release - regressions:

   - four fixes for the netdev per-instance locking

  Current release - new code bugs:

   - consolidate more code between existing Rx zero-copy and uring so
     that the latter doesn't miss / have to duplicate the safety checks

  Previous releases - regressions:

   - ipv6: fix omitted Netlink attributes when using SKIP_STATS

  Previous releases - always broken:

   - net: fix geneve_opt length integer overflow

   - udp: fix multiple wrap arounds of sk->sk_rmem_alloc when it
     approaches INT_MAX

   - dsa: mvpp2: add a lock to avoid corruption of the shared TCAM

   - dsa: airoha: fix issues with traffic QoS configuration / offload,
     and flow table offload

  Misc:

   - touch up the Netlink YAML specs of old families to make them usable
     for user space C codegen"

* tag 'net-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (56 commits)
  selftests: net: amt: indicate progress in the stress test
  netlink: specs: rt_route: pull the ifa- prefix out of the names
  netlink: specs: rt_addr: pull the ifa- prefix out of the names
  netlink: specs: rt_addr: fix get multi command name
  netlink: specs: rt_addr: fix the spec format / schema failures
  net: avoid false positive warnings in __net_mp_close_rxq()
  net: move mp dev config validation to __net_mp_open_rxq()
  net: ibmveth: make veth_pool_store stop hanging
  arcnet: Add NULL check in com20020pci_probe()
  ipv6: Do not consider link down nexthops in path selection
  ipv6: Start path selection from the first nexthop
  usbnet:fix NPE during rx_complete
  net: octeontx2: Handle XDP_ABORTED and XDP invalid as XDP_DROP
  net: fix geneve_opt length integer overflow
  io_uring/zcrx: fix selftests w/ updated netdev Python helpers
  selftests: net: use netdevsim in netns test
  docs: net: document netdev notifier expectations
  net: dummy: request ops lock
  netdevsim: add dummy device notifiers
  net: rename rtnl_net_debug to lock_debug
  ...
2025-04-04 09:15:35 -07:00
Chen Ni c966139485 selftests/bpf: Convert comma to semicolon
Replace comma between expressions with semicolons.

Using a ',' in place of a ';' can have unintended side effects.
Although that is not the case here, it is seems best to use ';'
unless ',' is intended.

Found by inspection.
No functional change intended.
Compile tested only.

Signed-off-by: Chen Ni <nichen@iscas.ac.cn>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Amery Hung <ameryhung@gmail.com>
Link: https://lore.kernel.org/bpf/20250401061546.1990156-1-nichen@iscas.ac.cn
2025-04-04 08:53:57 -07:00
Anton Protopopov dafae1ae2a libbpf: Add likely/unlikely macros and use them in selftests
A few selftests and, more importantly, consequent changes to the
bpf_helpers.h file, use likely/unlikely macros, so define them here
and remove duplicate definitions from existing selftests.

Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250331203618.1973691-3-a.s.protopopov@gmail.com
2025-04-04 08:53:24 -07:00
Jakub Kicinski 94f68c0f99 selftests: net: amt: indicate progress in the stress test
Our CI expects output from the test at least once every 10 minutes.
The AMT test when running on debug kernel is just on the edge
of that time for the stress test. Improve the output:
 - print the name of the test first, before starting it,
 - output a dot every 10% of the way.

Output after:

  TEST: amt discovery                                                 [ OK ]
  TEST: IPv4 amt multicast forwarding                                 [ OK ]
  TEST: IPv6 amt multicast forwarding                                 [ OK ]
  TEST: IPv4 amt traffic forwarding torture               ..........  [ OK ]
  TEST: IPv6 amt traffic forwarding torture               ..........  [ OK ]

Reviewed-by: Taehee Yoo <ap420073@gmail.com>
Link: https://patch.msgid.link/20250403145636.2891166-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-04 08:02:09 -07:00
Jakub Kicinski 0c8e30252d netlink: specs: rt_addr: pull the ifa- prefix out of the names
YAML specs don't normally include the C prefix name in the name
of the YAML attr. Remove the ifa- prefix from all attributes
in addr-attrs and specify name-prefix instead.

This is a bit risky, hopefully there aren't many users out there.

Fixes: dfb0f7d9d9 ("doc/netlink: Add spec for rt addr messages")
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250403013706.2828322-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-04 07:36:06 -07:00
Jakub Kicinski 524c03585f netlink: specs: rt_addr: fix get multi command name
Command names should match C defines, codegens may depend on it.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Fixes: 4f280376e5 ("selftests/net: Add selftest for IPv4 RTM_GETMULTICAST support")
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250403013706.2828322-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-04 07:36:06 -07:00
Paolo Bonzini 369348e1d8 Merge branch 'kvm-6.15-rc2-fixes' into HEAD 2025-04-04 07:16:54 -04:00
Sean Christopherson 0297cdc12a KVM: selftests: Add option to rseq test to override /dev/cpu_dma_latency
Add a "-l <latency>" param to the rseq test so that the user can override
/dev/cpu_dma_latency, as described by the test's suggested workaround for
not being able to complete enough migrations.

cpu_dma_latency is not a normal file, even as far as procfs files go.
Writes to cpu_dma_latency only persist so long as the file is open, e.g.
so that the kernel automatically reverts back to a power-optimized state
once the sensitive workload completes.  Provide the necessary functionality
instead of effectively forcing the user to write a non-obvious wrapper.

Cc: Dongsheng Zhang <dongsheng.x.zhang@intel.com>
Cc: Zide Chen <zide.chen@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20250401142238.819487-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-04-04 07:07:39 -04:00
Paolo Bonzini c57047f6f3 selftests: kvm: list once tests that are valid on all architectures
Several tests cover infrastructure from virt/kvm/ and userspace APIs that have
only minimal requirements from architecture-specific code.  As such, they are
available on all architectures that have libkvm support, and this presumably
will apply also in the future (for example if loongarch gets selftests support).
Put them in a separate variable and list them only once.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20250401141327.785520-1-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-04-04 06:23:25 -04:00
Paolo Bonzini 11934771e7 selftests: kvm: bring list of exit reasons up to date
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20250331221851.614582-1-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-04-04 06:22:17 -04:00
Paolo Bonzini 80fd663590 selftests: kvm: revamp MONITOR/MWAIT tests
Run each testcase in a separate VMs to cover more possibilities;
move WRMSR close to MONITOR/MWAIT to test updating CPUID bits
while in the VM.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-04-04 06:20:27 -04:00
Ming Lei 72070e57b0 selftests: ublk: fix test_stripe_04
Commit 57ed58c132 ("selftests: ublk: enable zero copy for stripe target")
added test entry of test_stripe_04, but forgot to add the test script.

So fix the test by adding the script file.

Reported-by: Uday Shankar <ushankar@purestorage.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Uday Shankar <ushankar@purestorage.com>
Link: https://lore.kernel.org/r/20250404001849.1443064-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-03 20:13:38 -06:00
Linus Torvalds 7930edcc3a io_uring-6.15-20250403
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmfvB5IQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpgnAEACYhVXXsIsoRqzhp2OTDEUPU8OLxe285h1s
 3u+erGnFiWUAxs54fbJjGCzqQSeAIcr2EuAWEyMdhixTwykoAdrlnKZzuH7jnHZ4
 FVkvOVlP6+7lRshJzr5FEsg5I7HxNMvFmnVKHoAsJRVqfiYbrRbIzSQokiWPE3lb
 a6ptr0itZgXjWftUIIioG6JBYuvZLY0OAqRQPvONB784hSwKxaqTBIY4LaWv+TBb
 SIwe26lXONSGtbQ7g/Vpkta6edcymEqiU1V4XPmjlal0DCMmN9jXY6gTBi/8JfwA
 DITSsDzXNvjAJfXclYzPk8sbNHlgZIha0+yqkVTQpxK8M9gahAAI1Z8TzkjXAG/G
 ttsXKq4DfP9tvytsaOc/blO/NSXiPLcPEmr9m1GYFL3c95tpYwhjSEZwcsyXvRbK
 Ooc3+s6ZYBWT8xO0pS1kRArYTILr6MR6fNbWO0H+f5+kNHSjXWl+0PwuYCpI5UW9
 z9BXVNqM6des7XI+7JRuaLhXoRTn6s5Wz8o4IP/xaNsKQLaosiRDjUxoAE/ZXmNx
 IvR+GCJsLU9xTOd97TIHuj+zfgnRbUiheRbCfpUCoTnmzkcWdqJhPCajtIEus5dI
 g5t2R5TnjRQObgOLTD0DVC2kdDjsOx1y0uDxBUNTBdUR+7TINX7jlkKurmSJMvEC
 MZp77CFcDA==
 =kf65
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-6.15-20250403' of git://git.kernel.dk/linux

Pull more io_uring updates from Jens Axboe:
 "Set of fixes/updates for io_uring that should go into this release.

  The ublk bits could've gone via either tree - usually I put them in
  block, but they got a bit mixed this series with the zero-copy
  supported that ended up dipping into both trees.

  This contains:

   - Fix for sendmsg zc, include in pinned pages accounting like we do
     for the other zc types

   - Series for ublk fixing request aborting, doing various little
     cleanups, fixing some zc issues, and adding queue_rqs support

   - Another ublk series doing some code cleanups

   - Series cleaning up the io_uring send path, mostly in preparation
     for registered buffers

   - Series doing little MSG_RING cleanups

   - Fix for the newly added zc rx, fixing len being 0 for the last
     invocation of the callback

   - Add vectored registered buffer support for ublk. With that, then
     ublk also supports this feature in the kernel revision where it
     could generically introduced for rw/net

   - A bunch of selftest additions for ublk. This is the majority of the
     diffstat

   - Silence a KCSAN data race warning for io-wq

   - Various little cleanups and fixes"

* tag 'io_uring-6.15-20250403' of git://git.kernel.dk/linux: (44 commits)
  io_uring: always do atomic put from iowq
  selftests: ublk: enable zero copy for stripe target
  io_uring: support vectored kernel fixed buffer
  block: add for_each_mp_bvec()
  io_uring: add validate_fixed_range() for validate fixed buffer
  selftests: ublk: kublk: fix an error log line
  selftests: ublk: kublk: use ioctl-encoded opcodes
  io_uring/zcrx: return early from io_zcrx_recv_skb if readlen is 0
  io_uring/net: avoid import_ubuf for regvec send
  io_uring/rsrc: check size when importing reg buffer
  io_uring: cleanup {g,s]etsockopt sqe reading
  io_uring: hide caches sqes from drivers
  io_uring: make zcrx depend on CONFIG_IO_URING
  io_uring: add req flag invariant build assertion
  Documentation: ublk: remove dead footnote
  selftests: ublk: specify io_cmd_buf pointer type
  ublk: specify io_cmd_buf pointer type
  io_uring: don't pass ctx to tw add remote helper
  io_uring/msg: initialise msg request opcode
  io_uring/msg: rename io_double_lock_ctx()
  ...
2025-04-03 15:48:58 -07:00
David Wei c0f21784bc io_uring/zcrx: fix selftests w/ updated netdev Python helpers
Fix io_uring zero copy rx selftest with updated netdev Python helpers.

Signed-off-by: David Wei <dw@davidwei.uk>
Link: https://patch.msgid.link/20250402172414.895276-1-dw@davidwei.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-03 15:45:30 -07:00
Stanislav Fomichev 56c8a23f8a selftests: net: use netdevsim in netns test
Netdevsim has extra register_netdevice_notifier_dev_net notifiers,
use netdevim instead of dummy device to test them out.

Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250401163452.622454-9-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-03 15:32:09 -07:00
Linus Torvalds 5916a6fbc0 RTC for 6.15
Core:
  - setdate is removed as it has better replacements
  - skip alarms with a second resolution when we know the RTC doesn't support
    those.
 
 Subsystem:
  - remove unnecessary private struct members
  - use devm_pm_set_wake_irq were relevant
 
 Drivers:
  - ds1307: stop disabling alarms on probe for DS1337, DS1339, DS1341 and DS3231
  - max31335: add max31331 support
  - pcf50633 is removed as support for the related SoC has been removed
  - pcf85063: properly handle POR failures
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEBqsFVZXh8s/0O5JiY6TcMGxwOjIFAmftwMgACgkQY6TcMGxw
 OjKk1g//dwDeinPyXC8+isDYCf07U6xy1OmvZcQSqG1qL64rONY1KoUKVq92mQUs
 1+rijwIJMi4MNJqaCwYHAEhvXxYJNoQLcV3uHkqVjrRfGQY0Mgl4/pLALGNp6P4U
 QixG8qJiVzAMolTVUozqp/amTc0zztFT6Fnr1EbrLkx0JZX5D09Na5pgdbvoBFX3
 pH5kxYQotpBD8x8CUHFU0oz8dEeSAbISEVJKX1Ct9xTqhYX9/OB92jQvvg46STPU
 2J6n9Yl9eH77itX8GmaDyNyKIIzAZktWuZofiPkni090W/H+uVIdSo0/pHRinvsA
 hoZBLc9CjUDFfqK9uuFOszl1lW/zpVkLdz3VR9OYxjIFzDk5KKnAK9g669VIBm3P
 yPlYQL9TzW+uacpzdN7YhUW0Oy5opRcYjjCrTg5znTtFFplxrMucveyXb8wAP3DG
 m68C1LJzCOxzYAqnfzh59UYSr+JexDJgEH1u79d0GYFrTXZ4mY6i94yva4lIOofX
 uaUuTOsUyQY4ZxMEXw2FlUzvSmQBaALj7ycMFmSBWYa5efI7UWZ2r2HZ0jX30HQU
 m+bG/+eMMZq9gYsiCrYxo0J+afpZ0lTRmyecnuCqP79rYPTCrOui/2n3fZryGfjV
 cDUYbFl5VbsYpwT/sBUrsKU8e9YYb85ACep3WbRlnS7YI5qkM+Y=
 =QJeg
 -----END PGP SIGNATURE-----

Merge tag 'rtc-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux

Pull RTC updates from Alexandre Belloni:
 "We see a net reduction of the number of lines of code thanks to the
  removal of a now unused driver and a testing tool that is not used
  anymore. Apart from this, the max31335 driver gets support for a new
  part number and pm8xxx gets UEFI support.

  Core:

   - setdate is removed as it has better replacements

   - skip alarms with a second resolution when we know the RTC doesn't
     support those.

  Subsystem:

   - remove unnecessary private struct members

   - use devm_pm_set_wake_irq were relevant

  Drivers:

   - ds1307: stop disabling alarms on probe for DS1337, DS1339, DS1341
     and DS3231

   - max31335: add max31331 support

   - pcf50633 is removed as support for the related SoC has been removed

   - pcf85063: properly handle POR failures"

* tag 'rtc-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (50 commits)
  rtc: remove 'setdate' test program
  selftest: rtc: skip some tests if the alarm only supports minutes
  rtc: mt6397: drop unused defines
  rtc: pcf85063: replace dev_err+return with return dev_err_probe
  rtc: pcf85063: do a SW reset if POR failed
  rtc: max31335: Add driver support for max31331
  dt-bindings: rtc: max31335: Add max31331 support
  rtc: cros-ec: Avoid a couple of -Wflex-array-member-not-at-end warnings
  dt-bindings: rtc: pcf2127: Reference spi-peripheral-props.yaml
  rtc: rzn1: implement one-second accuracy for alarms
  rtc: pcf50633: Remove
  rtc: pm8xxx: implement qcom,no-alarm flag for non-HLOS owned alarm
  rtc: pm8xxx: mitigate flash wear
  rtc: pm8xxx: add support for uefi offset
  dt-bindings: rtc: qcom-pm8xxx: document qcom,no-alarm flag
  rtc: rv3032: drop WADA
  rtc: rv3032: fix EERD location
  rtc: pm8xxx: switch to devm_device_init_wakeup
  rtc: pm8xxx: fix possible race condition
  rtc: mpfs: switch to devm_device_init_wakeup
  ...
2025-04-03 15:31:14 -07:00
Dmitry Safonov e5ddf19dbc net/selftests: Add loopback link local route for self-connect
self-connect-ipv6 got slightly flaky on netdev:
> # timeout set to 120
> # selftests: net/tcp_ao: self-connect_ipv6
> # 1..5
> # # 708[lib/setup.c:250] rand seed 1742872572
> # TAP version 13
> # # 708[lib/proc.c:213]    Snmp6            Ip6OutNoRoutes: 0 => 1
> # not ok 1 # error 708[self-connect.c:70] failed to connect()
> # ok 2 No unexpected trace events during the test run
> # # Planned tests != run tests (5 != 2)
> # # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:1
> ok 1 selftests: net/tcp_ao: self-connect_ipv6

I can not reproduce it on my machines, but judging by "Ip6OutNoRoutes"
there is no route to the local_addr (::1).

Looking at the kernel code, I see that kernel does add link-local
address automatically in init_loopback(), but that is called from
ipv6 notifier block. So, in turn the userspace that brought up
the loopback interface may see rtnetlink ACK earlier than
addrconf_notify() does it's job (at least, on a slow VM such as netdev).
Probably, for ipv4 it's the same, judging by inetdev_event().

The fix is quite simple: set the link-local route straight after
bringing the loopback interface. That will make it synchronous.

Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20250402-tcp-ao-selfconnect-flake-v1-1-8388d629ef3d@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-03 15:08:31 -07:00
Linus Torvalds 531a62f223 bpf-fixes
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmfupgEACgkQ6rmadz2v
 bTqZlw/+KBnC0B+DYRzPTfFAbmezVwnPIIclk2MxMV4MPGv05lG3HcjLaNmZPNSz
 QPC6vonmgiOI1cHxzy235wleT9qrOtxRzk27kvfZfMXRbxyyHUQMaOYdVaEksSHb
 UPjHH+vSHeVNt1ILIMIg5NhmJPzbUKZWqRBjPhF6Ihv0xkiIFa47gPl2Y2ERFVbS
 ywhjofh5OQ+BiE34Kd2MTsyeVvblrJEsGBgjbgAaVKe6A7Ja0z6aaiVOexuAN4nJ
 CJ7aRNLtXlCUlQQ2a+OhcmF+je4fHniftxpDtA0FbzjX2L2m/OcoGhd2fn0OpLvV
 DjoAJEk0d8Wp7rNUAA4VeaeJhyxtoQChvMV2zxL29o9zhqbHmBIFOwTdcKHKFnSv
 TQTnNYnTY3b6ZkKyCvKmUjrKeTOW3Qxe17uVvuuKB5Ivne4l+HksSW/wxSwek1dq
 ZBg3NbL85tCOrYT5oCWsBnxWqWgwi7g6C0cki4vp+MgqjQwSp/A1RxLhIPyLpWpr
 AUljaYfO3DlKFugXKebe4Qtsw+2zFzn0FHQ0n+Sd4P6xjW8ntSBgwXMNLfTmm2+U
 XW9Fu4JBcyrY4TuQvsxLvN5ZNokEy7wIrD1mtUnAE/RcNNkf/lePl6MbKf7ZDhNA
 lBgq3dCyaN7eUAfET3Zffp06SMlRt3TEln1NL9XYOBTBXqqJlHs=
 =/mR3
 -----END PGP SIGNATURE-----

Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Pull bpf fixes from Alexei Starovoitov:

 - Fix BPF selftests expectations of assembler output and struct layout
   (Song Liu and Yonghong Song)

 - Fix XSK error code when queue is full (Wang Liang)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests/bpf: Fix verifier_private_stack test failure
  selftests/bpf: Fix verifier_bpf_fastcall test
  selftests/bpf: Fix tests after fields reorder in struct file
  xsk: Fix __xsk_generic_xmit() error code when cq is full
2025-04-03 11:55:41 -07:00
Linus Torvalds 8c7c1b5506 - The 2 patch series "mm: fixes for fallouts from mem_init() cleanup"
from Mike Rapoport fixes a couple of issues with the just-merged "arch,
   mm: reduce code duplication in mem_init()" series.
 
 - The 4 patch series "MAINTAINERS: add my isub-entries to MM part." from
   Mike Rapoport does some maintenance on MAINTAINERS.
 
 - The 6 patch series "remove tlb_remove_page_ptdesc()" from Qi Zheng
   does some cleanup work to the page mapping code.
 
 - The 7 patch series "mseal system mappings" from Jeff Xu permits
   sealing of "system mappings", such as vdso, vvar, vvar_vclock, vectors
   (arm compat-mode), sigpage (arm compat-mode).
 
 - Plus the usual shower of singleton patches.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ+4XpgAKCRDdBJ7gKXxA
 jnwtAP43Rp3zyWf034fEypea36xQqcsy4I7YUTdZEgnFS7LCZwEApM97JvGHsYEr
 Ns9Zhnh+E3RWASfOAzJoVZVrAaMovg4=
 =MyVR
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2025-04-02-22-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull more MM updates from Andrew Morton:

 - The series "mm: fixes for fallouts from mem_init() cleanup" from Mike
   Rapoport fixes a couple of issues with the just-merged "arch, mm:
   reduce code duplication in mem_init()" series

 - The series "MAINTAINERS: add my isub-entries to MM part." from Mike
   Rapoport does some maintenance on MAINTAINERS

 - The series "remove tlb_remove_page_ptdesc()" from Qi Zheng does some
   cleanup work to the page mapping code

 - The series "mseal system mappings" from Jeff Xu permits sealing of
   "system mappings", such as vdso, vvar, vvar_vclock, vectors (arm
   compat-mode), sigpage (arm compat-mode)

 - Plus the usual shower of singleton patches

* tag 'mm-stable-2025-04-02-22-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (31 commits)
  mseal sysmap: add arch-support txt
  mseal sysmap: enable s390
  selftest: test system mappings are sealed
  mseal sysmap: update mseal.rst
  mseal sysmap: uprobe mapping
  mseal sysmap: enable arm64
  mseal sysmap: enable x86-64
  mseal sysmap: generic vdso vvar mapping
  selftests: x86: test_mremap_vdso: skip if vdso is msealed
  mseal sysmap: kernel config and header change
  mm: pgtable: remove tlb_remove_page_ptdesc()
  x86: pgtable: convert to use tlb_remove_ptdesc()
  riscv: pgtable: unconditionally use tlb_remove_ptdesc()
  mm: pgtable: convert some architectures to use tlb_remove_ptdesc()
  mm: pgtable: change pt parameter of tlb_remove_ptdesc() to struct ptdesc*
  mm: pgtable: make generic tlb_remove_table() use struct ptdesc
  microblaze/mm: put mm_cmdline_setup() in .init.text section
  mm/memory_hotplug: fix call folio_test_large with tail page in do_migrate_range
  MAINTAINERS: mm: add entry for secretmem
  MAINTAINERS: mm: add entry for numa memblocks and numa emulation
  ...
2025-04-03 11:10:00 -07:00
Yonghong Song 3f8ad18f81 selftests/bpf: Fix verifier_private_stack test failure
Several verifier_private_stack tests failed with latest bpf-next.
For example, for 'Private stack, single prog' subtest, the
jitted code:
  func #0:
  0:      f3 0f 1e fa                             endbr64
  4:      0f 1f 44 00 00                          nopl    (%rax,%rax)
  9:      0f 1f 00                                nopl    (%rax)
  c:      55                                      pushq   %rbp
  d:      48 89 e5                                movq    %rsp, %rbp
  10:     f3 0f 1e fa                             endbr64
  14:     49 b9 58 74 8a 8f 7d 60 00 00           movabsq $0x607d8f8a7458, %r9
  1e:     65 4c 03 0c 25 28 c0 48 87              addq    %gs:-0x78b73fd8, %r9
  27:     bf 2a 00 00 00                          movl    $0x2a, %edi
  2c:     49 89 b9 00 ff ff ff                    movq    %rdi, -0x100(%r9)
  33:     31 c0                                   xorl    %eax, %eax
  35:     c9                                      leave
  36:     e9 20 5d 0f e1                          jmp     0xffffffffe10f5d5b

The insn 'addq %gs:-0x78b73fd8, %r9' does not match the expected
regex 'addq %gs:0x{{.*}}, %r9' and this caused test failure.

Fix it by changing '%gs:0x{{.*}}' to '%gs:{{.*}}' to accommodate the
possible negative offset. A few other subtests are fixed in a similar way.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250331033828.365077-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-04-02 21:55:44 -07:00
Song Liu 14d84357a0 selftests/bpf: Fix verifier_bpf_fastcall test
Commit [1] moves percpu data on x86 from address 0x000... to address
0xfff...

Before [1]:

159020: 0000000000030700     0 OBJECT  GLOBAL DEFAULT   23 pcpu_hot

After [1]:

152602: ffffffff83a3e034     4 OBJECT  GLOBAL DEFAULT   35 pcpu_hot

As a result, verifier_bpf_fastcall tests should now expect a negative
value for pcpu_hot, IOW, the disassemble should show "r=" instead of
"w=".

Fix this in the test.

Note that, a later change created a new variable "cpu_number" for
bpf_get_smp_processor_id() [2]. The inlining logic is updated properly
as part of this change, so there is no need to fix anything on the
kernel side.

[1] commit 9d7de2aa8b ("x86/percpu/64: Use relative percpu offsets")
[2] commit 01c7bc5198 ("x86/smp: Move cpu number to percpu hot section")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20250328193124.808784-1-song@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-04-02 21:55:43 -07:00
Song Liu 00387808d3 selftests/bpf: Fix tests after fields reorder in struct file
The change in struct file [1] moved f_ref to the 3rd cache line.
It made *(u64 *)file dereference invalid from the verifier point of view,
because btf_struct_walk() walks into f_lock field, which is 4-byte long.

Fix the selftests to deference the file pointer as a 4-byte access.

[1] commit e249056c91 ("fs: place f_ref to 3rd cache line in struct file to resolve false sharing")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20250327185528.1740787-1-song@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-04-02 21:55:43 -07:00
Linus Torvalds 01ecadbe09 cxl for v6.15
- Add support for Global Persistent Flush (GPF)
 - Cleanup of DPA partition metadata handling
 	- Remove the CXL_DECODER_MIXED enum that's not needed anymore
 	- Introduce helpers to access resource and perf meta data
 	- Introduce 'struct cxl_dpa_partition' and 'struct cxl_range_info'
 	- Make cxl_dpa_alloc() DPA partition number agnostic
 	- Remove cxl_decoder_mode
 	- Cleanup partition size and perf helpers
 - Remove unused CXL partition values
 - Add logging support for CXL CPER endpoint and port protocol errors
 	- Prefix protocol error struct and function names with cxl_
 	- Move protocol error definitions and structures to a common location
 	- Remove drivers/firmware/efi/cper_cxl.h to include/linux/cper.h
 	- Add support in GHES to process CXL CPER protocol errors
 	- Process CXL CPER protocol errors
 	- Add trace logging for CXL PCIe port RAS errors
 - Remove redundant gp_port init
 - Add validation of cxl device serial number
 - CXL ABI documentation updates/fixups
 - A series that uses guard() to clean up open coded mutex lockings and remove gotos for error
   handling.
 - Some followup patches to support dirty shutdown accounting
 	- Add helper to retrieve DVSEC offset for dirty shutdown registers
 	- Rename cxl_get_dirty_shutdown() to cxl_arm_dirty_shutdown()
 	- Add support for dirty shutdown count via sysfs
 	- cxl_test support for dirty shutdown
 - A series to support CXL mailbox Features commands. Mostly in preparation for CXL EDAC
   code to utilize the Features commands. It's also in preparation for CXL fwctl support
   to utilize the CXL Features. The commands include "Get Supported Features", "Get Feature",
   and "Set Feature".
 - A series to support extended linear cache support described by the ACPI HMAT table. The
   addition helps enumerate the cache and also provides additional RAS reporting support for
   configuration with extended linear cache. (and related fixes for the
   series).
 - An update to cxl_test to support a 3-way capable CFMWS.
 - A documentation fix to remove unused "mixed mode".
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE5DAy15EJMCV1R6v9YGjFFmlTOEoFAmfqtP4ACgkQYGjFFmlT
 OEqx9A//UsCWf1CH8bvjKXxSTlQmtPlNpcXe+gVR0sc5cL2VFxKf93AY8Zo1Br5A
 b40gtZJz9QwjwGwIvDiki9U2bopOyX3aMOyBJMYmLuL/irY8ENx2ra7ODbxe7uGn
 oZwpwG2sEGQxIAG2bCpVuCDIt8JjNvsTJo45TICs07w9TWTmH4Swpbz1g8VGpDz/
 kCQcXXHSHZleR5BzqVRKxjjqGEUFj2xDMzAI8VSL+7izMMoPLbjwnl2c1fwaLBPd
 iJTMboTXDj7eVMta/qqGkG7pshM81SnkSzy8cxImj3r4SRgRTZg9U8vhrR3K1kdH
 F05Ozd12tljtNXLWthENZPUbfcovy9oTxzMt/gVut7j6C7H3s3KCSbV7zhz5BmfD
 XcapOX4Cu7ptn88KLqE5a98oLuq2DXrLOcX5vKPYBfAO+68rC+gSAPSbzfZlSHa0
 1/TsxVvzDQUBVZWL94DeHvemyQb58GQBOypeNZbH8P4gAhWJqk3hZEO+wlSxpfd+
 R7wgabfKJUJ82KusCZHIW1Wg3/IrXb4yC+UyiObS5RgIJWpRmOkuJEHDvEUje+Dj
 aOWw/H3vZgeZnpW87FRxzvDJx1/0jZI1vsxH65m2wrvz6n5aGIA/Q6pgqCdU/m6c
 I231bl1bmZzJ8u3+vOZL4tFHcYHh4XCwQp+ZQt1uDa0fA5LbLhc=
 =ZME1
 -----END PGP SIGNATURE-----

Merge tag 'cxl-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl

Pull Compute Express Link (CXL)  updates from Dave Jiang:

 - Add support for Global Persistent Flush (GPF)

 - Cleanup of DPA partition metadata handling:
     - Remove the CXL_DECODER_MIXED enum that's not needed anymore
     - Introduce helpers to access resource and perf meta data
     - Introduce 'struct cxl_dpa_partition' and 'struct cxl_range_info'
     - Make cxl_dpa_alloc() DPA partition number agnostic
     - Remove cxl_decoder_mode
     - Cleanup partition size and perf helpers

 - Remove unused CXL partition values

 - Add logging support for CXL CPER endpoint and port protocol errors:
     - Prefix protocol error struct and function names with cxl_
     - Move protocol error definitions and structures to a common location
     - Remove drivers/firmware/efi/cper_cxl.h to include/linux/cper.h
     - Add support in GHES to process CXL CPER protocol errors
     - Process CXL CPER protocol errors
     - Add trace logging for CXL PCIe port RAS errors

 - Remove redundant gp_port init

 - Add validation of cxl device serial number

 - CXL ABI documentation updates/fixups

 - A series that uses guard() to clean up open coded mutex lockings and
   remove gotos for error handling.

 - Some followup patches to support dirty shutdown accounting:
     - Add helper to retrieve DVSEC offset for dirty shutdown registers
     - Rename cxl_get_dirty_shutdown() to cxl_arm_dirty_shutdown()
     - Add support for dirty shutdown count via sysfs
     - cxl_test support for dirty shutdown

 - A series to support CXL mailbox Features commands.

   Mostly in preparation for CXL EDAC code to utilize the Features
   commands. It's also in preparation for CXL fwctl support to utilize
   the CXL Features. The commands include "Get Supported Features", "Get
   Feature", and "Set Feature".

 - A series to support extended linear cache support described by the
   ACPI HMAT table.

   The addition helps enumerate the cache and also provides additional
   RAS reporting support for configuration with extended linear cache.
   (and related fixes for the series).

 - An update to cxl_test to support a 3-way capable CFMWS

 - A documentation fix to remove unused "mixed mode"

* tag 'cxl-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (39 commits)
  cxl/region: Fix the first aliased address miscalculation
  cxl/region: Quiet some dev_warn()s in extended linear cache setup
  cxl/Documentation: Remove 'mixed' from sysfs mode doc
  cxl: Fix warning from emitting resource_size_t as long long int on 32bit systems
  cxl/test: Define a CFMWS capable of a 3 way HB interleave
  cxl/mem: Do not return error if CONFIG_CXL_MCE unset
  tools/testing/cxl: Set Shutdown State support
  cxl/pmem: Export dirty shutdown count via sysfs
  cxl/pmem: Rename cxl_dirty_shutdown_state()
  cxl/pci: Introduce cxl_gpf_get_dvsec()
  cxl/pci: Support Global Persistent Flush (GPF)
  cxl: Document missing sysfs files
  cxl: Plug typos in ABI doc
  cxl/pmem: debug invalid serial number data
  cxl/cdat: Remove redundant gp_port initialization
  cxl/memdev: Remove unused partition values
  cxl/region: Drop goto pattern of construct_region()
  cxl/region: Drop goto pattern in cxl_dax_region_alloc()
  cxl/core: Use guard() to drop goto pattern of cxl_dpa_alloc()
  cxl/core: Use guard() to drop the goto pattern of cxl_dpa_free()
  ...
2025-04-02 20:04:43 -07:00
Pedro Tammela ca9e5d3d9a selftests: tc-testing: fix nat regex matching
In iproute 6.14, the nat ip mask logic was fixed to remove an undefined
behaviour[1]. So now instead of reporting '0.0.0.0/32' on x86 and potentially
'0.0.0.0/0' in other platforms, it reports '0.0.0.0/0' in all platforms.

[1] https://lore.kernel.org/netdev/20250306112520.188728-1-torben.nielsen@prevas.dk/

Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Link: https://patch.msgid.link/20250401144908.568140-1-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-02 16:34:38 -07:00
Linus Torvalds 4b06c990c1 vfs-6.15-rc1.fixes
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ+1ZqAAKCRCRxhvAZXjc
 oqO7AP4jdW03PHsk5zkbuUTzTTngkZQ1AypIJLTCYIoKPATMowD+ILMjOTsVOfxA
 h38ziAM3tubsz1pwkGuIlsU+drwz5Ao=
 =QfJV
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.15-rc1.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs fixes from Christian Brauner:

 - Add a new maintainer for configfs

 - Fix exportfs module description

 - Place flexible array memeber at the end of an internal struct in the
   mount code

 - Add new maintainer for netfslib as Jeff Layton is stepping down as
   current co-maintainer

 - Fix error handling in cachefiles_get_directory()

 - Cleanup do_notify_pidfd()

 - Fix syscall number definitions in pidfd selftests

 - Fix racy usage of fs_struct->in exec during multi-threaded exec

 - Ensure correct exit code is reported when pidfs_exit() is called from
   release_task() for a delayed thread-group leader exit

 - Fix conflicting iomap flag definitions

* tag 'vfs-6.15-rc1.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  iomap: Fix conflicting values of iomap flags
  fs: namespace: Avoid -Wflex-array-member-not-at-end warning
  MAINTAINERS: configfs: add Andreas Hindborg as maintainer
  exportfs: add module description
  exit: fix the usage of delay_group_leader->exit_code in do_notify_parent() and pidfs_exit()
  netfs: add Paulo as maintainer and remove myself as Reviewer
  cachefiles: Fix oops in vfs_mkdir from cachefiles_get_directory
  exec: fix the racy usage of fs_struct->in_exec
  selftests/pidfd: fixes syscall number defines
  pidfs: cleanup the usage of do_notify_pidfd()
2025-04-02 16:05:21 -07:00
Cong Wang 076c700988 selftests: tc-testing: Add TBF with SKBPRIO queue length corner case test
Add a test case to validate the interaction between TBF and SKBPRIO queueing
disciplines, specifically targeting queue length accounting corner cases.

This test complements the fix for the queue length accounting issue in the
SKBPRIO qdisc. This is still best-effort, as timing and manipulating enqueue
and dequeue from user-space is very hard.

Cc: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://patch.msgid.link/20250329222536.696204-3-xiyou.wangcong@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-02 16:03:33 -07:00
Ming Lei 57ed58c132 selftests: ublk: enable zero copy for stripe target
Use io_uring vectored fixed kernel buffer for handling stripe IO.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250325135155.935398-5-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-02 07:07:00 -06:00
Linus Torvalds acc4d5ff0b Rather tiny PR, mostly so that we can get into our trees your fix
to the x86 Makefile.
 
 Current release - regressions:
 
  - Revert "tcp: avoid atomic operations on sk->sk_rmem_alloc",
    error queue accounting was missed
 
 Current release - new code bugs:
 
  - 5 fixes for the netdevice instance locking work
 
 Previous releases - regressions:
 
  - usbnet: restore usb%d name exception for local mac addresses
 
 Previous releases - always broken:
 
  - rtnetlink: allocate vfinfo size for VF GUIDs when supported,
    avoid spurious GET_LINK failures
 
  - eth: mana: Switch to page pool for jumbo frames
 
  - phy: broadcom: Correct BCM5221 PHY model detection
 
 Misc:
 
  - selftests: drv-net: replace helpers for referring to other files
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmfrLiQACgkQMUZtbf5S
 IrvDsg//VH5VmkMau/TUF1WpwA03Wb/X0UqtM72lufVCMGYCogqjHZzs9popfhuu
 CCi4ZiBofEJHfN9WYJoumUL8aOdux0Q875o7h5gvfqKCokkUbJDk3W32QiLj1RqZ
 L14TorNOyS9Tg9m60pLa/6H3WS0kduhUGLuLzgTmOtT/YVJrOGmBtk/tRXFAKclH
 f3CPucvZGS5Fr4HfUc3yWXiocjibDJnv+jE+xW6S6YHpghvsK7anUvqIGTqQV8Vp
 s6AuvFULGO00968hFHcO0N8f8MnaCCJr2bcHCOXyjssdEbdvDOqzhFN4KhxWEPbK
 kCl3rLkPdkXo+ekOC7gIxXXaKVz3IVm28pegtEws8fda/iLuqZTp+BKo7kdrf3Iy
 br0rP/iK3eFN0M1XpVUIbmEuJ6VamztCzK88uvDKdI+Ol3GLAfy9v5NmkbzJI+aE
 cw+SyE6NgbeDeHBxvOu2F7G2sWMBkTEGaHMNXCv7I/VAvQsbk48onTVnpA+GlFD5
 vFqMxiZHLBUfFOfUcHxmw8KAkZ44pc1xEkpw/4s8GZOfq+1oWz2LrSQ8M8Hjs2VN
 NTuE44OOsBDPOJ54iDAIOr5jyF+ZDeWEuPbvWbHGri80gII+2iF2788N2ToyAkyR
 R0t4VtJwXF+8D6IxTG41OIvAuq9Zc8AqM6O7VnOyFhZtKqezBTI=
 =BDoM
 -----END PGP SIGNATURE-----

Merge tag 'net-6.15-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Rather tiny pull request, mostly so that we can get into our trees
  your fix to the x86 Makefile.

  Current release - regressions:

   - Revert "tcp: avoid atomic operations on sk->sk_rmem_alloc", error
     queue accounting was missed

  Current release - new code bugs:

   - 5 fixes for the netdevice instance locking work

  Previous releases - regressions:

   - usbnet: restore usb%d name exception for local mac addresses

  Previous releases - always broken:

   - rtnetlink: allocate vfinfo size for VF GUIDs when supported, avoid
     spurious GET_LINK failures

   - eth: mana: Switch to page pool for jumbo frames

   - phy: broadcom: Correct BCM5221 PHY model detection

  Misc:

   - selftests: drv-net: replace helpers for referring to other files"

* tag 'net-6.15-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (22 commits)
  Revert "tcp: avoid atomic operations on sk->sk_rmem_alloc"
  bnxt_en: bring back rtnl lock in bnxt_shutdown
  eth: gve: add missing netdev locks on reset and shutdown paths
  selftests: mptcp: ignore mptcp_diag binary
  selftests: mptcp: close fd_in before returning in main_loop
  selftests: mptcp: fix incorrect fd checks in main_loop
  mptcp: fix NULL pointer in can_accept_new_subflow
  octeontx2-af: Free NIX_AF_INT_VEC_GEN irq
  octeontx2-af: Fix mbox INTR handler when num VFs > 64
  net: fix use-after-free in the netdev_nl_sock_priv_destroy()
  selftests: net: use Path helpers in ping
  selftests: net: use the dummy bpf from net/lib
  selftests: drv-net: replace the rpath helper with Path objects
  net: lapbether: use netdev_lockdep_set_classes() helper
  net: phy: broadcom: Correct BCM5221 PHY model detection
  net: usb: usbnet: restore usb%d name exception for local mac addresses
  net/mlx5e: SHAMPO, Make reserved size independent of page size
  net: mana: Switch to page pool for jumbo frames
  MAINTAINERS: Add dedicated entries for phy_link_topology
  net: move replay logic to tc_modify_qdisc
  ...
2025-04-01 20:00:51 -07:00
Linus Torvalds 48552153cf iommufd 6.15 merge window pull
Two significant new items:
 
 - Allow reporting IOMMU HW events to userspace when the events are clearly
   linked to a device. This is linked to the VIOMMU object and is intended to
   be used by a VMM to forward HW events to the virtual machine as part of
   emulating a vIOMMU. ARM SMMUv3 is the first driver to use this
   mechanism. Like the existing fault events the data is delivered through
   a simple FD returning event records on read().
 
 - PASID support in VFIO. "Process Address Space ID" is a PCI feature that
   allows the device to tag all PCI DMA operations with an ID. The IOMMU
   will then use the ID to select a unique translation for those DMAs. This
   is part of Intel's vIOMMU support as VT-D HW requires the hypervisor to
   manage each PASID entry. The support is generic so any VFIO user could
   attach any translation to a PASID, and the support should work on ARM
   SMMUv3 as well. AMD requires additional driver work.
 
 Some minor updates, along with fixes:
 
 - Prevent using nested parents with fault's, no driver support today
 
 - Put a single "cookie_type" value in the iommu_domain to indicate what
   owns the various opaque owner fields
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZ+q6NgAKCRCFwuHvBreF
 YZ3zAQDbl4/Z0O+CLN2AXq4Zeiyq1HTSoF94hzqmm7lQ17zTIwD8CCdyLXHvupaq
 tkBIv5IovpaxlrSk6M0kh2K8vPCk9Qk=
 =CIM3
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd

Pull iommufd updates from Jason Gunthorpe:
 "Two significant new items:

   - Allow reporting IOMMU HW events to userspace when the events are
     clearly linked to a device.

     This is linked to the VIOMMU object and is intended to be used by a
     VMM to forward HW events to the virtual machine as part of
     emulating a vIOMMU. ARM SMMUv3 is the first driver to use this
     mechanism. Like the existing fault events the data is delivered
     through a simple FD returning event records on read().

   - PASID support in VFIO.

     The "Process Address Space ID" is a PCI feature that allows the
     device to tag all PCI DMA operations with an ID. The IOMMU will
     then use the ID to select a unique translation for those DMAs. This
     is part of Intel's vIOMMU support as VT-D HW requires the
     hypervisor to manage each PASID entry.

     The support is generic so any VFIO user could attach any
     translation to a PASID, and the support should work on ARM SMMUv3
     as well. AMD requires additional driver work.

  Some minor updates, along with fixes:

   - Prevent using nested parents with fault's, no driver support today

   - Put a single "cookie_type" value in the iommu_domain to indicate
     what owns the various opaque owner fields"

* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (49 commits)
  iommufd: Test attach before detaching pasid
  iommufd: Fix iommu_vevent_header tables markup
  iommu: Convert unreachable() to BUG()
  iommufd: Balance veventq->num_events inc/dec
  iommufd: Initialize the flags of vevent in iommufd_viommu_report_event()
  iommufd/selftest: Add coverage for reporting max_pasid_log2 via IOMMU_HW_INFO
  iommufd: Extend IOMMU_GET_HW_INFO to report PASID capability
  vfio: VFIO_DEVICE_[AT|DE]TACH_IOMMUFD_PT support pasid
  vfio-iommufd: Support pasid [at|de]tach for physical VFIO devices
  ida: Add ida_find_first_range()
  iommufd/selftest: Add coverage for iommufd pasid attach/detach
  iommufd/selftest: Add test ops to test pasid attach/detach
  iommufd/selftest: Add a helper to get test device
  iommufd/selftest: Add set_dev_pasid in mock iommu
  iommufd: Allow allocating PASID-compatible domain
  iommu/vt-d: Add IOMMU_HWPT_ALLOC_PASID support
  iommufd: Enforce PASID-compatible domain for RID
  iommufd: Support pasid attach/replace
  iommufd: Enforce PASID-compatible domain in PASID path
  iommufd/device: Add pasid_attach array to track per-PASID attach
  ...
2025-04-01 18:03:46 -07:00
Jeff Xu b481341e4c selftest: test system mappings are sealed
Add sysmap_is_sealed.c to test system mappings are sealed.

Note: CONFIG_MSEAL_SYSTEM_MAPPINGS must be set, as indicated in
config file.

Link: https://lkml.kernel.org/r/20250305021711.3867874-8-jeffxu@google.com
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Kees Cook <kees@kernel.org>
Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Anna-Maria Behnsen <anna-maria@linutronix.de>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Benjamin Berg <benjamin@sipsolutions.net>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Elliot Hughes <enh@google.com>
Cc: Florian Faineli <f.fainelli@gmail.com>
Cc: Greg Ungerer <gerg@kernel.org>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jason A. Donenfeld <jason@zx2c4.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Linus Waleij <linus.walleij@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Rapoport <mike.rapoport@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Stephen Röttger <sroettger@google.com>
Cc: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-01 15:17:16 -07:00
Jeff Xu 7b0141daf3 selftests: x86: test_mremap_vdso: skip if vdso is msealed
Add code to detect if the vdso is memory sealed, skip the test if it is.

Link: https://lkml.kernel.org/r/20250305021711.3867874-3-jeffxu@google.com
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Reviewed-by: Kees Cook <kees@kernel.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Anna-Maria Behnsen <anna-maria@linutronix.de>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Benjamin Berg <benjamin@sipsolutions.net>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Elliot Hughes <enh@google.com>
Cc: Florian Faineli <f.fainelli@gmail.com>
Cc: Greg Ungerer <gerg@kernel.org>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jason A. Donenfeld <jason@zx2c4.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
Cc: Linus Waleij <linus.walleij@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Rapoport <mike.rapoport@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Stephen Röttger <sroettger@google.com>
Cc: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-01 15:17:15 -07:00
Li Wang 983e760bcd selftest/mm: va_high_addr_switch: add ppc64 support check
Add PPC64 Radix MMU support to the va_high_addr_switch.sh by introducing
check_supported_ppc64().  The function verifies:

  - 5-level paging (PGTABLE_LEVELS >= 5) enable in kernel config
  - Radix MMU (required for PPC64 5-level translation)
  - HugePages availability (needed for some tests)

If any check fails, the test is skipped (ksft_skip).  This ensures
compatibility with Power9/Power10 systems running in Radix MMU mode.

Avoid failures on 4-level paging system:

  # mmap(NULL, MAP_HUGETLB): 0xffffffffffffffff - FAILED
  # mmap(LOW_ADDR, MAP_HUGETLB): 0xffffffffffffffff - FAILED
  # mmap(HIGH_ADDR, MAP_HUGETLB): 0xffffffffffffffff - FAILED
  # mmap(HIGH_ADDR, MAP_HUGETLB) again: 0xffffffffffffffff - FAILED
  # mmap(HIGH_ADDR, MAP_FIXED | MAP_HUGETLB): 0xffffffffffffffff - FAILED
  # mmap(-1, MAP_HUGETLB): 0xffffffffffffffff - FAILED
  # mmap(-1, MAP_HUGETLB) again: 0xffffffffffffffff - FAILED
  # mmap(ADDR_SWITCH_HINT - PAGE_SIZE, 2*HUGETLB_SIZE, MAP_HUGETLB): 0xffffffffffffffff - FAILED
  # mmap(ADDR_SWITCH_HINT , 2*HUGETLB_SIZE, MAP_FIXED | MAP_HUGETLB): 0xffffffffffffffff - FAILED

Link: https://lkml.kernel.org/r/20250327114813.25980-1-liwang@redhat.com
Signed-off-by: Li Wang <liwang@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Kirill A. Shuemov <kirill.shutemov@linux.intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-01 15:17:11 -07:00
Uday Shankar f8554f512b selftests: ublk: kublk: fix an error log line
When doing io_uring operations using liburing, errno is not used to
indicate errors, so the %m format specifier does not provide any
relevant information for failed io_uring commands. Fix a log line
emitted on get_params failure to translate the error code returned in
the cqe->res field instead.

Signed-off-by: Uday Shankar <ushankar@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250401-ublk_selftests-v1-2-98129c9bc8bb@purestorage.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-01 14:52:38 -06:00
Uday Shankar 53c959295b selftests: ublk: kublk: use ioctl-encoded opcodes
There are a couple of places in the kublk selftests ublk server which
use the legacy ublk opcodes. These operations fail (with -EOPNOTSUPP) on
a kernel compiled without CONFIG_BLKDEV_UBLK_LEGACY_OPCODES set. We
could easily require it to be set as a prerequisite for these selftests,
but since new applications should not be using the legacy opcodes, use
the ioctl-encoded opcodes everywhere in kublk.

Signed-off-by: Uday Shankar <ushankar@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250401-ublk_selftests-v1-1-98129c9bc8bb@purestorage.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-01 14:52:38 -06:00
Linus Torvalds 2cd5769fb0 Driver core updates for 6.15-rc1
Here is the big set of driver core updates for 6.15-rc1.  Lots of stuff
 happened this development cycle, including:
   - kernfs scaling changes to make it even faster thanks to rcu
   - bin_attribute constify work in many subsystems
   - faux bus minor tweaks for the rust bindings
   - rust binding updates for driver core, pci, and platform busses,
     making more functionaliy available to rust drivers.  These are all
     due to people actually trying to use the bindings that were in 6.14.
   - make Rafael and Danilo full co-maintainers of the driver core
     codebase
   - other minor fixes and updates.
 
 This has been in linux-next for a while now, with the only reported
 issue being some merge conflicts with the rust tree.  Depending on which
 tree you pull first, you will have conflicts in one of them.  The merge
 resolution has been in linux-next as an example of what to do, or can be
 found here:
 	https://lore.kernel.org/r/CANiq72n3Xe8JcnEjirDhCwQgvWoE65dddWecXnfdnbrmuah-RQ@mail.gmail.com
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZ+mMrg8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ylRgwCdH58OE3BgL0uoFY5vFImStpmPtqUAoL5HpVWI
 jtbJ+UuXGsnmO+JVNBEv
 =gy6W
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updatesk from Greg KH:
 "Here is the big set of driver core updates for 6.15-rc1. Lots of stuff
  happened this development cycle, including:

   - kernfs scaling changes to make it even faster thanks to rcu

   - bin_attribute constify work in many subsystems

   - faux bus minor tweaks for the rust bindings

   - rust binding updates for driver core, pci, and platform busses,
     making more functionaliy available to rust drivers. These are all
     due to people actually trying to use the bindings that were in
     6.14.

   - make Rafael and Danilo full co-maintainers of the driver core
     codebase

   - other minor fixes and updates"

* tag 'driver-core-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (52 commits)
  rust: platform: require Send for Driver trait implementers
  rust: pci: require Send for Driver trait implementers
  rust: platform: impl Send + Sync for platform::Device
  rust: pci: impl Send + Sync for pci::Device
  rust: platform: fix unrestricted &mut platform::Device
  rust: pci: fix unrestricted &mut pci::Device
  rust: device: implement device context marker
  rust: pci: use to_result() in enable_device_mem()
  MAINTAINERS: driver core: mark Rafael and Danilo as co-maintainers
  rust/kernel/faux: mark Registration methods inline
  driver core: faux: only create the device if probe() succeeds
  rust/faux: Add missing parent argument to Registration::new()
  rust/faux: Drop #[repr(transparent)] from faux::Registration
  rust: io: fix devres test with new io accessor functions
  rust: io: rename `io::Io` accessors
  kernfs: Move dput() outside of the RCU section.
  efi: rci2: mark bin_attribute as __ro_after_init
  rapidio: constify 'struct bin_attribute'
  firmware: qemu_fw_cfg: constify 'struct bin_attribute'
  powerpc/perf/hv-24x7: Constify 'struct bin_attribute'
  ...
2025-04-01 11:02:03 -07:00
Linus Torvalds d6b02199cd - The 7 patch series "powerpc/crash: use generic crashkernel
reservation" from Sourabh Jain changes powerpc's kexec code to use more
   of the generic layers.
 
 - The 2 patch series "get_maintainer: report subsystem status
   separately" from Vlastimil Babka makes some long-requested improvements
   to the get_maintainer output.
 
 - The 4 patch series "ucount: Simplify refcounting with rcuref_t" from
   Sebastian Siewior cleans up and optimizing the refcounting in the ucount
   code.
 
 - The 12 patch series "reboot: support runtime configuration of
   emergency hw_protection action" from Ahmad Fatoum improves the ability
   for a driver to perform an emergency system shutdown or reboot.
 
 - The 16 patch series "Converge on using secs_to_jiffies() part two"
   from Easwar Hariharan performs further migrations from
   msecs_to_jiffies() to secs_to_jiffies().
 
 - The 7 patch series "lib/interval_tree: add some test cases and
   cleanup" from Wei Yang permits more userspace testing of kernel library
   code, adds some more tests and performs some cleanups.
 
 - The 2 patch series "hung_task: Dump the blocking task stacktrace" from
   Masami Hiramatsu arranges for the hung_task detector to dump the stack
   of the blocking task and not just that of the blocked task.
 
 - The 4 patch series "resource: Split and use DEFINE_RES*() macros" from
   Andy Shevchenko provides some cleanups to the resource definition
   macros.
 
 - Plus the usual shower of singleton patches - please see the individual
   changelogs for details.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ+nuqwAKCRDdBJ7gKXxA
 jtNqAQDxqJpjWkzn4yN9CNSs1ivVx3fr6SqazlYCrt3u89WQvwEA1oRrGpETzUGq
 r6khQUIcQImPPcjFqEFpuiSOU0MBZA0=
 =Kii8
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2025-03-30-18-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:

 - The series "powerpc/crash: use generic crashkernel reservation" from
   Sourabh Jain changes powerpc's kexec code to use more of the generic
   layers.

 - The series "get_maintainer: report subsystem status separately" from
   Vlastimil Babka makes some long-requested improvements to the
   get_maintainer output.

 - The series "ucount: Simplify refcounting with rcuref_t" from
   Sebastian Siewior cleans up and optimizing the refcounting in the
   ucount code.

 - The series "reboot: support runtime configuration of emergency
   hw_protection action" from Ahmad Fatoum improves the ability for a
   driver to perform an emergency system shutdown or reboot.

 - The series "Converge on using secs_to_jiffies() part two" from Easwar
   Hariharan performs further migrations from msecs_to_jiffies() to
   secs_to_jiffies().

 - The series "lib/interval_tree: add some test cases and cleanup" from
   Wei Yang permits more userspace testing of kernel library code, adds
   some more tests and performs some cleanups.

 - The series "hung_task: Dump the blocking task stacktrace" from Masami
   Hiramatsu arranges for the hung_task detector to dump the stack of
   the blocking task and not just that of the blocked task.

 - The series "resource: Split and use DEFINE_RES*() macros" from Andy
   Shevchenko provides some cleanups to the resource definition macros.

 - Plus the usual shower of singleton patches - please see the
   individual changelogs for details.

* tag 'mm-nonmm-stable-2025-03-30-18-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (77 commits)
  mailmap: consolidate email addresses of Alexander Sverdlin
  fs/procfs: fix the comment above proc_pid_wchan()
  relay: use kasprintf() instead of fixed buffer formatting
  resource: replace open coded variant of DEFINE_RES()
  resource: replace open coded variants of DEFINE_RES_*_NAMED()
  resource: replace open coded variant of DEFINE_RES_NAMED_DESC()
  resource: split DEFINE_RES_NAMED_DESC() out of DEFINE_RES_NAMED()
  samples: add hung_task detector mutex blocking sample
  hung_task: show the blocker task if the task is hung on mutex
  kexec_core: accept unaccepted kexec segments' destination addresses
  watchdog/perf: optimize bytes copied and remove manual NUL-termination
  lib/interval_tree: fix the comment of interval_tree_span_iter_next_gap()
  lib/interval_tree: skip the check before go to the right subtree
  lib/interval_tree: add test case for span iteration
  lib/interval_tree: add test case for interval_tree_iter_xxx() helpers
  lib/rbtree: add random seed
  lib/rbtree: split tests
  lib/rbtree: enable userland test suite for rbtree related data structure
  checkpatch: describe --min-conf-desc-length
  scripts/gdb/symbols: determine KASLR offset on s390
  ...
2025-04-01 10:06:52 -07:00
Linus Torvalds eb0ece1602 - The 6 patch series "Enable strict percpu address space checks" from
Uros Bizjak uses x86 named address space qualifiers to provide
   compile-time checking of percpu area accesses.
 
   This has caused a small amount of fallout - two or three issues were
   reported.  In all cases the calling code was founf to be incorrect.
 
 - The 4 patch series "Some cleanup for memcg" from Chen Ridong
   implements some relatively monir cleanups for the memcontrol code.
 
 - The 17 patch series "mm: fixes for device-exclusive entries (hmm)"
   from David Hildenbrand fixes a boatload of issues which David found then
   using device-exclusive PTE entries when THP is enabled.  More work is
   needed, but this makes thins better - our own HMM selftests now succeed.
 
 - The 2 patch series "mm: zswap: remove z3fold and zbud" from Yosry
   Ahmed remove the z3fold and zbud implementations.  They have been
   deprecated for half a year and nobody has complained.
 
 - The 5 patch series "mm: further simplify VMA merge operation" from
   Lorenzo Stoakes implements numerous simplifications in this area.  No
   runtime effects are anticipated.
 
 - The 4 patch series "mm/madvise: remove redundant mmap_lock operations
   from process_madvise()" from SeongJae Park rationalizes the locking in
   the madvise() implementation.  Performance gains of 20-25% were observed
   in one MADV_DONTNEED microbenchmark.
 
 - The 12 patch series "Tiny cleanup and improvements about SWAP code"
   from Baoquan He contains a number of touchups to issues which Baoquan
   noticed when working on the swap code.
 
 - The 2 patch series "mm: kmemleak: Usability improvements" from Catalin
   Marinas implements a couple of improvements to the kmemleak user-visible
   output.
 
 - The 2 patch series "mm/damon/paddr: fix large folios access and
   schemes handling" from Usama Arif provides a couple of fixes for DAMON's
   handling of large folios.
 
 - The 3 patch series "mm/damon/core: fix wrong and/or useless
   damos_walk() behaviors" from SeongJae Park fixes a few issues with the
   accuracy of kdamond's walking of DAMON regions.
 
 - The 3 patch series "expose mapping wrprotect, fix fb_defio use" from
   Lorenzo Stoakes changes the interaction between framebuffer deferred-io
   and core MM.  No functional changes are anticipated - this is
   preparatory work for the future removal of page structure fields.
 
 - The 4 patch series "mm/damon: add support for hugepage_size DAMOS
   filter" from Usama Arif adds a DAMOS filter which permits the filtering
   by huge page sizes.
 
 - The 4 patch series "mm: permit guard regions for file-backed/shmem
   mappings" from Lorenzo Stoakes extends the guard region feature from its
   present "anon mappings only" state.  The feature now covers shmem and
   file-backed mappings.
 
 - The 4 patch series "mm: batched unmap lazyfree large folios during
   reclamation" from Barry Song cleans up and speeds up the unmapping for
   pte-mapped large folios.
 
 - The 18 patch series "reimplement per-vma lock as a refcount" from
   Suren Baghdasaryan puts the vm_lock back into the vma.  Our reasons for
   pulling it out were largely bogus and that change made the code more
   messy.  This patchset provides small (0-10%) improvements on one
   microbenchmark.
 
 - The 5 patch series "Docs/mm/damon: misc DAMOS filters documentation
   fixes and improves" from SeongJae Park does some maintenance work on the
   DAMON docs.
 
 - The 27 patch series "hugetlb/CMA improvements for large systems" from
   Frank van der Linden addresses a pile of issues which have been observed
   when using CMA on large machines.
 
 - The 2 patch series "mm/damon: introduce DAMOS filter type for unmapped
   pages" from SeongJae Park enables users of DMAON/DAMOS to filter my the
   page's mapped/unmapped status.
 
 - The 19 patch series "zsmalloc/zram: there be preemption" from Sergey
   Senozhatsky teaches zram to run its compression and decompression
   operations preemptibly.
 
 - The 12 patch series "selftests/mm: Some cleanups from trying to run
   them" from Brendan Jackman fixes a pile of unrelated issues which
   Brendan encountered while runnimg our selftests.
 
 - The 2 patch series "fs/proc/task_mmu: add guard region bit to pagemap"
   from Lorenzo Stoakes permits userspace to use /proc/pid/pagemap to
   determine whether a particular page is a guard page.
 
 - The 7 patch series "mm, swap: remove swap slot cache" from Kairui Song
   removes the swap slot cache from the allocation path - it simply wasn't
   being effective.
 
 - The 5 patch series "mm: cleanups for device-exclusive entries (hmm)"
   from David Hildenbrand implements a number of unrelated cleanups in this
   code.
 
 - The 5 patch series "mm: Rework generic PTDUMP configs" from Anshuman
   Khandual implements a number of preparatoty cleanups to the
   GENERIC_PTDUMP Kconfig logic.
 
 - The 8 patch series "mm/damon: auto-tune aggregation interval" from
   SeongJae Park implements a feedback-driven automatic tuning feature for
   DAMON's aggregation interval tuning.
 
 - The 5 patch series "Fix lazy mmu mode" from Ryan Roberts fixes some
   issues in powerpc, sparc and x86 lazy MMU implementations.  Ryan did
   this in preparation for implementing lazy mmu mode for arm64 to optimize
   vmalloc.
 
 - The 2 patch series "mm/page_alloc: Some clarifications for migratetype
   fallback" from Brendan Jackman reworks some commentary to make the code
   easier to follow.
 
 - The 3 patch series "page_counter cleanup and size reduction" from
   Shakeel Butt cleans up the page_counter code and fixes a size increase
   which we accidentally added late last year.
 
 - The 3 patch series "Add a command line option that enables control of
   how many threads should be used to allocate huge pages" from Thomas
   Prescher does that.  It allows the careful operator to significantly
   reduce boot time by tuning the parallalization of huge page
   initialization.
 
 - The 3 patch series "Fix calculations in trace_balance_dirty_pages()
   for cgwb" from Tang Yizhou fixes the tracing output from the dirty page
   balancing code.
 
 - The 9 patch series "mm/damon: make allow filters after reject filters
   useful and intuitive" from SeongJae Park improves the handling of allow
   and reject filters.  Behaviour is made more consistent and the
   documention is updated accordingly.
 
 - The 5 patch series "Switch zswap to object read/write APIs" from Yosry
   Ahmed updates zswap to the new object read/write APIs and thus permits
   the removal of some legacy code from zpool and zsmalloc.
 
 - The 6 patch series "Some trivial cleanups for shmem" from Baolin Wang
   does as it claims.
 
 - The 20 patch series "fs/dax: Fix ZONE_DEVICE page reference counts"
   from Alistair Popple regularizes the weird ZONE_DEVICE page refcount
   handling in DAX, permittig the removal of a number of special-case
   checks.
 
 - The 4 patch series "refactor mremap and fix bug" from Lorenzo Stoakes
   is a preparatoty refactoring and cleanup of the mremap() code.
 
 - The 20 patch series "mm: MM owner tracking for large folios (!hugetlb)
   + CONFIG_NO_PAGE_MAPCOUNT" from David Hildenbrand reworks the manner in
   which we determine whether a large folio is known to be mapped
   exclusively into a single MM.
 
 - The 8 patch series "mm/damon: add sysfs dirs for managing DAMOS
   filters based on handling layers" from SeongJae Park adds a couple of
   new sysfs directories to ease the management of DAMON/DAMOS filters.
 
 - The 13 patch series "arch, mm: reduce code duplication in mem_init()"
   from Mike Rapoport consolidates many per-arch implementations of
   mem_init() into code generic code, where that is practical.
 
 - The 13 patch series "mm/damon/sysfs: commit parameters online via
   damon_call()" from SeongJae Park continues the cleaning up of sysfs
   access to DAMON internal data.
 
 - The 3 patch series "mm: page_ext: Introduce new iteration API" from
   Luiz Capitulino reworks the page_ext initialization to fix a boot-time
   crash which was observed with an unusual combination of compile and
   cmdline options.
 
 - The 8 patch series "Buddy allocator like (or non-uniform) folio split"
   from Zi Yan reworks the code to split a folio into smaller folios.  The
   main benefit is lessened memory consumption: fewer post-split folios are
   generated.
 
 - The 2 patch series "Minimize xa_node allocation during xarry split"
   from Zi Yan reduces the number of xarray xa_nodes which are generated
   during an xarray split.
 
 - The 2 patch series "drivers/base/memory: Two cleanups" from Gavin Shan
   performs some maintenance work on the drivers/base/memory code.
 
 - The 3 patch series "Add tracepoints for lowmem reserves, watermarks
   and totalreserve_pages" from Martin Liu adds some more tracepoints to
   the page allocator code.
 
 - The 4 patch series "mm/madvise: cleanup requests validations and
   classifications" from SeongJae Park cleans up some warts which SeongJae
   observed during his earlier madvise work.
 
 - The 3 patch series "mm/hwpoison: Fix regressions in memory failure
   handling" from Shuai Xue addresses two quite serious regressions which
   Shuai has observed in the memory-failure implementation.
 
 - The 5 patch series "mm: reliable huge page allocator" from Johannes
   Weiner makes huge page allocations cheaper and more reliable by reducing
   fragmentation.
 
 - The 5 patch series "Minor memcg cleanups & prep for memdescs" from
   Matthew Wilcox is preparatory work for the future implementation of
   memdescs.
 
 - The 4 patch series "track memory used by balloon drivers" from Nico
   Pache introduces a way to track memory used by our various balloon
   drivers.
 
 - The 2 patch series "mm/damon: introduce DAMOS filter type for active
   pages" from Nhat Pham permits users to filter for active/inactive pages,
   separately for file and anon pages.
 
 - The 2 patch series "Adding Proactive Memory Reclaim Statistics" from
   Hao Jia separates the proactive reclaim statistics from the direct
   reclaim statistics.
 
 - The 2 patch series "mm/vmscan: don't try to reclaim hwpoison folio"
   from Jinjiang Tu fixes our handling of hwpoisoned pages within the
   reclaim code.
 -----BEGIN PGP SIGNATURE-----
 
 iHQEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ+nZaAAKCRDdBJ7gKXxA
 jsOWAPiP4r7CJHMZRK4eyJOkvS1a1r+TsIarrFZtjwvf/GIfAQCEG+JDxVfUaUSF
 Ee93qSSLR1BkNdDw+931Pu0mXfbnBw==
 =Pn2K
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2025-03-30-16-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - The series "Enable strict percpu address space checks" from Uros
   Bizjak uses x86 named address space qualifiers to provide
   compile-time checking of percpu area accesses.

   This has caused a small amount of fallout - two or three issues were
   reported. In all cases the calling code was found to be incorrect.

 - The series "Some cleanup for memcg" from Chen Ridong implements some
   relatively monir cleanups for the memcontrol code.

 - The series "mm: fixes for device-exclusive entries (hmm)" from David
   Hildenbrand fixes a boatload of issues which David found then using
   device-exclusive PTE entries when THP is enabled. More work is
   needed, but this makes thins better - our own HMM selftests now
   succeed.

 - The series "mm: zswap: remove z3fold and zbud" from Yosry Ahmed
   remove the z3fold and zbud implementations. They have been deprecated
   for half a year and nobody has complained.

 - The series "mm: further simplify VMA merge operation" from Lorenzo
   Stoakes implements numerous simplifications in this area. No runtime
   effects are anticipated.

 - The series "mm/madvise: remove redundant mmap_lock operations from
   process_madvise()" from SeongJae Park rationalizes the locking in the
   madvise() implementation. Performance gains of 20-25% were observed
   in one MADV_DONTNEED microbenchmark.

 - The series "Tiny cleanup and improvements about SWAP code" from
   Baoquan He contains a number of touchups to issues which Baoquan
   noticed when working on the swap code.

 - The series "mm: kmemleak: Usability improvements" from Catalin
   Marinas implements a couple of improvements to the kmemleak
   user-visible output.

 - The series "mm/damon/paddr: fix large folios access and schemes
   handling" from Usama Arif provides a couple of fixes for DAMON's
   handling of large folios.

 - The series "mm/damon/core: fix wrong and/or useless damos_walk()
   behaviors" from SeongJae Park fixes a few issues with the accuracy of
   kdamond's walking of DAMON regions.

 - The series "expose mapping wrprotect, fix fb_defio use" from Lorenzo
   Stoakes changes the interaction between framebuffer deferred-io and
   core MM. No functional changes are anticipated - this is preparatory
   work for the future removal of page structure fields.

 - The series "mm/damon: add support for hugepage_size DAMOS filter"
   from Usama Arif adds a DAMOS filter which permits the filtering by
   huge page sizes.

 - The series "mm: permit guard regions for file-backed/shmem mappings"
   from Lorenzo Stoakes extends the guard region feature from its
   present "anon mappings only" state. The feature now covers shmem and
   file-backed mappings.

 - The series "mm: batched unmap lazyfree large folios during
   reclamation" from Barry Song cleans up and speeds up the unmapping
   for pte-mapped large folios.

 - The series "reimplement per-vma lock as a refcount" from Suren
   Baghdasaryan puts the vm_lock back into the vma. Our reasons for
   pulling it out were largely bogus and that change made the code more
   messy. This patchset provides small (0-10%) improvements on one
   microbenchmark.

 - The series "Docs/mm/damon: misc DAMOS filters documentation fixes and
   improves" from SeongJae Park does some maintenance work on the DAMON
   docs.

 - The series "hugetlb/CMA improvements for large systems" from Frank
   van der Linden addresses a pile of issues which have been observed
   when using CMA on large machines.

 - The series "mm/damon: introduce DAMOS filter type for unmapped pages"
   from SeongJae Park enables users of DMAON/DAMOS to filter my the
   page's mapped/unmapped status.

 - The series "zsmalloc/zram: there be preemption" from Sergey
   Senozhatsky teaches zram to run its compression and decompression
   operations preemptibly.

 - The series "selftests/mm: Some cleanups from trying to run them" from
   Brendan Jackman fixes a pile of unrelated issues which Brendan
   encountered while runnimg our selftests.

 - The series "fs/proc/task_mmu: add guard region bit to pagemap" from
   Lorenzo Stoakes permits userspace to use /proc/pid/pagemap to
   determine whether a particular page is a guard page.

 - The series "mm, swap: remove swap slot cache" from Kairui Song
   removes the swap slot cache from the allocation path - it simply
   wasn't being effective.

 - The series "mm: cleanups for device-exclusive entries (hmm)" from
   David Hildenbrand implements a number of unrelated cleanups in this
   code.

 - The series "mm: Rework generic PTDUMP configs" from Anshuman Khandual
   implements a number of preparatoty cleanups to the GENERIC_PTDUMP
   Kconfig logic.

 - The series "mm/damon: auto-tune aggregation interval" from SeongJae
   Park implements a feedback-driven automatic tuning feature for
   DAMON's aggregation interval tuning.

 - The series "Fix lazy mmu mode" from Ryan Roberts fixes some issues in
   powerpc, sparc and x86 lazy MMU implementations. Ryan did this in
   preparation for implementing lazy mmu mode for arm64 to optimize
   vmalloc.

 - The series "mm/page_alloc: Some clarifications for migratetype
   fallback" from Brendan Jackman reworks some commentary to make the
   code easier to follow.

 - The series "page_counter cleanup and size reduction" from Shakeel
   Butt cleans up the page_counter code and fixes a size increase which
   we accidentally added late last year.

 - The series "Add a command line option that enables control of how
   many threads should be used to allocate huge pages" from Thomas
   Prescher does that. It allows the careful operator to significantly
   reduce boot time by tuning the parallalization of huge page
   initialization.

 - The series "Fix calculations in trace_balance_dirty_pages() for cgwb"
   from Tang Yizhou fixes the tracing output from the dirty page
   balancing code.

 - The series "mm/damon: make allow filters after reject filters useful
   and intuitive" from SeongJae Park improves the handling of allow and
   reject filters. Behaviour is made more consistent and the documention
   is updated accordingly.

 - The series "Switch zswap to object read/write APIs" from Yosry Ahmed
   updates zswap to the new object read/write APIs and thus permits the
   removal of some legacy code from zpool and zsmalloc.

 - The series "Some trivial cleanups for shmem" from Baolin Wang does as
   it claims.

 - The series "fs/dax: Fix ZONE_DEVICE page reference counts" from
   Alistair Popple regularizes the weird ZONE_DEVICE page refcount
   handling in DAX, permittig the removal of a number of special-case
   checks.

 - The series "refactor mremap and fix bug" from Lorenzo Stoakes is a
   preparatoty refactoring and cleanup of the mremap() code.

 - The series "mm: MM owner tracking for large folios (!hugetlb) +
   CONFIG_NO_PAGE_MAPCOUNT" from David Hildenbrand reworks the manner in
   which we determine whether a large folio is known to be mapped
   exclusively into a single MM.

 - The series "mm/damon: add sysfs dirs for managing DAMOS filters based
   on handling layers" from SeongJae Park adds a couple of new sysfs
   directories to ease the management of DAMON/DAMOS filters.

 - The series "arch, mm: reduce code duplication in mem_init()" from
   Mike Rapoport consolidates many per-arch implementations of
   mem_init() into code generic code, where that is practical.

 - The series "mm/damon/sysfs: commit parameters online via
   damon_call()" from SeongJae Park continues the cleaning up of sysfs
   access to DAMON internal data.

 - The series "mm: page_ext: Introduce new iteration API" from Luiz
   Capitulino reworks the page_ext initialization to fix a boot-time
   crash which was observed with an unusual combination of compile and
   cmdline options.

 - The series "Buddy allocator like (or non-uniform) folio split" from
   Zi Yan reworks the code to split a folio into smaller folios. The
   main benefit is lessened memory consumption: fewer post-split folios
   are generated.

 - The series "Minimize xa_node allocation during xarry split" from Zi
   Yan reduces the number of xarray xa_nodes which are generated during
   an xarray split.

 - The series "drivers/base/memory: Two cleanups" from Gavin Shan
   performs some maintenance work on the drivers/base/memory code.

 - The series "Add tracepoints for lowmem reserves, watermarks and
   totalreserve_pages" from Martin Liu adds some more tracepoints to the
   page allocator code.

 - The series "mm/madvise: cleanup requests validations and
   classifications" from SeongJae Park cleans up some warts which
   SeongJae observed during his earlier madvise work.

 - The series "mm/hwpoison: Fix regressions in memory failure handling"
   from Shuai Xue addresses two quite serious regressions which Shuai
   has observed in the memory-failure implementation.

 - The series "mm: reliable huge page allocator" from Johannes Weiner
   makes huge page allocations cheaper and more reliable by reducing
   fragmentation.

 - The series "Minor memcg cleanups & prep for memdescs" from Matthew
   Wilcox is preparatory work for the future implementation of memdescs.

 - The series "track memory used by balloon drivers" from Nico Pache
   introduces a way to track memory used by our various balloon drivers.

 - The series "mm/damon: introduce DAMOS filter type for active pages"
   from Nhat Pham permits users to filter for active/inactive pages,
   separately for file and anon pages.

 - The series "Adding Proactive Memory Reclaim Statistics" from Hao Jia
   separates the proactive reclaim statistics from the direct reclaim
   statistics.

 - The series "mm/vmscan: don't try to reclaim hwpoison folio" from
   Jinjiang Tu fixes our handling of hwpoisoned pages within the reclaim
   code.

* tag 'mm-stable-2025-03-30-16-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (431 commits)
  mm/page_alloc: remove unnecessary __maybe_unused in order_to_pindex()
  x86/mm: restore early initialization of high_memory for 32-bits
  mm/vmscan: don't try to reclaim hwpoison folio
  mm/hwpoison: introduce folio_contain_hwpoisoned_page() helper
  cgroup: docs: add pswpin and pswpout items in cgroup v2 doc
  mm: vmscan: split proactive reclaim statistics from direct reclaim statistics
  selftests/mm: speed up split_huge_page_test
  selftests/mm: uffd-unit-tests support for hugepages > 2M
  docs/mm/damon/design: document active DAMOS filter type
  mm/damon: implement a new DAMOS filter type for active pages
  fs/dax: don't disassociate zero page entries
  MM documentation: add "Unaccepted" meminfo entry
  selftests/mm: add commentary about 9pfs bugs
  fork: use __vmalloc_node() for stack allocation
  docs/mm: Physical Memory: Populate the "Zones" section
  xen: balloon: update the NR_BALLOON_PAGES state
  hv_balloon: update the NR_BALLOON_PAGES state
  balloon_compaction: update the NR_BALLOON_PAGES state
  meminfo: add a per node counter for balloon drivers
  mm: remove references to folio in __memcg_kmem_uncharge_page()
  ...
2025-04-01 09:29:18 -07:00
Wolfram Sang 424dfcd441 rtc: remove 'setdate' test program
The tool is not embedded in the testing framework. 'rtc' from rtc-tools
serves as a much better programming example. No need to carry this tool
in the kernel tree.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Link: https://lore.kernel.org/r/20250320103433.11673-1-wsa+renesas@sang-engineering.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2025-04-01 15:25:15 +02:00
Wolfram Sang 0cd73ab4df selftest: rtc: skip some tests if the alarm only supports minutes
There are alarms which have only minute-granularity. The RTC core
already has a flag to describe them. Use this flag to skip tests which
require the alarm to support seconds.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Link: https://lore.kernel.org/r/20250218101548.6514-1-wsa+renesas@sang-engineering.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2025-04-01 15:03:13 +02:00
Ignacio Encinas 79ba5c1c77
selftests: riscv: fix v_exec_initval_nolibc.c
Vector registers are zero initialized by the kernel. Stop accepting
"all ones" as a clean value.

Note that this was not working as expected given that
	value == 0xff
can be assumed to be always false by the compiler as value's range is
[-128, 127]. Both GCC (-Wtype-limits) and clang
(-Wtautological-constant-out-of-range-compare) warn about this.

Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Tested-by: Charlie Jenkins <charlie@rivosinc.com>
Signed-off-by: Ignacio Encinas <ignacio@iencinas.com>
Link: https://lore.kernel.org/r/20250306-fix-v_exec_initval_nolibc-v2-1-97f9dc8a7faf@iencinas.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
2025-04-01 07:03:04 +00:00
Matthieu Baerts (NGI0) b44a4c2822 selftests: mptcp: ignore mptcp_diag binary
A new binary is now generated by the MPTCP selftests: mptcp_diag.

Like the other binaries from this directory, there is no need to track
this in Git, it should then be ignored.

Fixes: 00f5e338cf ("selftests: mptcp: Add a tool to get specific msk_info")
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250328-net-mptcp-misc-fixes-6-15-v1-4-34161a482a7f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-31 16:52:39 -07:00
Geliang Tang c183165f87 selftests: mptcp: close fd_in before returning in main_loop
The file descriptor 'fd_in' is opened when cfg_input is configured, but
not closed in main_loop(), this patch fixes it.

Fixes: 05be5e273c ("selftests: mptcp: add disconnect tests")
Cc: stable@vger.kernel.org
Co-developed-by: Cong Liu <liucong2@kylinos.cn>
Signed-off-by: Cong Liu <liucong2@kylinos.cn>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250328-net-mptcp-misc-fixes-6-15-v1-3-34161a482a7f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-31 16:52:39 -07:00
Cong Liu 7335d4ac81 selftests: mptcp: fix incorrect fd checks in main_loop
Fix a bug where the code was checking the wrong file descriptors
when opening the input files. The code was checking 'fd' instead
of 'fd_in', which could lead to incorrect error handling.

Fixes: 05be5e273c ("selftests: mptcp: add disconnect tests")
Cc: stable@vger.kernel.org
Fixes: ca7ae89160 ("selftests: mptcp: mptfo Initiator/Listener")
Co-developed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Cong Liu <liucong2@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250328-net-mptcp-misc-fixes-6-15-v1-2-34161a482a7f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-31 16:52:39 -07:00
Jakub Kicinski 88dec030df selftests: net: use Path helpers in ping
Now that net and net-next have converged we can use the Path
helpers in the ping test without conflicts.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250327222315.1098596-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-31 16:44:26 -07:00
Jakub Kicinski c231e12ecd selftests: net: use the dummy bpf from net/lib
Commit 29b036be1b ("selftests: drv-net: test XDP, HDS auto and
the ioctl path") added an sample XDP_PASS prog in net/lib, so
that we can reuse it in various sub-directories. Delete the old
sample and use the one from the lib in existing tests.

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250327222315.1098596-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-31 16:44:25 -07:00
Jakub Kicinski e514d77334 selftests: drv-net: replace the rpath helper with Path objects
The single letter + "path" helpers do not have many fans (see Link).
Use a Path object with a better name. test_dir is the replacement
for rpath(), net_lib_dir is a new path of the $ksft/net/lib directory.

The Path() class overloads the "/" operator and can be cast to string
automatically, so to get a path to a file tests can do:

    path = env.test_dir / "binary"

Link: https://lore.kernel.org/CA+FuTSemTNVZ5MxXkq8T9P=DYm=nSXcJnL7CJBPZNAT_9UFisQ@mail.gmail.com
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250327222315.1098596-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-31 16:44:25 -07:00
Waiman Long e8a457b735 selftest/cgroup: Add a remote partition transition test to test_cpuset_prs.sh
The current cgroup directory layout for running the partition state
transition tests is mainly suitable for testing local partitions as
well as with a mix of local and remote partitions. It is not that
suitable for doing extensive remote partition and nested remote/local
partition testing.

Add a new set of remote partition tests REMOTE_TEST_MATRIX with another
cgroup directory structure more tailored for remote partition testing
to provide better code coverage.

Also add a few new test cases as well as adjusting existig ones for
the original TEST_MATRIX.

Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-03-31 13:28:19 -10:00
Waiman Long b2b2b4d058 selftest/cgroup: Clean up and restructure test_cpuset_prs.sh
Cleaning up the test_cpuset_prs.sh script and restructure some of the
functions so that a new test matrix with a different cgroup directory
structure can be added in the next patch.

Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-03-31 13:28:05 -10:00
Waiman Long 65046b5e0a selftest/cgroup: Update test_cpuset_prs.sh to use | as effective CPUs and state separator
Currently, ',' is used as the cgroup separator of the expected effective
CPUs and partition root states in the test matrix. However, ',' can be
part of the output of the cpuset.cpus*.effective and cpuset.cpus.isolated
files. Change the separator to '|' so that ',' can appear as part of
the expected values.

Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-03-31 13:27:49 -10:00
Waiman Long f62a5d3936 cgroup/cpuset: Remove remote_partition_check() & make update_cpumasks_hier() handle remote partition
Currently, changes in exclusive CPUs are being handled in
remote_partition_check() by disabling conflicting remote partitions.
However, that may lead to results unexpected by the users. Fix
this problem by removing remote_partition_check() and making
update_cpumasks_hier() handle changes in descendant remote partitions
properly.

The compute_effective_exclusive_cpumask() function is enhanced to check
the exclusive_cpus and effective_xcpus from siblings and excluded them
in its effective exclusive CPUs computation and return a value to show if
there is any sibling conflicts.  This is somewhat like the cpu_exclusive
flag check in validate_change(). This is the initial step to enable us
to retire the use of cpu_exclusive flag in cgroup v2 in the future.

One of the tests in the TEST_MATRIX of the test_cpuset_prs.sh
script has to be updated due to changes in the way a child remote
partition root is being handled (updated instead of invalidation)
in update_cpumasks_hier().

Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-03-31 13:25:56 -10:00
Linus Torvalds 494e7fe591 bpf_res_spin_lock
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmfcq3kACgkQ6rmadz2v
 bToxkw/8DHIqjVnzU2O9hbRM1anYo6yM8e34IxCt0ajHTSEVJ93+C161QDWo/6Dk
 +RNlaeGekaBUk+QOLb4u+rzZ2eR/pWSm37xuDRAiBCQ+3MgR60gGRaSljpS3IUem
 0FvS6C1HObBCEUXMU2rNv/5cJB5/qrQYa9FEEjRvBTLqgQkdS7yaW/KKuZaNb+Ts
 KiEeWvPrPSZXStfRGy8Wr4eS2rYhxPAikUR+xde9CM+HtMWwKTCTSp8qXrqA92Dj
 Cz9ix01scznuf78QCRDZp09im3lZys8ZQprmPgMxyEscN+CDL7n68wAhmTJq0uo3
 3NqIv7zBQ8wMChj0f0HjwZ0Wrj7BJAveY2Q0RterxdzT4vMKdtNkThX46ISaCoX/
 XQAAhZHemK6MvBJk+LKkqqMgrD+3FAzvY7O+SCyUBAMs4FK1myRJQihdLXHGfiBU
 DMDZE1jsE8qBaeUbz4LIuCy8fx2LhtVwVNwbNIBUZHdyfjxIXnQT/8Cnrgklwy2i
 tnYekhAsHDQY+QDkrvJpc4E1vUtiXwSDI5ErcnWdSzctEOyVeUg7OuuGD4riCd1c
 emdJmtASM1z9Ajqa1dytDxVaF6wjKlbhQgnKamuex5JLGCK6makk8ZoB+DBfKYHD
 VoWummTu8ldf+Dp4ehBh7AbeF2vn4kLqcF1PLRsBO6ytJs4HIt8=
 =5O7h
 -----END PGP SIGNATURE-----

Merge tag 'bpf_res_spin_lock' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Pull bpf relisient spinlock support from Alexei Starovoitov:
 "This patch set introduces Resilient Queued Spin Lock (or rqspinlock
  with res_spin_lock() and res_spin_unlock() APIs).

  This is a qspinlock variant which recovers the kernel from a stalled
  state when the lock acquisition path cannot make forward progress.
  This can occur when a lock acquisition attempt enters a deadlock
  situation (e.g. AA, or ABBA), or more generally, when the owner of the
  lock (which we’re trying to acquire) isn’t making forward progress.
  Deadlock detection is the main mechanism used to provide instant
  recovery, with the timeout mechanism acting as a final line of
  defense. Detection is triggered immediately when beginning the waiting
  loop of a lock slow path.

  Additionally, BPF programs attached to different parts of the kernel
  can introduce new control flow into the kernel, which increases the
  likelihood of deadlocks in code not written to handle reentrancy.
  There have been multiple syzbot reports surfacing deadlocks in
  internal kernel code due to the diverse ways in which BPF programs can
  be attached to different parts of the kernel. By switching the BPF
  subsystem’s lock usage to rqspinlock, all of these issues are
  mitigated at runtime.

  This spin lock implementation allows BPF maps to become safer and
  remove mechanisms that have fallen short in assuring safety when
  nesting programs in arbitrary ways in the same context or across
  different contexts.

  We run benchmarks that stress locking scalability and perform
  comparison against the baseline (qspinlock). For the rqspinlock case,
  we replace the default qspinlock with it in the kernel, such that all
  spin locks in the kernel use the rqspinlock slow path. As such,
  benchmarks that stress kernel spin locks end up exercising rqspinlock.

  More details in the cover letter in commit 6ffb9017e9 ("Merge branch
  'resilient-queued-spin-lock'")"

* tag 'bpf_res_spin_lock' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (24 commits)
  selftests/bpf: Add tests for rqspinlock
  bpf: Maintain FIFO property for rqspinlock unlock
  bpf: Implement verifier support for rqspinlock
  bpf: Introduce rqspinlock kfuncs
  bpf: Convert lpm_trie.c to rqspinlock
  bpf: Convert percpu_freelist.c to rqspinlock
  bpf: Convert hashtab.c to rqspinlock
  rqspinlock: Add locktorture support
  rqspinlock: Add entry to Makefile, MAINTAINERS
  rqspinlock: Add macros for rqspinlock usage
  rqspinlock: Add basic support for CONFIG_PARAVIRT
  rqspinlock: Add a test-and-set fallback
  rqspinlock: Add deadlock detection and recovery
  rqspinlock: Protect waiters in trylock fallback from stalls
  rqspinlock: Protect waiters in queue from stalls
  rqspinlock: Protect pending bit owners from stalls
  rqspinlock: Hardcode cond_acquire loops for arm64
  rqspinlock: Add support for timeouts
  rqspinlock: Drop PV and virtualization support
  rqspinlock: Add rqspinlock.h header
  ...
2025-03-30 13:06:27 -07:00
Linus Torvalds fa593d0f96 bpf-next-6.15
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmfi6ZAACgkQ6rmadz2v
 bTpLOg/+J7xUddPMhlpFAUlifQEadE5hmw6v1tXpM3zyKHzUWJiv/qsx3j8/ckgD
 D+d4P8bqIbI9SSuIS4oZ0+D9pr/g7GYztnoYZmPiYJ7v2AijPuof5dsagFQE8E2y
 rhfbt9KHTMzzkdkTvaAZaITS/HWAoJ2YVRB6gfLex2ghcXYHcgmtKRZniQrbBiFZ
 MIXBN8Rg6HP+pUdIVllSXFcQCb3XIgjPONRAos4hr5tIm+3Ku7Jvkgk2H/9vUcoF
 bdXAcg8xygyH7eY+1l3e7nEPQlG0jUZEsL+tq+vpdoLRLqlIpAUYmwUvqcmq4dPS
 QGFjiUcpDbXlxsUFpzjXHIFto7fXCfND7HEICQPwAncdflIIfYaATSQUfkEexn0a
 wBCFlAChrEzAmg2vFl4EeEr0fdSe/3jswrgKx0m6ctKieMjgloBUeeH4fXOpfkhS
 9tvhuduVFuronlebM8ew4w9T/mBgbyxkE5KkvP4hNeB3ni3N0K6Mary5/u2HyN1e
 lqTlnZxRA4p6lrvxce/mDrR4VSwlKLcSeQVjxAL1afD5KRkuZJnUv7bUhS361vkG
 IjNrQX30EisDAz+X7tMn3ndBf9vVatwFT4+c3yaxlQRor1WofhDfT88HPiyB4QqQ
 Kdx2EHgbQxJp4vkzhp4/OXlTfkihsMEn8egzZuphdPEQ9Y+Jdwg=
 =aN/V
 -----END PGP SIGNATURE-----

Merge tag 'bpf-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Pull bpf updates from Alexei Starovoitov:
 "For this merge window we're splitting BPF pull request into three for
  higher visibility: main changes, res_spin_lock, try_alloc_pages.

  These are the main BPF changes:

   - Add DFA-based live registers analysis to improve verification of
     programs with loops (Eduard Zingerman)

   - Introduce load_acquire and store_release BPF instructions and add
     x86, arm64 JIT support (Peilin Ye)

   - Fix loop detection logic in the verifier (Eduard Zingerman)

   - Drop unnecesary lock in bpf_map_inc_not_zero() (Eric Dumazet)

   - Add kfunc for populating cpumask bits (Emil Tsalapatis)

   - Convert various shell based tests to selftests/bpf/test_progs
     format (Bastien Curutchet)

   - Allow passing referenced kptrs into struct_ops callbacks (Amery
     Hung)

   - Add a flag to LSM bpf hook to facilitate bpf program signing
     (Blaise Boscaccy)

   - Track arena arguments in kfuncs (Ihor Solodrai)

   - Add copy_remote_vm_str() helper for reading strings from remote VM
     and bpf_copy_from_user_task_str() kfunc (Jordan Rome)

   - Add support for timed may_goto instruction (Kumar Kartikeya
     Dwivedi)

   - Allow bpf_get_netns_cookie() int cgroup_skb programs (Mahe Tardy)

   - Reduce bpf_cgrp_storage_busy false positives when accessing cgroup
     local storage (Martin KaFai Lau)

   - Introduce bpf_dynptr_copy() kfunc (Mykyta Yatsenko)

   - Allow retrieving BTF data with BTF token (Mykyta Yatsenko)

   - Add BPF kfuncs to set and get xattrs with 'security.bpf.' prefix
     (Song Liu)

   - Reject attaching programs to noreturn functions (Yafang Shao)

   - Introduce pre-order traversal of cgroup bpf programs (Yonghong
     Song)"

* tag 'bpf-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (186 commits)
  selftests/bpf: Add selftests for load-acquire/store-release when register number is invalid
  bpf: Fix out-of-bounds read in check_atomic_load/store()
  libbpf: Add namespace for errstr making it libbpf_errstr
  bpf: Add struct_ops context information to struct bpf_prog_aux
  selftests/bpf: Sanitize pointer prior fclose()
  selftests/bpf: Migrate test_xdp_vlan.sh into test_progs
  selftests/bpf: test_xdp_vlan: Rename BPF sections
  bpf: clarify a misleading verifier error message
  selftests/bpf: Add selftest for attaching fexit to __noreturn functions
  bpf: Reject attaching fexit/fmod_ret to __noreturn functions
  bpf: Only fails the busy counter check in bpf_cgrp_storage_get if it creates storage
  bpf: Make perf_event_read_output accessible in all program types.
  bpftool: Using the right format specifiers
  bpftool: Add -Wformat-signedness flag to detect format errors
  selftests/bpf: Test freplace from user namespace
  libbpf: Pass BPF token from find_prog_btf_id to BPF_BTF_GET_FD_BY_ID
  bpf: Return prog btf_id without capable check
  bpf: BPF token support for BPF_BTF_GET_FD_BY_ID
  bpf, x86: Fix objtool warning for timed may_goto
  bpf: Check map->record at the beginning of check_and_free_fields()
  ...
2025-03-30 12:43:03 -07:00
Linus Torvalds 0ccff074d6 fwctl first pull request
fwctl is a new subsystem intended to bring some common rules and order to
 the growing pattern of exposing a secure FW interface directly to
 userspace. Unlike existing places like RDMA/DRM/VFIO/uacce that are
 exposing a device for datapath operations fwctl is focused on debugging,
 configuration and provisioning of the device. It will not have the
 necessary features like interrupt delivery to support a datapath.
 
 This concept is similar to the long standing practice in the "HW" RAID
 space of having a device specific misc device to manage the RAID
 controller FW. fwctl generalizes this notion of a companion debug and
 management interface that goes along with a dataplane implemented in an
 appropriate subsystem.
 
 There have been three LWN articles written discussing various aspects of
 this:
 
  https://lwn.net/Articles/955001/
  https://lwn.net/Articles/969383/
  https://lwn.net/Articles/990802/
 
 This pull requests includes three drivers to launch the subsystem:
 
  - CXL provides a vendor scheme for executing commands and a way to learn
    the 'command effects' (ie the security properties) of such
    commands. The fwctl driver allows access to these mechanism within the
    fwctl security model
 
  - mlx5 is family of networking products, the driver supports all current
    Mellanox HW still receiving FW feature updates. This includes RDMA
    multiprotocol NICs like ConnectX and the Bluefield family of Smart
    NICs.
 
  - AMD/Pensando Distributed Services card is a multi protocol Smart NIC
    with a multi PCI function design. fwctl works on the management PCI
    function following a 'command effects' model similar to CXL.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZ939zQAKCRCFwuHvBreF
 YdOoAQCJq59/UC7lXU+sOsR6LISaDVTT5jAweBo0o036P9+DNAEA1iQdZ/GK2yCJ
 Ub33Xo9L+hzIpIbCouI3BtCXqymybAg=
 =f5YG
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-fwctl' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull fwctl subsystem from Jason Gunthorpe:
 "fwctl is a new subsystem intended to bring some common rules and order
  to the growing pattern of exposing a secure FW interface directly to
  userspace.

  Unlike existing places like RDMA/DRM/VFIO/uacce that are exposing a
  device for datapath operations fwctl is focused on debugging,
  configuration and provisioning of the device. It will not have the
  necessary features like interrupt delivery to support a datapath.

  This concept is similar to the long standing practice in the "HW" RAID
  space of having a device specific misc device to manage the RAID
  controller FW. fwctl generalizes this notion of a companion debug and
  management interface that goes along with a dataplane implemented in
  an appropriate subsystem.

  There have been three LWN articles written discussing various aspects
  of this:

	https://lwn.net/Articles/955001/
	https://lwn.net/Articles/969383/
	https://lwn.net/Articles/990802/

  This includes three drivers to launch the subsystem:

   - CXL provides a vendor scheme for executing commands and a way to
     learn the 'command effects' (ie the security properties) of such
     commands. The fwctl driver allows access to these mechanism within
     the fwctl security model

   - mlx5 is family of networking products, the driver supports all
     current Mellanox HW still receiving FW feature updates. This
     includes RDMA multiprotocol NICs like ConnectX and the Bluefield
     family of Smart NICs.

   - AMD/Pensando Distributed Services card is a multi protocol Smart
     NIC with a multi PCI function design. fwctl works on the management
     PCI function following a 'command effects' model similar to CXL"

* tag 'for-linus-fwctl' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (30 commits)
  pds_fwctl: add Documentation entries
  pds_fwctl: add rpc and query support
  pds_fwctl: initial driver framework
  pds_core: add new fwctl auxiliary_device
  pds_core: specify auxiliary_device to be created
  pds_core: make pdsc_auxbus_dev_del() void
  cxl: Fixup kdoc issues for include/cxl/features.h
  fwctl/cxl: Add documentation to FWCTL CXL
  cxl/test: Add Set Feature support to cxl_test
  cxl/test: Add Get Feature support to cxl_test
  cxl: Add support to handle user feature commands for set feature
  cxl: Add support to handle user feature commands for get feature
  cxl: Add support for fwctl RPC command to enable CXL feature commands
  cxl: Move cxl feature command structs to user header
  cxl: Add FWCTL support to CXL
  mlx5: Create an auxiliary device for fwctl_mlx5
  fwctl/mlx5: Support for communicating with mlx5 fw
  fwctl: Add documentation
  fwctl: FWCTL_RPC to execute a Remote Procedure Call to device firmware
  taint: Add TAINT_FWCTL
  ...
2025-03-29 10:45:20 -07:00
Linus Torvalds e5e0e6bebe This update includes the following changes:
API:
 
 - Remove legacy compression interface.
 - Improve scatterwalk API.
 - Add request chaining to ahash and acomp.
 - Add virtual address support to ahash and acomp.
 - Add folio support to acomp.
 - Remove NULL dst support from acomp.
 
 Algorithms:
 
 - Library options are fuly hidden (selected by kernel users only).
 - Add Kerberos5 algorithms.
 - Add VAES-based ctr(aes) on x86.
 - Ensure LZO respects output buffer length on compression.
 - Remove obsolete SIMD fallback code path from arm/ghash-ce.
 
 Drivers:
 
 - Add support for PCI device 0x1134 in ccp.
 - Add support for rk3588's standalone TRNG in rockchip.
 - Add Inside Secure SafeXcel EIP-93 crypto engine support in eip93.
 - Fix bugs in tegra uncovered by multi-threaded self-test.
 - Fix corner cases in hisilicon/sec2.
 
 Others:
 
 - Add SG_MITER_LOCAL to sg miter.
 - Convert ubifs, hibernate and xfrm_ipcomp from legacy API to acomp.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEn51F/lCuNhUwmDeSxycdCkmxi6cFAmfiQ9kACgkQxycdCkmx
 i6fFZg/9GWjC1FLEV66vNlYAIzFGwzwWdFGyQzXyP235Cphhm4qt9gx7P91N6Lvc
 pplVjNEeZHoP8lMw+AIeGc2cRhIwsvn8C+HA3tCBOoC1qSe8T9t7KHAgiRGd/0iz
 UrzVBFLYlR9i4tc0T5peyQwSctv8DfjWzduTmI3Ts8i7OQcfeVVgj3sGfWam7kjF
 1GJWIQH7aPzT8cwFtk8gAK1insuPPZelT1Ppl9kUeZe0XUibrP7Gb5G9simxXAyi
 B+nLCaJYS6Hc1f47cfR/qyZSeYQN35KTVrEoKb1pTYXfEtMv6W9fIvQVLJRYsqpH
 RUBdDJUseE+WckR6glX9USrh+Fv9d+HfsTXh1fhpApKU5sQJ7pDbUm4ge8p6htNG
 MIszbJPdqajYveRLuPUjFlUXaqomos8eT6BZA+RLHm1cogzEOm+5bjspbfRNAVPj
 x9KiDu5lXNiFj02v/MkLKUe3bnGIyVQnZNi7Rn0Rpxjv95tIjVpksZWMPJarxUC6
 5zdyM2I5X0Z9+teBpbfWyqfzSbAs/KpzV8S/xNvWDUT6NlpYGBeNXrCDTXcwJLAh
 PRW0w1EJUwsZbPi8GEh5jNzo/YK1cGsUKrihKv7YgqSSopMLI8e/WVr8nKZMVDFA
 O+6F6ec5lR7KsOIMGUqrBGFU1ccAeaLLvLK3H5J8//gMMg82Uik=
 =aQNt
 -----END PGP SIGNATURE-----

Merge tag 'v6.15-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto updates from Herbert Xu:
 "API:
   - Remove legacy compression interface
   - Improve scatterwalk API
   - Add request chaining to ahash and acomp
   - Add virtual address support to ahash and acomp
   - Add folio support to acomp
   - Remove NULL dst support from acomp

  Algorithms:
   - Library options are fuly hidden (selected by kernel users only)
   - Add Kerberos5 algorithms
   - Add VAES-based ctr(aes) on x86
   - Ensure LZO respects output buffer length on compression
   - Remove obsolete SIMD fallback code path from arm/ghash-ce

  Drivers:
   - Add support for PCI device 0x1134 in ccp
   - Add support for rk3588's standalone TRNG in rockchip
   - Add Inside Secure SafeXcel EIP-93 crypto engine support in eip93
   - Fix bugs in tegra uncovered by multi-threaded self-test
   - Fix corner cases in hisilicon/sec2

  Others:
   - Add SG_MITER_LOCAL to sg miter
   - Convert ubifs, hibernate and xfrm_ipcomp from legacy API to acomp"

* tag 'v6.15-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (187 commits)
  crypto: testmgr - Add multibuffer acomp testing
  crypto: acomp - Fix synchronous acomp chaining fallback
  crypto: testmgr - Add multibuffer hash testing
  crypto: hash - Fix synchronous ahash chaining fallback
  crypto: arm/ghash-ce - Remove SIMD fallback code path
  crypto: essiv - Replace memcpy() + NUL-termination with strscpy()
  crypto: api - Call crypto_alg_put in crypto_unregister_alg
  crypto: scompress - Fix incorrect stream freeing
  crypto: lib/chacha - remove unused arch-specific init support
  crypto: remove obsolete 'comp' compression API
  crypto: compress_null - drop obsolete 'comp' implementation
  crypto: cavium/zip - drop obsolete 'comp' implementation
  crypto: zstd - drop obsolete 'comp' implementation
  crypto: lzo - drop obsolete 'comp' implementation
  crypto: lzo-rle - drop obsolete 'comp' implementation
  crypto: lz4hc - drop obsolete 'comp' implementation
  crypto: lz4 - drop obsolete 'comp' implementation
  crypto: deflate - drop obsolete 'comp' implementation
  crypto: 842 - drop obsolete 'comp' implementation
  crypto: nx - Migrate to scomp API
  ...
2025-03-29 10:01:55 -07:00
Caleb Sander Mateos 25aaa81371 selftests: ublk: specify io_cmd_buf pointer type
Matching the ublk driver, change the type of io_cmd_buf from char * to
struct ublksrv_io_desc *.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/r/20250328194230.2726862-3-csander@purestorage.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-29 05:56:36 -06:00
Linus Torvalds 7d06015d93 pci-v6.15-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmfmt1AUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vxqNw//cC1BlVe4UUVR5r9nfpoFeGAZeDJz
 32naGvZKjwL0tR6dStO/BEZx4QrBp+smVfJfuxtQxfzLHLgMigM2jVhfa7XUmaun
 7yZlJZu4Jmydc57sPf56CFOYOFP6zyPzSaE8u1Eb4IIqpvuoYpvDayDt6PSsLmFS
 PDzqmicT3nuNbbcfE4rYLyL6JsXooKCR1h+NNcDjy7Run9DvQbE6N2PPvXCu6O97
 aC3+kYUydEpgn9DfjBDghe+GBQCkBPldwnWqXBxKDFmYj5bKFujNccS9/IDSEuuX
 oWntDRAXgLWg048sBC1AuJQajF3UaqffRGJkzUBaZWbU/jB9t5N/Z3GpYlXzizRx
 CAqnt/ciGUKVbaESKwoeeIqgK+wG1bnrmoEaJHXFGqjr6sjm2A2T5EzyBMJ1hFwE
 wUq6SDnkp5igG7rWtsBPo/lGa5h/pNlaXng11570ikD2ZfHVfRgwy2MpXYxChrkt
 X2q/lRYU7yyfNJQ8O5LQJ6bYztatjxT0TxXNFv+cxfVrdI7vnQMuaeBE352jn+Lo
 aZ8fTJDFbRXtrZolcoetZjBdHwPIS42wYQtvxo/ylUl64xKEzZEzN09XODjwn74v
 nuOzDtbM0TAjZWxi6bwRFPnemTEAQxPuv1i4VdPQ7yblvC0at2agNBsz6Fdq60GS
 s80YMJVUU0UcVoc=
 =gM1i
 -----END PGP SIGNATURE-----

Merge tag 'pci-v6.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull pci updates from Bjorn Helgaas:
 "Enumeration:

   - Enable Configuration RRS SV, which makes device readiness visible,
     early instead of during child bus scanning (Bjorn Helgaas)

   - Log debug messages about reset methods being used (Bjorn Helgaas)

   - Avoid reset when it has been disabled via sysfs (Nishanth
     Aravamudan)

   - Add common pci-ep-bus.yaml schema for exporting several peripherals
     of a single PCI function via devicetree (Andrea della Porta)

   - Create DT nodes for PCI host bridges to enable loading device tree
     overlays to create platform devices for PCI devices that have
     several features that require multiple drivers (Herve Codina)

  Resource management:

   - Enlarge devres table[] to accommodate bridge windows, ROM, IOV
     BARs, etc., and validate BAR index in devres interfaces (Philipp
     Stanner)

   - Fix typo that repeatedly distributed resources to a bridge instead
     of iterating over subordinate bridges, which resulted in too little
     space to assign some BARs (Kai-Heng Feng)

   - Relax bridge window tail sizing for optional resources, e.g., IOV
     BARs, to avoid failures when removing and re-adding devices (Ilpo
     Järvinen)

   - Allow drivers to enable devices even if we haven't assigned
     optional IOV resources to them (Ilpo Järvinen)

   - Rework handling of optional resources (IOV BARs, ROMs) to reduce
     failures if we can't allocate them (Ilpo Järvinen)

   - Fix a NULL dereference in the SR-IOV VF creation error path (Shay
     Drory)

   - Fix s390 mmio_read/write syscalls, which didn't cause page faults
     in some cases, which broke vfio-pci lazy mapping on first access
     (Niklas Schnelle)

   - Add pdev->non_mappable_bars to replace CONFIG_VFIO_PCI_MMAP, which
     was disabled only for s390 (Niklas Schnelle)

   - Support mmap of PCI resources on s390 except for ISM devices
     (Niklas Schnelle)

  ASPM:

   - Delay pcie_link_state deallocation to avoid dangling pointers that
     cause invalid references during hot-unplug (Daniel Stodden)

  Power management:

   - Allow PCI bridges to go to D3Hot when suspending on all non-x86
     systems (Manivannan Sadhasivam)

  Power control:

   - Create pwrctrl devices in pci_scan_device() to make it more
     symmetric with pci_pwrctrl_unregister() and make pwrctrl devices
     for PCI bridges possible (Manivannan Sadhasivam)

   - Unregister pwrctrl devices in pci_destroy_dev() so DOE, ASPM, etc.
     can still access devices after pci_stop_dev() (Manivannan
     Sadhasivam)

   - If there's a pwrctrl device for a PCI device, skip scanning it
     because the pwrctrl core will rescan the bus after the device is
     powered on (Manivannan Sadhasivam)

   - Add a pwrctrl driver for PCI slots based on voltage regulators
     described via devicetree (Manivannan Sadhasivam)

  Bandwidth control:

   - Add set_pcie_speed.sh to TEST_PROGS to fix issue when executing the
     set_pcie_cooling_state.sh test case (Yi Lai)

   - Avoid a NULL pointer dereference when we run out of bus numbers to
     assign for a bridge secondary bus (Lukas Wunner)

  Hotplug:

   - Drop superfluous pci_hotplug_slot_list, try_module_get() calls, and
     NULL pointer checks (Lukas Wunner)

   - Drop shpchp module init/exit logging, replace shpchp dbg() with
     ctrl_dbg(), and remove unused dbg(), err(), info(), warn() wrappers
     (Ilpo Järvinen)

   - Drop 'shpchp_debug' module parameter in favor of standard dynamic
     debugging (Ilpo Järvinen)

   - Drop unused cpcihp .get_power(), .set_power() function pointers
     (Guilherme Giacomo Simoes)

   - Disable hotplug interrupts in portdrv only when pciehp is not
     enabled to avoid issuing two hotplug commands too close together
     (Feng Tang)

   - Skip pciehp 'device replaced' check if the device has been removed
     to address a deadlock when resuming after a device was removed
     during system sleep (Lukas Wunner)

   - Don't enable pciehp hotplug interupt when resuming in poll mode
     (Ilpo Järvinen)

  Virtualization:

   - Fix bugs in 'pci=config_acs=' kernel command line parameter (Tushar
     Dave)

  DOE:

   - Expose supported DOE features via sysfs (Alistair Francis)

   - Allow DOE support to be enabled even if CXL isn't enabled (Alistair
     Francis)

  Endpoint framework:

   - Convert PCI device data so pci-epf-test works correctly on
     big-endian endpoint systems (Niklas Cassel)

   - Add BAR_RESIZABLE type to endpoint framework and add DWC core
     support for EPF drivers to set BAR_RESIZABLE type and size (Niklas
     Cassel)

   - Fix pci-epf-test double free that causes an oops if the host
     reboots and PERST# deassertion restarts endpoint BAR allocation
     (Christian Bruel)

   - Fix endpoint BAR testing so tests can skip disabled BARs instead of
     reporting them as failures (Niklas Cassel)

   - Widen endpoint test BAR size variable to accommodate BARs larger
     than INT_MAX (Niklas Cassel)

   - Remove unused tools 'pci' build target left over after moving tests
     to tools/testing/selftests/pci_endpoint (Jianfeng Liu)

  Altera PCIe controller driver:

   - Add DT binding and driver support for Agilex family (P-Tile,
     F-Tile, R-Tile) (Matthew Gerlach and D M, Sharath Kumar)

  AMD MDB PCIe controller driver:

   - Add DT binding and driver for AMD MDB (Multimedia DMA Bridge)
     (Thippeswamy Havalige)

  Broadcom STB PCIe controller driver:

   - Add BCM2712 MSI-X DT binding and interrupt controller drivers and
     add softdep on irq_bcm2712_mip driver to ensure that it is loaded
     first (Stanimir Varbanov)

   - Expand inbound window map to 64GB so it can accommodate BCM2712
     (Stanimir Varbanov)

   - Add BCM2712 support and DT updates (Stanimir Varbanov)

   - Apply link speed restriction before bringing link up, not after
     (Jim Quinlan)

   - Update Max Link Speed in Link Capabilities via the internal
     writable register, not the read-only config register (Jim Quinlan)

   - Handle regulator_bulk_get() error to avoid panic when we call
     regulator_bulk_free() later (Jim Quinlan)

   - Disable regulators only when removing the bus immediately below a
     Root Port because we don't support regulators deeper in the
     hierarchy (Jim Quinlan)

   - Make const read-only arrays static (Colin Ian King)

  Cadence PCIe endpoint driver:

   - Correct MSG TLP generation so endpoints can generate INTx messages
     (Hans Zhang)

  Freescale i.MX6 PCIe controller driver:

   - Identify the second controller on i.MX8MQ based on devicetree
     'linux,pci-domain' instead of DBI 'reg' address (Richard Zhu)

   - Remove imx_pcie_cpu_addr_fixup() since dwc core can now derive the
     ATU input address (using parent_bus_offset) from devicetree (Frank
     Li)

  Freescale Layerscape PCIe controller driver:

   - Drop deprecated 'num-ib-windows' and 'num-ob-windows' and
     unnecessary 'status' from example (Krzysztof Kozlowski)

   - Correct the syscon_regmap_lookup_by_phandle_args("fsl,pcie-scfg")
     arg_count to fix probe failure on LS1043A (Ioana Ciornei)

  HiSilicon STB PCIe controller driver:

   - Call phy_exit() to clean up if histb_pcie_probe() fails (Christophe
     JAILLET)

  Intel Gateway PCIe controller driver:

   - Remove intel_pcie_cpu_addr() since dwc core can now derive the ATU
     input address (using parent_bus_offset) from devicetree (Frank Li)

  Intel VMD host bridge driver:

   - Convert vmd_dev.cfg_lock from spinlock_t to raw_spinlock_t so
     pci_ops.read() will never sleep, even on PREEMPT_RT where
     spinlock_t becomes a sleepable lock, to avoid calling a sleeping
     function from invalid context (Ryo Takakura)

  MediaTek PCIe Gen3 controller driver:

   - Remove leftover mac_reset assert for Airoha EN7581 SoC (Lorenzo
     Bianconi)

   - Add EN7581 PBUS controller 'mediatek,pbus-csr' DT property and
     program host bridge memory aperture to this syscon node (Lorenzo
     Bianconi)

  Qualcomm PCIe controller driver:

   - Add qcom,pcie-ipq5332 binding (Varadarajan Narayanan)

   - Add qcom i.MX8QM and i.MX8QXP/DXP optional DMA interrupt (Alexander
     Stein)

   - Add optional dma-coherent DT property for Qualcomm SA8775P (Dmitry
     Baryshkov)

   - Make DT iommu property required for SA8775P and prohibited for
     SDX55 (Dmitry Baryshkov)

   - Add DT IOMMU and DMA-related properties for Qualcomm SM8450 (Dmitry
     Baryshkov)

   - Add endpoint DT properties for SAR2130P and enable endpoint mode in
     driver (Dmitry Baryshkov)

   - Describe endpoint BAR0 and BAR2 as 64-bit only and BAR1 and BAR3 as
     RESERVED (Manivannan Sadhasivam)

  Rockchip DesignWare PCIe controller driver:

   - Describe rk3568 and rk3588 BARs as Resizable, not Fixed (Niklas
     Cassel)

  Synopsys DesignWare PCIe controller driver:

   - Add debugfs-based Silicon Debug, Error Injection, Statistical
     Counter support for DWC (Shradha Todi)

   - Add debugfs property to expose LTSSM status of DWC PCIe link (Hans
     Zhang)

   - Add Rockchip support for DWC debugfs features (Niklas Cassel)

   - Add dw_pcie_parent_bus_offset() to look up the parent bus address
     of a specified 'reg' property and return the offset from the CPU
     physical address (Frank Li)

   - Use dw_pcie_parent_bus_offset() to derive CPU -> ATU addr offset
     via 'reg[config]' for host controllers and 'reg[addr_space]' for
     endpoint controllers (Frank Li)

   - Apply struct dw_pcie.parent_bus_offset in ATU users to remove use
     of .cpu_addr_fixup() when programming ATU (Frank Li)

  TI J721E PCIe driver:

   - Correct the 'link down' interrupt bit for J784S4 (Siddharth
     Vadapalli)

  TI Keystone PCIe controller driver:

   - Describe AM65x BARs 2 and 5 as Resizable (not Fixed) and reduce
     alignment requirement from 1MB to 64KB (Niklas Cassel)

  Xilinx Versal CPM PCIe controller driver:

   - Free IRQ domain in probe error path to avoid leaking it
     (Thippeswamy Havalige)

   - Add DT .compatible "xlnx,versal-cpm5nc-host" and driver support for
     Versal Net CPM5NC Root Port controller (Thippeswamy Havalige)

   - Add driver support for CPM5_HOST1 (Thippeswamy Havalige)

  Miscellaneous:

   - Convert fsl,mpc83xx-pcie binding to YAML (J. Neuschäfer)

   - Use for_each_available_child_of_node_scoped() to simplify apple,
     kirin, mediatek, mt7621, tegra drivers (Zhang Zekun)"

* tag 'pci-v6.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (197 commits)
  PCI: layerscape: Fix arg_count to syscon_regmap_lookup_by_phandle_args()
  PCI: j721e: Fix the value of .linkdown_irq_regfield for J784S4
  misc: pci_endpoint_test: Add support for PCITEST_IRQ_TYPE_AUTO
  PCI: endpoint: pci-epf-test: Expose supported IRQ types in CAPS register
  PCI: dw-rockchip: Endpoint mode cannot raise INTx interrupts
  PCI: endpoint: Add intx_capable to epc_features struct
  dt-bindings: PCI: Add common schema for devices accessible through PCI BARs
  PCI: intel-gw: Remove intel_pcie_cpu_addr()
  PCI: imx6: Remove imx_pcie_cpu_addr_fixup()
  PCI: dwc: Use parent_bus_offset to remove need for .cpu_addr_fixup()
  PCI: dwc: ep: Ensure proper iteration over outbound map windows
  PCI: dwc: ep: Use devicetree 'reg[addr_space]' to derive CPU -> ATU addr offset
  PCI: dwc: ep: Consolidate devicetree handling in dw_pcie_ep_get_resources()
  PCI: dwc: ep: Call epc_create() early in dw_pcie_ep_init()
  PCI: dwc: Use devicetree 'reg[config]' to derive CPU -> ATU addr offset
  PCI: dwc: Add dw_pcie_parent_bus_offset() checking and debug
  PCI: dwc: Add dw_pcie_parent_bus_offset()
  PCI/bwctrl: Fix NULL pointer dereference on bus number exhaustion
  PCI: xilinx-cpm: Add cpm_csr register mapping for CPM5_HOST1 variant
  PCI: brcmstb: Make const read-only arrays static
  ...
2025-03-28 19:36:53 -07:00
Ming Lei c78ae7b71e selftests: ublk: add test for checking zero copy related parameter
ublk zero copy usually requires to set dma and segment parameter correctly,
so hard-code null target's dma & segment parameter in non-default value,
and verify if they are setup correctly by ublk driver.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250327095123.179113-12-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-28 16:15:43 -06:00
Ming Lei 8c77861436 selftests: ublk: add more tests for covering MQ
Add test test_generic_02.sh for covering IO dispatch order in case of MQ.

Especially we just support ->queue_rqs() which may affect IO dispatch
order.

Add test_loop_05.sh and test_stripe_03.sh for covering MQ.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250327095123.179113-11-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-28 16:15:43 -06:00
Linus Torvalds ca0b04ba0b for-6.15/io_uring-rx-zc-20250325
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmfjTP8QHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpm6oEACnpGL52FAKTVj14GDqFo6Pq0Jmnh07x8qj
 mpHFPwxfWAzRiuNyji2iS9ecS2cnlkixNyMWZipXRi4KJAUjJH6YDd7IofUI3Glf
 6v7b6srFSvsWJIJ8LdkJHLHAJuzYnJvFZ8apwgQczEDqgHq7BAunM1sVQ+mydjYk
 EXT4kN6DSBOPzwr9GAay52f8nXhbqdHfT+YTGHPHg+QToojL6gD7vvW57w/QqD/x
 91hJef1z01cSIsDZOxA0EUeD+9bBAHpoamr/e3IOOCVYCN6hy0dGa9g0QGbbpVyE
 AeU4FGZLV9J8OOfvHVraDt5Wn3IXxYaMu22dSn1S6tVinwjXhJR2LAA+t4fGHAkt
 i38LjOsIbopSQn/cNhzwC8UZcHLqnVsdDolHlHzsVFVfcpck2/4JFpUeP8QhWgrk
 f9tY12QUf/oEaWm0/sUCHZNFxpIGeFA5FFXf0Z92clnzBuiuWoesBNvxqY/2DeZn
 IDNXiv+Trxr6kFEjTpzPeuxbWrn4PJ7afQSAFcEmOCguk5riM+zJZNIKg0TxUHSS
 tt6sfxmTP1DhgDKad5kT3MLyzOcx47Kbjf4dj6KmRnD+3DGwwN2F7X7R1GJylPSp
 RLOzJ+Ouuy9UmBN6JMsT4BmR9+FJTVirADU926d/ZqCTtRV8Tnq/6HHmKmmr4CR0
 THJ0PJqQjg==
 =MOve
 -----END PGP SIGNATURE-----

Merge tag 'for-6.15/io_uring-rx-zc-20250325' of git://git.kernel.dk/linux

Pull io_uring zero-copy receive support from Jens Axboe:
 "This adds support for zero-copy receive with io_uring, enabling fast
  bulk receive of data directly into application memory, rather than
  needing to copy the data out of kernel memory.

  While this version only supports host memory as that was the initial
  target, other memory types are planned as well, with notably GPU
  memory coming next.

  This work depends on some networking components which were queued up
  on the networking side, but have now landed in your tree.

  This is the work of Pavel Begunkov and David Wei. From the v14 posting:

    'We configure a page pool that a driver uses to fill a hw rx queue
     to hand out user pages instead of kernel pages. Any data that ends
     up hitting this hw rx queue will thus be dma'd into userspace
     memory directly, without needing to be bounced through kernel
     memory. 'Reading' data out of a socket instead becomes a
     _notification_ mechanism, where the kernel tells userspace where
     the data is. The overall approach is similar to the devmem TCP
     proposal

     This relies on hw header/data split, flow steering and RSS to
     ensure packet headers remain in kernel memory and only desired
     flows hit a hw rx queue configured for zero copy. Configuring this
     is outside of the scope of this patchset.

     We share netdev core infra with devmem TCP. The main difference is
     that io_uring is used for the uAPI and the lifetime of all objects
     are bound to an io_uring instance. Data is 'read' using a new
     io_uring request type. When done, data is returned via a new shared
     refill queue. A zero copy page pool refills a hw rx queue from this
     refill queue directly. Of course, the lifetime of these data
     buffers are managed by io_uring rather than the networking stack,
     with different refcounting rules.

     This patchset is the first step adding basic zero copy support. We
     will extend this iteratively with new features e.g. dynamically
     allocated zero copy areas, THP support, dmabuf support, improved
     copy fallback, general optimisations and more'

  In a local setup, I was able to saturate a 200G link with a single CPU
  core, and at netdev conf 0x19 earlier this month, Jamal reported
  188Gbit of bandwidth using a single core (no HT, including soft-irq).

  Safe to say the efficiency is there, as bigger links would be needed
  to find the per-core limit, and it's considerably more efficient and
  faster than the existing devmem solution"

* tag 'for-6.15/io_uring-rx-zc-20250325' of git://git.kernel.dk/linux:
  io_uring/zcrx: add selftest case for recvzc with read limit
  io_uring/zcrx: add a read limit to recvzc requests
  io_uring: add missing IORING_MAP_OFF_ZCRX_REGION in io_uring_mmap
  io_uring: Rename KConfig to Kconfig
  io_uring/zcrx: fix leaks on failed registration
  io_uring/zcrx: recheck ifq on shutdown
  io_uring/zcrx: add selftest
  net: add documentation for io_uring zcrx
  io_uring/zcrx: add copy fallback
  io_uring/zcrx: throttle receive requests
  io_uring/zcrx: set pp memory provider for an rx queue
  io_uring/zcrx: add io_recvzc request
  io_uring/zcrx: dma-map area for the device
  io_uring/zcrx: implement zerocopy receive pp memory provider
  io_uring/zcrx: grab a net device
  io_uring/zcrx: add io_zcrx_area
  io_uring/zcrx: add interface queue and refill queue
2025-03-28 13:45:52 -07:00
Linus Torvalds 7288511606 Landlock update for v6.15-rc1
-----BEGIN PGP SIGNATURE-----
 
 iIYEABYKAC4WIQSVyBthFV4iTW/VU1/l49DojIL20gUCZ+bGgBAcbWljQGRpZ2lr
 b2QubmV0AAoJEOXj0OiMgvbSKmgBAICZsmQTuKMHIXdB7kwA+BX5k++SZcyA+qHN
 0hrJTSMsAP0Uv6NpiPT4CTduqBMRbuMwNhujBczRiok6yaHDbC8eCw==
 =K8XL
 -----END PGP SIGNATURE-----

Merge tag 'landlock-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux

Pull landlock updates from Mickaël Salaün:
 "This brings two main changes to Landlock:

   - A signal scoping fix with a new interface for user space to know if
     it is compatible with the running kernel.

   - Audit support to give visibility on why access requests are denied,
     including the origin of the security policy, missing access rights,
     and description of object(s). This was designed to limit log spam
     as much as possible while still alerting about unexpected blocked
     access.

  With these changes come new and improved documentation, and a lot of
  new tests"

* tag 'landlock-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux: (36 commits)
  landlock: Add audit documentation
  selftests/landlock: Add audit tests for network
  selftests/landlock: Add audit tests for filesystem
  selftests/landlock: Add audit tests for abstract UNIX socket scoping
  selftests/landlock: Add audit tests for ptrace
  selftests/landlock: Test audit with restrict flags
  selftests/landlock: Add tests for audit flags and domain IDs
  selftests/landlock: Extend tests for landlock_restrict_self(2)'s flags
  selftests/landlock: Add test for invalid ruleset file descriptor
  samples/landlock: Enable users to log sandbox denials
  landlock: Add LANDLOCK_RESTRICT_SELF_LOG_SUBDOMAINS_OFF
  landlock: Add LANDLOCK_RESTRICT_SELF_LOG_*_EXEC_* flags
  landlock: Log scoped denials
  landlock: Log TCP bind and connect denials
  landlock: Log truncate and IOCTL denials
  landlock: Factor out IOCTL hooks
  landlock: Log file-related denials
  landlock: Log mount-related denials
  landlock: Add AUDIT_LANDLOCK_DOMAIN and log domain status
  landlock: Add AUDIT_LANDLOCK_ACCESS and log ptrace denials
  ...
2025-03-28 12:37:13 -07:00
Yi Liu 7be11d34f6 iommufd: Test attach before detaching pasid
Check if the pasid has been attached before going further in the detach
path. This fixes a crash found by syzkaller. Add a selftest as well.

   Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP KASI
   KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
   CPU: 1 UID: 0 PID: 668 Comm: repro Not tainted 6.14.0-next-20250325-eb4bc4b07f66 #1 PREEMPT(voluntary)
   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org4
   RIP: 0010:iommufd_hw_pagetable_detach+0x8a/0x4d0
   Code: 00 00 00 44 89 ee 48 89 c7 48 89 75 c8 48 89 45 c0 e8 ca 55 17 02 48 89 c2 49 89 c4 48 b8 00 00 00b
   RSP: 0018:ffff888021b17b78 EFLAGS: 00010246
   RAX: dffffc0000000000 RBX: ffff888014b5a000 RCX: ffff888021b17a64
   RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88801dad07fc
   RBP: ffff888021b17bc8 R08: 0000000000000001 R09: 0000000000000001
   R10: 0000000000000001 R11: ffff88801dad0e58 R12: 0000000000000000
   R13: 0000000000000001 R14: ffff888021b17e18 R15: ffff8880132d3008
   FS:  00007fca52013600(0000) GS:ffff8880e3684000(0000) knlGS:0000000000000000
   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
   CR2: 00000000200006c0 CR3: 00000000112d0005 CR4: 0000000000770ef0
   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
   DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400
   PKRU: 55555554
   Call Trace:
    <TASK>
    iommufd_device_detach+0x2a/0x2e0
    iommufd_test+0x2f99/0x5cd0
    iommufd_fops_ioctl+0x38e/0x520
    __x64_sys_ioctl+0x1ba/0x220
    x64_sys_call+0x122e/0x2150
    do_syscall_64+0x6d/0x150
    entry_SYSCALL_64_after_hwframe+0x76/0x7e

Link: https://patch.msgid.link/r/20250328133448.22052-1-yi.l.liu@intel.com
Reported-by: Lai Yi <yi1.lai@linux.intel.com>
Closes: https://lore.kernel.org/linux-iommu/Z+X0tzxhiaupJT7b@ly-workstation
Fixes: c0e301b297 ("iommufd/device: Add pasid_attach array to track per-PASID attach")
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-28 11:40:41 -03:00
Yi Liu 6d9500bb1f iommufd/selftest: Add coverage for reporting max_pasid_log2 via IOMMU_HW_INFO
IOMMU_HW_INFO is extended to report max_pasid_log2, hence add coverage
for it.

Link: https://patch.msgid.link/r/20250321180143.8468-6-yi.l.liu@intel.com
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-28 10:07:23 -03:00
Linus Torvalds 7b667acd69 powerpc updates for 6.15
- Removal of support for IBM Cell Blades
 
  - SMP support for microwatt platform
 
  - Support for inline static calls on PPC32
 
  - Enable pmu selftests for power11 platform
 
  - Enable hardware trace macro (HTM) hcall support
 
  - Support for limited address mode capability
 
  - Changes to RMA size from 512 MB to 768 MB to handle fadump
 
  - Misc fixes and cleanups
 
 Thanks to: Abhishek Dubey, Amit Machhiwal, Andreas Schwab, Arnd Bergmann,
 Athira Rajeev, Avnish Chouhan, Christophe Leroy, Disha Goel, Donet Tom, Gaurav
 Batra, Gautam Menghani, Hari Bathini, Kajol Jain, Kees Cook, Mahesh Salgaonkar,
 Michael Ellerman, Paul Mackerras, Ritesh Harjani (IBM), Sathvika Vasireddy,
 Segher Boessenkool, Sourabh Jain, Vaibhav Jain, Venkat Rao Bagalkote.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEqX2DNAOgU8sBX3pRpnEsdPSHZJQFAmfjZnYACgkQpnEsdPSH
 ZJTmKg/+NGW7wyFY9d8Iai9ncYY7GSzsMSDTaan7qg0QWOd5gjHbsdbava7TM/DW
 8p9XsC+17kSeftRNUjtc52bSN8Ei2gBdsXIagQG1alfB2X2e6wkNauifK+dz3Su6
 usMEZZTO5R/jFDotFXNM1nsUj+8dvjnPgOUrji/P8k7PT5295wpza0hz1fy5SrOA
 hM5cliBP36UgFe5Efvgm4OUX2gQIhbc3stt9MVfymW/k0Mit5f41UIPuVGiTWowY
 s0cUJGkhxUlGXT3VfOVKuZfn4u9KMha7UCl9afSceJzXOdnUIKIbskui1VEv6cD/
 iSIxi839uErAobFHlsLYprgYFciYLII3xe2qNZCA/ZxeIMS/Mm6xokESeWLhBnfa
 P7ke6l0z3GDtTvgI2eSeU9BdrVveF1NgbP9GYSKgT6gtw/kRRnxgHF8tzmLON5PT
 KXpQlzz8VuSBRtF2jnLFU89+FFwSA1bRUhDrp89HyYFqw1B5g4N7kFFTUJWHOuKS
 fwPGy+cveKehmCUBedeTRFqHvvqdwpD/WnPlQzCly3WxqdL8U/eTXYftMiAwuK28
 ovLuSs3vRThKRQ8DnUa5oB0UGsjMpRV5LdvYkhw+x8mZKUR59oj4fx2ae4TtPakg
 dbAYuPPkCORdaSga/nV6vQgsLprFpcGX3dq6E19+BVBAY5D+1PE=
 =GFcj
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Madhavan Srinivasan:

 - Remove support for IBM Cell Blades

 - SMP support for microwatt platform

 - Support for inline static calls on PPC32

 - Enable pmu selftests for power11 platform

 - Enable hardware trace macro (HTM) hcall support

 - Support for limited address mode capability

 - Changes to RMA size from 512 MB to 768 MB to handle fadump

 - Misc fixes and cleanups

Thanks to Abhishek Dubey, Amit Machhiwal, Andreas Schwab, Arnd Bergmann,
Athira Rajeev, Avnish Chouhan, Christophe Leroy, Disha Goel, Donet Tom,
Gaurav Batra, Gautam Menghani, Hari Bathini, Kajol Jain, Kees Cook,
Mahesh Salgaonkar, Michael Ellerman, Paul Mackerras, Ritesh Harjani
(IBM), Sathvika Vasireddy, Segher Boessenkool, Sourabh Jain, Vaibhav
Jain, and Venkat Rao Bagalkote.

* tag 'powerpc-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (61 commits)
  powerpc/kexec: fix physical address calculation in clear_utlb_entry()
  crypto: powerpc: Mark ghashp8-ppc.o as an OBJECT_FILES_NON_STANDARD
  powerpc: Fix 'intra_function_call not a direct call' warning
  powerpc/perf: Fix ref-counting on the PMU 'vpa_pmu'
  KVM: PPC: Enable CAP_SPAPR_TCE_VFIO on pSeries KVM guests
  powerpc/prom_init: Fixup missing #size-cells on PowerBook6,7
  powerpc/microwatt: Add SMP support
  powerpc: Define config option for processors with broadcast TLBIE
  powerpc/microwatt: Define an idle power-save function
  powerpc/microwatt: Device-tree updates
  powerpc/microwatt: Select COMMON_CLK in order to get the clock framework
  net: toshiba: Remove reference to PPC_IBM_CELL_BLADE
  net: spider_net: Remove powerpc Cell driver
  cpufreq: ppc_cbe: Remove powerpc Cell driver
  genirq: Remove IRQ_EDGE_EOI_HANDLER
  docs: Remove reference to removed CBE_CPUFREQ_SPU_GOVERNOR
  powerpc: Remove UDBG_RTAS_CONSOLE
  powerpc/io: Use standard barrier macros in io.c
  powerpc/io: Rename _insw_ns() etc.
  powerpc/io: Use generic raw accessors
  ...
2025-03-27 19:39:08 -07:00
Linus Torvalds a7e135fe59 Probes updates for v6.15:
- probe-events: Add comments about entry data storing code to clarify
   where and how the entry data is stored for function return events.
 
 - probe-events: Log error for exceeding the number of arguments to help
   user to identify error reason via tracefs/error_log file.
 
 - selftests/ftrace: Improve the ftracetest to add followngs.
   . Expand the tprobe event test to check if it can correctly find
     the wrong format tracepoint name.
   . Add new syntax error test to check whether error_log correctly
     indicates a wrong character in the tracepoint name.
   . Add a new dynamic events argument limitation test case which checks
     max number of probe arguments.
 -----BEGIN PGP SIGNATURE-----
 
 iQFPBAABCgA5FiEEh7BulGwFlgAOi5DV2/sHvwUrPxsFAmflTlobHG1hc2FtaS5o
 aXJhbWF0c3VAZ21haWwuY29tAAoJENv7B78FKz8ba9UIALzzZQxzUhJYO/B/XaCz
 KTuSrU2594cLr3nCrmCfL3UHUCr5IcjXKCfUdrgzE9mckWF+nVRXWwbp29KpOQky
 fzU9Ardbr7ksGAYFk4My+P/BeYa7vh9LwofXzWlJibANVxWvq66+GfKnWnh1P8Bl
 /zjov61DAEQmDfXNGZ2oTmKnYYMoPkJU4voRvAEUgiP02SrF04UUa00uC7hmZJJV
 aI5b6TatE5FDEmAoHWh0cf04HoyevRyLVOQd5GSswH/oWpyhs90M7a9WENHamBx6
 RXEZ3xYyuH9zBSvYVeRn2H1eHnGCR4RMctZiRGXF8EdLWo7RXRQ1Hp9CgvwDrU8Z
 IL0=
 =vZ3e
 -----END PGP SIGNATURE-----

Merge tag 'probes-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull probes updates from Masami Hiramatsu:

 - probe-events: Add comments about entry data storing code to clarify
   where and how the entry data is stored for function return events.

 - probe-events: Log error for exceeding the number of arguments to help
   user to identify error reason via tracefs/error_log file.

 - Improve the ftracetest selftests:
    - Expand the tprobe event test to check if it can correctly find the
      wrong format tracepoint name.
    - Add new syntax error test to check whether error_log correctly
      indicates a wrong character in the tracepoint name.
    - Add a new dynamic events argument limitation test case which
      checks max number of probe arguments.

* tag 'probes-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: probe-events: Add comments about entry data storing code
  selftests/ftrace: Add dynamic events argument limitation test case
  selftests/ftrace: Add new syntax error test
  selftests/ftrace: Expand the tprobe event test to check wrong format
  tracing: probe-events: Log error for exceeding the number of arguments
2025-03-27 19:31:34 -07:00
Linus Torvalds dcf9f31c62 Livepatching changes for 6.15
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmflMGMACgkQUqAMR0iA
 lPKHhg/8DjxkQ3tvUpW53pfABBdKELJWOl4hH+IMvGcdh2dZrcFf6OkEB4JeXXVn
 duSODaefalrdU0jV21LChYpEkG40wZqStmamH9im+VbyCc8JwMwvWckrw45RNmjO
 AC4oVh0K9SJcrTj7XWsFSeS38KmQKhtp2zTHpSqCW9FZkfn8si11E8Xzue2kTtEH
 7CESq6N0SMahVfiNkLI1n0x0mUyWiTE43hQ3e+tmmBgtmv3gU4FpQqeTGc/MQUxy
 B6AaI1vChovLmfNhw4KM8GvyPCTIn4xRQ09WXZRbK2ZzAxgFfm7QXGL3E93ra929
 juSoIWnNwzw81RB3apzclOHFTbctHJWL+85KEefqCjrCuEbaoLNGoLkVqNuj0U+D
 LmLqmm6NfEtPzDVBY232KepjFUoIb890eKRF3Wq1NnHonaVxmUSMEMRSdlKeHpSo
 uImBhr5XhXDJLHt4+lAa4m0JSm3RH2AK83xIrdJq/YUcKscCzJMHLqqsp8pCa7mb
 L7iOhHFKs2r6GJCt5kLEw1jkNEg5Lpk08fY5AekjScvNcxXWXF8E8Jl9DnBBoz+5
 e15mvqsiQtcAGHyR1Ohmw8UPYnWY2fSZTmru9aClwHCvC0q+cyCxIZgcqLxFYFIr
 TKIgQ6zSepA+j7SzI1SaDY4Z9cNWMTU+9GqrNO3xO4ddpKrp5C8=
 =ggzJ
 -----END PGP SIGNATURE-----

Merge tag 'livepatching-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching

Pull livepatching updates from Petr Mladek:

 - Add a selftest for tracing of a livepatched function

 - Skip a selftest when kprobes are not using ftrace

 - Some documentation clean up

* tag 'livepatching-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching:
  selftests: livepatch: test if ftrace can trace a livepatched function
  selftests: livepatch: add new ftrace helpers functions
  selftest/livepatch: Only run test-kprobe with CONFIG_KPROBES_ON_FTRACE
  docs: livepatch: move text out of code block
  livepatch: Add comment to clarify klp_add_nops()
2025-03-27 19:26:10 -07:00
Linus Torvalds a10c7949ad linux_kselftest-kunit-6.15-rc1
kunit tool:
 - Changes to kunit tool to use qboot on QEMU x86_64, and build GDB scripts.
 - Fixes kunit tool bug in parsing test plan.
 - Adds test to kunit tool to check parsing late test plan.
 
 kunit:
 - Clarifies kunit_skip() argument name.
 - Adds Kunit check for the longest symbol length.
 - Changes qemu_configs for sparc to use Zilog console.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmfkvDYACgkQCwJExA0N
 Qxwljg//ZRoF/Jncvlb0vapnOIYywHbJEPRVTKfNurRjhb7stAX7CpLKXing4Gtq
 ewy3UXRaAZKg1BvugDYWoUsDDD5o7jx6y9rOMOWM+aAHPzYgxY6gbIzyUVolNZg/
 50/ANMhT0bvME8KBB2k2l6p1NAblzOpH3zH35CCDL/40eVodwMPrhq0V5AqccOaE
 C5Bn+tDiviS6Icw+b/mVUw8fvmoJSTSKvdjaSeRAqThJN3KtqBVyX383++A1zNqy
 Y6tItu9wG06FDjuQ1miOlSMwhgMEYK4TS4GwbX4PUucR8ETaZNUXVviMRou7vMEa
 GGOdtsBG3CBgFNtO2VK1qJLWbJesw2G9+w2oIZ2KQKtyfoF7nDMj+DBO2QD/T+GB
 u2g/xlSDJ5PTzZBMVKENDMy+C9Q+ux8Y2PsQ0fTCdpYgadytKYBFA23EAiZaMdKa
 d1AweNvFS5gi8WkpS8SyMjs0D5pZnKMgHQqOIfRFjCi0HXsGE9RJfkOjLOzRnaOc
 zldLAgDcrhtdG8Xin08bux5UuCoqg/e/RJiXF+xQLLJkE7cltN/CuWMrHX4kija+
 8xmJtj4Oe0p7JCwnIaXjLAQDuFfxHYHM9wM0nKm+YpVJLPSWqSXk4+xtQEOlvZhN
 DJW61ez+pYVCmXuIZ/bgeRzpwXJMfALmI3kn+UtCYwqdTt6Xhp8=
 =h8xS
 -----END PGP SIGNATURE-----

Merge tag 'linux_kselftest-kunit-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kunit updates from Shuah Khan:
 "kunit tool:
   - Changes to kunit tool to use qboot on QEMU x86_64, and build GDB
     scripts
   - Fixes kunit tool bug in parsing test plan
   - Adds test to kunit tool to check parsing late test plan

  kunit:
   - Clarifies kunit_skip() argument name
   - Adds Kunit check for the longest symbol length
   - Changes qemu_configs for sparc to use Zilog console"

* tag 'linux_kselftest-kunit-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  kunit: tool: add test to check parsing late test plan
  kunit: tool: Fix bug in parsing test plan
  Kunit to check the longest symbol length
  kunit: Clarify kunit_skip() argument name
  kunit: tool: Build GDB scripts
  kunit: qemu_configs: sparc: use Zilog console
  kunit: tool: Use qboot on QEMU x86_64
2025-03-27 19:06:07 -07:00
Linus Torvalds 8e324a5c98 linux_kselftest-next-6.15-rc1
Fixes bugs and cleans up code in tracing, ftrace, and user_events tests.
 Adds missing executables to ftrace gitignore.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmfkreMACgkQCwJExA0N
 QxxgKBAAh8oO0zPvbiVDYcOqQBJ4qBpCLr3YWHM92v/nbUbj5teyrPJLGhTstYHy
 khWWzOQBN+Kz91w355CLqiYZPYTTDEVaZDvxt43cBnU3yc8O72wBgo6adzQA1gzP
 83TGB6B4CVT/6loHGtG40mHA0FHbB0imoCH39+5d9Nb63rOPUriyA5xWvACh4Dnl
 AOMOmtHNAxUWAaTljMrNI5ZiYVsGSIwjwXPYbMg3WUI8qMqKsFWPNScrsCHlQoMX
 EZOwPINwHVYO5IDKWdzwAAuGFWhesmydmnAI+6Ji1xBkCgG/YIIqmu/p0ru8vI/q
 96GqxeuE9tgKZgEcOinOVKvclCbPr2wp9oUSFVzDuxuDBW9ErgyAZqCTYwBDhsxq
 5P+O0qQBgBicYtrD437KF+nEvHdj9yOuhBrAftV70bVeCsCfgUWlAgFqB0bSvrX9
 Fp/apOxc5LBqsDJJmvNiGMIUMWfoL05u83vtrkVhkhA7ZhUl6JFj2Tekgu29urnM
 pMM03hTVOpDRMQ1enV65XPwWkwOakFTnJU1lnGCEdwsklgy73DLueRI0iAc8DyeS
 +m9DwJNWmELHXQ4ugvNZ1A3RAX2E1E95QfQAt/Hmv+t0eRWTpL5idu98GQ7M7Tm9
 6BjpbjUljTh/wGnQ2TIxmflHqIjqkKXYLKgT1+gjjf/6ihFvElU=
 =5VeG
 -----END PGP SIGNATURE-----

Merge tag 'linux_kselftest-next-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull Kselftest updates from Shuah Khan:

 - Fix bugs and clean up code in tracing, ftrace, and user_events tests

 - Add missing executables to ftrace gitignore

* tag 'linux_kselftest-next-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests/ftrace: add 'poll' binary to gitignore
  selftests/ftrace: Use readelf to find entry point in uprobe test
  selftests/user_events: Fix failures caused by test code
  selftests/tracing: Allow some more tests to run in instances
  selftests/ftrace: Clean up triggers after setting them
  selftests/tracing: Test only toplevel README file not the instances
2025-03-27 18:57:58 -07:00
Linus Torvalds 68f090f09b ktest update for v6.15:
- Fix failure of directory of log file not existing
 
   If a LOG_FILE option is set for ktest to log its messages, and the
   directory path does not exist. Then ktest fails. Have ktest attempt
   to create the directory where the log file exists and if that succeeds
   continue on testing.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZ+WDNxQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6quknAQDGW0MUyG3OSP1mdTm/a/kvAc4nnx8i
 OBfT8ZqDPMXAhgEA2jdn/pvOmh0Nm+cofUC8tzWIdaY0Ey1r/QlCOnx4TAQ=
 =QgSs
 -----END PGP SIGNATURE-----

Merge tag 'ktest-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest

Pull ktest update from Steven Rostedt:

 - Fix failure of directory of log file not existing

   If a LOG_FILE option is set for ktest to log its messages, and the
   directory path does not exist. Then ktest fails. Have ktest attempt
   to create the directory where the log file exists and if that
   succeeds continue on testing.

* tag 'ktest-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest:
  ktest: Fix Test Failures Due to Missing LOG_FILE Directories
2025-03-27 18:22:46 -07:00
Bjorn Helgaas cc28c0e5e7 Merge branch 'pci/endpoint-test'
- Fix endpoint BAR testing so the test can skip disabled BARs instead of
  reporting them as failures (Niklas Cassel)

- Verify that pci_endpoint interrupt tests set the correct IRQ type
  (Kunihiko Hayashi)

- Fix interpretation of pci_endpoint_test_bars_read_bar() error returns
  (Niklas Cassel)

- Fix potential string truncation in pci_endpoint_test_probe() (Niklas
  Cassel)

- Increase endpoint test BAR size variable to accommodate BARs larger than
  INT_MAX (Niklas Cassel)

- Release IRQs to avoid leak in pci_endpoint interrupt tests (Kunihiko
  Hayashi)

- Log the correct IRQ type when pci_endpoint IRQ request test fails
  (Kunihiko Hayashi)

- Remove pci_endpoint_test irq_type and no_msi globals; instead use
  test->irq_type (Kunihiko Hayashi)

- Remove unnecessary use of managed IRQ functions in pci_endpoint_test
  (Kunihiko Hayashi)

- Add and use IRQ_TYPE_* defines in pci_endpoint_test (Niklas Cassel)

- Add struct pci_epc_features.intx_capable and note that RK3568 and RK3588
  can't raise INTx interrupts (Niklas Cassel)

- Expose supported IRQ types in CAPS so pci_endpoint_test can set
  appropriate type (Niklas Cassel)

- Add PCITEST_IRQ_TYPE_AUTO to pci_endpoint_test for cases where the IRQ
  type doesn't matter (Niklas Cassel)

* pci/endpoint-test:
  misc: pci_endpoint_test: Add support for PCITEST_IRQ_TYPE_AUTO
  PCI: endpoint: pci-epf-test: Expose supported IRQ types in CAPS register
  PCI: dw-rockchip: Endpoint mode cannot raise INTx interrupts
  PCI: endpoint: Add intx_capable to epc_features struct
  selftests: pci_endpoint: Use IRQ_TYPE_* defines from UAPI header
  misc: pci_endpoint_test: Use IRQ_TYPE_* defines from UAPI header
  PCI: endpoint: pcitest: Add IRQ_TYPE_* defines to UAPI header
  misc: pci_endpoint_test: Do not use managed IRQ functions
  misc: pci_endpoint_test: Remove global 'irq_type' and 'no_msi'
  misc: pci_endpoint_test: Fix 'irq_type' to convey the correct type
  misc: pci_endpoint_test: Fix displaying 'irq_type' after 'request_irq' error
  misc: pci_endpoint_test: Avoid issue of interrupts remaining after request_irq error
  misc: pci_endpoint_test: Handle BAR sizes larger than INT_MAX
  misc: pci_endpoint_test: Give disabled BARs a distinct error code
  misc: pci_endpoint_test: Fix potential truncation in pci_endpoint_test_probe()
  misc: pci_endpoint_test: Fix pci_endpoint_test_bars_read_bar() error handling
  selftests: pci_endpoint: Add GET_IRQTYPE checks to each interrupt test
  selftests: pci_endpoint: Skip disabled BARs
2025-03-27 13:14:46 -05:00
Ayush Jain 5a1bed2327 ktest: Fix Test Failures Due to Missing LOG_FILE Directories
Handle missing parent directories for LOG_FILE path to prevent test
failures. If the parent directories don't exist, create them to ensure
the tests proceed successfully.

Cc: <warthog9@eaglescrag.net>
Link: https://lore.kernel.org/20250307043854.2518539-1-Ayush.jain3@amd.com
Signed-off-by: Ayush Jain <Ayush.jain3@amd.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2025-03-27 09:34:00 -04:00
Masami Hiramatsu (Google) 581a7b26ab selftests/ftrace: Add dynamic events argument limitation test case
Add argument limitation test case for dynamic events.
This is a boudary check for the maximum number of the probe
event arguments.

Link: https://lore.kernel.org/all/174055078295.4079315.14702008939511417359.stgit@mhiramat.tok.corp.google.com/

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2025-03-27 21:19:54 +09:00
Masami Hiramatsu (Google) 168ccc9b99 selftests/ftrace: Add new syntax error test
Add BAD_TP_NAME syntax error message check.

Link: https://lore.kernel.org/all/174055077485.4079315.3624012056141021755.stgit@mhiramat.tok.corp.google.com/

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2025-03-27 21:19:54 +09:00
Masami Hiramatsu (Google) 381af2ab91 selftests/ftrace: Expand the tprobe event test to check wrong format
Expand the tprobe event test case to check wrong tracepoint
format.

Link: https://lore.kernel.org/all/174055076681.4079315.16941322116874021804.stgit@mhiramat.tok.corp.google.com/

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2025-03-27 21:19:54 +09:00
Petr Mladek d11f0d172a Merge branch 'for-6.15/ftrace-test' into for-linus 2025-03-27 12:01:51 +01:00
Linus Torvalds 1a9239bb42 Networking changes for 6.15.
Core & protocols
 ----------------
 
  - Continue Netlink conversions to per-namespace RTNL lock
    (IPv4 routing, routing rules, routing next hops, ARP ioctls).
 
  - Continue extending the use of netdev instance locks. As a driver
    opt-in protect queue operations and (in due course) ethtool
    operations with the instance lock and not RTNL lock.
 
  - Support collecting TCP timestamps (data submitted, sent, acked)
    in BPF, allowing for transparent (to the application) and lower
    overhead tracking of TCP RPC performance.
 
  - Tweak existing networking Rx zero-copy infra to support zero-copy
    Rx via io_uring.
 
  - Optimize MPTCP performance in single subflow mode by 29%.
 
  - Enable GRO on packets which went thru XDP CPU redirect (were queued
    for processing on a different CPU). Improving TCP stream performance
    up to 2x.
 
  - Improve performance of contended connect() by 200% by searching
    for an available 4-tuple under RCU rather than a spin lock.
    Bring an additional 229% improvement by tweaking hash distribution.
 
  - Avoid unconditionally touching sk_tsflags on RX, improving
    performance under UDP flood by as much as 10%.
 
  - Avoid skb_clone() dance in ping_rcv() to improve performance under
    ping flood.
 
  - Avoid FIB lookup in netfilter if socket is available, 20% perf win.
 
  - Rework network device creation (in-kernel) API to more clearly
    identify network namespaces and their roles.
    There are up to 4 namespace roles but we used to have just 2 netns
    pointer arguments, interpreted differently based on context.
 
  - Use sysfs_break_active_protection() instead of trylock to avoid
    deadlocks between unregistering objects and sysfs access.
 
  - Add a new sysctl and sockopt for capping max retransmit timeout
    in TCP.
 
  - Support masking port and DSCP in routing rule matches.
 
  - Support dumping IPv4 multicast addresses with RTM_GETMULTICAST.
 
  - Support specifying at what time packet should be sent on AF_XDP
    sockets.
 
  - Expose TCP ULP diagnostic info (for TLS and MPTCP) to non-admin users.
 
  - Add Netlink YAML spec for WiFi (nl80211) and conntrack.
 
  - Introduce EXPORT_IPV6_MOD() and EXPORT_IPV6_MOD_GPL() for symbols
    which only need to be exported when IPv6 support is built as a module.
 
  - Age FDB entries based on Rx not Tx traffic in VxLAN, similar
    to normal bridging.
 
  - Allow users to specify source port range for GENEVE tunnels.
 
  - netconsole: allow attaching kernel release, CPU ID and task name
    to messages as metadata
 
 Driver API
 ----------
 
  - Continue rework / fixing of Energy Efficient Ethernet (EEE) across
    the SW layers. Delegate the responsibilities to phylink where possible.
    Improve its handling in phylib.
 
  - Support symmetric OR-XOR RSS hashing algorithm.
 
  - Support tracking and preserving IRQ affinity by NAPI itself.
 
  - Support loopback mode speed selection for interface selftests.
 
 Device drivers
 --------------
 
  - Remove the IBM LCS driver for s390.
 
  - Remove the sb1000 cable modem driver.
 
  - Add support for SFP module access over SMBus.
 
  - Add MCTP transport driver for MCTP-over-USB.
 
  - Enable XDP metadata support in multiple drivers.
 
  - Ethernet high-speed NICs:
    - Broadcom (bnxt):
      - add PCIe TLP Processing Hints (TPH) support for new AMD platforms
      - support dumping RoCE queue state for debug
      - opt into instance locking
    - Intel (100G, ice, idpf):
      - ice: rework MSI-X IRQ management and distribution
      - ice: support for E830 devices
      - iavf: add support for Rx timestamping
      - iavf: opt into instance locking
    - nVidia/Mellanox:
      - mlx4: use page pool memory allocator for Rx
      - mlx5: support for one PTP device per hardware clock
      - mlx5: support for 200Gbps per-lane link modes
      - mlx5: move IPSec policy check after decryption
    - AMD/Solarflare:
      - support FW flashing via devlink
    - Cisco (enic):
      - use page pool memory allocator for Rx
      - enable 32, 64 byte CQEs
      - get max rx/tx ring size from the device
    - Meta (fbnic):
      - support flow steering and RSS configuration
      - report queue stats
      - support TCP segmentation
      - support IRQ coalescing
      - support ring size configuration
    - Marvell/Cavium:
      - support AF_XDP
    - Wangxun:
      - support for PTP clock and timestamping
    - Huawei (hibmcge):
      - checksum offload
      - add more statistics
 
  - Ethernet virtual:
    - VirtIO net:
      - aggressively suppress Tx completions, improve perf by 96% with
        1 CPU and 55% with 2 CPUs
      - expose NAPI to IRQ mapping and persist NAPI settings
    - Google (gve):
      - support XDP in DQO RDA Queue Format
      - opt into instance locking
    - Microsoft vNIC:
      - support BIG TCP
 
  - Ethernet NICs consumer, and embedded:
    - Synopsys (stmmac):
      - cleanup Tx and Tx clock setting and other link-focused cleanups
      - enable SGMII and 2500BASEX mode switching for Intel platforms
      - support Sophgo SG2044
    - Broadcom switches (b53):
      - support for BCM53101
    - TI:
      - iep: add perout configuration support
      - icssg: support XDP
    - Cadence (macb):
      - implement BQL
    - Xilinx (axinet):
      - support dynamic IRQ moderation and changing coalescing at runtime
      - implement BQL
      - report standard stats
    - MediaTek:
      - support phylink managed EEE
    - Intel:
      - igc: don't restart the interface on every XDP program change
    - RealTek (r8169):
      - support reading registers of internal PHYs directly
      - increase max jumbo packet size on RTL8125/RTL8126
    - Airoha:
      - support for RISC-V NPU packet processing unit
      - enable scatter-gather and support MTU up to 9kB
    - Tehuti (tn40xx):
      - support cards with TN4010 MAC and an Aquantia AQR105 PHY
 
  - Ethernet PHYs:
    - support for TJA1102S, TJA1121
    - dp83tg720: add randomized polling intervals for link detection
    - dp83822: support changing the transmit amplitude voltage
    - support for LEDs on 88q2xxx
 
  - CAN:
    - canxl: support Remote Request Substitution bit access
    - flexcan: add S32G2/S32G3 SoC
 
  - WiFi:
    - remove cooked monitor support
    - strict mode for better AP testing
    - basic EPCS support
    - OMI RX bandwidth reduction support
    - batman-adv: add support for jumbo frames
 
  - WiFi drivers:
    - RealTek (rtw88):
      - support RTL8814AE and RTL8814AU
    - RealTek (rtw89):
      - switch using wiphy_lock and wiphy_work
      - add BB context to manipulate two PHY as preparation of MLO
      - improve BT-coexistence mechanism to play A2DP smoothly
    - Intel (iwlwifi):
      - add new iwlmld sub-driver for latest HW/FW combinations
    - MediaTek (mt76):
      - preparation for mt7996 Multi-Link Operation (MLO) support
    - Qualcomm/Atheros (ath12k):
      - continued work on MLO
    - Silabs (wfx):
      - Wake-on-WLAN support
 
  - Bluetooth:
    - add support for skb TX SND/COMPLETION timestamping
    - hci_core: enable buffer flow control for SCO/eSCO
    - coredump: log devcd dumps into the monitor
 
  - Bluetooth drivers:
    - intel: add support to configure TX power
    - nxp: handle bootloader error during cmd5 and cmd7
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmfkLC8ACgkQMUZtbf5S
 Irsb5g/+L7oKOf0ALbaV9kxFsoz8AymZfAW9i/27F07omGJGpks8oX6j6rQLgIRO
 OQOFcp7XEdDh1+jh82gHVuPrw2/6lchLtW8ARtzdiQKFr5DRjrsbtua6GRc8iBqA
 DIRCBFoV2HuMkF39Vr09HMa9AZAT7QR2RLsRGpSq8E8Z8xxKz0X7oujs10PFpMTE
 IVKhTrVrk+NDot/IU2hzVpnpup+0ld+T2/ZaBklJGcU8uDffImsqNepHRyCG5UC3
 xz74Ju23MAj24Gct+og0yFUooF+lUltKyVm0FYCDCY3bASTwgY01NR3kEH/0NQvM
 cywLzd/ngHm/SMD2ggVAHkjZUieiIVHdaZ53dgjDeBOQoVP6p0dgUK7EumXX8Mx4
 8ReR2UiGoYRPaq9c4o+IjG4K027MwVK2p+mF1a6MLa+20XcyMbev8FIRbbHtC/V4
 z5/FsOAxcuICWkA1hU9bODrrGzIqemmdRgKG8sGuTJCt/kYGAn72/TCATGNSaCJ0
 00n2jN1aepa7wtywHJ5MhVzxN9iQX7+geUHXz0BI+lK4e1Pmk+vjGksymb9ai2fk
 eQAUV9ekub6q68/J16scD7XeOUM37bTLiMBQeIF8UtZBOJscKiS71zn9QP9Twwxv
 P2pm01RDZUI+z5ZX3hc12Pm1vjRHaAh9S1JpAw/pTOVlQ+mAJEM=
 =XY0S
 -----END PGP SIGNATURE-----

Merge tag 'net-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "Core & protocols:

   - Continue Netlink conversions to per-namespace RTNL lock
     (IPv4 routing, routing rules, routing next hops, ARP ioctls)

   - Continue extending the use of netdev instance locks. As a driver
     opt-in protect queue operations and (in due course) ethtool
     operations with the instance lock and not RTNL lock.

   - Support collecting TCP timestamps (data submitted, sent, acked) in
     BPF, allowing for transparent (to the application) and lower
     overhead tracking of TCP RPC performance.

   - Tweak existing networking Rx zero-copy infra to support zero-copy
     Rx via io_uring.

   - Optimize MPTCP performance in single subflow mode by 29%.

   - Enable GRO on packets which went thru XDP CPU redirect (were queued
     for processing on a different CPU). Improving TCP stream
     performance up to 2x.

   - Improve performance of contended connect() by 200% by searching for
     an available 4-tuple under RCU rather than a spin lock. Bring an
     additional 229% improvement by tweaking hash distribution.

   - Avoid unconditionally touching sk_tsflags on RX, improving
     performance under UDP flood by as much as 10%.

   - Avoid skb_clone() dance in ping_rcv() to improve performance under
     ping flood.

   - Avoid FIB lookup in netfilter if socket is available, 20% perf win.

   - Rework network device creation (in-kernel) API to more clearly
     identify network namespaces and their roles. There are up to 4
     namespace roles but we used to have just 2 netns pointer arguments,
     interpreted differently based on context.

   - Use sysfs_break_active_protection() instead of trylock to avoid
     deadlocks between unregistering objects and sysfs access.

   - Add a new sysctl and sockopt for capping max retransmit timeout in
     TCP.

   - Support masking port and DSCP in routing rule matches.

   - Support dumping IPv4 multicast addresses with RTM_GETMULTICAST.

   - Support specifying at what time packet should be sent on AF_XDP
     sockets.

   - Expose TCP ULP diagnostic info (for TLS and MPTCP) to non-admin
     users.

   - Add Netlink YAML spec for WiFi (nl80211) and conntrack.

   - Introduce EXPORT_IPV6_MOD() and EXPORT_IPV6_MOD_GPL() for symbols
     which only need to be exported when IPv6 support is built as a
     module.

   - Age FDB entries based on Rx not Tx traffic in VxLAN, similar to
     normal bridging.

   - Allow users to specify source port range for GENEVE tunnels.

   - netconsole: allow attaching kernel release, CPU ID and task name to
     messages as metadata

  Driver API:

   - Continue rework / fixing of Energy Efficient Ethernet (EEE) across
     the SW layers. Delegate the responsibilities to phylink where
     possible. Improve its handling in phylib.

   - Support symmetric OR-XOR RSS hashing algorithm.

   - Support tracking and preserving IRQ affinity by NAPI itself.

   - Support loopback mode speed selection for interface selftests.

  Device drivers:

   - Remove the IBM LCS driver for s390

   - Remove the sb1000 cable modem driver

   - Add support for SFP module access over SMBus

   - Add MCTP transport driver for MCTP-over-USB

   - Enable XDP metadata support in multiple drivers

   - Ethernet high-speed NICs:
      - Broadcom (bnxt):
         - add PCIe TLP Processing Hints (TPH) support for new AMD
           platforms
         - support dumping RoCE queue state for debug
         - opt into instance locking
      - Intel (100G, ice, idpf):
         - ice: rework MSI-X IRQ management and distribution
         - ice: support for E830 devices
         - iavf: add support for Rx timestamping
         - iavf: opt into instance locking
      - nVidia/Mellanox:
         - mlx4: use page pool memory allocator for Rx
         - mlx5: support for one PTP device per hardware clock
         - mlx5: support for 200Gbps per-lane link modes
         - mlx5: move IPSec policy check after decryption
      - AMD/Solarflare:
         - support FW flashing via devlink
      - Cisco (enic):
         - use page pool memory allocator for Rx
         - enable 32, 64 byte CQEs
         - get max rx/tx ring size from the device
      - Meta (fbnic):
         - support flow steering and RSS configuration
         - report queue stats
         - support TCP segmentation
         - support IRQ coalescing
         - support ring size configuration
      - Marvell/Cavium:
         - support AF_XDP
      - Wangxun:
         - support for PTP clock and timestamping
      - Huawei (hibmcge):
         - checksum offload
         - add more statistics

   - Ethernet virtual:
      - VirtIO net:
         - aggressively suppress Tx completions, improve perf by 96%
           with 1 CPU and 55% with 2 CPUs
         - expose NAPI to IRQ mapping and persist NAPI settings
      - Google (gve):
         - support XDP in DQO RDA Queue Format
         - opt into instance locking
      - Microsoft vNIC:
         - support BIG TCP

   - Ethernet NICs consumer, and embedded:
      - Synopsys (stmmac):
         - cleanup Tx and Tx clock setting and other link-focused
           cleanups
         - enable SGMII and 2500BASEX mode switching for Intel platforms
         - support Sophgo SG2044
      - Broadcom switches (b53):
         - support for BCM53101
      - TI:
         - iep: add perout configuration support
         - icssg: support XDP
      - Cadence (macb):
         - implement BQL
      - Xilinx (axinet):
         - support dynamic IRQ moderation and changing coalescing at
           runtime
         - implement BQL
         - report standard stats
      - MediaTek:
         - support phylink managed EEE
      - Intel:
         - igc: don't restart the interface on every XDP program change
      - RealTek (r8169):
         - support reading registers of internal PHYs directly
         - increase max jumbo packet size on RTL8125/RTL8126
      - Airoha:
         - support for RISC-V NPU packet processing unit
         - enable scatter-gather and support MTU up to 9kB
      - Tehuti (tn40xx):
         - support cards with TN4010 MAC and an Aquantia AQR105 PHY

   - Ethernet PHYs:
      - support for TJA1102S, TJA1121
      - dp83tg720: add randomized polling intervals for link detection
      - dp83822: support changing the transmit amplitude voltage
      - support for LEDs on 88q2xxx

   - CAN:
      - canxl: support Remote Request Substitution bit access
      - flexcan: add S32G2/S32G3 SoC

   - WiFi:
      - remove cooked monitor support
      - strict mode for better AP testing
      - basic EPCS support
      - OMI RX bandwidth reduction support
      - batman-adv: add support for jumbo frames

   - WiFi drivers:
      - RealTek (rtw88):
         - support RTL8814AE and RTL8814AU
      - RealTek (rtw89):
         - switch using wiphy_lock and wiphy_work
         - add BB context to manipulate two PHY as preparation of MLO
         - improve BT-coexistence mechanism to play A2DP smoothly
      - Intel (iwlwifi):
         - add new iwlmld sub-driver for latest HW/FW combinations
      - MediaTek (mt76):
         - preparation for mt7996 Multi-Link Operation (MLO) support
      - Qualcomm/Atheros (ath12k):
         - continued work on MLO
      - Silabs (wfx):
         - Wake-on-WLAN support

   - Bluetooth:
      - add support for skb TX SND/COMPLETION timestamping
      - hci_core: enable buffer flow control for SCO/eSCO
      - coredump: log devcd dumps into the monitor

   - Bluetooth drivers:
      - intel: add support to configure TX power
      - nxp: handle bootloader error during cmd5 and cmd7"

* tag 'net-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1681 commits)
  unix: fix up for "apparmor: add fine grained af_unix mediation"
  mctp: Fix incorrect tx flow invalidation condition in mctp-i2c
  net: usb: asix: ax88772: Increase phy_name size
  net: phy: Introduce PHY_ID_SIZE — minimum size for PHY ID string
  net: libwx: fix Tx L4 checksum
  net: libwx: fix Tx descriptor content for some tunnel packets
  atm: Fix NULL pointer dereference
  net: tn40xx: add pci-id of the aqr105-based Tehuti TN4010 cards
  net: tn40xx: prepare tn40xx driver to find phy of the TN9510 card
  net: tn40xx: create swnode for mdio and aqr105 phy and add to mdiobus
  net: phy: aquantia: add essential functions to aqr105 driver
  net: phy: aquantia: search for firmware-name in fwnode
  net: phy: aquantia: add probe function to aqr105 for firmware loading
  net: phy: Add swnode support to mdiobus_scan
  gve: add XDP DROP and PASS support for DQ
  gve: update XDP allocation path support RX buffer posting
  gve: merge packet buffer size fields
  gve: update GQ RX to use buf_size
  gve: introduce config-based allocation for XDP
  gve: remove xdp_xsk_done and xdp_xsk_wakeup statistics
  ...
2025-03-26 21:48:21 -07:00
Linus Torvalds 592329e5e9 Summary
* Move vm_table members out of kernel/sysctl.c
 
   All vm_table array members have moved to their respective subsystems leading
   to the removal of vm_table from kernel/sysctl.c. This increases modularity by
   placing the ctl_tables closer to where they are actually used and at the same
   time reducing the chances of merge conflicts in kernel/sysctl.c.
 
 * ctl_table range fixes
 
   Replace the proc_handler function that checks variable ranges in
   coredump_sysctls and vdso_table with the one that actually uses the extra{1,2}
   pointers as min/max values. This tightens the range of the values that users
   can pass into the kernel effectively preventing {under,over}flows.
 
 * Misc fixes
 
   Correct grammar errors and typos in test messages. Update sysctl files in
   MAINTAINERS. Constified and removed array size in declaration for
   alignment_tbl
 
 * Testing
 
   - These have all been in linux-next for at least 1 month
   - They have gone through 0-day
   - Ran all these through sysctl selftests in x86_64
 -----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEErkcJVyXmMSXOyyeQupfNUreWQU8FAmfhV8EACgkQupfNUreW
 QU/udAv/VCXGkndQsJ5biXpXYFnokX0gIEaYzzHiqrFycZqr8ys0/wWzc+ar1LjF
 Jvanl2uKB0mUviLKt7Gk0+Hri+PJlYIrbx+5K5eo2wsKUUxFykqLLm59y/orPODl
 gyPQjKNpHJb7COsnEc3Lrq/fvol4NPHlcBPXG8NwehccTeBHZ1ninfo+pSnxh3o8
 kI3GSLLxD4K9AgBl5QuVWH4gU7o//u7lUkKzy03NW+2jmuRv3dRcYF7IdgMINNee
 AeXnygdSBxLzECBvmkfNdyg+AmL8hdsmzbsIh7UuJDvxLlQOInVLZa+sXBotCOIc
 TImCrr1Ws1OuGrD0kpH+21tJvc8pNFWt61QlulObQdrLndWHdZEGyGOusLpXTwbn
 jIWZmMvzk1foSwdgzwPFzUqPEpW3FrBVDo4Z4kenBDrCp56QTX7hGRvkNYJNKvot
 Ue+i8BeHR/Gm/p+UMqgsSTOaNJXTqZhFqwJQVzxU/9LN/vkS0On6fbjgBd5X6Pn+
 a5dlc9gy
 =0bcX
 -----END PGP SIGNATURE-----

Merge tag 'sysctl-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl

Pull sysctl updates from Joel Granados:

 - Move vm_table members out of kernel/sysctl.c

   All vm_table array members have moved to their respective subsystems
   leading to the removal of vm_table from kernel/sysctl.c. This
   increases modularity by placing the ctl_tables closer to where they
   are actually used and at the same time reducing the chances of merge
   conflicts in kernel/sysctl.c.

 - ctl_table range fixes

   Replace the proc_handler function that checks variable ranges in
   coredump_sysctls and vdso_table with the one that actually uses the
   extra{1,2} pointers as min/max values. This tightens the range of the
   values that users can pass into the kernel effectively preventing
   {under,over}flows.

 - Misc fixes

   Correct grammar errors and typos in test messages. Update sysctl
   files in MAINTAINERS. Constified and removed array size in
   declaration for alignment_tbl

* tag 'sysctl-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl: (22 commits)
  selftests/sysctl: fix wording of help messages
  selftests: fix spelling/grammar errors in sysctl/sysctl.sh
  MAINTAINERS: Update sysctl file list in MAINTAINERS
  sysctl: Fix underflow value setting risk in vm_table
  coredump: Fixes core_pipe_limit sysctl proc_handler
  sysctl: remove unneeded include
  sysctl: remove the vm_table
  sh: vdso: move the sysctl to arch/sh/kernel/vsyscall/vsyscall.c
  x86: vdso: move the sysctl to arch/x86/entry/vdso/vdso32-setup.c
  fs: dcache: move the sysctl to fs/dcache.c
  sunrpc: simplify rpcauth_cache_shrink_count()
  fs: drop_caches: move sysctl to fs/drop_caches.c
  fs: fs-writeback: move sysctl to fs/fs-writeback.c
  mm: nommu: move sysctl to mm/nommu.c
  security: min_addr: move sysctl to security/min_addr.c
  mm: mmap: move sysctl to mm/mmap.c
  mm: util: move sysctls to mm/util.c
  mm: vmscan: move vmscan sysctls to mm/vmscan.c
  mm: swap: move sysctl to mm/swap.c
  mm: filemap: move sysctl to mm/filemap.c
  ...
2025-03-26 21:02:05 -07:00
Linus Torvalds 2e3fcbcc3b SCSI misc on 20250326
Updates to the usual drivers (scsi_debug, ufs, lpfc, st, fnic, mpi3mr,
 mpt3sas) and the removal of cxlflash. The only non-trivial core change
 is an addition to unit attention handling to recognize UAs for power
 on/reset and new media so the tape driver can use it.
 
 Signed-off-by: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCZ+RQ2yYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishe6DAQCdW/21
 S1Y6BDlJLQfpWChGv6GIzanC+5sMfylw4d6ULgEA8upOE5L3fC29IY958jXig0o1
 uLjxylwYEfVLDf8gwJ0=
 =mkM+
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI updates from James Bottomley:
 "Updates to the usual drivers (scsi_debug, ufs, lpfc, st, fnic, mpi3mr,
  mpt3sas) and the removal of cxlflash.

  The only non-trivial core change is an addition to unit attention
  handling to recognize UAs for power on/reset and new media so the tape
  driver can use it"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (107 commits)
  scsi: st: Tighten the page format heuristics with MODE SELECT
  scsi: st: ERASE does not change tape location
  scsi: st: Fix array overflow in st_setup()
  scsi: target: tcm_loop: Fix wrong abort tag
  scsi: lpfc: Restore clearing of NLP_UNREG_INP in ndlp->nlp_flag
  scsi: hisi_sas: Fixed failure to issue vendor specific commands
  scsi: fnic: Remove unnecessary NUL-terminations
  scsi: fnic: Remove redundant flush_workqueue() calls
  scsi: core: Use a switch statement when attaching VPD pages
  scsi: ufs: renesas: Add initialization code for R-Car S4-8 ES1.2
  scsi: ufs: renesas: Add reusable functions
  scsi: ufs: renesas: Refactor 0x10ad/0x10af PHY settings
  scsi: ufs: renesas: Remove register control helper function
  scsi: ufs: renesas: Add register read to remove save/set/restore
  scsi: ufs: renesas: Replace init data by init code
  scsi: ufs: dt-bindings: renesas,ufs: Add calibration data
  scsi: mpi3mr: Task Abort EH Support
  scsi: storvsc: Don't report the host packet status as the hv status
  scsi: isci: Make most module parameters static
  scsi: megaraid_sas: Make most module parameters static
  ...
2025-03-26 19:57:34 -07:00
Linus Torvalds 91928e0d3c for-6.15/io_uring-20250322
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmfe8DcQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgprzNEADPOOfi05F0f4KOMLYOiRlB+D9yGtnbcu8k
 foPATU++L5fFDUpjAGobi38AXqHtF/Bx0CjCkPCvy4GjygcszYDY81FQmRlSFdeH
 CnYV/bm5myyupc8xANcWGJYpxf/ZAw7ugOAgcR3Xxk/qXdD5hYJ1jG4477o8v4IJ
 jyzE4vQCnWBogIoqpWTQyY3FSP5fdjeQlYdFGIFCx68rFTO9zEpuHhFr2OMfJtSu
 y1dLkeqbu1BZ0fhgRog9k9ZWk0uRQJI7d7EUXf9iwXOfkUyzCaV+nHvNGRdhJ/XO
 zj/qePw4lFyCgc+kuIejjIaPf/g84paCzshz/PIxOZTsvo6X78zj72RPM0Makhga
 amgtEqegWHtKSya1zew57oDwqwaIe6b+56PtNSP7K8IrwcJLheszoPKJIwiiynIL
 ZqvWseqYQXs58S9qZzSZlyEpwLVFcvyNt7Z/q7b56vhvXHabxOccPkUHt+UscVoQ
 qpG0R9+zmIiDfzY6Vfu7JyH2Uspdq3WjGYiaoVUko1rJXY+HFPd3be9pIbckyaA8
 ZQHIxB+h8IlxhpQ8HJKQpZhtYiluxEu6wDdvtpwC0KiGp5g4Q5eg9SkVyvETIEN8
 yW2tHqMAJJtJjDAPwJOV+GIyWQMIexzLmMP7Fq8R5VIoAox+Xn3flgNNEnMZdl/r
 kTc/hRjHvQ==
 =VEQr
 -----END PGP SIGNATURE-----

Merge tag 'for-6.15/io_uring-20250322' of git://git.kernel.dk/linux

Pull io_uring updates from Jens Axboe:
 "This is the first of the io_uring pull requests for the 6.15 merge
  window, there will be others once the net tree has gone in. This
  contains:

   - Cleanup and unification of cancelation handling across various
     request types.

   - Improvement for bundles, supporting them both for incrementally
     consumed buffers, and for non-multishot requests.

   - Enable toggling of using iowait while waiting on io_uring events or
     not. Unfortunately this is still tied with CPU frequency boosting
     on short waits, as the scheduler side has not been very receptive
     to splitting the (useless) iowait stat from the cpufreq implied
     boost.

   - Add support for kbuf nodes, enabling zero-copy support for the ublk
     block driver.

   - Various cleanups for resource node handling.

   - Series greatly cleaning up the legacy provided (non-ring based)
     buffers. For years, we've been pushing the ring provided buffers as
     the way to go, and that is what people have been using. Reduce the
     complexity and code associated with legacy provided buffers.

   - Series cleaning up the compat handling.

   - Series improving and cleaning up the recvmsg/sendmsg iovec and msg
     handling.

   - Series of cleanups for io-wq.

   - Start adding a bunch of selftests. The liburing repository
     generally carries feature and regression tests for everything, but
     at least for ublk initially, we'll try and go the route of having
     it in selftests as well. We'll see how this goes, might decide to
     migrate more tests this way in the future.

   - Various little cleanups and fixes"

* tag 'for-6.15/io_uring-20250322' of git://git.kernel.dk/linux: (108 commits)
  selftests: ublk: add stripe target
  selftests: ublk: simplify loop io completion
  selftests: ublk: enable zero copy for null target
  selftests: ublk: prepare for supporting stripe target
  selftests: ublk: move common code into common.c
  selftests: ublk: increase max buffer size to 1MB
  selftests: ublk: add single sqe allocator helper
  selftests: ublk: add generic_01 for verifying sequential IO order
  selftests: ublk: fix starting ublk device
  io_uring: enable toggle of iowait usage when waiting on CQEs
  selftests: ublk: fix write cache implementation
  selftests: ublk: add variable for user to not show test result
  selftests: ublk: don't show `modprobe` failure
  selftests: ublk: add one dependency header
  io_uring/kbuf: enable bundles for incrementally consumed buffers
  Revert "io_uring/rsrc: simplify the bvec iter count calculation"
  selftests: ublk: improve test usability
  selftests: ublk: add stress test for covering IO vs. killing ublk server
  selftests: ublk: add one stress test for covering IO vs. removing device
  selftests: ublk: load/unload ublk_drv when preparing & cleaning up tests
  ...
2025-03-26 17:56:00 -07:00
Mickaël Salaün a5c369e45b
selftests/landlock: Add audit tests for network
Test all network blockers:
- net.bind_tcp
- net.connect_tcp

Test coverage for security/landlock is 94.0% of 1525 lines according to
gcc/gcov-14.

Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20250320190717.2287696-28-mic@digikod.net
[mic: Update test coverage]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2025-03-26 13:59:48 +01:00
Mickaël Salaün 316d06b011
selftests/landlock: Add audit tests for filesystem
Test all filesystem blockers, including events with several records, and
record with several blockers:
- fs.execute
- fs.write_file
- fs.read_file
- fs_read_dir
- fs.remove_dir
- fs.remove_file
- fs.make_char
- fs.make_dir
- fs.make_reg
- fs.make_sock
- fs.make_fifo
- fs.make_block
- fs.make_sym
- fs.refer
- fs.truncate
- fs.ioctl_dev
- fs.change_topology

Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20250320190717.2287696-27-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2025-03-26 13:59:48 +01:00
Mickaël Salaün e1156872ef
selftests/landlock: Add audit tests for abstract UNIX socket scoping
Add a new scoped_audit.connect_to_child test to check the abstract UNIX
socket blocker.

Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20250320190717.2287696-26-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2025-03-26 13:59:47 +01:00
Mickaël Salaün e2893c0a69
selftests/landlock: Add audit tests for ptrace
Add tests for all ptrace actions checking "blockers=ptrace" records.

This also improves PTRACE_TRACEME and PTRACE_ATTACH tests by making sure
that the restrictions comes from Landlock, and with the expected
process.  These extended tests are like enhanced errno checks that make
sure Landlock enforcement is consistent.

Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20250320190717.2287696-25-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2025-03-26 13:59:47 +01:00
Mickaël Salaün 960ed6ca4c
selftests/landlock: Test audit with restrict flags
Add audit_exec tests to filter Landlock denials according to
cross-execution or muted subdomains.

Add a wait-pipe-sandbox.c test program to sandbox itself and send a
(denied) signals to its parent.

Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20250320190717.2287696-24-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2025-03-26 13:59:46 +01:00
Mickaël Salaün 6a500b2297
selftests/landlock: Add tests for audit flags and domain IDs
Add audit_test.c to check with and without LANDLOCK_RESTRICT_SELF_*
flags against the two Landlock audit record types:
AUDIT_LANDLOCK_ACCESS and AUDIT_LANDLOCK_DOMAIN.

Check consistency of domain IDs per layer in AUDIT_LANDLOCK_ACCESS and
AUDIT_LANDLOCK_DOMAIN messages: denied access, domain allocation, and
domain deallocation.

These tests use signal scoping to make it simple.  They are not in the
scoped_signal_test.c file but in the new dedicated audit_test.c file.

Tests are run with audit filters to ensure the audit records come from
the test program.  Moreover, because there can only be one audit
process, tests would failed if run in parallel.  Because of audit
limitations, tests can only be run in the initial namespace.

The audit test helpers were inspired by libaudit and
tools/testing/selftests/net/netfilter/audit_logread.c

Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Phil Sutter <phil@nwl.cc>
Link: https://lore.kernel.org/r/20250320190717.2287696-23-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2025-03-26 13:59:45 +01:00
Mickaël Salaün e178b404ea
selftests/landlock: Extend tests for landlock_restrict_self(2)'s flags
Add the base_test's restrict_self_fd_flags tests to align with previous
restrict_self_fd tests but with the new
LANDLOCK_RESTRICT_SELF_LOG_SUBDOMAINS_OFF flag.

Add the restrict_self_flags tests to check that
LANDLOCK_RESTRICT_SELF_LOG_SAME_EXEC_OFF,
LANDLOCK_RESTRICT_SELF_LOG_NEW_EXEC_ON, and
LANDLOCK_RESTRICT_SELF_LOG_SUBDOMAINS_OFF are valid but not the next
bit.  Some checks are similar to restrict_self_checks_ordering's ones.

Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20250320190717.2287696-22-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2025-03-26 13:59:45 +01:00
Mickaël Salaün ec12a8d4c1
selftests/landlock: Add test for invalid ruleset file descriptor
To align with fs_test's layout1.inval and layout0.proc_nsfs which test
EBADFD for landlock_add_rule(2), create a new base_test's
restrict_self_fd which test EBADFD for landlock_restrict_self(2).

Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20250320190717.2287696-21-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2025-03-26 13:59:44 +01:00
Mickaël Salaün 12bfcda73a
landlock: Add LANDLOCK_RESTRICT_SELF_LOG_*_EXEC_* flags
Most of the time we want to log denied access because they should not
happen and such information helps diagnose issues.  However, when
sandboxing processes that we know will try to access denied resources
(e.g. unknown, bogus, or malicious binary), we might want to not log
related access requests that might fill up logs.

By default, denied requests are logged until the task call execve(2).

If the LANDLOCK_RESTRICT_SELF_LOG_SAME_EXEC_OFF flag is set, denied
requests will not be logged for the same executed file.

If the LANDLOCK_RESTRICT_SELF_LOG_NEW_EXEC_ON flag is set, denied
requests from after an execve(2) call will be logged.

The rationale is that a program should know its own behavior, but not
necessarily the behavior of other programs.

Because LANDLOCK_RESTRICT_SELF_LOG_SAME_EXEC_OFF is set for a specific
Landlock domain, it makes it possible to selectively mask some access
requests that would be logged by a parent domain, which might be handy
for unprivileged processes to limit logs.  However, system
administrators should still use the audit filtering mechanism.  There is
intentionally no audit nor sysctl configuration to re-enable these logs.
This is delegated to the user space program.

Increment the Landlock ABI version to reflect this interface change.

Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20250320190717.2287696-18-mic@digikod.net
[mic: Rename variables and fix __maybe_unused]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2025-03-26 13:59:42 +01:00
Mickaël Salaün d9d2a68ed4
landlock: Add unique ID generator
Landlock IDs can be generated to uniquely identify Landlock objects.
For now, only Landlock domains get an ID at creation time.  These IDs
map to immutable domain hierarchies.

Landlock IDs have important properties:
- They are unique during the lifetime of the running system thanks to
  the 64-bit values: at worse, 2^60 - 2*2^32 useful IDs.
- They are always greater than 2^32 and must then be stored in 64-bit
  integer types.
- The initial ID (at boot time) is randomly picked between 2^32 and
  2^33, which limits collisions in logs across different boots.
- IDs are sequential, which enables users to order them.
- IDs may not be consecutive but increase with a random 2^4 step, which
  limits side channels.

Such IDs can be exposed to unprivileged processes, even if it is not the
case with this audit patch series.  The domain IDs will be useful for
user space to identify sandboxes and get their properties.

These Landlock IDs are more secure that other absolute kernel IDs such
as pipe's inodes which rely on a shared global counter.

For checkpoint/restore features (i.e. CRIU), we could easily implement a
privileged interface (e.g. sysfs) to set the next ID counter.

IDR/IDA are not used because we only need a bijection from Landlock
objects to Landlock IDs, and we must not recycle IDs.  This enables us
to identify all Landlock objects during the lifetime of the system (e.g.
in logs), but not to access an object from an ID nor know if an ID is
assigned.   Using a counter is simpler, it scales (i.e. avoids growing
memory footprint), and it does not require locking.  We'll use proper
file descriptors (with IDs used as inode numbers) to access Landlock
objects.

Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20250320190717.2287696-3-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2025-03-26 13:59:34 +01:00
Mickaël Salaün c5efa393d8
selftests/landlock: Add a new test for setuid()
The new signal_scoping_thread_setuid tests check that the libc's
setuid() function works as expected even when a thread is sandboxed with
scoped signal restrictions.

Before the signal scoping fix, this test would have failed with the
setuid() call:

  [pid    65] getpid()                    = 65
  [pid    65] tgkill(65, 66, SIGRT_1)     = -1 EPERM (Operation not permitted)
  [pid    65] futex(0x40a66cdc, FUTEX_WAKE_PRIVATE, 1) = 0
  [pid    65] setuid(1001)                = 0

After the fix, tgkill(2) is successfully leveraged to synchronize
credentials update across threads:

  [pid    65] getpid()                    = 65
  [pid    65] tgkill(65, 66, SIGRT_1)     = 0
  [pid    66] <... read resumed>0x40a65eb7, 1) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
  [pid    66] --- SIGRT_1 {si_signo=SIGRT_1, si_code=SI_TKILL, si_pid=65, si_uid=1000} ---
  [pid    66] getpid()                    = 65
  [pid    66] setuid(1001)                = 0
  [pid    66] futex(0x40a66cdc, FUTEX_WAKE_PRIVATE, 1) = 0
  [pid    66] rt_sigreturn({mask=[]})     = 0
  [pid    66] read(3,  <unfinished ...>
  [pid    65] setuid(1001)                = 0

Test coverage for security/landlock is 92.9% of 1137 lines according to
gcc/gcov-14.

Fixes: c899496501 ("selftests/landlock: Test signal scoping for threads")
Cc: Günther Noack <gnoack@google.com>
Cc: Tahera Fahimi <fahimitahera@gmail.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250318161443.279194-8-mic@digikod.net
[mic: Update test coverage]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2025-03-26 13:59:32 +01:00
Mickaël Salaün bbe7227403
selftests/landlock: Split signal_scoping_threads tests
Split signal_scoping_threads tests into signal_scoping_thread_before
and signal_scoping_thread_after.

Use local variables for thread synchronization.  Fix exported function.
Replace some asserts with expects.

Fixes: c899496501 ("selftests/landlock: Test signal scoping for threads")
Cc: Günther Noack <gnoack@google.com>
Cc: Tahera Fahimi <fahimitahera@gmail.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250318161443.279194-7-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2025-03-26 13:59:32 +01:00
Mickaël Salaün 18eb75f3af
landlock: Always allow signals between threads of the same process
Because Linux credentials are managed per thread, user space relies on
some hack to synchronize credential update across threads from the same
process.  This is required by the Native POSIX Threads Library and
implemented by set*id(2) wrappers and libcap(3) to use tgkill(2) to
synchronize threads.  See nptl(7) and libpsx(3).  Furthermore, some
runtimes like Go do not enable developers to have control over threads
[1].

To avoid potential issues, and because threads are not security
boundaries, let's relax the Landlock (optional) signal scoping to always
allow signals sent between threads of the same process.  This exception
is similar to the __ptrace_may_access() one.

hook_file_set_fowner() now checks if the target task is part of the same
process as the caller.  If this is the case, then the related signal
triggered by the socket will always be allowed.

Scoping of abstract UNIX sockets is not changed because kernel objects
(e.g. sockets) should be tied to their creator's domain at creation
time.

Note that creating one Landlock domain per thread puts each of these
threads (and their future children) in their own scope, which is
probably not what users expect, especially in Go where we do not control
threads.  However, being able to drop permissions on all threads should
not be restricted by signal scoping.  We are working on a way to make it
possible to atomically restrict all threads of a process with the same
domain [2].

Add erratum for signal scoping.

Closes: https://github.com/landlock-lsm/go-landlock/issues/36
Fixes: 54a6e6bbf3 ("landlock: Add signal scoping")
Fixes: c899496501 ("selftests/landlock: Test signal scoping for threads")
Depends-on: 26f204380a ("fs: Fix file_set_fowner LSM hook inconsistencies")
Link: https://pkg.go.dev/kernel.org/pub/linux/libs/security/libcap/psx [1]
Link: https://github.com/landlock-lsm/linux/issues/2 [2]
Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Tahera Fahimi <fahimitahera@gmail.com>
Cc: stable@vger.kernel.org
Acked-by: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/r/20250318161443.279194-6-mic@digikod.net
[mic: Add extra pointer check and RCU guard, and ease backport]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2025-03-26 13:59:29 +01:00
Niklas Cassel 08818c6d7f
misc: pci_endpoint_test: Add support for PCITEST_IRQ_TYPE_AUTO
For PCITEST_MSI we really want to set PCITEST_SET_IRQTYPE explicitly
to PCITEST_IRQ_TYPE_MSI, since we want to test if MSI works.

For PCITEST_MSIX we really want to set PCITEST_SET_IRQTYPE explicitly
to PCITEST_IRQ_TYPE_MSIX, since we want to test if MSI works.

For PCITEST_LEGACY_IRQ we really want to set PCITEST_SET_IRQTYPE
explicitly to PCITEST_IRQ_TYPE_INTX, since we want to test if INTx
works.

However, for PCITEST_WRITE, PCITEST_READ, PCITEST_COPY, we really don't
care which IRQ type that is used, we just want to use a IRQ type that is
supported by the EPC.

The old behavior was to always use MSI for PCITEST_WRITE, PCITEST_READ,
PCITEST_COPY, was to always set IRQ type to MSI before doing the actual
test, however, there are EPC drivers that do not support MSI.

Add a new PCITEST_IRQ_TYPE_AUTO, that will use the CAPS register to see
which IRQ types the endpoint supports, and use one of the supported IRQ
types.

For backwards compatibility, if the endpoint does not expose any supported
IRQ type in the CAPS register, simply fallback to using MSI, as it was
unconditionally done before.

Signed-off-by: Niklas Cassel <cassel@kernel.org>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Link: https://lore.kernel.org/r/20250310111016.859445-16-cassel@kernel.org
2025-03-26 06:11:54 +00:00
Linus Torvalds ee6740fd34 CRC updates for 6.15
Another set of improvements to the kernel's CRC (cyclic redundancy
 check) code:
 
 - Rework the CRC64 library functions to be directly optimized, like what
   I did last cycle for the CRC32 and CRC-T10DIF library functions.
 
 - Rewrite the x86 PCLMULQDQ-optimized CRC code, and add VPCLMULQDQ
   support and acceleration for crc64_be and crc64_nvme.
 
 - Rewrite the riscv Zbc-optimized CRC code, and add acceleration for
   crc_t10dif, crc64_be, and crc64_nvme.
 
 - Remove crc_t10dif and crc64_rocksoft from the crypto API, since they
   are no longer needed there.
 
 - Rename crc64_rocksoft to crc64_nvme, as the old name was incorrect.
 
 - Add kunit test cases for crc64_nvme and crc7.
 
 - Eliminate redundant functions for calculating the Castagnoli CRC32,
   settling on just crc32c().
 
 - Remove unnecessary prompts from some of the CRC kconfig options.
 
 - Further optimize the x86 crc32c code.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCZ+CGGhQcZWJpZ2dlcnNA
 Z29vZ2xlLmNvbQAKCRDzXCl4vpKOK3wRAP4tbnzawUmlIHIF0hleoADXehUgAhMt
 NZn15mGvyiuwIQEA8W9qvnLdFXZkdxhxAEvDDFjyrRauL6eGtr/GvCx4AQY=
 =wmKG
 -----END PGP SIGNATURE-----

Merge tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux

Pull CRC updates from Eric Biggers:
 "Another set of improvements to the kernel's CRC (cyclic redundancy
  check) code:

   - Rework the CRC64 library functions to be directly optimized, like
     what I did last cycle for the CRC32 and CRC-T10DIF library
     functions

   - Rewrite the x86 PCLMULQDQ-optimized CRC code, and add VPCLMULQDQ
     support and acceleration for crc64_be and crc64_nvme

   - Rewrite the riscv Zbc-optimized CRC code, and add acceleration for
     crc_t10dif, crc64_be, and crc64_nvme

   - Remove crc_t10dif and crc64_rocksoft from the crypto API, since
     they are no longer needed there

   - Rename crc64_rocksoft to crc64_nvme, as the old name was incorrect

   - Add kunit test cases for crc64_nvme and crc7

   - Eliminate redundant functions for calculating the Castagnoli CRC32,
     settling on just crc32c()

   - Remove unnecessary prompts from some of the CRC kconfig options

   - Further optimize the x86 crc32c code"

* tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (36 commits)
  x86/crc: drop the avx10_256 functions and rename avx10_512 to avx512
  lib/crc: remove unnecessary prompt for CONFIG_CRC64
  lib/crc: remove unnecessary prompt for CONFIG_LIBCRC32C
  lib/crc: remove unnecessary prompt for CONFIG_CRC8
  lib/crc: remove unnecessary prompt for CONFIG_CRC7
  lib/crc: remove unnecessary prompt for CONFIG_CRC4
  lib/crc7: unexport crc7_be_syndrome_table
  lib/crc_kunit.c: update comment in crc_benchmark()
  lib/crc_kunit.c: add test and benchmark for crc7_be()
  x86/crc32: optimize tail handling for crc32c short inputs
  riscv/crc64: add Zbc optimized CRC64 functions
  riscv/crc-t10dif: add Zbc optimized CRC-T10DIF function
  riscv/crc32: reimplement the CRC32 functions using new template
  riscv/crc: add "template" for Zbc optimized CRC functions
  x86/crc: add ANNOTATE_NOENDBR to suppress objtool warnings
  x86/crc32: improve crc32c_arch() code generation with clang
  x86/crc64: implement crc64_be and crc64_nvme using new template
  x86/crc-t10dif: implement crc_t10dif using new template
  x86/crc32: implement crc32_le using new template
  x86/crc: add "template" for [V]PCLMULQDQ based CRC functions
  ...
2025-03-25 18:33:04 -07:00
Linus Torvalds edb0e8f6e2 ARM:
* Nested virtualization support for VGICv3, giving the nested
 hypervisor control of the VGIC hardware when running an L2 VM
 
 * Removal of 'late' nested virtualization feature register masking,
   making the supported feature set directly visible to userspace
 
 * Support for emulating FEAT_PMUv3 on Apple silicon, taking advantage
   of an IMPLEMENTATION DEFINED trap that covers all PMUv3 registers
 
 * Paravirtual interface for discovering the set of CPU implementations
   where a VM may run, addressing a longstanding issue of guest CPU
   errata awareness in big-little systems and cross-implementation VM
   migration
 
 * Userspace control of the registers responsible for identifying a
   particular CPU implementation (MIDR_EL1, REVIDR_EL1, AIDR_EL1),
   allowing VMs to be migrated cross-implementation
 
 * pKVM updates, including support for tracking stage-2 page table
   allocations in the protected hypervisor in the 'SecPageTable' stat
 
 * Fixes to vPMU, ensuring that userspace updates to the vPMU after
   KVM_RUN are reflected into the backing perf events
 
 LoongArch:
 
 * Remove unnecessary header include path
 
 * Assume constant PGD during VM context switch
 
 * Add perf events support for guest VM
 
 RISC-V:
 
 * Disable the kernel perf counter during configure
 
 * KVM selftests improvements for PMU
 
 * Fix warning at the time of KVM module removal
 
 x86:
 
 * Add support for aging of SPTEs without holding mmu_lock.  Not taking mmu_lock
   allows multiple aging actions to run in parallel, and more importantly avoids
   stalling vCPUs.  This includes an implementation of per-rmap-entry locking;
   aging the gfn is done with only a per-rmap single-bin spinlock taken, whereas
   locking an rmap for write requires taking both the per-rmap spinlock and
   the mmu_lock.
 
   Note that this decreases slightly the accuracy of accessed-page information,
   because changes to the SPTE outside aging might not use atomic operations
   even if they could race against a clear of the Accessed bit.  This is
   deliberate because KVM and mm/ tolerate false positives/negatives for
   accessed information, and testing has shown that reducing the latency of
   aging is far more beneficial to overall system performance than providing
   "perfect" young/old information.
 
 * Defer runtime CPUID updates until KVM emulates a CPUID instruction, to
   coalesce updates when multiple pieces of vCPU state are changing, e.g. as
   part of a nested transition.
 
 * Fix a variety of nested emulation bugs, and add VMX support for synthesizing
   nested VM-Exit on interception (instead of injecting #UD into L2).
 
 * Drop "support" for async page faults for protected guests that do not set
   SEND_ALWAYS (i.e. that only want async page faults at CPL3)
 
 * Bring a bit of sanity to x86's VM teardown code, which has accumulated
   a lot of cruft over the years.  Particularly, destroy vCPUs before
   the MMU, despite the latter being a VM-wide operation.
 
 * Add common secure TSC infrastructure for use within SNP and in the
   future TDX
 
 * Block KVM_CAP_SYNC_REGS if guest state is protected.  It does not make
   sense to use the capability if the relevant registers are not
   available for reading or writing.
 
 * Don't take kvm->lock when iterating over vCPUs in the suspend notifier to
   fix a largely theoretical deadlock.
 
 * Use the vCPU's actual Xen PV clock information when starting the Xen timer,
   as the cached state in arch.hv_clock can be stale/bogus.
 
 * Fix a bug where KVM could bleed PVCLOCK_GUEST_STOPPED across different
   PV clocks; restrict PVCLOCK_GUEST_STOPPED to kvmclock, as KVM's suspend
   notifier only accounts for kvmclock, and there's no evidence that the
   flag is actually supported by Xen guests.
 
 * Clean up the per-vCPU "cache" of its reference pvclock, and instead only
   track the vCPU's TSC scaling (multipler+shift) metadata (which is moderately
   expensive to compute, and rarely changes for modern setups).
 
 * Don't write to the Xen hypercall page on MSR writes that are initiated by
   the host (userspace or KVM) to fix a class of bugs where KVM can write to
   guest memory at unexpected times, e.g. during vCPU creation if userspace has
   set the Xen hypercall MSR index to collide with an MSR that KVM emulates.
 
 * Restrict the Xen hypercall MSR index to the unofficial synthetic range to
   reduce the set of possible collisions with MSRs that are emulated by KVM
   (collisions can still happen as KVM emulates Hyper-V MSRs, which also reside
   in the synthetic range).
 
 * Clean up and optimize KVM's handling of Xen MSR writes and xen_hvm_config.
 
 * Update Xen TSC leaves during CPUID emulation instead of modifying the CPUID
   entries when updating PV clocks; there is no guarantee PV clocks will be
   updated between TSC frequency changes and CPUID emulation, and guest reads
   of the TSC leaves should be rare, i.e. are not a hot path.
 
 x86 (Intel):
 
 * Fix a bug where KVM unnecessarily reads XFD_ERR from hardware and thus
   modifies the vCPU's XFD_ERR on a #NM due to CR0.TS=1.
 
 * Pass XFD_ERR as the payload when injecting #NM, as a preparatory step
   for upcoming FRED virtualization support.
 
 * Decouple the EPT entry RWX protection bit macros from the EPT Violation
   bits, both as a general cleanup and in anticipation of adding support for
   emulating Mode-Based Execution Control (MBEC).
 
 * Reject KVM_RUN if userspace manages to gain control and stuff invalid guest
   state while KVM is in the middle of emulating nested VM-Enter.
 
 * Add a macro to handle KVM's sanity checks on entry/exit VMCS control pairs
   in anticipation of adding sanity checks for secondary exit controls (the
   primary field is out of bits).
 
 x86 (AMD):
 
 * Ensure the PSP driver is initialized when both the PSP and KVM modules are
   built-in (the initcall framework doesn't handle dependencies).
 
 * Use long-term pins when registering encrypted memory regions, so that the
   pages are migrated out of MIGRATE_CMA/ZONE_MOVABLE and don't lead to
   excessive fragmentation.
 
 * Add macros and helpers for setting GHCB return/error codes.
 
 * Add support for Idle HLT interception, which elides interception if the vCPU
   has a pending, unmasked virtual IRQ when HLT is executed.
 
 * Fix a bug in INVPCID emulation where KVM fails to check for a non-canonical
   address.
 
 * Don't attempt VMRUN for SEV-ES+ guests if the vCPU's VMSA is invalid, e.g.
   because the vCPU was "destroyed" via SNP's AP Creation hypercall.
 
 * Reject SNP AP Creation if the requested SEV features for the vCPU don't
   match the VM's configured set of features.
 
 Selftests:
 
 * Fix again the Intel PMU counters test; add a data load and do CLFLUSH{OPT} on the data
   instead of executing code.  The theory is that modern Intel CPUs have
   learned new code prefetching tricks that bypass the PMU counters.
 
 * Fix a flaw in the Intel PMU counters test where it asserts that an event is
   counting correctly without actually knowing what the event counts on the
   underlying hardware.
 
 * Fix a variety of flaws, bugs, and false failures/passes dirty_log_test, and
   improve its coverage by collecting all dirty entries on each iteration.
 
 * Fix a few minor bugs related to handling of stats FDs.
 
 * Add infrastructure to make vCPU and VM stats FDs available to tests by
   default (open the FDs during VM/vCPU creation).
 
 * Relax an assertion on the number of HLT exits in the xAPIC IPI test when
   running on a CPU that supports AMD's Idle HLT (which elides interception of
   HLT if a virtual IRQ is pending and unmasked).
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmfcTkEUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMnQAf/cPx72hJOdNy4Qrm8M33YLXVRVV00
 yEZ8eN8TWdOclr0ltE/w/ELGh/qS4CU8pjURAk0A6lPioU+mdcTn3dPEqMDMVYom
 uOQ2lusEHw0UuSnGZSEjvZJsE/Ro2NSAsHIB6PWRqig1ZBPJzyu0frce34pMpeQH
 diwriJL9lKPAhBWXnUQ9BKoi1R0P5OLW9ahX4SOWk7cAFg4DLlDE66Nqf6nKqViw
 DwEucTiUEg5+a3d93gihdD4JNl+fb3vI2erxrMxjFjkacl0qgqRu3ei3DG0MfdHU
 wNcFSG5B1n0OECKxr80lr1Ip1KTVNNij0Ks+w6Gc6lSg9c4PptnNkfLK3A==
 =nnCN
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "ARM:

   - Nested virtualization support for VGICv3, giving the nested
     hypervisor control of the VGIC hardware when running an L2 VM

   - Removal of 'late' nested virtualization feature register masking,
     making the supported feature set directly visible to userspace

   - Support for emulating FEAT_PMUv3 on Apple silicon, taking advantage
     of an IMPLEMENTATION DEFINED trap that covers all PMUv3 registers

   - Paravirtual interface for discovering the set of CPU
     implementations where a VM may run, addressing a longstanding issue
     of guest CPU errata awareness in big-little systems and
     cross-implementation VM migration

   - Userspace control of the registers responsible for identifying a
     particular CPU implementation (MIDR_EL1, REVIDR_EL1, AIDR_EL1),
     allowing VMs to be migrated cross-implementation

   - pKVM updates, including support for tracking stage-2 page table
     allocations in the protected hypervisor in the 'SecPageTable' stat

   - Fixes to vPMU, ensuring that userspace updates to the vPMU after
     KVM_RUN are reflected into the backing perf events

  LoongArch:

   - Remove unnecessary header include path

   - Assume constant PGD during VM context switch

   - Add perf events support for guest VM

  RISC-V:

   - Disable the kernel perf counter during configure

   - KVM selftests improvements for PMU

   - Fix warning at the time of KVM module removal

  x86:

   - Add support for aging of SPTEs without holding mmu_lock.

     Not taking mmu_lock allows multiple aging actions to run in
     parallel, and more importantly avoids stalling vCPUs. This includes
     an implementation of per-rmap-entry locking; aging the gfn is done
     with only a per-rmap single-bin spinlock taken, whereas locking an
     rmap for write requires taking both the per-rmap spinlock and the
     mmu_lock.

     Note that this decreases slightly the accuracy of accessed-page
     information, because changes to the SPTE outside aging might not
     use atomic operations even if they could race against a clear of
     the Accessed bit.

     This is deliberate because KVM and mm/ tolerate false
     positives/negatives for accessed information, and testing has shown
     that reducing the latency of aging is far more beneficial to
     overall system performance than providing "perfect" young/old
     information.

   - Defer runtime CPUID updates until KVM emulates a CPUID instruction,
     to coalesce updates when multiple pieces of vCPU state are
     changing, e.g. as part of a nested transition

   - Fix a variety of nested emulation bugs, and add VMX support for
     synthesizing nested VM-Exit on interception (instead of injecting
     #UD into L2)

   - Drop "support" for async page faults for protected guests that do
     not set SEND_ALWAYS (i.e. that only want async page faults at CPL3)

   - Bring a bit of sanity to x86's VM teardown code, which has
     accumulated a lot of cruft over the years. Particularly, destroy
     vCPUs before the MMU, despite the latter being a VM-wide operation

   - Add common secure TSC infrastructure for use within SNP and in the
     future TDX

   - Block KVM_CAP_SYNC_REGS if guest state is protected. It does not
     make sense to use the capability if the relevant registers are not
     available for reading or writing

   - Don't take kvm->lock when iterating over vCPUs in the suspend
     notifier to fix a largely theoretical deadlock

   - Use the vCPU's actual Xen PV clock information when starting the
     Xen timer, as the cached state in arch.hv_clock can be stale/bogus

   - Fix a bug where KVM could bleed PVCLOCK_GUEST_STOPPED across
     different PV clocks; restrict PVCLOCK_GUEST_STOPPED to kvmclock, as
     KVM's suspend notifier only accounts for kvmclock, and there's no
     evidence that the flag is actually supported by Xen guests

   - Clean up the per-vCPU "cache" of its reference pvclock, and instead
     only track the vCPU's TSC scaling (multipler+shift) metadata (which
     is moderately expensive to compute, and rarely changes for modern
     setups)

   - Don't write to the Xen hypercall page on MSR writes that are
     initiated by the host (userspace or KVM) to fix a class of bugs
     where KVM can write to guest memory at unexpected times, e.g.
     during vCPU creation if userspace has set the Xen hypercall MSR
     index to collide with an MSR that KVM emulates

   - Restrict the Xen hypercall MSR index to the unofficial synthetic
     range to reduce the set of possible collisions with MSRs that are
     emulated by KVM (collisions can still happen as KVM emulates
     Hyper-V MSRs, which also reside in the synthetic range)

   - Clean up and optimize KVM's handling of Xen MSR writes and
     xen_hvm_config

   - Update Xen TSC leaves during CPUID emulation instead of modifying
     the CPUID entries when updating PV clocks; there is no guarantee PV
     clocks will be updated between TSC frequency changes and CPUID
     emulation, and guest reads of the TSC leaves should be rare, i.e.
     are not a hot path

  x86 (Intel):

   - Fix a bug where KVM unnecessarily reads XFD_ERR from hardware and
     thus modifies the vCPU's XFD_ERR on a #NM due to CR0.TS=1

   - Pass XFD_ERR as the payload when injecting #NM, as a preparatory
     step for upcoming FRED virtualization support

   - Decouple the EPT entry RWX protection bit macros from the EPT
     Violation bits, both as a general cleanup and in anticipation of
     adding support for emulating Mode-Based Execution Control (MBEC)

   - Reject KVM_RUN if userspace manages to gain control and stuff
     invalid guest state while KVM is in the middle of emulating nested
     VM-Enter

   - Add a macro to handle KVM's sanity checks on entry/exit VMCS
     control pairs in anticipation of adding sanity checks for secondary
     exit controls (the primary field is out of bits)

  x86 (AMD):

   - Ensure the PSP driver is initialized when both the PSP and KVM
     modules are built-in (the initcall framework doesn't handle
     dependencies)

   - Use long-term pins when registering encrypted memory regions, so
     that the pages are migrated out of MIGRATE_CMA/ZONE_MOVABLE and
     don't lead to excessive fragmentation

   - Add macros and helpers for setting GHCB return/error codes

   - Add support for Idle HLT interception, which elides interception if
     the vCPU has a pending, unmasked virtual IRQ when HLT is executed

   - Fix a bug in INVPCID emulation where KVM fails to check for a
     non-canonical address

   - Don't attempt VMRUN for SEV-ES+ guests if the vCPU's VMSA is
     invalid, e.g. because the vCPU was "destroyed" via SNP's AP
     Creation hypercall

   - Reject SNP AP Creation if the requested SEV features for the vCPU
     don't match the VM's configured set of features

  Selftests:

   - Fix again the Intel PMU counters test; add a data load and do
     CLFLUSH{OPT} on the data instead of executing code. The theory is
     that modern Intel CPUs have learned new code prefetching tricks
     that bypass the PMU counters

   - Fix a flaw in the Intel PMU counters test where it asserts that an
     event is counting correctly without actually knowing what the event
     counts on the underlying hardware

   - Fix a variety of flaws, bugs, and false failures/passes
     dirty_log_test, and improve its coverage by collecting all dirty
     entries on each iteration

   - Fix a few minor bugs related to handling of stats FDs

   - Add infrastructure to make vCPU and VM stats FDs available to tests
     by default (open the FDs during VM/vCPU creation)

   - Relax an assertion on the number of HLT exits in the xAPIC IPI test
     when running on a CPU that supports AMD's Idle HLT (which elides
     interception of HLT if a virtual IRQ is pending and unmasked)"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (216 commits)
  RISC-V: KVM: Optimize comments in kvm_riscv_vcpu_isa_disable_allowed
  RISC-V: KVM: Teardown riscv specific bits after kvm_exit
  LoongArch: KVM: Register perf callbacks for guest
  LoongArch: KVM: Implement arch-specific functions for guest perf
  LoongArch: KVM: Add stub for kvm_arch_vcpu_preempted_in_kernel()
  LoongArch: KVM: Remove PGD saving during VM context switch
  LoongArch: KVM: Remove unnecessary header include path
  KVM: arm64: Tear down vGIC on failed vCPU creation
  KVM: arm64: PMU: Reload when resetting
  KVM: arm64: PMU: Reload when user modifies registers
  KVM: arm64: PMU: Fix SET_ONE_REG for vPMC regs
  KVM: arm64: PMU: Assume PMU presence in pmu-emul.c
  KVM: arm64: PMU: Set raw values from user to PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR}
  KVM: arm64: Create each pKVM hyp vcpu after its corresponding host vcpu
  KVM: arm64: Factor out pKVM hyp vcpu creation to separate function
  KVM: arm64: Initialize HCRX_EL2 traps in pKVM
  KVM: arm64: Factor out setting HCRX_EL2 traps into separate function
  KVM: x86: block KVM_CAP_SYNC_REGS if guest state is protected
  KVM: x86: Add infrastructure for secure TSC
  KVM: x86: Push down setting vcpu.arch.user_set_tsc
  ...
2025-03-25 14:22:07 -07:00
Linus Torvalds 2d09a9449e arm64 updates for 6.15:
Perf and PMUs:
 
  - Support for the "Rainier" CPU PMU from Arm
 
  - Preparatory driver changes and cleanups that pave the way for BRBE
    support
 
  - Support for partial virtualisation of the Apple-M1 PMU
 
  - Support for the second event filter in Arm CSPMU designs
 
  - Minor fixes and cleanups (CMN and DWC PMUs)
 
  - Enable EL2 requirements for FEAT_PMUv3p9
 
 Power, CPU topology:
 
  - Support for AMUv1-based average CPU frequency
 
  - Run-time SMT control wired up for arm64 (CONFIG_HOTPLUG_SMT). It adds
    a generic topology_is_primary_thread() function overridden by x86 and
    powerpc
 
 New(ish) features:
 
  - MOPS (memcpy/memset) support for the uaccess routines
 
 Security/confidential compute:
 
  - Fix the DMA address for devices used in Realms with Arm CCA. The
    CCA architecture uses the address bit to differentiate between shared
    and private addresses
 
  - Spectre-BHB: assume CPUs Linux doesn't know about vulnerable by
    default
 
 Memory management clean-ups:
 
  - Drop the P*D_TABLE_BIT definition in preparation for 128-bit PTEs
 
  - Some minor page table accessor clean-ups
 
  - PIE/POE (permission indirection/overlay) helpers clean-up
 
 Kselftests:
 
  - MTE: skip hugetlb tests if MTE is not supported on such mappings and
    user correct naming for sync/async tag checking modes
 
 Miscellaneous:
 
  - Add a PKEY_UNRESTRICTED definition as 0 to uapi (toolchain people
    request)
 
  - Sysreg updates for new register fields
 
  - CPU type info for some Qualcomm Kryo cores
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmfjB2QACgkQa9axLQDI
 XvGrfg//W3Bx9+jw1G/XHHEQqGEVFmvltvxZUkvgV0Qki0rPSMnappJhZRL9n0Nm
 V6PvGd2KoKHZuL3g5ViZb3cs2R9BiD2JB6PncwBKuxumHGh3vz3kk1JMkDVfWdHv
 qAceOckFJD9rXjPZn+PDsfYiEi2i3RRWIP5VglZ14ue8j3prHQ6DJXLUQF2GYvzE
 /bgLSq44wp5N59ddy23+qH9rxrHzz3bgpbVv/F56W/LErvE873mRmyFwiuGJm+M0
 Pn8ra572rI6a4sgSwrMTeNPBU+F9o5AbqwauVhkz428RdMvgfEuW6qHUBnGWJDmt
 HotXmu+4Eb2KJks/iQkDo4OTJ38yUqvvZZJtP171ms3E4yqESSJngWP6O2A6LF+y
 xhe0sESF/Ew6jLhM6/hvOmBcE2AyB14JE3ymqLkXbWub4NXddBn2AF1WXFjF4CBw
 F8KSUhNLekrCYKv1k9M3nhvkcpoS9FkTF/TI+zEg546alI/GLPih6uDRkgMAODh1
 RDJYixHsf2NDDRQbfwvt9Xua/KKpDF6qNkHLA4OiqqVUwh1hkas24Lrnp8vmce4o
 wIpWCLqYWey8Rl3XWuWgWz2Xu58fHH4Dl2k72Z8I0pwp3abCDa9xEj79G0Svk7Si
 Q+FCYrNlpKee1RXBC+1MUD/Gl5r/28dEUFkAzPD80F7AgafXPd0=
 =Kc9c
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Catalin Marinas:
 "Nothing major this time around.

  Apart from the usual perf/PMU updates, some page table cleanups, the
  notable features are average CPU frequency based on the AMUv1
  counters, CONFIG_HOTPLUG_SMT and MOPS instructions (memcpy/memset) in
  the uaccess routines.

  Perf and PMUs:

   - Support for the 'Rainier' CPU PMU from Arm

   - Preparatory driver changes and cleanups that pave the way for BRBE
     support

   - Support for partial virtualisation of the Apple-M1 PMU

   - Support for the second event filter in Arm CSPMU designs

   - Minor fixes and cleanups (CMN and DWC PMUs)

   - Enable EL2 requirements for FEAT_PMUv3p9

  Power, CPU topology:

   - Support for AMUv1-based average CPU frequency

   - Run-time SMT control wired up for arm64 (CONFIG_HOTPLUG_SMT). It
     adds a generic topology_is_primary_thread() function overridden by
     x86 and powerpc

  New(ish) features:

   - MOPS (memcpy/memset) support for the uaccess routines

  Security/confidential compute:

   - Fix the DMA address for devices used in Realms with Arm CCA. The
     CCA architecture uses the address bit to differentiate between
     shared and private addresses

   - Spectre-BHB: assume CPUs Linux doesn't know about vulnerable by
     default

  Memory management clean-ups:

   - Drop the P*D_TABLE_BIT definition in preparation for 128-bit PTEs

   - Some minor page table accessor clean-ups

   - PIE/POE (permission indirection/overlay) helpers clean-up

  Kselftests:

   - MTE: skip hugetlb tests if MTE is not supported on such mappings
     and user correct naming for sync/async tag checking modes

  Miscellaneous:

   - Add a PKEY_UNRESTRICTED definition as 0 to uapi (toolchain people
     request)

   - Sysreg updates for new register fields

   - CPU type info for some Qualcomm Kryo cores"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (72 commits)
  arm64: mm: Don't use %pK through printk
  perf/arm_cspmu: Fix missing io.h include
  arm64: errata: Add newer ARM cores to the spectre_bhb_loop_affected() lists
  arm64: cputype: Add MIDR_CORTEX_A76AE
  arm64: errata: Add KRYO 2XX/3XX/4XX silver cores to Spectre BHB safe list
  arm64: errata: Assume that unknown CPUs _are_ vulnerable to Spectre BHB
  arm64: errata: Add QCOM_KRYO_4XX_GOLD to the spectre_bhb_k24_list
  arm64/sysreg: Enforce whole word match for open/close tokens
  arm64/sysreg: Fix unbalanced closing block
  arm64: Kconfig: Enable HOTPLUG_SMT
  arm64: topology: Support SMT control on ACPI based system
  arch_topology: Support SMT control for OF based system
  cpu/SMT: Provide a default topology_is_primary_thread()
  arm64/mm: Define PTDESC_ORDER
  perf/arm_cspmu: Add PMEVFILT2R support
  perf/arm_cspmu: Generalise event filtering
  perf/arm_cspmu: Move register definitons to header
  arm64/kernel: Always use level 2 or higher for early mappings
  arm64/mm: Drop PXD_TABLE_BIT
  arm64/mm: Check pmd_table() in pmd_trans_huge()
  ...
2025-03-25 13:16:16 -07:00
Catalin Marinas 8cc14fdcc1 Merge branches 'for-next/amuv1-avg-freq', 'for-next/pkey_unrestricted', 'for-next/sysreg', 'for-next/misc', 'for-next/pgtable-cleanups', 'for-next/kselftest', 'for-next/uaccess-mops', 'for-next/pie-poe-cleanup', 'for-next/cputype-kryo', 'for-next/cca-dma-address', 'for-next/drop-pxd_table_bit' and 'for-next/spectre-bhb-assume-vulnerable', remote-tracking branch 'arm64/for-next/perf' into for-next/core
* arm64/for-next/perf:
  perf/arm_cspmu: Fix missing io.h include
  perf/arm_cspmu: Add PMEVFILT2R support
  perf/arm_cspmu: Generalise event filtering
  perf/arm_cspmu: Move register definitons to header
  drivers/perf: apple_m1: Support host/guest event filtering
  drivers/perf: apple_m1: Refactor event select/filter configuration
  perf/dwc_pcie: fix duplicate pci_dev devices
  perf/dwc_pcie: fix some unreleased resources
  perf/arm-cmn: Minor event type housekeeping
  perf: arm_pmu: Move PMUv3-specific data
  perf: apple_m1: Don't disable counter in m1_pmu_enable_event()
  perf: arm_v7_pmu: Don't disable counter in (armv7|krait_|scorpion_)pmu_enable_event()
  perf: arm_v7_pmu: Drop obvious comments for enabling/disabling counters and interrupts
  perf: arm_pmuv3: Don't disable counter in armv8pmu_enable_event()
  perf: arm_pmu: Don't disable counter in armpmu_add()
  perf: arm_pmuv3: Call kvm_vcpu_pmu_resync_el0() before enabling counters
  perf: arm_pmuv3: Add support for ARM Rainier PMU

* for-next/amuv1-avg-freq:
  : Add support for AArch64 AMUv1-based average freq
  arm64: Utilize for_each_cpu_wrap for reference lookup
  arm64: Update AMU-based freq scale factor on entering idle
  arm64: Provide an AMU-based version of arch_freq_get_on_cpu
  cpufreq: Introduce an optional cpuinfo_avg_freq sysfs entry
  cpufreq: Allow arch_freq_get_on_cpu to return an error
  arch_topology: init capacity_freq_ref to 0

* for-next/pkey_unrestricted:
  : mm/pkey: Add PKEY_UNRESTRICTED macro
  selftest/powerpc/mm/pkey: fix build-break introduced by commit 00894c3fc9
  selftests/powerpc: Use PKEY_UNRESTRICTED macro
  selftests/mm: Use PKEY_UNRESTRICTED macro
  mm/pkey: Add PKEY_UNRESTRICTED macro

* for-next/sysreg:
  : arm64 sysreg updates
  arm64/sysreg: Enforce whole word match for open/close tokens
  arm64/sysreg: Fix unbalanced closing block
  arm64/sysreg: Add register fields for HFGWTR2_EL2
  arm64/sysreg: Add register fields for HFGRTR2_EL2
  arm64/sysreg: Add register fields for HFGITR2_EL2
  arm64/sysreg: Add register fields for HDFGWTR2_EL2
  arm64/sysreg: Add register fields for HDFGRTR2_EL2
  arm64/sysreg: Update register fields for ID_AA64MMFR0_EL1

* for-next/misc:
  : Miscellaneous arm64 patches
  arm64: mm: Don't use %pK through printk
  arm64/fpsimd: Remove unused declaration fpsimd_kvm_prepare()

* for-next/pgtable-cleanups:
  : arm64 pgtable accessors cleanup
  arm64/mm: Define PTDESC_ORDER
  arm64/kernel: Always use level 2 or higher for early mappings
  arm64/hugetlb: Consistently use pud_sect_supported()
  arm64/mm: Convert __pte_to_phys() and __phys_to_pte_val() as functions

* for-next/kselftest:
  : arm64 kselftest updates
  kselftest/arm64: mte: Skip the hugetlb tests if MTE not supported on such mappings
  kselftest/arm64: mte: Use the correct naming for tag check modes in check_hugetlb_options.c

* for-next/uaccess-mops:
  : Implement the uaccess memory copy/set using MOPS instructions
  arm64: lib: Use MOPS for usercopy routines
  arm64: mm: Handle PAN faults on uaccess CPY* instructions
  arm64: extable: Add fixup handling for uaccess CPY* instructions

* for-next/pie-poe-cleanup:
  : PIE/POE helpers cleanup
  arm64/sysreg: Move POR_EL0_INIT to asm/por.h
  arm64/sysreg: Rename POE_RXW to POE_RWX
  arm64/sysreg: Improve PIR/POR helpers

* for-next/cputype-kryo:
  : Add cputype info for some Qualcomm Kryo cores
  arm64: cputype: Add comments about Qualcomm Kryo 5XX and 6XX cores
  arm64: cputype: Add QCOM_CPU_PART_KRYO_3XX_GOLD

* for-next/cca-dma-address:
  : Fix DMA address for devices used in realms with Arm CCA
  arm64: realm: Use aliased addresses for device DMA to shared buffers
  dma: Introduce generic dma_addr_*crypted helpers
  dma: Fix encryption bit clearing for dma_to_phys

* for-next/drop-pxd_table_bit:
  : Drop the arm64 PXD_TABLE_BIT (clean-up in preparation for 128-bit PTEs)
  arm64/mm: Drop PXD_TABLE_BIT
  arm64/mm: Check pmd_table() in pmd_trans_huge()
  arm64/mm: Check PUD_TYPE_TABLE in pud_bad()
  arm64/mm: Check PXD_TYPE_TABLE in [p4d|pgd]_bad()
  arm64/mm: Clear PXX_TYPE_MASK and set PXD_TYPE_SECT in [pmd|pud]_mkhuge()
  arm64/mm: Clear PXX_TYPE_MASK in mk_[pmd|pud]_sect_prot()
  arm64/ptdump: Test PMD_TYPE_MASK for block mapping
  KVM: arm64: ptdump: Test PMD_TYPE_MASK for block mapping

* for-next/spectre-bhb-assume-vulnerable:
  : Rework Spectre BHB mitigations to not assume "safe"
  arm64: errata: Add newer ARM cores to the spectre_bhb_loop_affected() lists
  arm64: cputype: Add MIDR_CORTEX_A76AE
  arm64: errata: Add KRYO 2XX/3XX/4XX silver cores to Spectre BHB safe list
  arm64: errata: Assume that unknown CPUs _are_ vulnerable to Spectre BHB
  arm64: errata: Add QCOM_KRYO_4XX_GOLD to the spectre_bhb_k24_list
2025-03-25 19:32:03 +00:00
Linus Torvalds 317a76a996 Updates for the VDSO infrastructure:
- Consolidate the VDSO storage
 
     The VDSO data storage and data layout has been largely architecture
     specific for historical reasons. That increases the maintenance effort
     and causes inconsistencies over and over.
 
     There is no real technical reason for architecture specific layouts and
     implementations. The architecture specific details can easily be
     integrated into a generic layout, which also reduces the amount of
     duplicated code for managing the mappings.
 
     Convert all architectures over to a unified layout and common mapping
     infrastructure. This splits the VDSO data layout into subsystem
     specific blocks, timekeeping, random and architecture parts, which
     provides a better structure and allows to improve and update the
     functionalities without conflict and interaction.
 
   - Rework the timekeeping data storage
 
     The current implementation is designed for exposing system timekeeping
     accessors, which was good enough at the time when it was designed.
 
     PTP and Time Sensitive Networking (TSN) change that as there are
     requirements to expose independent PTP clocks, which are not related to
     system timekeeping.
 
     Replace the monolithic data storage by a structured layout, which
     allows to add support for independent PTP clocks on top while reusing
     both the data structures and the time accessor implementations.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmfgSWUTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoYGED/0f/M8YyacAyErDYW4ufW+zh2sUidSf
 GVlK0Jn5BMljOoye+y2XfTxuvvXxEDjJNYiJm2uKGPdV29tjNXreGK39XyNqXPu5
 jwR4f/IN/QVSM2nCO6jyydMz8ympJ2k6M4RewwmxXBL2KsUzzJWSKTgRNqM5Tdjs
 1RhJMjkQVTiiSYerBpHXYCeZLM7/VEfZ120uuzVAYPXo0/R6zuyF7IBgIao9hbfO
 IQeCMLLfpDQHQhwquTA8ZbWqQusiEoSYHT+kTDa3eXDDbE/2UklAUs9gaatI979x
 73zs0Yqxyx2iIGaghACWOAbKdcBWBeCYDw5fFwYVKn4VMQi1+wcxbtOYL767jp9o
 vfkLXGilXcVkvDjv4fH+e1NoJXXBxq1Ug1silKdOeJzenQF8Q1i3tavkWUVCNfwH
 qyOIM72NiCEWbYBDcz0lwBxEAyO4o0E6NP1bDc4y50VedEYIbXwSh0QGrdev1abn
 rjY9vsuUR9oznmZ6BRPPxMTY87gOSHoKvqydgSZUACEgLV9346f5qZf341OReYai
 MXUmXOM4+LdyaM1+Mec8ppvjMbLw+736NZyZtT2InusEBE+Ddp25L3hYiWnklJu8
 2uwv0AoyrwaJ8y6ADOX4thcLZq0gND0Z/Ayz/XvpeI30eftsGUCt5KOVlqwfwOkI
 4EQKvk2fAixPxg==
 =rwei
 -----END PGP SIGNATURE-----

Merge tag 'timers-vdso-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull VDSO infrastructure updates from Thomas Gleixner:

 - Consolidate the VDSO storage

   The VDSO data storage and data layout has been largely architecture
   specific for historical reasons. That increases the maintenance
   effort and causes inconsistencies over and over.

   There is no real technical reason for architecture specific layouts
   and implementations. The architecture specific details can easily be
   integrated into a generic layout, which also reduces the amount of
   duplicated code for managing the mappings.

   Convert all architectures over to a unified layout and common mapping
   infrastructure. This splits the VDSO data layout into subsystem
   specific blocks, timekeeping, random and architecture parts, which
   provides a better structure and allows to improve and update the
   functionalities without conflict and interaction.

 - Rework the timekeeping data storage

   The current implementation is designed for exposing system
   timekeeping accessors, which was good enough at the time when it was
   designed.

   PTP and Time Sensitive Networking (TSN) change that as there are
   requirements to expose independent PTP clocks, which are not related
   to system timekeeping.

   Replace the monolithic data storage by a structured layout, which
   allows to add support for independent PTP clocks on top while reusing
   both the data structures and the time accessor implementations.

* tag 'timers-vdso-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (55 commits)
  sparc/vdso: Always reject undefined references during linking
  x86/vdso: Always reject undefined references during linking
  vdso: Rework struct vdso_time_data and introduce struct vdso_clock
  vdso: Move architecture related data before basetime data
  powerpc/vdso: Prepare introduction of struct vdso_clock
  arm64/vdso: Prepare introduction of struct vdso_clock
  x86/vdso: Prepare introduction of struct vdso_clock
  time/namespace: Prepare introduction of struct vdso_clock
  vdso/namespace: Rename timens_setup_vdso_data() to reflect new vdso_clock struct
  vdso/vsyscall: Prepare introduction of struct vdso_clock
  vdso/gettimeofday: Prepare helper functions for introduction of struct vdso_clock
  vdso/gettimeofday: Prepare do_coarse_timens() for introduction of struct vdso_clock
  vdso/gettimeofday: Prepare do_coarse() for introduction of struct vdso_clock
  vdso/gettimeofday: Prepare do_hres_timens() for introduction of struct vdso_clock
  vdso/gettimeofday: Prepare do_hres() for introduction of struct vdso_clock
  vdso/gettimeofday: Prepare introduction of struct vdso_clock
  vdso/helpers: Prepare introduction of struct vdso_clock
  vdso/datapage: Define vdso_clock to prepare for multiple PTP clocks
  vdso: Make vdso_time_data cacheline aligned
  arm64: Make asm/cache.h compatible with vDSO
  ...
2025-03-25 11:30:42 -07:00
Eric Dumazet a7c428ee8f tcp/dccp: remove icsk->icsk_timeout
icsk->icsk_timeout can be replaced by icsk->icsk_retransmit_timer.expires

This saves 8 bytes in TCP/DCCP sockets and helps for better cache locality.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250324203607.703850-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25 10:34:33 -07:00
Linus Torvalds d5048d1176 Updates for the core time/timer subsystem:
- Fix a memory ordering issue in posix-timers
 
     Posix-timer lookup is lockless and reevaluates the timer validity under
     the timer lock, but the update which validates the timer is not
     protected by the timer lock. That allows the store to be reordered
     against the initialization stores, so that the lookup side can observe
     a partially initialized timer. That's mostly a theoretical problem, but
     incorrect nevertheless.
 
   - Fix a long standing inconsistency of the coarse time getters
 
     The coarse time getters read the base time of the current update cycle
     without reading the actual hardware clock. NTP frequency adjustment can
     set the base time backwards. The fine grained interfaces compensate
     this by reading the clock and applying the new conversion factor, but
     the coarse grained time getters use the base time directly. That allows
     the user to observe time going backwards.
 
     Cure it by always forwarding base time, when NTP changes the frequency
     with an immediate step.
 
   - Rework of posix-timer hashing
 
     The posix-timer hash is not scalable and due to the CRIU timer restore
     mechanism prone to massive contention on the global hash bucket lock.
 
     Replace the global hash lock with a fine grained per bucket locking
     scheme to address that.
 
   - Rework the proc/$PID/timers interface.
 
     /proc/$PID/timers is provided for CRIU to be able to restore a
     timer. The printout happens with sighand lock held and interrupts
     disabled. That's not required as this can be done with RCU protection
     as well.
 
   - Provide a sane mechanism for CRIU to restore a timer ID
 
     CRIU restores timers by creating and deleting them until the kernel
     internal per process ID counter reached the requested ID. That's
     horribly slow for sparse timer IDs.
 
     Provide a prctl() which allows CRIU to restore a timer with a given
     ID. When enabled the ID pointer is used as input pointer to read the
     requested ID from user space. When disabled, the normal allocation
     scheme (next ID) is active as before. This is backwards compatible for
     both kernel and user space.
 
   - Make hrtimer_update_function() less expensive.
 
     The sanity checks are valuable, but expensive for high frequency usage
     in io/uring. Make the debug checks conditional and enable them only
     when lockdep is enabled.
 
   - Small updates, cleanups and improvements
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmfgQ6wTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoeQzD/9p+EuUGrMbSNaLVMCYFULBbR0lersJ
 hrGGoKUsNt5T+f6hEEbSLBnkjZcMIj0J+mdIEUiRa73ryw1KmwLk/8MBu0c6u6q3
 musDvJqt3dLTG98yN0YeWK3tJDxhSjxIpwcAXusPQ04j16I2fVXFzDQ/kGPq6MTI
 tdMYzsS3wjuWpi+CbgRSP2HEwu08fIDVsQ7Grynh4Kmd31apne4ZgF2UVp6UiZyp
 8yJHZgVzJcFs7Y3MS6XTgezHnuADxMY1irzbXmok19941X8mZz2QRIpGQX+oMh6o
 g7SG2lj9i8YbLqU9/5RbC5ppjRcWfogDpW0Lk+OmdOpr0RiXTmx5Lz8Egxex9wG5
 pUJszeTY+bLw7mmYmkGZyBz+PNoGgVM5KFZRe5ENvYM8Gy8LUW5DA9zvxeHqDDz1
 FiMmKdYrwr8VCKqx+8hJQdzlzRbepxq9sNzDdMKVOUcFdGUVWekfG6ZFkfLKxwzA
 XDTKJilzXbAAj4r57vEvOCYLUZH/ZsFK4yyg0O53fEg6fj87EbTDb5+YUGazb3+C
 yNTEOQIT8LtutzLR9+xeLi92k+6zlJ4c1PfqBx5Kv/TwBrIfV1P8N2c6TCOWDoRM
 AOvo2SXEA/jEPix2GjT5jalSV1mROEXo2T9/G7kz4H7K+DkI/dGgS9mXyUDO2mMd
 ouOxYN0GohVqTQ==
 =XUGH
 -----END PGP SIGNATURE-----

Merge tag 'timers-core-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer core updates from Thomas Gleixner:

 - Fix a memory ordering issue in posix-timers

   Posix-timer lookup is lockless and reevaluates the timer validity
   under the timer lock, but the update which validates the timer is not
   protected by the timer lock. That allows the store to be reordered
   against the initialization stores, so that the lookup side can
   observe a partially initialized timer. That's mostly a theoretical
   problem, but incorrect nevertheless.

 - Fix a long standing inconsistency of the coarse time getters

   The coarse time getters read the base time of the current update
   cycle without reading the actual hardware clock. NTP frequency
   adjustment can set the base time backwards. The fine grained
   interfaces compensate this by reading the clock and applying the new
   conversion factor, but the coarse grained time getters use the base
   time directly. That allows the user to observe time going backwards.

   Cure it by always forwarding base time, when NTP changes the
   frequency with an immediate step.

 - Rework of posix-timer hashing

   The posix-timer hash is not scalable and due to the CRIU timer
   restore mechanism prone to massive contention on the global hash
   bucket lock.

   Replace the global hash lock with a fine grained per bucket locking
   scheme to address that.

 - Rework the proc/$PID/timers interface.

   /proc/$PID/timers is provided for CRIU to be able to restore a timer.
   The printout happens with sighand lock held and interrupts disabled.
   That's not required as this can be done with RCU protection as well.

 - Provide a sane mechanism for CRIU to restore a timer ID

   CRIU restores timers by creating and deleting them until the kernel
   internal per process ID counter reached the requested ID. That's
   horribly slow for sparse timer IDs.

   Provide a prctl() which allows CRIU to restore a timer with a given
   ID. When enabled the ID pointer is used as input pointer to read the
   requested ID from user space. When disabled, the normal allocation
   scheme (next ID) is active as before. This is backwards compatible
   for both kernel and user space.

 - Make hrtimer_update_function() less expensive.

   The sanity checks are valuable, but expensive for high frequency
   usage in io/uring. Make the debug checks conditional and enable them
   only when lockdep is enabled.

 - Small updates, cleanups and improvements

* tag 'timers-core-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
  selftests/timers: Improve skew_consistency by testing with other clockids
  timekeeping: Fix possible inconsistencies in _COARSE clockids
  posix-timers: Drop redundant memset() invocation
  selftests/timers/posix-timers: Add a test for exact allocation mode
  posix-timers: Provide a mechanism to allocate a given timer ID
  posix-timers: Dont iterate /proc/$PID/timers with sighand:: Siglock held
  posix-timers: Make per process list RCU safe
  posix-timers: Avoid false cacheline sharing
  posix-timers: Switch to jhash32()
  posix-timers: Improve hash table performance
  posix-timers: Make signal_struct:: Next_posix_timer_id an atomic_t
  posix-timers: Make lock_timer() use guard()
  posix-timers: Rework timer removal
  posix-timers: Simplify lock/unlock_timer()
  posix-timers: Use guards in a few places
  posix-timers: Remove SLAB_PANIC from kmem cache
  posix-timers: Remove a few paranoid warnings
  posix-timers: Cleanup includes
  posix-timers: Add cond_resched() to posix_timer_add() search loop
  posix-timers: Initialise timer before adding it to the hash table
  ...
2025-03-25 10:33:23 -07:00
Oleg Nesterov 8661bb9c71
selftests/pidfd: fixes syscall number defines
I had to spend some (a lot;) time to understand why pidfd_info_test
(and more) fails with my patch under qemu on my machine ;) Until I
applied the patch below.

I think it is a bad idea to do the things like

	#ifndef __NR_clone3
	#define __NR_clone3 -1
	#endif

because this can hide a problem. My working laptop runs Fedora-23 which
doesn't have __NR_clone3/etc in /usr/include/. So "make" happily succeeds,
but everything fails and it is not clear why.

Link: https://lore.kernel.org/r/20250323174518.GB834@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-25 14:59:05 +01:00
Filipe Xavier 474eecc882 selftests: livepatch: test if ftrace can trace a livepatched function
This new test makes sure that ftrace can trace a
function that was introduced by a livepatch.

Signed-off-by: Filipe Xavier <felipeaggger@gmail.com>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Joe Lawrence <joe.lawrence@redhat.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Tested-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20250324-ftrace-sftest-livepatch-v3-2-d9d7cc386c75@gmail.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-03-25 14:52:32 +01:00
Filipe Xavier 2ca7cd8020 selftests: livepatch: add new ftrace helpers functions
Add new ftrace helpers functions cleanup_tracing, trace_function and
check_traced_functions.

Signed-off-by: Filipe Xavier <felipeaggger@gmail.com>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Joe Lawrence <joe.lawrence@redhat.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Tested-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20250324-ftrace-sftest-livepatch-v3-1-d9d7cc386c75@gmail.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-03-25 14:46:02 +01:00
Yi Liu d57a1fb342 iommufd/selftest: Add coverage for iommufd pasid attach/detach
This tests iommufd pasid attach/replace/detach.

Link: https://patch.msgid.link/r/20250321171940.7213-19-yi.l.liu@intel.com
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-25 10:18:31 -03:00
Dmitry Safonov edbac739e4 selftests/net: Drop timeout argument from test_client_verify()
It's always TEST_TIMEOUT_SEC, with an unjustified exception in rst test,
that is more paranoia-long timeout rather than based on requirements.

Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-7-da48040153d1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25 06:10:30 -07:00
Dmitry Safonov 1e1738faa2 selftests/net: Delete timeout from test_connect_socket()
Unused: it's always either the default timeout or asynchronous
connect().

Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-6-da48040153d1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25 06:10:30 -07:00
Dmitry Safonov 266ed1ace8 selftests/net: Print the testing side in unsigned-md5
As both client and server print the same test name on failure or pass,
add "[server]" so that it's more obvious from a log which side printed
"ok" or "not ok".

Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-5-da48040153d1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25 06:10:30 -07:00
Dmitry Safonov 3f36781e57 selftests/net: Add mixed select()+polling mode to TCP-AO tests
Currently, tcp_ao tests have two timeouts: TEST_RETRANSMIT_SEC and
TEST_TIMEOUT_SEC [by default 1 and 5 seconds]. The first one,
TEST_RETRANSMIT_SEC is used for operations that are expected to succeed
in order for a test to pass. It is usually not consumed and exists only
to avoid indefinite test run if the operation didn't complete.
The second one, TEST_RETRANSMIT_SEC exists for the tests that checking
operations, that are expected to fail/timeout. It is shorter as it is
fully consumed, with an expectation that if operation didn't succeed
during that period, it will timeout. And the related test that expects
the timeout is passing. The actual operation failure is then
cross-verified by other means like counters checks.

The issue with TEST_RETRANSMIT_SEC timeout is that 1 second is the exact
initial TCP timeout. So, in case the initial segment gets lost (quite
unlikely on local veth interface between two net namespaces, yet happens
in slow VMs), the retransmission never happens and as a result, the test
is not actually testing the functionality. Which in the end fails
counters checks.

As I want tcp_ao selftests to be fast and finishing in a reasonable
amount of time on manual run, I didn't consider increasing
TEST_RETRANSMIT_SEC.

Rather, initially, BPF_SOCK_OPS_TIMEOUT_INIT looked promising as a lever
to make the initial TCP timeout shorter. But as it's not a socket bpf
attached thing, but sock_ops (attaches to cgroups), the selftests would
have to use libbpf, which I wanted to avoid if not absolutely required.

Instead, use a mixed select() and counters polling mode with the longer
TEST_TIMEOUT_SEC timeout to detect running-away failed tests. It
actually not only allows losing segments and succeeding after
the previous TEST_RETRANSMIT_SEC timeout was consumed, but makes
the tests expecting timeout/failure pass faster.

The only test case taking longer (TEST_TIMEOUT_SEC) now is connect-deny
"wrong snd id", which checks for no key on SYN-ACK for which there is no
counter in the kernel (see tcp_make_synack()). Yet it can be speed up
by poking skpair from the trace event (see trace_tcp_ao_synack_no_key).

Fixes: ed9d09b309 ("selftests/net: Add a test for TCP-AO keys matching")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/netdev/20241205070656.6ef344d7@kernel.org/
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-4-da48040153d1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25 06:10:30 -07:00
Dmitry Safonov 5a0a3193f6 selftests/net: Fetch and check TCP-MD5 counters
There are related TCP-MD5 <=> TCP and TCP-MD5 <=> TCP-AO tests
that can benefit from checking the related counters, not only from
validating operations timeouts.

It also prepares the code for introduction of mixed select()+poll mode,
see the follow-up patches.

Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-3-da48040153d1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25 06:10:30 -07:00
Dmitry Safonov 1fe4221093 selftests/net: Provide tcp-ao counters comparison helper
Rename __test_tcp_ao_counters_cmp() into test_assert_counters_ao() and
test_tcp_ao_key_counters_cmp() into test_assert_counters_key() as they
are asserts, rather than just compare functions.

Provide test_cmp_counters() helper, that's going to be used to compare
ao_info and netns counters as a stop condition for polling the sockets.

Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-2-da48040153d1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25 06:10:29 -07:00
Dmitry Safonov 65ffdf31be selftests/net: Print TCP flags in more common format
Before:
># 13145[lib/ftrace-tcp.c:427] trace event filter tcp_ao_key_not_found [2001:db8:1::1:-1 => 2001:db8:254::1:7010, L3index 0, flags: !FS!R!P!., keyid: 100, rnext: 100, maclen: -1, sne: -1] = 1

After:
># 13487[lib/ftrace-tcp.c:427] trace event filter tcp_ao_key_not_found [2001:db8:1::1:-1 => 2001:db8:254::1:7010, L3index 0, flags: S, keyid: 100, rnext: 100, maclen: -1, sne: -1] = 1

For the history, I think the initial format was to emphasize the absence
of flags as well as their presence (!R meant no RST flag). But looking
again, it's just unreadable and hard to understand.
Make it the standard/expected one.

Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-1-da48040153d1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25 06:10:29 -07:00
Song Liu 59481b8bd0 selftest/livepatch: Only run test-kprobe with CONFIG_KPROBES_ON_FTRACE
CONFIG_KPROBES_ON_FTRACE is required for test-kprobe. Skip test-kprobe
when CONFIG_KPROBES_ON_FTRACE is not set. Since some kernel may not have
/proc/config.gz, grep for kprobe_ftrace_ops from /proc/kallsyms to check
whether CONFIG_KPROBES_ON_FTRACE is enabled.

Signed-off-by: Song Liu <song@kernel.org>
Acked-by: Joe Lawrence <joe.lawrence@redhat.com>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Link: https://lore.kernel.org/r/20250318181518.1055532-1-song@kernel.org
[pmladek@suse.com: Call grep with -q option.]
Reviewed-by: Petr Mladek <pmladek@suse.com>
Tested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-03-25 13:36:40 +01:00
Linus Torvalds a49a879f0a Miscellaneous x86 cleanups by Arnd Bergmann, Charles Han,
Mirsad Todorovac, Randy Dunlap, Thorsten Blum and Zhang Kunbo.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmfeo6ERHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1gl9w//ZRxGhf3eXYVyPW35aoqaRe7W3fBJhaPn
 eLP0JuuglZO3JSTJfmbTmSXrNN4kkzI3nnuTnchK0iYZ+2Tn4E0iaz4tm5sjL5T1
 KW9ajv7QQxMOwXS69hrklY/eGxVimfs4mIyhxEhxLjPvzbGJOVS6WFgvaYH5klQQ
 7sPYXzrNGZJhGCmRqCelHSPhl7b6YVS/yWtMe+BpPBn1AqIQN+O+S/Kbu7dQokmq
 7MNy4eUe7GYgnSB7Qhq/jNSqjmGWsVwhMiuoDxx7GvShM73+kYatQZT0H+qzTxzp
 90rIPcPTNCOlKJO3xSWqXStcMH00MbplOhKdEU6SzeM+xC2aD2t4GoIG0EFxIQdS
 X0wh40Psm5GehThhpmMBJdmM4Le4TTSkHBhedpQ+sp+6BIX4A9hoeCoybIlcngaJ
 W89YKHqC5ruPSRDOzyaTMSypaEGH5VHP6AMj0CmkuyHRbxADsMazifpOVOw9AvVk
 IR0USCzyenjLMiugRrJgRpy8R2Q1jp35bP0e7Y4QvmydEGC1Wl5mW5Y8B9Zdlk0C
 iuxoqwntem7PAkEA2JejW6rzuRICxXloKv+xjtQjKzXpK+pwhhSp830zT8gaWZ1Z
 hdQRz7gSlJg3JzWinX8+XNusdHw9TxCUqwgLw8486nzNp01GU62gby5DWgECEdae
 2IyRRSd0FyA=
 =GsK4
 -----END PGP SIGNATURE-----

Merge tag 'x86-cleanups-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cleanups from Ingo Molnar:
 "Miscellaneous x86 cleanups by Arnd Bergmann, Charles Han, Mirsad
  Todorovac, Randy Dunlap, Thorsten Blum and Zhang Kunbo"

* tag 'x86-cleanups-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/coco: Replace 'static const cc_mask' with the newly introduced cc_get_mask() function
  x86/delay: Fix inconsistent whitespace
  selftests/x86/syscall: Fix coccinelle WARNING recommending the use of ARRAY_SIZE()
  x86/platform: Fix missing declaration of 'x86_apple_machine'
  x86/irq: Fix missing declaration of 'io_apic_irqs'
  x86/usercopy: Fix kernel-doc func param name in clean_cache_range()'s description
  x86/apic: Use str_disabled_enabled() helper in print_ipi_mode()
2025-03-24 22:39:53 -07:00
Linus Torvalds 71b639af06 x86/fpu updates for v6.15:
- Improve crypto performance by making kernel-mode FPU reliably usable
    in softirqs ((Eric Biggers)
 
  - Fully optimize out WARN_ON_FPU() (Eric Biggers)
 
  - Initial steps to support Support Intel APX (Advanced Performance Extensions)
    (Chang S. Bae)
 
  - Fix KASAN for arch_dup_task_struct() (Benjamin Berg)
 
  - Refine and simplify the FPU magic number check during signal return
    (Chang S. Bae)
 
  - Fix inconsistencies in guest FPU xfeatures (Chao Gao, Stanislav Spassov)
 
  - selftests/x86/xstate: Introduce common code for testing extended states
    (Chang S. Bae)
 
  - Misc fixes and cleanups (Borislav Petkov, Colin Ian King, Uros Bizjak)
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmfepZQRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1grLw/+Lt71PQWu/uVt3/dU0VkpppTqRmUTujuR
 XVwFlBVxMFw9A8wi56bLD3WT28mKWr0SepynBt1Wr7/pZCnaW4SH/91ERChRqTzU
 ifawmqQqhrVhFlXziDOKp9lcDy9f7NjJVKBfEBa1VBQGC0Q8FzeasOhCwTJw8qXa
 Lj2z8zenhDq6NBkk22VlScc1CvsUBiXppvm8uk3md7HKaDqdzmCakm/a0FgaEppt
 K6P/u1iaVvGXme2CHSskCshkYoEFIF6LRNYnkVloXsVV4AeWeLaJ54xW+syPd/4H
 EX6oMLifdXCmxmsmi3LPRUjBgdSMsDaAkLsPNoX1w4uWjsihqyUl4wTEpobMIuu1
 PWOrCxQKxtaGoECJ0nsE4uR7MZEQ0sYCGKv4JSiiXeUVf/EDRuUBfvBCoGlDWXYA
 GZcoMmH+BtcYuQdzbrc/vWkfo3bpTxL3x5zsgtFkQL3xyzhugRPNgeOn9yuSZ9x6
 AD8QS0G6uRcc9a5ZeeTEcYxoIE/+vvfnh1wfMjisehkVn179ixfZdgcSGrZJUNyH
 a7LmUmwLvLYNO5DUUGs1upN8YHP7jiohNc/r2ZC1IX5LHbuK1gyXA4xo6hAZ8r/7
 XZ2FfRp0dr3PvuWIwq2v6T5JHvALADsKCAjqvNdwHLkxF87ygixT+B87wTA3H6ov
 LSY10A/eRx4=
 =U7PR
 -----END PGP SIGNATURE-----

Merge tag 'x86-fpu-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86/fpu updates from Ingo Molnar:

 - Improve crypto performance by making kernel-mode FPU reliably usable
   in softirqs ((Eric Biggers)

 - Fully optimize out WARN_ON_FPU() (Eric Biggers)

 - Initial steps to support Support Intel APX (Advanced Performance
   Extensions) (Chang S. Bae)

 - Fix KASAN for arch_dup_task_struct() (Benjamin Berg)

 - Refine and simplify the FPU magic number check during signal return
   (Chang S. Bae)

 - Fix inconsistencies in guest FPU xfeatures (Chao Gao, Stanislav
   Spassov)

 - selftests/x86/xstate: Introduce common code for testing extended
   states (Chang S. Bae)

 - Misc fixes and cleanups (Borislav Petkov, Colin Ian King, Uros
   Bizjak)

* tag 'x86-fpu-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/fpu/xstate: Fix inconsistencies in guest FPU xfeatures
  x86/fpu: Clarify the "xa" symbolic name used in the XSTATE* macros
  x86/fpu: Use XSAVE{,OPT,C,S} and XRSTOR{,S} mnemonics in xstate.h
  x86/fpu: Improve crypto performance by making kernel-mode FPU reliably usable in softirqs
  x86/fpu/xstate: Simplify print_xstate_features()
  x86/fpu: Refine and simplify the magic number check during signal return
  selftests/x86/xstate: Fix spelling mistake "hader" -> "header"
  x86/fpu: Avoid copying dynamic FP state from init_task in arch_dup_task_struct()
  vmlinux.lds.h: Remove entry to place init_task onto init_stack
  selftests/x86/avx: Add AVX tests
  selftests/x86/xstate: Clarify supported xstates
  selftests/x86/xstate: Consolidate test invocations into a single entry
  selftests/x86/xstate: Introduce signal ABI test
  selftests/x86/xstate: Refactor ptrace ABI test
  selftests/x86/xstate: Refactor context switching test
  selftests/x86/xstate: Enumerate and name xstate components
  selftests/x86/xstate: Refactor XSAVE helpers for general use
  selftests/x86: Consolidate redundant signal helper functions
  x86/fpu: Fix guest FPU state buffer allocation size
  x86/fpu: Fully optimize out WARN_ON_FPU()
2025-03-24 22:27:18 -07:00
Linus Torvalds e34c38057a [ Merge note: this pull request depends on you having merged
two locking commits in the locking tree,
 	      part of the locking-core-2025-03-22 pull request. ]
 
 x86 CPU features support:
   - Generate the <asm/cpufeaturemasks.h> header based on build config
     (H. Peter Anvin, Xin Li)
   - x86 CPUID parsing updates and fixes (Ahmed S. Darwish)
   - Introduce the 'setcpuid=' boot parameter (Brendan Jackman)
   - Enable modifying CPU bug flags with '{clear,set}puid='
     (Brendan Jackman)
   - Utilize CPU-type for CPU matching (Pawan Gupta)
   - Warn about unmet CPU feature dependencies (Sohil Mehta)
   - Prepare for new Intel Family numbers (Sohil Mehta)
 
 Percpu code:
   - Standardize & reorganize the x86 percpu layout and
     related cleanups (Brian Gerst)
   - Convert the stackprotector canary to a regular percpu
     variable (Brian Gerst)
   - Add a percpu subsection for cache hot data (Brian Gerst)
   - Unify __pcpu_op{1,2}_N() macros to __pcpu_op_N() (Uros Bizjak)
   - Construct __percpu_seg_override from __percpu_seg (Uros Bizjak)
 
 MM:
   - Add support for broadcast TLB invalidation using AMD's INVLPGB instruction
     (Rik van Riel)
   - Rework ROX cache to avoid writable copy (Mike Rapoport)
   - PAT: restore large ROX pages after fragmentation
     (Kirill A. Shutemov, Mike Rapoport)
   - Make memremap(MEMREMAP_WB) map memory as encrypted by default
     (Kirill A. Shutemov)
   - Robustify page table initialization (Kirill A. Shutemov)
   - Fix flush_tlb_range() when used for zapping normal PMDs (Jann Horn)
   - Clear _PAGE_DIRTY for kernel mappings when we clear _PAGE_RW
     (Matthew Wilcox)
 
 KASLR:
   - x86/kaslr: Reduce KASLR entropy on most x86 systems,
     to support PCI BAR space beyond the 10TiB region
     (CONFIG_PCI_P2PDMA=y) (Balbir Singh)
 
 CPU bugs:
   - Implement FineIBT-BHI mitigation (Peter Zijlstra)
   - speculation: Simplify and make CALL_NOSPEC consistent (Pawan Gupta)
   - speculation: Add a conditional CS prefix to CALL_NOSPEC (Pawan Gupta)
   - RFDS: Exclude P-only parts from the RFDS affected list (Pawan Gupta)
 
 System calls:
   - Break up entry/common.c (Brian Gerst)
   - Move sysctls into arch/x86 (Joel Granados)
 
 Intel LAM support updates: (Maciej Wieczor-Retman)
   - selftests/lam: Move cpu_has_la57() to use cpuinfo flag
   - selftests/lam: Skip test if LAM is disabled
   - selftests/lam: Test get_user() LAM pointer handling
 
 AMD SMN access updates:
   - Add SMN offsets to exclusive region access (Mario Limonciello)
   - Add support for debugfs access to SMN registers (Mario Limonciello)
   - Have HSMP use SMN through AMD_NODE (Yazen Ghannam)
 
 Power management updates: (Patryk Wlazlyn)
   - Allow calling mwait_play_dead with an arbitrary hint
   - ACPI/processor_idle: Add FFH state handling
   - intel_idle: Provide the default enter_dead() handler
   - Eliminate mwait_play_dead_cpuid_hint()
 
 Bootup:
 
 Build system:
   - Raise the minimum GCC version to 8.1 (Brian Gerst)
   - Raise the minimum LLVM version to 15.0.0
     (Nathan Chancellor)
 
 Kconfig: (Arnd Bergmann)
   - Add cmpxchg8b support back to Geode CPUs
   - Drop 32-bit "bigsmp" machine support
   - Rework CONFIG_GENERIC_CPU compiler flags
   - Drop configuration options for early 64-bit CPUs
   - Remove CONFIG_HIGHMEM64G support
   - Drop CONFIG_SWIOTLB for PAE
   - Drop support for CONFIG_HIGHPTE
   - Document CONFIG_X86_INTEL_MID as 64-bit-only
   - Remove old STA2x11 support
   - Only allow CONFIG_EISA for 32-bit
 
 Headers:
   - Replace __ASSEMBLY__ with __ASSEMBLER__ in UAPI and non-UAPI headers
     (Thomas Huth)
 
 Assembly code & machine code patching:
   - x86/alternatives: Simplify alternative_call() interface (Josh Poimboeuf)
   - x86/alternatives: Simplify callthunk patching (Peter Zijlstra)
   - KVM: VMX: Use named operands in inline asm (Josh Poimboeuf)
   - x86/hyperv: Use named operands in inline asm (Josh Poimboeuf)
   - x86/traps: Cleanup and robustify decode_bug() (Peter Zijlstra)
   - x86/kexec: Merge x86_32 and x86_64 code using macros from <asm/asm.h>
     (Uros Bizjak)
   - Use named operands in inline asm (Uros Bizjak)
   - Improve performance by using asm_inline() for atomic locking instructions
     (Uros Bizjak)
 
 Earlyprintk:
   - Harden early_serial (Peter Zijlstra)
 
 NMI handler:
   - Add an emergency handler in nmi_desc & use it in nmi_shootdown_cpus()
     (Waiman Long)
 
 Miscellaneous fixes and cleanups:
 
   - by Ahmed S. Darwish, Andy Shevchenko, Ard Biesheuvel,
     Artem Bityutskiy, Borislav Petkov, Brendan Jackman, Brian Gerst,
     Dan Carpenter, Dr. David Alan Gilbert, H. Peter Anvin,
     Ingo Molnar, Josh Poimboeuf, Kevin Brodsky, Mike Rapoport,
     Lukas Bulwahn, Maciej Wieczor-Retman, Max Grobecker,
     Patryk Wlazlyn, Pawan Gupta, Peter Zijlstra,
     Philip Redkin, Qasim Ijaz, Rik van Riel, Thomas Gleixner,
     Thorsten Blum, Tom Lendacky, Tony Luck, Uros Bizjak,
     Vitaly Kuznetsov, Xin Li, liuye.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmfenkQRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1g1FRAAi6OFTSn/5aeLMI0IMNBxJ6ddQiFc3imd
 7+C/vU5nul4CyDs8mKyj/+f/DDrbkG9lKz3VG631Yl237lXHjD8XWcVMeC/1z/q0
 3zInDIloE9/nBHRPkF6F7fARBLBZ0LFgaBsGrCo7mwpGybiQdqGcqcxllvTbtXaw
 OHta4q6ok+lBDNlfc0v6H4cRnzhmmlKu6Ng0j6UI3V7uFhi3vtxas32ltDQtzorq
 2+jbV6/+kbrrv+xPC+jlzOFhTEKRupNPQXmvyQteoQg6G3kqAKMDvBthGXd1rHuX
 Qa+BoDIifE/2NiVeRwNrhoqYH/pHCzUzDREW5IW8+ca+4XNKuzAC6EuC8CeCzyK1
 q8ZjZjooQW4zEeVFeJYllHONzJYfxfSH5CLsnbcuhq99yfGlrQhF1qL72/Omn1w/
 DfPJM8Zt5zyKvLqUg3Md+fkVCO2wyDNhB61QPzRgHF+yD+rvuDpoqvUWir+w7cSn
 fwEDVZGXlFx6dumtSrqRaTd1nvFt80s8yP2ll09DMvGQ8D/yruS7hndGAmmJVCSW
 NAfd8pSjq5v2+ux2UR92/Cc3VF3SjaUqHBOp/Nq9rESya18ZVa3cJpHhVYYtPIVf
 THW0h07RIkGVKs1uq+5ekLCr/8uAZg58UPIqmhTuW0ttymRHCNfohR45FQZzy+0M
 tJj1oc2TIZw=
 =Dcb3
 -----END PGP SIGNATURE-----

Merge tag 'x86-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull core x86 updates from Ingo Molnar:
 "x86 CPU features support:
   - Generate the <asm/cpufeaturemasks.h> header based on build config
     (H. Peter Anvin, Xin Li)
   - x86 CPUID parsing updates and fixes (Ahmed S. Darwish)
   - Introduce the 'setcpuid=' boot parameter (Brendan Jackman)
   - Enable modifying CPU bug flags with '{clear,set}puid=' (Brendan
     Jackman)
   - Utilize CPU-type for CPU matching (Pawan Gupta)
   - Warn about unmet CPU feature dependencies (Sohil Mehta)
   - Prepare for new Intel Family numbers (Sohil Mehta)

  Percpu code:
   - Standardize & reorganize the x86 percpu layout and related cleanups
     (Brian Gerst)
   - Convert the stackprotector canary to a regular percpu variable
     (Brian Gerst)
   - Add a percpu subsection for cache hot data (Brian Gerst)
   - Unify __pcpu_op{1,2}_N() macros to __pcpu_op_N() (Uros Bizjak)
   - Construct __percpu_seg_override from __percpu_seg (Uros Bizjak)

  MM:
   - Add support for broadcast TLB invalidation using AMD's INVLPGB
     instruction (Rik van Riel)
   - Rework ROX cache to avoid writable copy (Mike Rapoport)
   - PAT: restore large ROX pages after fragmentation (Kirill A.
     Shutemov, Mike Rapoport)
   - Make memremap(MEMREMAP_WB) map memory as encrypted by default
     (Kirill A. Shutemov)
   - Robustify page table initialization (Kirill A. Shutemov)
   - Fix flush_tlb_range() when used for zapping normal PMDs (Jann Horn)
   - Clear _PAGE_DIRTY for kernel mappings when we clear _PAGE_RW
     (Matthew Wilcox)

  KASLR:
   - x86/kaslr: Reduce KASLR entropy on most x86 systems, to support PCI
     BAR space beyond the 10TiB region (CONFIG_PCI_P2PDMA=y) (Balbir
     Singh)

  CPU bugs:
   - Implement FineIBT-BHI mitigation (Peter Zijlstra)
   - speculation: Simplify and make CALL_NOSPEC consistent (Pawan Gupta)
   - speculation: Add a conditional CS prefix to CALL_NOSPEC (Pawan
     Gupta)
   - RFDS: Exclude P-only parts from the RFDS affected list (Pawan
     Gupta)

  System calls:
   - Break up entry/common.c (Brian Gerst)
   - Move sysctls into arch/x86 (Joel Granados)

  Intel LAM support updates: (Maciej Wieczor-Retman)
   - selftests/lam: Move cpu_has_la57() to use cpuinfo flag
   - selftests/lam: Skip test if LAM is disabled
   - selftests/lam: Test get_user() LAM pointer handling

  AMD SMN access updates:
   - Add SMN offsets to exclusive region access (Mario Limonciello)
   - Add support for debugfs access to SMN registers (Mario Limonciello)
   - Have HSMP use SMN through AMD_NODE (Yazen Ghannam)

  Power management updates: (Patryk Wlazlyn)
   - Allow calling mwait_play_dead with an arbitrary hint
   - ACPI/processor_idle: Add FFH state handling
   - intel_idle: Provide the default enter_dead() handler
   - Eliminate mwait_play_dead_cpuid_hint()

  Build system:
   - Raise the minimum GCC version to 8.1 (Brian Gerst)
   - Raise the minimum LLVM version to 15.0.0 (Nathan Chancellor)

  Kconfig: (Arnd Bergmann)
   - Add cmpxchg8b support back to Geode CPUs
   - Drop 32-bit "bigsmp" machine support
   - Rework CONFIG_GENERIC_CPU compiler flags
   - Drop configuration options for early 64-bit CPUs
   - Remove CONFIG_HIGHMEM64G support
   - Drop CONFIG_SWIOTLB for PAE
   - Drop support for CONFIG_HIGHPTE
   - Document CONFIG_X86_INTEL_MID as 64-bit-only
   - Remove old STA2x11 support
   - Only allow CONFIG_EISA for 32-bit

  Headers:
   - Replace __ASSEMBLY__ with __ASSEMBLER__ in UAPI and non-UAPI
     headers (Thomas Huth)

  Assembly code & machine code patching:
   - x86/alternatives: Simplify alternative_call() interface (Josh
     Poimboeuf)
   - x86/alternatives: Simplify callthunk patching (Peter Zijlstra)
   - KVM: VMX: Use named operands in inline asm (Josh Poimboeuf)
   - x86/hyperv: Use named operands in inline asm (Josh Poimboeuf)
   - x86/traps: Cleanup and robustify decode_bug() (Peter Zijlstra)
   - x86/kexec: Merge x86_32 and x86_64 code using macros from
     <asm/asm.h> (Uros Bizjak)
   - Use named operands in inline asm (Uros Bizjak)
   - Improve performance by using asm_inline() for atomic locking
     instructions (Uros Bizjak)

  Earlyprintk:
   - Harden early_serial (Peter Zijlstra)

  NMI handler:
   - Add an emergency handler in nmi_desc & use it in
     nmi_shootdown_cpus() (Waiman Long)

  Miscellaneous fixes and cleanups:
   - by Ahmed S. Darwish, Andy Shevchenko, Ard Biesheuvel, Artem
     Bityutskiy, Borislav Petkov, Brendan Jackman, Brian Gerst, Dan
     Carpenter, Dr. David Alan Gilbert, H. Peter Anvin, Ingo Molnar,
     Josh Poimboeuf, Kevin Brodsky, Mike Rapoport, Lukas Bulwahn, Maciej
     Wieczor-Retman, Max Grobecker, Patryk Wlazlyn, Pawan Gupta, Peter
     Zijlstra, Philip Redkin, Qasim Ijaz, Rik van Riel, Thomas Gleixner,
     Thorsten Blum, Tom Lendacky, Tony Luck, Uros Bizjak, Vitaly
     Kuznetsov, Xin Li, liuye"

* tag 'x86-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (211 commits)
  zstd: Increase DYNAMIC_BMI2 GCC version cutoff from 4.8 to 11.0 to work around compiler segfault
  x86/asm: Make asm export of __ref_stack_chk_guard unconditional
  x86/mm: Only do broadcast flush from reclaim if pages were unmapped
  perf/x86/intel, x86/cpu: Replace Pentium 4 model checks with VFM ones
  perf/x86/intel, x86/cpu: Simplify Intel PMU initialization
  x86/headers: Replace __ASSEMBLY__ with __ASSEMBLER__ in non-UAPI headers
  x86/headers: Replace __ASSEMBLY__ with __ASSEMBLER__ in UAPI headers
  x86/locking/atomic: Improve performance by using asm_inline() for atomic locking instructions
  x86/asm: Use asm_inline() instead of asm() in clwb()
  x86/asm: Use CLFLUSHOPT and CLWB mnemonics in <asm/special_insns.h>
  x86/hweight: Use asm_inline() instead of asm()
  x86/hweight: Use ASM_CALL_CONSTRAINT in inline asm()
  x86/hweight: Use named operands in inline asm()
  x86/stackprotector/64: Only export __ref_stack_chk_guard on CONFIG_SMP
  x86/head/64: Avoid Clang < 17 stack protector in startup code
  x86/kexec: Merge x86_32 and x86_64 code using macros from <asm/asm.h>
  x86/runtime-const: Add the RUNTIME_CONST_PTR assembly macro
  x86/cpu/intel: Limit the non-architectural constant_tsc model checks
  x86/mm/pat: Replace Intel x86_model checks with VFM ones
  x86/cpu/intel: Fix fast string initialization for extended Families
  ...
2025-03-24 22:06:11 -07:00
Linus Torvalds 32b22538be Scheduler updates for v6.15:
[ Merge note, these two commits are identical:
 
    - f3fa0e40df ("sched/clock: Don't define sched_clock_irqtime as static key")
    - b9f2b29b94 ("sched: Don't define sched_clock_irqtime as static key")
 
   The first one is a cherry-picked version of the second, and the first one
   is already upstream. ]
 
 Core & fair scheduler changes:
 
   - Cancel the slice protection of the idle entity (Zihan Zhou)
   - Reduce the default slice to avoid tasks getting an extra tick
     (Zihan Zhou)
   - Force propagating min_slice of cfs_rq when {en,de}queue tasks
     (Tianchen Ding)
   - Refactor can_migrate_task() to elimate looping (I Hsin Cheng)
   - Add unlikey branch hints to several system calls (Colin Ian King)
   - Optimize current_clr_polling() on certain architectures (Yujun Dong)
 
 Deadline scheduler: (Juri Lelli)
 
   - Remove redundant dl_clear_root_domain call
   - Move dl_rebuild_rd_accounting to cpuset.h
 
 Uclamp:
 
   - Use the uclamp_is_used() helper instead of open-coding it (Xuewen Yan)
   - Optimize sched_uclamp_used static key enabling (Xuewen Yan)
 
 Scheduler topology support: (Juri Lelli)
 
   - Ignore special tasks when rebuilding domains
   - Add wrappers for sched_domains_mutex
   - Generalize unique visiting of root domains
   - Rebuild root domain accounting after every update
   - Remove partition_and_rebuild_sched_domains
   - Stop exposing partition_sched_domains_locked
 
 RSEQ: (Michael Jeanson)
 
   - Update kernel fields in lockstep with CONFIG_DEBUG_RSEQ=y
   - Fix segfault on registration when rseq_cs is non-zero
   - selftests: Add rseq syscall errors test
   - selftests: Ensure the rseq ABI TLS is actually 1024 bytes
 
 Membarriers:
 
   - Fix redundant load of membarrier_state (Nysal Jan K.A.)
 
 Scheduler debugging:
 
   - Introduce and use preempt_model_str() (Sebastian Andrzej Siewior)
   - Make CONFIG_SCHED_DEBUG unconditional (Ingo Molnar)
 
 Fixes and cleanups:
 
   - Always save/restore x86 TSC sched_clock() on suspend/resume
    (Guilherme G. Piccoli)
 
   - Misc fixes and cleanups (Thorsten Blum, Juri Lelli,
     Sebastian Andrzej Siewior)
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmfejsoRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1ivkhAAwBF2tYRBS1oIHcC/OKK3JJoHVDp2LFbU
 9sm5S3ZlGD/Ns2fbpY+9A8UFgUFfjYiTSV7hvf2B9Vge0XSxTmMNFu/MdxLBbo9r
 w6GSeNcNDQKpjEGLkrmPFsa2fiYI4dmH0IzDbS9V2cNPk470QBKjAKXNPaSER691
 n2wLnQq+m5o4gXnPjnSz6RrrzisRnm2GOWnDV/iqR47pZFNlX2wWlo3s5r7//Hw0
 a+QfEfpgKehhy/VSDXmSAgpqnNffjc78yBV6LNoVUddwahnOWiQMS3XViOqgy5VO
 jUGBrzW+sKkdRMBppxwJ/0XWgHGC27amIgnU0ZE5u+eiUEu8H9qWl1cRCFxyeB0O
 8+WNfwmkH+FPWUdsn84kdePhSsZy6HfM6h44Xe0hx1V7tQXEXfbPzK3TnQg8Ktt1
 Ky6ctbZt4cGpqGQuIqvba21A/racrD/DgvB7mHeZksnqZoKTDwxhT/nlQGpuwPoy
 SJYd1ynFVJvfC69SwMdwnaglimvEZx1GfT0o5XtCMslY5NkWCou5u+e65WX7ccU5
 94wBCwI1/+KiFMJZp6TlPw07Q/Hsj9dDryxLc3OunU3zMVt3++1GZnBS/eb5FN1A
 5TlAEpxgH9c5Q4/XJFCvx21DrTVHSuIrR6naK91bgqHo0cEfaHGtO/ejPtJGSfxe
 YIFnnu1dhRw=
 =SuQE
 -----END PGP SIGNATURE-----

Merge tag 'sched-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler updates from Ingo Molnar:
 "Core & fair scheduler changes:

   - Cancel the slice protection of the idle entity (Zihan Zhou)
   - Reduce the default slice to avoid tasks getting an extra tick
     (Zihan Zhou)
   - Force propagating min_slice of cfs_rq when {en,de}queue tasks
     (Tianchen Ding)
   - Refactor can_migrate_task() to elimate looping (I Hsin Cheng)
   - Add unlikey branch hints to several system calls (Colin Ian King)
   - Optimize current_clr_polling() on certain architectures (Yujun
     Dong)

  Deadline scheduler: (Juri Lelli)
   - Remove redundant dl_clear_root_domain call
   - Move dl_rebuild_rd_accounting to cpuset.h

  Uclamp:
   - Use the uclamp_is_used() helper instead of open-coding it (Xuewen
     Yan)
   - Optimize sched_uclamp_used static key enabling (Xuewen Yan)

  Scheduler topology support: (Juri Lelli)
   - Ignore special tasks when rebuilding domains
   - Add wrappers for sched_domains_mutex
   - Generalize unique visiting of root domains
   - Rebuild root domain accounting after every update
   - Remove partition_and_rebuild_sched_domains
   - Stop exposing partition_sched_domains_locked

  RSEQ: (Michael Jeanson)
   - Update kernel fields in lockstep with CONFIG_DEBUG_RSEQ=y
   - Fix segfault on registration when rseq_cs is non-zero
   - selftests: Add rseq syscall errors test
   - selftests: Ensure the rseq ABI TLS is actually 1024 bytes

  Membarriers:
   - Fix redundant load of membarrier_state (Nysal Jan K.A.)

  Scheduler debugging:
   - Introduce and use preempt_model_str() (Sebastian Andrzej Siewior)
   - Make CONFIG_SCHED_DEBUG unconditional (Ingo Molnar)

  Fixes and cleanups:
   - Always save/restore x86 TSC sched_clock() on suspend/resume
     (Guilherme G. Piccoli)
   - Misc fixes and cleanups (Thorsten Blum, Juri Lelli, Sebastian
     Andrzej Siewior)"

* tag 'sched-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
  cpuidle, sched: Use smp_mb__after_atomic() in current_clr_polling()
  sched/debug: Remove CONFIG_SCHED_DEBUG
  sched/debug: Remove CONFIG_SCHED_DEBUG from self-test config files
  sched/debug, Documentation: Remove (most) CONFIG_SCHED_DEBUG references from documentation
  sched/debug: Make CONFIG_SCHED_DEBUG functionality unconditional
  sched/debug: Make 'const_debug' tunables unconditional __read_mostly
  sched/debug: Change SCHED_WARN_ON() to WARN_ON_ONCE()
  rseq/selftests: Fix namespace collision with rseq UAPI header
  include/{topology,cpuset}: Move dl_rebuild_rd_accounting to cpuset.h
  sched/topology: Stop exposing partition_sched_domains_locked
  cgroup/cpuset: Remove partition_and_rebuild_sched_domains
  sched/topology: Remove redundant dl_clear_root_domain call
  sched/deadline: Rebuild root domain accounting after every update
  sched/deadline: Generalize unique visiting of root domains
  sched/topology: Wrappers for sched_domains_mutex
  sched/deadline: Ignore special tasks when rebuilding domains
  tracing: Use preempt_model_str()
  xtensa: Rely on generic printing of preemption model
  x86: Rely on generic printing of preemption model
  s390: Rely on generic printing of preemption model
  ...
2025-03-24 21:28:12 -07:00
Linus Torvalds 3ba7dfb8da RCU pull request for v6.15
This pull request contains the following branches:
 
 docs.2025.02.04a:
  - Add broken-timing possibility to stallwarn.rst.
  - Improve discussion of this_cpu_ptr(), add raw_cpu_ptr().
  - Document self-propagating callbacks.
  - Point call_srcu() to call_rcu() for detailed memory ordering.
  - Add CONFIG_RCU_LAZY delays to call_rcu() kernel-doc header.
  - Clarify RCU_LAZY and RCU_LAZY_DEFAULT_OFF help text.
  - Remove references to old grace-period-wait primitives.
 
 srcu.2025.02.05a:
  - Introduce srcu_read_{un,}lock_fast(), which is similar to
    srcu_read_{un,}lock_lite(): avoid smp_mb()s in lock and unlock at the
    cost of calling synchronize_rcu() in synchronize_srcu(). Moreover, by
    returning the percpu offset of the counter at srcu_read_lock_fast()
    time, srcu_read_unlock_fast() can save extra pointer dereferencing,
    which makes it faster than srcu_read_{un,}lock_lite().
    srcu_read_{un,}lock_fast() are intended to replace
    rcu_read_{un,}lock_trace() if possible.
 
 torture.2025.02.05a:
  - Add get_torture_init_jiffies() to return the start time of the test.
  - Add a test_boost_holdoff module parameter to allow delaying boosting
    tests when building rcutorture as built-in.
  - Add grace period sequence number logging at the beginning and end of
    failure/close-call results.
  - Switch to hexadecimal for the expedited grace period sequence number
    in the rcu_exp_grace_period trace point.
  - Make cur_ops->format_gp_seqs take buffer length.
  - Move RCU_TORTURE_TEST_{CHK_RDR_STATE,LOG_CPU} to bool.
  - Complain when invalid SRCU reader_flavor is specified.
  - Add FORCE_NEED_SRCU_NMI_SAFE Kconfig for testing, which forces SRCU
    uses atomics even when percpu ops are NMI safe, and use the Kconfig
    for SRCU lockdep testing.
 
 misc.2025.03.04a:
  - Split rcu_report_exp_cpu_mult() mask parameter and use for tracing.
  - Remove READ_ONCE() for rdp->gpwrap access in __note_gp_changes().
  - Fix get_state_synchronize_rcu_full() GP-start detection.
  - Move RCU Tasks self-tests to core_initcall().
  - Print segment lengths in show_rcu_nocb_gp_state().
  - Make RCU watch ct_kernel_exit_state() warning.
  - Flush console log from kernel_power_off().
  - rcutorture: Allow a negative value for nfakewriters.
  - rcu: Update TREE05.boot to test normal synchronize_rcu().
  - rcu: Use _full() API to debug synchronize_rcu().
 
 lazypreempt.2025.03.04a: Make RCU handle PREEMPT_LAZY better:
  - Fix header guard for rcu_all_qs().
  - rcu: Rename PREEMPT_AUTO to PREEMPT_LAZY.
  - Update __cond_resched comment about RCU quiescent states.
  - Handle unstable rdp in rcu_read_unlock_strict().
  - Handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y.
  - osnoise: Provide quiescent states.
  - Adjust rcutorture with possible PREEMPT_RCU=n && PREEMPT_COUNT=y
    combination.
  - Limit PREEMPT_RCU configurations.
  - Make rcutorture senario TREE07 and senario TREE10 use PREEMPT_LAZY=y.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEj5IosQTPz8XU1wRHSXnow7UH+rgFAmfeBLQACgkQSXnow7UH
 +rh11Qf/Rt6IZJ/YT/V9Sd+8hMx4O0BMh779pr9cD6mbAG+FDk2Yeva1m8vIdFOb
 qId6oc8K/ef2JfFjSn0oHMzQP2D3XUyiJWPNbBDHv/D8Os8GZgjzu8dkxVkSbdbY
 OxtvIflbcqFN1JDJfGKZnTEW0/YxGqfnS9b6R7iyyA7SOGQ/WubGOE5qNCqPufc9
 zJiP+qTUFYQzCIiPlEJul39o9KboPogbt3QAAQjWmi3utd77ehJnm/15FvAjyau4
 uhC2cnGfMY535rQaiaQeBQ/IHIowKripCq0JQFvcUNdyArZM3HOI2x79+2II6ft7
 mjHskNODOIJHfW2o1RzQ0yRYAywFIg==
 =J+mH
 -----END PGP SIGNATURE-----

Merge tag 'rcu-next-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux

Pull RCU updates from Boqun Feng:
 "Documentation:
   - Add broken-timing possibility to stallwarn.rst
   - Improve discussion of this_cpu_ptr(), add raw_cpu_ptr()
   - Document self-propagating callbacks
   - Point call_srcu() to call_rcu() for detailed memory ordering
   - Add CONFIG_RCU_LAZY delays to call_rcu() kernel-doc header
   - Clarify RCU_LAZY and RCU_LAZY_DEFAULT_OFF help text
   - Remove references to old grace-period-wait primitives

  srcu:
   - Introduce srcu_read_{un,}lock_fast(), which is similar to
     srcu_read_{un,}lock_lite(): avoid smp_mb()s in lock and unlock
     at the cost of calling synchronize_rcu() in synchronize_srcu()

     Moreover, by returning the percpu offset of the counter at
     srcu_read_lock_fast() time, srcu_read_unlock_fast() can avoid
     extra pointer dereferencing, which makes it faster than
     srcu_read_{un,}lock_lite()

     srcu_read_{un,}lock_fast() are intended to replace
     rcu_read_{un,}lock_trace() if possible

  RCU torture:
   - Add get_torture_init_jiffies() to return the start time of the test
   - Add a test_boost_holdoff module parameter to allow delaying
     boosting tests when building rcutorture as built-in
   - Add grace period sequence number logging at the beginning and end
     of failure/close-call results
   - Switch to hexadecimal for the expedited grace period sequence
     number in the rcu_exp_grace_period trace point
   - Make cur_ops->format_gp_seqs take buffer length
   - Move RCU_TORTURE_TEST_{CHK_RDR_STATE,LOG_CPU} to bool
   - Complain when invalid SRCU reader_flavor is specified
   - Add FORCE_NEED_SRCU_NMI_SAFE Kconfig for testing, which forces SRCU
     uses atomics even when percpu ops are NMI safe, and use the Kconfig
     for SRCU lockdep testing

  Misc:
   - Split rcu_report_exp_cpu_mult() mask parameter and use for tracing
   - Remove READ_ONCE() for rdp->gpwrap access in __note_gp_changes()
   - Fix get_state_synchronize_rcu_full() GP-start detection
   - Move RCU Tasks self-tests to core_initcall()
   - Print segment lengths in show_rcu_nocb_gp_state()
   - Make RCU watch ct_kernel_exit_state() warning
   - Flush console log from kernel_power_off()
   - rcutorture: Allow a negative value for nfakewriters
   - rcu: Update TREE05.boot to test normal synchronize_rcu()
   - rcu: Use _full() API to debug synchronize_rcu()

  Make RCU handle PREEMPT_LAZY better:
   - Fix header guard for rcu_all_qs()
   - rcu: Rename PREEMPT_AUTO to PREEMPT_LAZY
   - Update __cond_resched comment about RCU quiescent states
   - Handle unstable rdp in rcu_read_unlock_strict()
   - Handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y
   - osnoise: Provide quiescent states
   - Adjust rcutorture with possible PREEMPT_RCU=n && PREEMPT_COUNT=y
     combination
   - Limit PREEMPT_RCU configurations
   - Make rcutorture senario TREE07 and senario TREE10 use
     PREEMPT_LAZY=y"

* tag 'rcu-next-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (59 commits)
  rcutorture: Make scenario TREE07 build CONFIG_PREEMPT_LAZY=y
  rcutorture: Make scenario TREE10 build CONFIG_PREEMPT_LAZY=y
  rcu: limit PREEMPT_RCU configurations
  rcutorture: Update ->extendables check for lazy preemption
  rcutorture: Update rcutorture_one_extend_check() for lazy preemption
  osnoise: provide quiescent states
  rcu: Use _full() API to debug synchronize_rcu()
  rcu: Update TREE05.boot to test normal synchronize_rcu()
  rcutorture: Allow a negative value for nfakewriters
  Flush console log from kernel_power_off()
  context_tracking: Make RCU watch ct_kernel_exit_state() warning
  rcu/nocb: Print segment lengths in show_rcu_nocb_gp_state()
  rcu-tasks: Move RCU Tasks self-tests to core_initcall()
  rcu: Fix get_state_synchronize_rcu_full() GP-start detection
  torture: Make SRCU lockdep testing use srcu_read_lock_nmisafe()
  srcu: Add FORCE_NEED_SRCU_NMI_SAFE Kconfig for testing
  rcutorture: Complain when invalid SRCU reader_flavor is specified
  rcutorture: Move RCU_TORTURE_TEST_{CHK_RDR_STATE,LOG_CPU} to bool
  rcutorture: Make cur_ops->format_gp_seqs take buffer length
  rcutorture: Add ftrace-compatible timestamp to GP# failure/close-call output
  ...
2025-03-24 19:41:37 -07:00
Linus Torvalds 418becac37 nolibc changes for 6.15
Changes
 -------
 
 * 32bit s390 support
 * opendir() and friends
 * openat() support
 * sscanf() support
 * various cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTg4lxklFHAidmUs57B+h1jyw5bOAUCZ8w24QAKCRDB+h1jyw5b
 OIRTAQD1KvUV0qYOKU3xqOiUxDe84DyFv4mgu1iZoqmd0po/CQEAs8HU4XWp4Ls0
 tPkcLqlhiVPgwhtipjklW8uy6JcFzAI=
 =+Zma
 -----END PGP SIGNATURE-----

Merge tag 'nolibc-20250308-for-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu

Pull nolibc updates from Paul McKenney:
 - 32bit s390 support
 - opendir() and friends
 - openat() support
 - sscanf() support
 - various cleanups

[ Paul has just forwarded the pull request from Thomas Weißschuh, so
  the tag signature is from Thomas, not Paul   - Linus ]

* tag 'nolibc-20250308-for-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (26 commits)
  tools/nolibc: don't use asm/ UAPI headers
  selftests/nolibc: stop testing constructor order
  selftests/nolibc: use O_RDONLY flag instead of 0
  tools/nolibc: drop outdated example from overview comment
  tools/nolibc: process open() vararg as mode_t
  tools/nolibc: always use openat(2) instead of open(2)
  tools/nolibc: add support for openat(2)
  selftests/nolibc: add armthumb configuration
  selftests/nolibc: explicitly enable ARM mode
  Revert "selftests: kselftest: Fix build failure with NOLIBC"
  tools/nolibc: add support for [v]sscanf()
  tools/nolibc: add support for 32-bit s390
  selftests/nolibc: rename s390 to s390x
  selftests/nolibc: only run constructor tests on nolibc
  selftests/nolibc: split up architecture list in run-tests.sh
  tools/nolibc: add support for directory access
  tools/nolibc: add support for sys_llseek()
  selftests/nolibc: always keep test kernel configuration up to date
  selftests/nolibc: execute defconfig before other targets
  selftests/nolibc: drop call to mrproper target
  ...
2025-03-24 17:59:29 -07:00
Linus Torvalds bcb044256d sched_ext: Changes for v6.15
- Add mechanism to count and report internal events. This significantly
   improves visibility on subtle corner conditions.
 
 - The default idle CPU selection logic is revamped and improved in multiple
   ways including being made topology aware.
 
 - sched_ext was disabling ttwu_queue for simplicity, which can be costly
   when hardware topology is more complex. Implement
   SCX_OPS_ALLOWED_QUEUED_WAKEUP so that BPF schedulers can selectively
   enable ttwu_queue.
 
 - tools/sched_ext updates to improve compatibility among others.
 
 - Other misc updates and fixes.
 
 - sched_ext/for-6.14-fixes were pulled a few times to receive prerequisite
   fixes and resolve conflicts.
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZ999Sg4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGf/KAQCoMTVOBpQT9gCaCKDOmrVJTwi6boEoV5WnGZzw
 PDr0vwEAq36iz4no6Y5THcN/DCx+52IiS0zuhPy3rBZVo11TMgU=
 =iQ+A
 -----END PGP SIGNATURE-----

Merge tag 'sched_ext-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext

Pull sched_ext updates from Tejun Heo:

 - Add mechanism to count and report internal events. This significantly
   improves visibility on subtle corner conditions.

 - The default idle CPU selection logic is revamped and improved in
   multiple ways including being made topology aware.

 - sched_ext was disabling ttwu_queue for simplicity, which can be
   costly when hardware topology is more complex. Implement
   SCX_OPS_ALLOWED_QUEUED_WAKEUP so that BPF schedulers can selectively
   enable ttwu_queue.

 - tools/sched_ext updates to improve compatibility among others.

 - Other misc updates and fixes.

 - sched_ext/for-6.14-fixes were pulled a few times to receive
   prerequisite fixes and resolve conflicts.

* tag 'sched_ext-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: (42 commits)
  sched_ext: idle: Refactor scx_select_cpu_dfl()
  sched_ext: idle: Honor idle flags in the built-in idle selection policy
  sched_ext: Skip per-CPU tasks in scx_bpf_reenqueue_local()
  sched_ext: Add trace point to track sched_ext core events
  sched_ext: Change the event type from u64 to s64
  sched_ext: Documentation: add task lifecycle summary
  tools/sched_ext: Provide a compatible helper for scx_bpf_events()
  selftests/sched_ext: Add NUMA-aware scheduler test
  tools/sched_ext: Provide consistent access to scx flags
  sched_ext: idle: Fix scx_bpf_pick_any_cpu_node() behavior
  sched_ext: idle: Introduce scx_bpf_nr_node_ids()
  sched_ext: idle: Introduce node-aware idle cpu kfunc helpers
  sched_ext: idle: Per-node idle cpumasks
  sched_ext: idle: Introduce SCX_OPS_BUILTIN_IDLE_PER_NODE
  sched_ext: idle: Make idle static keys private
  sched/topology: Introduce for_each_node_numadist() iterator
  mm/numa: Introduce nearest_node_nodemask()
  nodemask: numa: reorganize inclusion path
  nodemask: add nodes_copy()
  tools/sched_ext: Sync with scx repo
  ...
2025-03-24 17:23:48 -07:00
Linus Torvalds 11c2b2e332 seccomp updates for v6.15-rc1
- avoid the lock trip seccomp_filter_release in common case (Mateusz Guzik)
 
 - remove unused 'sd' argument through-out (Oleg Nesterov)
 
 - selftests/seccomp: Add hard-coded __NR_uretprobe for x86_64
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRSPkdeREjth1dHnSE2KwveOeQkuwUCZ9hJKgAKCRA2KwveOeQk
 u3UDAQCUPSJ+iNGEsh9ErHJZWmg+LQJ999fOVrWWz+onjXFoQgD/cEHyLtSnz1Oa
 RbdGEqkBkRTXgOdkcIW5pJ4lBtfIpQQ=
 =z/8H
 -----END PGP SIGNATURE-----

Merge tag 'seccomp-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull seccomp updates from Kees Cook:

 - avoid the lock trip seccomp_filter_release in common case (Mateusz
   Guzik)

 - remove unused 'sd' argument through-out (Oleg Nesterov)

 - selftests/seccomp: Add hard-coded __NR_uretprobe for x86_64

* tag 'seccomp-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  seccomp: avoid the lock trip seccomp_filter_release in common case
  seccomp: remove the 'sd' argument from __seccomp_filter()
  seccomp: remove the 'sd' argument from __secure_computing()
  seccomp: fix the __secure_computing() stub for !HAVE_ARCH_SECCOMP_FILTER
  seccomp/mips: change syscall_trace_enter() to use secure_computing()
  selftests/seccomp: Add hard-coded __NR_uretprobe for x86_64
2025-03-24 15:34:38 -07:00
Linus Torvalds 06961fbbbd move-lib-kunit for v6.15-rc1
- move lib/ selftests into lib/tests/ (Kees Cook, Gabriela Bittencourt,
   Luis Felipe Hernandez, Lukas Bulwahn, Tamir Duberstein)
 
 - lib/math: Add int_log test suite (Bruno Sobreira França)
 
 - lib/math: Add Kunit test suite for gcd() (Yu-Chun Lin)
 
 - lib/tests/kfifo_kunit.c: add tests for the kfifo structure (Diego Vieira)
 
 - unicode: refactor selftests into KUnit (Gabriela Bittencourt)
 
 - lib/prime_numbers: convert self-test to KUnit (Tamir Duberstein)
 
 - printf: convert self-test to KUnit (Tamir Duberstein)
 
 - scanf: convert self-test to KUnit (Tamir Duberstein)
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRSPkdeREjth1dHnSE2KwveOeQkuwUCZ9hCvgAKCRA2KwveOeQk
 u1nzAP9F/vQZUPkU9ADuqdcbyyEXlTzNk8R5rC2e1w+uKzJx+QD9EAbeCv9ZLdC0
 KQF0pWVYCJtiSMEhkiMS/bMmpRCgwQ8=
 =VYZG
 -----END PGP SIGNATURE-----

Merge tag 'move-lib-kunit-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull lib kunit selftest move from Kees Cook:
 "This is a one-off tree to coordinate the move of selftests out of lib/
  and into lib/tests/. A separate tree was used for this to keep the
  paths sane with all the work in the same place.

   - move lib/ selftests into lib/tests/ (Kees Cook, Gabriela
     Bittencourt, Luis Felipe Hernandez, Lukas Bulwahn, Tamir
     Duberstein)

   - lib/math: Add int_log test suite (Bruno Sobreira França)

   - lib/math: Add Kunit test suite for gcd() (Yu-Chun Lin)

   - lib/tests/kfifo_kunit.c: add tests for the kfifo structure (Diego
     Vieira)

   - unicode: refactor selftests into KUnit (Gabriela Bittencourt)

   - lib/prime_numbers: convert self-test to KUnit (Tamir Duberstein)

   - printf: convert self-test to KUnit (Tamir Duberstein)

   - scanf: convert self-test to KUnit (Tamir Duberstein)"

* tag 'move-lib-kunit-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (21 commits)
  scanf: break kunit into test cases
  scanf: convert self-test to KUnit
  scanf: remove redundant debug logs
  scanf: implicate test line in failure messages
  printf: implicate test line in failure messages
  printf: break kunit into test cases
  printf: convert self-test to KUnit
  kunit/fortify: Replace "volatile" with OPTIMIZER_HIDE_VAR()
  kunit/fortify: Expand testing of __compiletime_strlen()
  kunit/stackinit: Use fill byte different from Clang i386 pattern
  kunit/overflow: Fix DEFINE_FLEX tests for counted_by
  selftests: remove reference to prime_numbers.sh
  MAINTAINERS: adjust entries in FORTIFY_SOURCE and KERNEL HARDENING
  lib/prime_numbers: convert self-test to KUnit
  lib/math: Add Kunit test suite for gcd()
  unicode: kunit: change tests filename and path
  unicode: kunit: refactor selftest to kunit tests
  lib/tests/kfifo_kunit.c: add tests for the kfifo structure
  lib: Move KUnit tests into tests/ subdirectory
  lib/math: Add int_log test suite
  ...
2025-03-24 15:15:11 -07:00
Amit Cohen 36ed81bcad selftests: vxlan_bridge: Test flood with unresolved FDB entry
Extend flood test to configure FDB entry with unresolved destination IP,
check that packets are not sent twice.

Without the previous patch which handles such scenario in mlxsw, the
tests fail:

$ TESTS='test_flood' ./vxlan_bridge_1d.sh
Running tests with UDP port 4789
TEST: VXLAN: flood                                                  [ OK ]
TEST: VXLAN: flood, unresolved FDB entry                            [FAIL]
        vx2 ns2: Expected to capture 10 packets, got 20.

$ TESTS='test_flood' ./vxlan_bridge_1q.sh
INFO: Running tests with UDP port 4789
TEST: VXLAN: flood vlan 10                                          [ OK ]
TEST: VXLAN: flood vlan 20                                          [ OK ]
TEST: VXLAN: flood vlan 10, unresolved FDB entry                    [FAIL]
        vx10 ns2: Expected to capture 10 packets, got 20.
TEST: VXLAN: flood vlan 20, unresolved FDB entry                    [FAIL]
        vx20 ns2: Expected to capture 10 packets, got 20.

With the previous patch, the tests pass.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/7bc96e317531f3bf06319fb2ea447bd8666f29fa.1742224300.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24 15:09:31 -07:00
Gal Pressman c61209eeb0 selftests: drv-net: rss_ctx: Don't assume indirection table is present
The test_rss_context_dump() test assumes the indirection table is always
supported, which is not true for all drivers, e.g., virtio_net when
VIRTIO_NET_F_RSS is disabled.

Skip the check if 'indir' is not present.

Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250318112426.386651-1-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24 12:22:37 -07:00
Peter Seiderer 3099f9e156 selftest: net: update proc_net_pktgen (add more imix_weights test cases)
Add more imix_weights test cases (for incomplete input).

Signed-off-by: Peter Seiderer <ps.report@gmx.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250317090401.1240704-2-ps.report@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24 12:01:38 -07:00
Linus Torvalds 130e696aa6 vfs-6.15-rc1.mount.namespace
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ90r2wAKCRCRxhvAZXjc
 ouC6AQCk3MoqskN0WeNcaZT23dB7dHbEhf/7YXOFC9MFRMKXqQD9Fbn95+GuIe3U
 nBVPbVyQfDtfXE08ml6gbDJrCsbkkQI=
 =Xm1C
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.15-rc1.mount.namespace' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs mount namespace updates from Christian Brauner:
 "This expands the ability of anonymous mount namespaces:

   - Creating detached mounts from detached mounts

     Currently, detached mounts can only be created from attached
     mounts. This limitaton prevents various use-cases. For example, the
     ability to mount a subdirectory without ever having to make the
     whole filesystem visible first.

     The current permission modelis:

      (1) Check that the caller is privileged over the owning user
          namespace of it's current mount namespace.

      (2) Check that the caller is located in the mount namespace of the
          mount it wants to create a detached copy of.

     While it is not strictly necessary to do it this way it is
     consistently applied in the new mount api. This model will also be
     used when allowing the creation of detached mount from another
     detached mount.

     The (1) requirement can simply be met by performing the same check
     as for the non-detached case, i.e., verify that the caller is
     privileged over its current mount namespace.

     To meet the (2) requirement it must be possible to infer the origin
     mount namespace that the anonymous mount namespace of the detached
     mount was created from.

     The origin mount namespace of an anonymous mount is the mount
     namespace that the mounts that were copied into the anonymous mount
     namespace originate from.

     In order to check the origin mount namespace of an anonymous mount
     namespace the sequence number of the original mount namespace is
     recorded in the anonymous mount namespace.

     With this in place it is possible to perform an equivalent check
     (2') to (2). The origin mount namespace of the anonymous mount
     namespace must be the same as the caller's mount namespace. To
     establish this the sequence number of the caller's mount namespace
     and the origin sequence number of the anonymous mount namespace are
     compared.

     The caller is always located in a non-anonymous mount namespace
     since anonymous mount namespaces cannot be setns()ed into. The
     caller's mount namespace will thus always have a valid sequence
     number.

     The owning namespace of any mount namespace, anonymous or
     non-anonymous, can never change. A mount attached to a
     non-anonymous mount namespace can never change mount namespace.

     If the sequence number of the non-anonymous mount namespace and the
     origin sequence number of the anonymous mount namespace match, the
     owning namespaces must match as well.

     Hence, the capability check on the owning namespace of the caller's
     mount namespace ensures that the caller has the ability to copy the
     mount tree.

   - Allow mount detached mounts on detached mounts

     Currently, detached mounts can only be mounted onto attached
     mounts. This limitation makes it impossible to assemble a new
     private rootfs and move it into place. Instead, a detached tree
     must be created, attached, then mounted open and then either moved
     or detached again. Lift this restriction.

     In order to allow mounting detached mounts onto other detached
     mounts the same permission model used for creating detached mounts
     from detached mounts can be used (cf. above).

     Allowing to mount detached mounts onto detached mounts leaves three
     cases to consider:

      (1) The source mount is an attached mount and the target mount is
          a detached mount. This would be equivalent to moving a mount
          between different mount namespaces. A caller could move an
          attached mount to a detached mount. The detached mount can now
          be freely attached to any mount namespace. This changes the
          current delegatioh model significantly for no good reason. So
          this will fail.

      (2) Anonymous mount namespaces are always attached fully, i.e., it
          is not possible to only attach a subtree of an anoymous mount
          namespace. This simplifies the implementation and reasoning.

          Consequently, if the anonymous mount namespace of the source
          detached mount and the target detached mount are the identical
          the mount request will fail.

      (3) The source mount's anonymous mount namespace is different from
          the target mount's anonymous mount namespace.

          In this case the source anonymous mount namespace of the
          source mount tree must be freed after its mounts have been
          moved to the target anonymous mount namespace. The source
          anonymous mount namespace must be empty afterwards.

     By allowing to mount detached mounts onto detached mounts a caller
     may do the following:

       fd_tree1 = open_tree(-EBADF, "/mnt", OPEN_TREE_CLONE)
       fd_tree2 = open_tree(-EBADF, "/tmp", OPEN_TREE_CLONE)

     fd_tree1 and fd_tree2 refer to two different detached mount trees
     that belong to two different anonymous mount namespace.

     It is important to note that fd_tree1 and fd_tree2 both refer to
     the root of their respective anonymous mount namespaces.

     By allowing to mount detached mounts onto detached mounts the
     caller may now do:

         move_mount(fd_tree1, "", fd_tree2, "",
                    MOVE_MOUNT_F_EMPTY_PATH | MOVE_MOUNT_T_EMPTY_PATH)

     This will cause the detached mount referred to by fd_tree1 to be
     mounted on top of the detached mount referred to by fd_tree2.

     Thus, the detached mount fd_tree1 is moved from its separate
     anonymous mount namespace into fd_tree2's anonymous mount
     namespace.

     It also means that while fd_tree2 continues to refer to the root of
     its respective anonymous mount namespace fd_tree1 doesn't anymore.

     This has the consequence that only fd_tree2 can be moved to another
     anonymous or non-anonymous mount namespace. Moving fd_tree1 will
     now fail as fd_tree1 doesn't refer to the root of an anoymous mount
     namespace anymore.

     Now fd_tree1 and fd_tree2 refer to separate detached mount trees
     referring to the same anonymous mount namespace.

     This is conceptually fine. The new mount api does allow for this to
     happen already via:

       mount -t tmpfs tmpfs /mnt
       mkdir -p /mnt/A
       mount -t tmpfs tmpfs /mnt/A

       fd_tree3 = open_tree(-EBADF, "/mnt", OPEN_TREE_CLONE | AT_RECURSIVE)
       fd_tree4 = open_tree(-EBADF, "/mnt/A", 0)

     Both fd_tree3 and fd_tree4 refer to two different detached mount
     trees but both detached mount trees refer to the same anonymous
     mount namespace. An as with fd_tree1 and fd_tree2, only fd_tree3
     may be moved another mount namespace as fd_tree3 refers to the root
     of the anonymous mount namespace just while fd_tree4 doesn't.

     However, there's an important difference between the
     fd_tree3/fd_tree4 and the fd_tree1/fd_tree2 example.

     Closing fd_tree4 and releasing the respective struct file will have
     no further effect on fd_tree3's detached mount tree.

     However, closing fd_tree3 will cause the mount tree and the
     respective anonymous mount namespace to be destroyed causing the
     detached mount tree of fd_tree4 to be invalid for further mounting.

     By allowing to mount detached mounts on detached mounts as in the
     fd_tree1/fd_tree2 example both struct files will affect each other.

     Both fd_tree1 and fd_tree2 refer to struct files that have
     FMODE_NEED_UNMOUNT set.

     To handle this we use the fact that @fd_tree1 will have a parent
     mount once it has been attached to @fd_tree2.

     When dissolve_on_fput() is called the mount that has been passed in
     will refer to the root of the anonymous mount namespace. If it
     doesn't it would mean that mounts are leaked. So before allowing to
     mount detached mounts onto detached mounts this would be a bug.

     Now that detached mounts can be mounted onto detached mounts it
     just means that the mount has been attached to another anonymous
     mount namespace and thus dissolve_on_fput() must not unmount the
     mount tree or free the anonymous mount namespace as the file
     referring to the root of the namespace hasn't been closed yet.

     If it had been closed yet it would be obvious because the mount
     namespace would be NULL, i.e., the @fd_tree1 would have already
     been unmounted. If @fd_tree1 hasn't been unmounted yet and has a
     parent mount it is safe to skip any cleanup as closing @fd_tree2
     will take care of all cleanup operations.

   - Allow mount propagation for detached mount trees

     In commit ee2e3f5062 ("mount: fix mounting of detached mounts
     onto targets that reside on shared mounts") I fixed a bug where
     propagating the source mount tree of an anonymous mount namespace
     into a target mount tree of a non-anonymous mount namespace could
     be used to trigger an integer overflow in the non-anonymous mount
     namespace causing any new mounts to fail.

     The cause of this was that the propagation algorithm was unable to
     recognize mounts from the source mount tree that were already
     propagated into the target mount tree and then reappeared as
     propagation targets when walking the destination propagation mount
     tree.

     When fixing this I disabled mount propagation into anonymous mount
     namespaces. Make it possible for anonymous mount namespace to
     receive mount propagation events correctly. This is now also a
     correctness issue now that we allow mounting detached mount trees
     onto detached mount trees.

     Mark the source anonymous mount namespace with MNTNS_PROPAGATING
     indicating that all mounts belonging to this mount namespace are
     currently in the process of being propagated and make the
     propagation algorithm discard those if they appear as propagation
     targets"

* tag 'vfs-6.15-rc1.mount.namespace' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (21 commits)
  selftests: test subdirectory mounting
  selftests: add test for detached mount tree propagation
  fs: namespace: fix uninitialized variable use
  mount: handle mount propagation for detached mount trees
  fs: allow creating detached mounts from fsmount() file descriptors
  selftests: seventh test for mounting detached mounts onto detached mounts
  selftests: sixth test for mounting detached mounts onto detached mounts
  selftests: fifth test for mounting detached mounts onto detached mounts
  selftests: fourth test for mounting detached mounts onto detached mounts
  selftests: third test for mounting detached mounts onto detached mounts
  selftests: second test for mounting detached mounts onto detached mounts
  selftests: first test for mounting detached mounts onto detached mounts
  fs: mount detached mounts onto detached mounts
  fs: support getname_maybe_null() in move_mount()
  selftests: create detached mounts from detached mounts
  fs: create detached mounts from detached mounts
  fs: add may_copy_tree()
  fs: add fastpath for dissolve_on_fput()
  fs: add assert for move_mount()
  fs: add mnt_ns_empty() helper
  ...
2025-03-24 11:41:41 -07:00
Linus Torvalds 74adf9e353 vfs-6.15-rc1.nsfs
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ90rXwAKCRCRxhvAZXjc
 ogrYAP4kWLzxD2IbBGSs5kBkKdc9qNGMtjrOn5InHm263vTpPwD/VYcOmyc3gScO
 e8hTBES3mYlzBpselh99HnGx5geMtAE=
 =+I5+
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.15-rc1.nsfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs nsfs updates from Christian Brauner:
 "This contains non-urgent fixes for nsfs to validate ioctls before
  performing any relevant operations.

  We alredy did this for a few other filesystems last cycle"

* tag 'vfs-6.15-rc1.nsfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  selftests/nsfs: add ioctl validation tests
  nsfs: validate ioctls
2025-03-24 11:38:12 -07:00
Linus Torvalds 804382d59b vfs-6.15-rc1.overlayfs
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ90rUAAKCRCRxhvAZXjc
 opI3AP9ws4S/JXOjxNKoTYmNM2nZ8+r1v8tUxbLIiqdvzx9PygD/V1ZjXtn6lwZr
 OK8d5Y8UnlPZTlBF8D61op3AjnXYzws=
 =KV4p
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.15-rc1.overlayfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs overlayfs updates from Christian Brauner:
 "Currently overlayfs uses the mounter's credentials for its
  override_creds() calls. That provides a consistent permission model.

  This patches allows a caller to instruct overlayfs to use its
  credentials instead. The caller must be located in the same user
  namespace hierarchy as the user namespace the overlayfs instance will
  be mounted in. This provides a consistent and simple security model.

  With this it is possible to e.g., mount an overlayfs instance where
  the mounter must have CAP_SYS_ADMIN but the credentials used for
  override_creds() have dropped CAP_SYS_ADMIN. It also allows the usage
  of custom fs{g,u}id different from the callers and other tweaks"

* tag 'vfs-6.15-rc1.overlayfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  selftests/ovl: add third selftest for "override_creds"
  selftests/ovl: add second selftest for "override_creds"
  selftests/filesystems: add utils.{c,h}
  selftests/ovl: add first selftest for "override_creds"
  ovl: allow to specify override credentials
2025-03-24 10:37:40 -07:00
Linus Torvalds df00ded23a vfs-6.15-rc1.pidfs
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ90pqgAKCRCRxhvAZXjc
 oqVsAP9Aq/fMCI14HeXehPCezKQZPu1HTrPPo2clLHXoSnafawEAsA3YfWTT4Heb
 iexzqvAEUOMYOVN66QEc+6AAwtMLrwc=
 =0eYo
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.15-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs pidfs updates from Christian Brauner:

 - Allow retrieving exit information after a process has been reaped
   through pidfds via the new PIDFD_INTO_EXIT extension for the
   PIDFD_GET_INFO ioctl. Various tools need access to information about
   a process/task even after it has already been reaped.

   Pidfd polling allows waiting on either task exit or for a task to
   have been reaped. The contract for PIDFD_INFO_EXIT is simply that
   EPOLLHUP must be observed before exit information can be retrieved,
   i.e., exit information is only provided once the task has been reaped
   and then can be retrieved as long as the pidfd is open.

 - Add PIDFD_SELF_{THREAD,THREAD_GROUP} sentinels allowing userspace to
   forgo allocating a file descriptor for their own process. This is
   useful in scenarios where users want to act on their own process
   through pidfds and is akin to AT_FDCWD.

 - Improve premature thread-group leader and subthread exec behavior
   when polling on pidfds:

   (1) During a multi-threaded exec by a subthread, i.e.,
       non-thread-group leader thread, all other threads in the
       thread-group including the thread-group leader are killed and the
       struct pid of the thread-group leader will be taken over by the
       subthread that called exec. IOW, two tasks change their TIDs.

   (2) A premature thread-group leader exit means that the thread-group
       leader exited before all of the other subthreads in the
       thread-group have exited.

   Both cases lead to inconsistencies for pidfd polling with
   PIDFD_THREAD. Any caller that holds a PIDFD_THREAD pidfd to the
   current thread-group leader may or may not see an exit notification
   on the file descriptor depending on when poll is performed. If the
   poll is performed before the exec of the subthread has concluded an
   exit notification is generated for the old thread-group leader. If
   the poll is performed after the exec of the subthread has concluded
   no exit notification is generated for the old thread-group leader.

   The correct behavior is to simply not generate an exit notification
   on the struct pid of a subhthread exec because the struct pid is
   taken over by the subthread and thus remains alive.

   But this is difficult to handle because a thread-group may exit
   premature as mentioned in (2). In that case an exit notification is
   reliably generated but the subthreads may continue to run for an
   indeterminate amount of time and thus also may exec at some point.

   After this pull no exit notifications will be generated for a
   PIDFD_THREAD pidfd for a thread-group leader until all subthreads
   have been reaped. If a subthread should exec before no exit
   notification will be generated until that task exits or it creates
   subthreads and repeates the cycle.

   This means an exit notification indicates the ability for the father
   to reap the child.

* tag 'vfs-6.15-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (25 commits)
  selftests/pidfd: third test for multi-threaded exec polling
  selftests/pidfd: second test for multi-threaded exec polling
  selftests/pidfd: first test for multi-threaded exec polling
  pidfs: improve multi-threaded exec and premature thread-group leader exit polling
  pidfs: ensure that PIDFS_INFO_EXIT is available
  selftests/pidfd: add seventh PIDFD_INFO_EXIT selftest
  selftests/pidfd: add sixth PIDFD_INFO_EXIT selftest
  selftests/pidfd: add fifth PIDFD_INFO_EXIT selftest
  selftests/pidfd: add fourth PIDFD_INFO_EXIT selftest
  selftests/pidfd: add third PIDFD_INFO_EXIT selftest
  selftests/pidfd: add second PIDFD_INFO_EXIT selftest
  selftests/pidfd: add first PIDFD_INFO_EXIT selftest
  selftests/pidfd: expand common pidfd header
  pidfs/selftests: ensure correct headers for ioctl handling
  selftests/pidfd: fix header inclusion
  pidfs: allow to retrieve exit information
  pidfs: record exit code and cgroupid at exit
  pidfs: use private inode slab cache
  pidfs: move setting flags into pidfs_alloc_file()
  pidfd: rely on automatic cleanup in __pidfd_prepare()
  ...
2025-03-24 10:16:37 -07:00
Linus Torvalds fd101da676 vfs-6.15-rc1.mount
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ90qAwAKCRCRxhvAZXjc
 on7lAP0akpIsJMWREg9tLwTNTySI1b82uKec0EAgM6T7n/PYhAD/T4zoY8UYU0Pr
 qCxwTXHUVT6bkNhjREBkfqq9OkPP8w8=
 =GxeN
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.15-rc1.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs mount updates from Christian Brauner:

 - Mount notifications

   The day has come where we finally provide a new api to listen for
   mount topology changes outside of /proc/<pid>/mountinfo. A mount
   namespace file descriptor can be supplied and registered with
   fanotify to listen for mount topology changes.

   Currently notifications for mount, umount and moving mounts are
   generated. The generated notification record contains the unique
   mount id of the mount.

   The listmount() and statmount() api can be used to query detailed
   information about the mount using the received unique mount id.

   This allows userspace to figure out exactly how the mount topology
   changed without having to generating diffs of /proc/<pid>/mountinfo
   in userspace.

 - Support O_PATH file descriptors with FSCONFIG_SET_FD in the new mount
   api

 - Support detached mounts in overlayfs

   Since last cycle we support specifying overlayfs layers via file
   descriptors. However, we don't allow detached mounts which means
   userspace cannot user file descriptors received via
   open_tree(OPEN_TREE_CLONE) and fsmount() directly. They have to
   attach them to a mount namespace via move_mount() first.

   This is cumbersome and means they have to undo mounts via umount().
   Allow them to directly use detached mounts.

 - Allow to retrieve idmappings with statmount

   Currently it isn't possible to figure out what idmapping has been
   attached to an idmapped mount. Add an extension to statmount() which
   allows to read the idmapping from the mount.

 - Allow creating idmapped mounts from mounts that are already idmapped

   So far it isn't possible to allow the creation of idmapped mounts
   from already idmapped mounts as this has significant lifetime
   implications. Make the creation of idmapped mounts atomic by allow to
   pass struct mount_attr together with the open_tree_attr() system call
   allowing to solve these issues without complicating VFS lookup in any
   way.

   The system call has in general the benefit that creating a detached
   mount and applying mount attributes to it becomes an atomic operation
   for userspace.

 - Add a way to query statmount() for supported options

   Allow userspace to query which mount information can be retrieved
   through statmount().

 - Allow superblock owners to force unmount

* tag 'vfs-6.15-rc1.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (21 commits)
  umount: Allow superblock owners to force umount
  selftests: add tests for mount notification
  selinux: add FILE__WATCH_MOUNTNS
  samples/vfs: fix printf format string for size_t
  fs: allow changing idmappings
  fs: add kflags member to struct mount_kattr
  fs: add open_tree_attr()
  fs: add copy_mount_setattr() helper
  fs: add vfs_open_tree() helper
  statmount: add a new supported_mask field
  samples/vfs: add STATMOUNT_MNT_{G,U}IDMAP
  selftests: add tests for using detached mount with overlayfs
  samples/vfs: check whether flag was raised
  statmount: allow to retrieve idmappings
  uidgid: add map_id_range_up()
  fs: allow detached mounts in clone_private_mount()
  selftests/overlayfs: test specifying layers as O_PATH file descriptors
  fs: support O_PATH fds with FSCONFIG_SET_FD
  vfs: add notifications for mount attach and detach
  fanotify: notify on mount attach and detach
  ...
2025-03-24 09:34:10 -07:00
Ming Lei 0f3ebf2d4b selftests: ublk: add stripe target
Add ublk stripe target which can take 1~4 underlying backing files
or block device, with stripe size 4k ~ 512K.

Add two basic tests(write verify & mkfs/mount/umount) over ublk/stripe.

This target is helpful to cover multiple IOs aiming at same
fixed/registered IO kernel buffer.

It is also capable of verifying vectored registered (kernel)buffers
in future for zero copy, so far it isn't supported yet.

Todo: support vectored registered kernel buffer for ublk/zc.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250322093218.431419-9-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-22 08:35:08 -06:00
Ming Lei 263846eb43 selftests: ublk: simplify loop io completion
Use the added target io handling helpers for simplifying loop io
completion.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250322093218.431419-8-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-22 08:35:08 -06:00
Ming Lei 8cb9b971e2 selftests: ublk: enable zero copy for null target
Enable zero copy for null target so that we can evaluate performance
from zero copy or not.

Also this should be the simplest ublk zero copy implementation, which
can be served as zc example.

Add test for covering 'add -t null -z'.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250322093218.431419-7-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-22 08:35:08 -06:00
Ming Lei 8842b72a82 selftests: ublk: prepare for supporting stripe target
- pass 'truct dev_ctx *ctx' to target init function

- add 'private_data' to 'struct ublk_dev' for storing target specific data

- add 'private_data' to 'struct ublk_io' for storing per-IO data

- add 'tgt_ios' to 'struct ublk_io' for counting how many io_uring ios
for handling the current io command

- add helper ublk_get_io() for supporting stripe target

- add two helpers for simplifying target io handling

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250322093218.431419-6-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-22 08:35:08 -06:00
Ming Lei 10d962dae2 selftests: ublk: move common code into common.c
Move two functions for initializing & de-initializing backing file
into common.c.

Also move one common helper into kublk.h.

Prepare for supporting ublk-stripe.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250322093218.431419-5-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-22 08:35:08 -06:00
Ming Lei 9413c0ca8e selftests: ublk: increase max buffer size to 1MB
Increase max buffer size to 1MB, and 64KB is too small to evaluate
performance with builtin ublk server implementation.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250322093218.431419-4-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-22 08:35:08 -06:00
Ming Lei f2639ed11e selftests: ublk: add single sqe allocator helper
Unify the sqe allocator helper, and we will use it for supporting
more cases, such as ublk stripe, in which variable sqe allocation
is required.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250322093218.431419-3-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-22 08:35:08 -06:00
Ming Lei 723977cab4 selftests: ublk: add generic_01 for verifying sequential IO order
block layer, ublk and io_uring might re-order IO in the past

- plug

- queue ublk io command via task work

Add one test for verifying if sequential WRITE IO is dispatched in order.

- null target is taken, so we can just observe io order from
`tracepoint:block:block_rq_complete` which represents the dispatch order

- WRITE IO is taken because READ may come from system-wide utility

Cc: Uday Shankar <ushankar@purestorage.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250322093218.431419-2-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-22 08:35:08 -06:00
Kohei Enju 5f3077d7fc selftests/bpf: Add selftests for load-acquire/store-release when register number is invalid
syzbot reported out-of-bounds read in check_atomic_load/store() when the
register number is invalid in this context:
    https://syzkaller.appspot.com/bug?extid=a5964227adc0f904549c

To avoid the issue from now on, let's add tests where the register number
is invalid for load-acquire/store-release.

After discussion with Eduard, I decided to use R15 as invalid register
because the actual slab-out-of-bounds read issue occurs when the register
number is R12 or larger.

Signed-off-by: Kohei Enju <enjuk@amazon.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250322045340.18010-6-enjuk@amazon.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-22 06:19:09 -07:00
Kohei Enju c03bb2fa32 bpf: Fix out-of-bounds read in check_atomic_load/store()
syzbot reported the following splat [0].

In check_atomic_load/store(), register validity is not checked before
atomic_ptr_type_ok(). This causes the out-of-bounds read in is_ctx_reg()
called from atomic_ptr_type_ok() when the register number is MAX_BPF_REG
or greater.

Call check_load_mem()/check_store_reg() before atomic_ptr_type_ok()
to avoid the OOB read.

However, some tests introduced by commit ff3afe5da9 ("selftests/bpf: Add
selftests for load-acquire and store-release instructions") assume
calling atomic_ptr_type_ok() before checking register validity.
Therefore the swapping of order unintentionally changes verifier messages
of these tests.

For example in the test load_acquire_from_pkt_pointer(), expected message
is 'BPF_ATOMIC loads from R2 pkt is not allowed' although actual messages
are different.

  validate_msgs:FAIL:754 expect_msg
  VERIFIER LOG:
  =============
  Global function load_acquire_from_pkt_pointer() doesn't return scalar. Only those are supported.
  0: R1=ctx() R10=fp0
  ; asm volatile ( @ verifier_load_acquire.c:140
  0: (61) r2 = *(u32 *)(r1 +0)          ; R1=ctx() R2_w=pkt(r=0)
  1: (d3) r0 = load_acquire((u8 *)(r2 +0))
  invalid access to packet, off=0 size=1, R2(id=0,off=0,r=0)
  R2 offset is outside of the packet
  processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
  =============
  EXPECTED   SUBSTR: 'BPF_ATOMIC loads from R2 pkt is not allowed'
  #505/19  verifier_load_acquire/load-acquire from pkt pointer:FAIL

This is because instructions in the test don't pass check_load_mem() and
therefore don't enter the atomic_ptr_type_ok() path.
In this case, we have to modify instructions so that they pass the
check_load_mem() and trigger atomic_ptr_type_ok().
Similarly for store-release tests, we need to modify instructions so that
they pass check_store_reg().

Like load_acquire_from_pkt_pointer(), modify instructions in:
  load_acquire_from_sock_pointer()
  store_release_to_ctx_pointer()
  store_release_to_pkt_pointer()

Also in store_release_to_sock_pointer(), check_store_reg() returns error
early and atomic_ptr_type_ok() is not triggered, since write to sock
pointer is not possible in general.
We might be able to remove the test, but for now let's leave it and just
change the expected message.

[0]
 BUG: KASAN: slab-out-of-bounds in is_ctx_reg kernel/bpf/verifier.c:6185 [inline]
 BUG: KASAN: slab-out-of-bounds in atomic_ptr_type_ok+0x3d7/0x550 kernel/bpf/verifier.c:6223
 Read of size 4 at addr ffff888141b0d690 by task syz-executor143/5842

 CPU: 1 UID: 0 PID: 5842 Comm: syz-executor143 Not tainted 6.14.0-rc3-syzkaller-gf28214603dc6 #0
 Call Trace:
  <TASK>
  __dump_stack lib/dump_stack.c:94 [inline]
  dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
  print_address_description mm/kasan/report.c:408 [inline]
  print_report+0x16e/0x5b0 mm/kasan/report.c:521
  kasan_report+0x143/0x180 mm/kasan/report.c:634
  is_ctx_reg kernel/bpf/verifier.c:6185 [inline]
  atomic_ptr_type_ok+0x3d7/0x550 kernel/bpf/verifier.c:6223
  check_atomic_store kernel/bpf/verifier.c:7804 [inline]
  check_atomic kernel/bpf/verifier.c:7841 [inline]
  do_check+0x89dd/0xedd0 kernel/bpf/verifier.c:19334
  do_check_common+0x1678/0x2080 kernel/bpf/verifier.c:22600
  do_check_main kernel/bpf/verifier.c:22691 [inline]
  bpf_check+0x165c8/0x1cca0 kernel/bpf/verifier.c:23821

Reported-by: syzbot+a5964227adc0f904549c@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=a5964227adc0f904549c
Tested-by: syzbot+a5964227adc0f904549c@syzkaller.appspotmail.com
Fixes: e24bbad29a8d ("bpf: Introduce load-acquire and store-release instructions")
Fixes: ff3afe5da9 ("selftests/bpf: Add selftests for load-acquire and store-release instructions")
Signed-off-by: Kohei Enju <enjuk@amazon.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250322045340.18010-5-enjuk@amazon.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-22 06:15:57 -07:00
Ryan Roberts a2c6f9c3ca selftests/mm: speed up split_huge_page_test
create_pagecache_thp_and_fd() was previously writing a file sized at twice
the PMD size by making a per-byte write syscall.  This was quite slow when
the PMD size is 4M, but completely intolerable for 32M (PMD size for
arm64's 16K page size), and 512M (PMD size for arm64's 64K page size).

The byte pattern has a 256 byte period, so let's create a 1K buffer and
fill it with exactly 4 periods.  Then we can write the buffer as many
times as is required to fill the file.  This makes things much more
tolerable.

The test now passes for 16K page size.  It still fails for 64K page size
because MAX_PAGECACHE_ORDER is too small for 512M folio size (I think).

Link: https://lkml.kernel.org/r/20250318174343.243631-3-ryan.roberts@arm.com
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Acked-by: Peter Xu <peterx@redhat.com>
Acked-by: Rafael Aquini <raquini@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-21 22:03:16 -07:00
Ryan Roberts 735b3f7e77 selftests/mm: uffd-unit-tests support for hugepages > 2M
uffd-unit-tests uses a memory area with a fixed 32M size.  Then it
calculates the number of pages by dividing by page_size, which itself is
either the base page size or the PMD huge page size depending on the test
config.  For the latter, we end up with nr_pages=1 for arm64 16K base
pages, and nr_pages=0 for 64K base pages.  This doesn't end well.

So let's make the 32M size a floor and also ensure that we have at least 2
pages given the PMD size.  With this change, the tests pass on arm64 64K
base page size configuration.

Link: https://lkml.kernel.org/r/20250318174343.243631-2-ryan.roberts@arm.com
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Acked-by: Peter Xu <peterx@redhat.com>
Acked-by: Rafael Aquini <raquini@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-21 22:03:15 -07:00
Brendan Jackman d8a866c766 selftests/mm: add commentary about 9pfs bugs
As discussed here:

https://lore.kernel.org/lkml/Z9RRkL1hom48z3Tt@google.com/

This code could benefit from some more commentary.

To avoid needing to comment the same thing in multiple places (I guess
more of these SKIPs will need to be added over time, for now I am only
like 20% of the way through Project Run run_vmtests.sh Successfully), add
a dummy "skip tests for this specific reason" function that basically just
serves as a hook to hang comments on.

Link: https://lkml.kernel.org/r/20250317-9pfs-comments-v1-1-9ac96043e146@google.com
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-21 22:03:14 -07:00
Ming Lei ffde32a49a selftests: ublk: fix starting ublk device
Firstly ublk char device node may not be created by udev yet, so wait
a while until it can be opened or timeout.

Secondly delete created ublk device in case of start failure, otherwise
the device becomes zombie.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250321135324.259677-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-21 14:09:24 -06:00
John Stultz e40d3709c0 selftests/timers: Improve skew_consistency by testing with other clockids
Lei Chen reported a bug with CLOCK_MONOTONIC_COARSE having inconsistencies
when NTP is adjusting the clock frequency.

This has gone seemingly undetected for ~15 years, illustrating a clear gap
in our testing.

The skew_consistency test is intended to catch this sort of problem, but
was focused on only evaluating CLOCK_MONOTONIC, and thus missed the problem
on CLOCK_MONOTONIC_COARSE.

So adjust the test to run with all clockids for 60 seconds each instead of
10 minutes with just CLOCK_MONOTONIC.

Reported-by: Lei Chen <lei.chen@smartx.com>
Signed-off-by: John Stultz <jstultz@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250320200306.1712599-2-jstultz@google.com
Closes: https://lore.kernel.org/lkml/20250310030004.3705801-1-lei.chen@smartx.com/
2025-03-21 19:16:18 +01:00
Breno Leitao 4b73dc83ed selftests: netconsole: Add tests for 'release' feature in sysdata
Expands the self-tests to include the 'release' feature in
sysdata.

Verifies that enabling the 'release' feature appends the
correct data and ensures that disabling it functions as expected.

When enabled, the message should have an item similar to in the
userdata: `release=$(uname -r)`

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250314-netcons_release-v1-5-07979c4b86af@debian.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-21 18:59:25 +01:00
Mickaël Salaün 15383a0d63
landlock: Add the errata interface
Some fixes may require user space to check if they are applied on the
running kernel before using a specific feature.  For instance, this
applies when a restriction was previously too restrictive and is now
getting relaxed (e.g. for compatibility reasons).  However, non-visible
changes for legitimate use (e.g. security fixes) do not require an
erratum.

Because fixes are backported down to a specific Landlock ABI, we need a
way to avoid cherry-pick conflicts.  The solution is to only update a
file related to the lower ABI impacted by this issue.  All the ABI files
are then used to create a bitmask of fixes.

The new errata interface is similar to the one used to get the supported
Landlock ABI version, but it returns a bitmask instead because the order
of fixes may not match the order of versions, and not all fixes may
apply to all versions.

The actual errata will come with dedicated commits.  The description is
not actually used in the code but serves as documentation.

Create the landlock_abi_version symbol and use its value to check errata
consistency.

Update test_base's create_ruleset_checks_ordering tests and add errata
tests.

This commit is backportable down to the first version of Landlock.

Fixes: 3532b0b435 ("landlock: Enable user space to infer supported features")
Cc: Günther Noack <gnoack@google.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250318161443.279194-3-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2025-03-21 12:12:19 +01:00
Eric Biggers ca17aa6640 crypto: lib/chacha - remove unused arch-specific init support
All implementations of chacha_init_arch() just call
chacha_init_generic(), so it is pointless.  Just delete it, and replace
chacha_init() with what was previously chacha_init_generic().

Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-03-21 17:39:06 +08:00
Ming Lei 96af5af47b selftests: ublk: fix write cache implementation
For loop target, write cache isn't enabled, and each write isn't be
marked as DSYNC too.

Fix it by enabling write cache, meantime fix FLUSH implementation
by not taking LBA range into account, and there isn't such info
for FLUSH command.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250321004758.152572-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-20 20:01:03 -06:00
Ming Lei beb31982ad selftests: ublk: add variable for user to not show test result
Some user decides test result by exit code only, and wouldn't like to be
bothered by the test result.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250320013743.4167489-4-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-20 17:18:55 -06:00
Ming Lei fe2230d921 selftests: ublk: don't show `modprobe` failure
ublk_drv may be built-in, so don't show modprobe failure, and we
do check `/dev/ublk-control` for skipping test if ublk_drv isn't
enabled.

Reported-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250320013743.4167489-3-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-20 17:18:55 -06:00
Ming Lei 8764c1a72b selftests: ublk: add one dependency header
Add one dependency helper which can include new uapi definition which
isn't synced from kernel.

This way also helps a lot for downstream test deployment.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250320013743.4167489-2-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-20 17:18:55 -06:00
Paolo Abeni 6f13bec53a bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQ6NaUOruQGUkvPdG4raS+Z+3y5EwUCZ9NWrwAKCRAraS+Z+3y5
 E2VIAP4iIdMobFDEm04JuXyrOK6HasnkOFkuEgLNNhGRbCCTTwD/X/2A8baSXnjg
 eqYN2DJfrs7OZ1ZUku/ic1VGtJg7Sgk=
 =jTyW
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Martin KaFai Lau says:

====================
pull-request: bpf-next 2025-03-13

The following pull-request contains BPF updates for your *net-next* tree.

We've added 4 non-merge commits during the last 3 day(s) which contain
a total of 2 files changed, 35 insertions(+), 12 deletions(-).

The main changes are:

1) bpf_getsockopt support for TCP_BPF_RTO_MIN and TCP_BPF_DELACK_MAX,
   from Jason Xing

bpf-next-for-netdev

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
  selftests/bpf: Add bpf_getsockopt() for TCP_BPF_DELACK_MAX and TCP_BPF_RTO_MIN
  tcp: bpf: Support bpf_getsockopt for TCP_BPF_DELACK_MAX
  tcp: bpf: Support bpf_getsockopt for TCP_BPF_RTO_MIN
  tcp: bpf: Introduce bpf_sol_tcp_getsockopt to support TCP_BPF flags
====================

Link: https://patch.msgid.link/20250313221620.2512684-1-martin.lau@linux.dev
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-20 21:48:14 +01:00
Paolo Abeni f491593394 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.14-rc8).

Conflict:

tools/testing/selftests/net/Makefile
  03544faad7 ("selftest: net: add proc_net_pktgen")
  3ed61b8938 ("selftests: net: test for lwtunnel dst ref loops")

tools/testing/selftests/net/config:
  85cb3711ac ("selftests: net: Add test cases for link and peer netns")
  3ed61b8938 ("selftests: net: test for lwtunnel dst ref loops")

Adjacent commits:

tools/testing/selftests/net/Makefile
  c935af429e ("selftests: net: add support for testing SO_RCVMARK and SO_RCVPRIORITY")
  355d940f4d ("Revert "selftests: Add IPv6 link-local address generation tests for GRE devices."")

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-20 21:38:01 +01:00
Björn Töpel e16e64f9e0 selftests/bpf: Sanitize pointer prior fclose()
There are scenarios where env.{sub,}test_state->stdout_saved, can be
NULL, e.g. sometimes when the watchdog timeout kicks in, or if the
open_memstream syscall is not available.

Avoid crashing test_progs by adding an explicit NULL check prior the
fclose() call.

Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20250318081648.122523-1-bjorn@kernel.org
2025-03-20 10:35:07 -07:00
Paolo Bonzini 0afd104fb3 KVM/arm64 updates for 6.15
- Nested virtualization support for VGICv3, giving the nested
    hypervisor control of the VGIC hardware when running an L2 VM
 
  - Removal of 'late' nested virtualization feature register masking,
    making the supported feature set directly visible to userspace
 
  - Support for emulating FEAT_PMUv3 on Apple silicon, taking advantage
    of an IMPLEMENTATION DEFINED trap that covers all PMUv3 registers
 
  - Paravirtual interface for discovering the set of CPU implementations
    where a VM may run, addressing a longstanding issue of guest CPU
    errata awareness in big-little systems and cross-implementation VM
    migration
 
  - Userspace control of the registers responsible for identifying a
    particular CPU implementation (MIDR_EL1, REVIDR_EL1, AIDR_EL1),
    allowing VMs to be migrated cross-implementation
 
  - pKVM updates, including support for tracking stage-2 page table
    allocations in the protected hypervisor in the 'SecPageTable' stat
 
  - Fixes to vPMU, ensuring that userspace updates to the vPMU after
    KVM_RUN are reflected into the backing perf events
 -----BEGIN PGP SIGNATURE-----
 
 iI0EABYIADUWIQSNXHjWXuzMZutrKNKivnWIJHzdFgUCZ9s9gBccb2xpdmVyLnVw
 dG9uQGxpbnV4LmRldgAKCRCivnWIJHzdFp6LAQCOQ1Fidp8RT1NdhLLAhW5D4gLe
 MNT619R4qfqu64ZpeQEAidHMAYaGRk5KDNBq6Jn+awcJnwCcMnh2ok0vTOjz3gY=
 =RC6A
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-6.15' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 updates for 6.15

 - Nested virtualization support for VGICv3, giving the nested
   hypervisor control of the VGIC hardware when running an L2 VM

 - Removal of 'late' nested virtualization feature register masking,
   making the supported feature set directly visible to userspace

 - Support for emulating FEAT_PMUv3 on Apple silicon, taking advantage
   of an IMPLEMENTATION DEFINED trap that covers all PMUv3 registers

 - Paravirtual interface for discovering the set of CPU implementations
   where a VM may run, addressing a longstanding issue of guest CPU
   errata awareness in big-little systems and cross-implementation VM
   migration

 - Userspace control of the registers responsible for identifying a
   particular CPU implementation (MIDR_EL1, REVIDR_EL1, AIDR_EL1),
   allowing VMs to be migrated cross-implementation

 - pKVM updates, including support for tracking stage-2 page table
   allocations in the protected hypervisor in the 'SecPageTable' stat

 - Fixes to vPMU, ensuring that userspace updates to the vPMU after
   KVM_RUN are reflected into the backing perf events
2025-03-20 12:54:12 -04:00
Paolo Bonzini c0f99fb4e5 KVM/riscv changes for 6.15
- Disable the kernel perf counter during configure
 - KVM selftests improvements for PMU
 - Fix warning at the time of KVM module removal
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEZdn75s5e6LHDQ+f/rUjsVaLHLAcFAmfbsiQACgkQrUjsVaLH
 LAfGAxAAlODgZMDt+f/YlOOmQPsLohGrWsZjvheRfrf3X5PFNuXd1frpZSbF1PCs
 lk4WZrekJo2VT2tWMEBjFrH6A9slHUzjZs+xIxig1N/Gmxy1clKp/svJVG20WwQI
 a5/5jPH3TNDL3KUphwBF0QIi/Tlp6+SZ5RAag6QsBXKnFy5+o9lioR2T6tFzMoxy
 ry8VD8e395bQXn5tYkk4vsr9vBRzpgq7ZBOMg6zyNTL86UBOVq3QXPNzSE5kenXW
 4zqDmuxbk+GgpUKLXA9aXwLvasGiOr+59uToZd2lHf5ynSdSprQyW11I5ZOFs2DX
 800mLeYtyHc4nGTi6OO3VN9toMhmgijtFLNpBDg4LRvwMfB+sp62Ixsmjs3CBCd+
 GqcbJaMtdgM1SpZ3tlzEQoWpIBlNYs7BuG/6Jen9xKHHvOg7Ty9UimJeMikV+FUm
 srq0+GlgjaniZ/d8arIIm7LVqZLAc6OSzn+A8IcE0KQ9fKBgB0CZrANW+N+b+7ub
 Lkq9hA8tM5Ti0bNszqPyJu3bmhWAOEOwF0CjYL79vEhIYsegCKKtruKOppdM+sl2
 dzk+DVV9Aar+bDu/v/bYV5TfIq39U86Tqxe+OU7EPzch5NdyDvKcpZ9/JIZnf/j5
 tviM3awZfFmgTeBUTAYvaR34KExBOLQLkGb9BqAydYjg+pl9jVE=
 =HP3N
 -----END PGP SIGNATURE-----

Merge tag 'kvm-riscv-6.15-1' of https://github.com/kvm-riscv/linux into HEAD

KVM/riscv changes for 6.15

- Disable the kernel perf counter during configure
- KVM selftests improvements for PMU
- Fix warning at the time of KVM module removal
2025-03-20 12:53:34 -04:00
Linus Torvalds 5fc3193608 Including fixes from can, bluetooth and ipsec.
This contains a last minute revert of a recent GRE patch, mostly
 to allow me stating there are no known regressions outstanding.
 
 Current release - regressions:
 
   - revert "gre: Fix IPv6 link-local address generation."
 
   - eth: ti: am65-cpsw: fix NAPI registration sequence
 
 Previous releases - regressions:
 
   - ipv6: fix memleak of nhc_pcpu_rth_output in fib_check_nh_v6_gw().
 
   - mptcp: fix data stream corruption in the address announcement
 
   - bluetooth: fix connection regression between LE and non-LE adapters
 
   - can:
     - flexcan: only change CAN state when link up in system PM
     - ucan: fix out of bound read in strscpy() source
 
 Previous releases - always broken:
 
   - lwtunnel: fix reentry loops
 
   - ipv6: fix TCP GSO segmentation with NAT
 
   - xfrm: force software GSO only in tunnel mode
 
   - eth: ti: icssg-prueth: add lock to stats
 
 Misc:
 
   - add Andrea Mayer as a maintainer of SRv6
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmfcOhESHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkJUQP/2jpOcA+ogkCfRhcJHHaeX9gCX+buSY4
 RZnMUkSUhW5dcL9mTybx3tzH/0oNlGfLtA016SR+Grq0mIBIYsRWhTW6C8mOgvsZ
 PrHvE9VyATBSxUM9o9bL5WMg/M9TvX7foR63+zGPN2lEk6mmLK/hIBFNjkv9R8Vk
 /VR0ZRKHMsJARsyRve1Sf8DZXndAfROnWhCCmWWxKpnCd4biBL/6n6p00vfxpAls
 /Xnm1PC0NMuRz0hlIr8UN9DkkF1v+LEhp1EFbg5e7i8cJkAJyXcvZBb4rJ8f2Ty7
 4qXK53i+kp+NryNAZ6cNu6OtkD+DtdDUNJ28ElkdnKOc8H787YGFvCTBDPiqe5yu
 CVA6tT1hsuKjEnXdX3545+tTW48XS+6J60ZeVnslfC3fakG03ckOZboofT+LQV1y
 wcv4+73dDrIbSo3X7DBdksFzQi/ICb1VG/GOdGlg8vlokKnH0di/veoBtbyAR7dJ
 2Fv3rpE6e/JQQmXYffto0qrNXOYrEx7Zqmo2QbDS6dTkQ1FiDsxRYegRmVxHLHX9
 0fd48FNFIqEJSM5RwLOQW9X83aObT5p2OiHiA+WrnmkiSiMwK4EdusBm4WtFZ7qc
 bqaUHbnAmg1g7UiPn7cGE3navoasH8xmkS77ZsbyW618ej/hRFGDwWMQSuEMyeLz
 PKrT/OMXIHu9
 =aYLc
 -----END PGP SIGNATURE-----

Merge tag 'net-6.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from can, bluetooth and ipsec.

  This contains a last minute revert of a recent GRE patch, mostly to
  allow me stating there are no known regressions outstanding.

  Current release - regressions:

   - revert "gre: Fix IPv6 link-local address generation."

   - eth: ti: am65-cpsw: fix NAPI registration sequence

  Previous releases - regressions:

   - ipv6: fix memleak of nhc_pcpu_rth_output in fib_check_nh_v6_gw().

   - mptcp: fix data stream corruption in the address announcement

   - bluetooth: fix connection regression between LE and non-LE adapters

   - can:
       - flexcan: only change CAN state when link up in system PM
       - ucan: fix out of bound read in strscpy() source

  Previous releases - always broken:

   - lwtunnel: fix reentry loops

   - ipv6: fix TCP GSO segmentation with NAT

   - xfrm: force software GSO only in tunnel mode

   - eth: ti: icssg-prueth: add lock to stats

  Misc:

   - add Andrea Mayer as a maintainer of SRv6"

* tag 'net-6.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (33 commits)
  MAINTAINERS: Add Andrea Mayer as a maintainer of SRv6
  Revert "gre: Fix IPv6 link-local address generation."
  Revert "selftests: Add IPv6 link-local address generation tests for GRE devices."
  net/neighbor: add missing policy for NDTPA_QUEUE_LENBYTES
  tools headers: Sync uapi/asm-generic/socket.h with the kernel sources
  mptcp: Fix data stream corruption in the address announcement
  selftests: net: test for lwtunnel dst ref loops
  net: ipv6: ioam6: fix lwtunnel_output() loop
  net: lwtunnel: fix recursion loops
  net: ti: icssg-prueth: Add lock to stats
  net: atm: fix use after free in lec_send()
  xsk: fix an integer overflow in xp_create_and_assign_umem()
  net: stmmac: dwc-qos-eth: use devm_kzalloc() for AXI data
  selftests: drv-net: use defer in the ping test
  phy: fix xa_alloc_cyclic() error handling
  dpll: fix xa_alloc_cyclic() error handling
  devlink: fix xa_alloc_cyclic() error handling
  ipv6: Set errno after ip_fib_metrics_init() in ip6_route_info_create().
  ipv6: Fix memleak of nhc_pcpu_rth_output in fib_check_nh_v6_gw().
  net: ipv6: fix TCP GSO segmentation with NAT
  ...
2025-03-20 09:39:15 -07:00
Guillaume Nault 355d940f4d Revert "selftests: Add IPv6 link-local address generation tests for GRE devices."
This reverts commit 6f50175cca.

Commit 183185a18f ("gre: Fix IPv6 link-local address generation.") is
going to be reverted. So let's revert the corresponding kselftest
first.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/259a9e98f7f1be7ce02b53d0b4afb7c18a8ff747.1742418408.git.gnault@redhat.com
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-20 15:46:16 +01:00
Christian Brauner 4d5483a42c
selftests/pidfd: third test for multi-threaded exec polling
Ensure that during a multi-threaded exec and premature thread-group
leader exit no exit notification is generated.

Link: https://lore.kernel.org/r/20250320-work-pidfs-thread_group-v4-4-da678ce805bf@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-20 15:32:43 +01:00
Christian Brauner 9b6f723db5
selftests/pidfd: second test for multi-threaded exec polling
Ensure that during a multi-threaded exec and premature thread-group
leader exit no exit notification is generated.

Link: https://lore.kernel.org/r/20250320-work-pidfs-thread_group-v4-3-da678ce805bf@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-20 15:32:43 +01:00
Christian Brauner db7ce91e22
selftests/pidfd: first test for multi-threaded exec polling
Add first test for premature thread-group leader exit.

Link: https://lore.kernel.org/r/20250320-work-pidfs-thread_group-v4-2-da678ce805bf@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-20 15:32:43 +01:00
Justin Iurman 3ed61b8938 selftests: net: test for lwtunnel dst ref loops
As recently specified by commit 0ea09cbf83 ("docs: netdev: add a note
on selftest posting") in net-next, the selftest is therefore shipped in
this series. However, this selftest does not really test this series. It
needs this series to avoid crashing the kernel. What it really tests,
thanks to kmemleak, is what was fixed by the following commits:
- commit c71a192976 ("net: ipv6: fix dst refleaks in rpl, seg6 and
ioam6 lwtunnels")
- commit 92191dd107 ("net: ipv6: fix dst ref loops in rpl, seg6 and
ioam6 lwtunnels")
- commit c64a0727f9 ("net: ipv6: fix dst ref loop on input in seg6
lwt")
- commit 13e55fbaec ("net: ipv6: fix dst ref loop on input in rpl
lwt")
- commit 0e7633d7b9 ("net: ipv6: fix dst ref loop in ila lwtunnel")
- commit 5da15a9c11 ("net: ipv6: fix missing dst ref drop in ila
lwtunnel")

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Link: https://patch.msgid.link/20250314120048.12569-4-justin.iurman@uliege.be
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-20 11:25:52 +01:00
Geliang Tang 9cf0128e64 selftests: mptcp: add pm sysctl mapping tests
This patch checks if the newly added net.mptcp.path_manager is mapped
successfully from or to the old net.mptcp.pm_type in userspace_pm.sh.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250313-net-next-mptcp-pm-ops-intro-v1-12-f4e4a88efc50@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-20 10:14:49 +01:00
Bastien Curutchet (eBPF Foundation) f8df95e84c selftests/bpf: Migrate test_xdp_vlan.sh into test_progs
test_xdp_vlan.sh isn't used by the BPF CI.

Migrate test_xdp_vlan.sh in prog_tests/xdp_vlan.c.
It uses the same BPF programs located in progs/test_xdp_vlan.c and the
same network topology.
Remove test_xdp_vlan*.sh and their Makefile entries.

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250221-xdp_vlan-v1-2-7d29847169af@bootlin.com/
2025-03-19 16:01:33 -07:00
Bastien Curutchet (eBPF Foundation) 0f9ff4cb68 selftests/bpf: test_xdp_vlan: Rename BPF sections
The __load() helper expects BPF sections to be names 'xdp' or 'tc'

Rename BPF sections so they can be loaded with the __load() helper in
upcoming patch.
Rename the BPF functions with their previous section's name.
Update the 'ip link' commands in the script to use the program name
instead of the section name to load the BPF program.

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250221-xdp_vlan-v1-1-7d29847169af@bootlin.com
2025-03-19 15:59:27 -07:00
Oliver Upton 4f2774c57a Merge branch 'kvm-arm64/writable-midr' into kvmarm/next
* kvm-arm64/writable-midr:
  : Writable implementation ID registers, courtesy of Sebastian Ott
  :
  : Introduce a new capability that allows userspace to set the
  : ID registers that identify a CPU implementation: MIDR_EL1, REVIDR_EL1,
  : and AIDR_EL1. Also plug a hole in KVM's trap configuration where
  : SMIDR_EL1 was readable at EL1, despite the fact that KVM does not
  : support SME.
  KVM: arm64: Fix documentation for KVM_CAP_ARM_WRITABLE_IMP_ID_REGS
  KVM: arm64: Copy MIDR_EL1 into hyp VM when it is writable
  KVM: arm64: Copy guest CTR_EL0 into hyp VM
  KVM: selftests: arm64: Test writes to MIDR,REVIDR,AIDR
  KVM: arm64: Allow userspace to change the implementation ID registers
  KVM: arm64: Load VPIDR_EL2 with the VM's MIDR_EL1 value
  KVM: arm64: Maintain per-VM copy of implementation ID regs
  KVM: arm64: Set HCR_EL2.TID1 unconditionally

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-19 14:54:32 -07:00
Oliver Upton d300b0168e Merge branch 'kvm-arm64/pv-cpuid' into kvmarm/next
* kvm-arm64/pv-cpuid:
  : Paravirtualized implementation ID, courtesy of Shameer Kolothum
  :
  : Big-little has historically been a pain in the ass to virtualize. The
  : implementation ID (MIDR, REVIDR, AIDR) of a vCPU can change at the whim
  : of vCPU scheduling. This can be particularly annoying when the guest
  : needs to know the underlying implementation to mitigate errata.
  :
  : "Hyperscalers" face a similar scheduling problem, where VMs may freely
  : migrate between hosts in a pool of heterogenous hardware. And yes, our
  : server-class friends are equally riddled with errata too.
  :
  : In absence of an architected solution to this wart on the ecosystem,
  : introduce support for paravirtualizing the implementation exposed
  : to a VM, allowing the VMM to describe the pool of implementations that a
  : VM may be exposed to due to scheduling/migration.
  :
  : Userspace is expected to intercept and handle these hypercalls using the
  : SMCCC filter UAPI, should it choose to do so.
  smccc: kvm_guest: Fix kernel builds for 32 bit arm
  KVM: selftests: Add test for KVM_REG_ARM_VENDOR_HYP_BMAP_2
  smccc/kvm_guest: Enable errata based on implementation CPUs
  arm64: Make  _midr_in_range_list() an exported function
  KVM: arm64: Introduce KVM_REG_ARM_VENDOR_HYP_BMAP_2
  KVM: arm64: Specify hypercall ABI for retrieving target implementations
  arm64: Modify _midr_range() functions to read MIDR/REVIDR internally

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-19 14:53:16 -07:00
Oliver Upton 13f64f6d21 Merge branch 'kvm-arm64/nv-idregs' into kvmarm/next
* kvm-arm64/nv-idregs:
  : Changes to exposure of NV features, courtesy of Marc Zyngier
  :
  : Apply NV-specific feature restrictions at reset rather than at the point
  : of KVM_RUN. This makes the true feature set visible to userspace, a
  : necessary step towards save/restore support or NV VMs.
  :
  : Add an additional vCPU feature flag for selecting the E2H0 flavor of NV,
  : such that the VHE-ness of the VM can be applied to the feature set.
  KVM: arm64: selftests: Test that TGRAN*_2 fields are writable
  KVM: arm64: Allow userspace to write ID_AA64MMFR0_EL1.TGRAN*_2
  KVM: arm64: Advertise FEAT_ECV when possible
  KVM: arm64: Make ID_AA64MMFR4_EL1.NV_frac writable
  KVM: arm64: Allow userspace to limit NV support to nVHE
  KVM: arm64: Move NV-specific capping to idreg sanitisation
  KVM: arm64: Enforce NV limits on a per-idregs basis
  KVM: arm64: Make ID_REG_LIMIT_FIELD_ENUM() more widely available
  KVM: arm64: Consolidate idreg callbacks
  KVM: arm64: Advertise NV2 in the boot messages
  KVM: arm64: Mark HCR.EL2.{NV*,AT} RES0 when ID_AA64MMFR4_EL1.NV_frac is 0
  KVM: arm64: Mark HCR.EL2.E2H RES0 when ID_AA64MMFR1_EL1.VH is zero
  KVM: arm64: Hide ID_AA64MMFR2_EL1.NV from guest and userspace
  arm64: cpufeature: Handle NV_frac as a synonym of NV2

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-19 14:52:26 -07:00
Ingo Molnar 14d281db78 sched/debug: Remove CONFIG_SCHED_DEBUG from self-test config files
We leave most of the defconfigs alone (there's over 70 of them),
but let's remove CONFIG_SCHED_DEBUG from the scheduler self-test
Kconfig files.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/Z9szt3MpQmQ56TRd@gmail.com
2025-03-19 22:23:24 +01:00
Michael Jeanson d047e32b8d rseq/selftests: Fix namespace collision with rseq UAPI header
When the rseq UAPI header is included, 'union rseq' clashes with 'struct
rseq'. It's not the case in the rseq selftests but it does break the KVM
selftests that also include this file.

Rename 'union rseq' to 'union rseq_tls' to fix this.

Fixes: e6644c967d ("rseq/selftests: Ensure the rseq ABI TLS is actually 1024 bytes")
Reported-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/r/20250319202144.1141542-1-mjeanson@efficios.com
2025-03-19 21:26:24 +01:00
Jonathan Lennox 756f88ff9c tc-tests: Update tc police action tests for tc buffer size rounding fixes.
Before tc's recent change to fix rounding errors, several tests which
specified a burst size of "1m" would translate back to being 1048574
bytes (2b less than 1Mb).  sprint_size prints this as "1024Kb".

With the tc fix, the burst size is instead correctly reported as
1048576 bytes (precisely 1Mb), which sprint_size prints as "1Mb".

This updates the expected output in the tests' matchPattern values
to accept either the old or the new output.

Signed-off-by: Jonathan Lennox <jonathan.lennox@8x8.com>
Link: https://patch.msgid.link/20250312174804.313107-1-jonathan.lennox@8x8.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-19 18:40:24 +01:00
Jakub Kicinski acf10a8c0b selftests: drv-net: use defer in the ping test
Make sure the test cleans up after itself. The XDP off statements
at the end of the test may not be reached.

Fixes: 75cc19c8ff ("selftests: drv-net: add xdp cases for ping.py")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250312131040.660386-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-19 18:21:13 +01:00
Kumar Kartikeya Dwivedi 60ba5b3ed7 selftests/bpf: Add tests for rqspinlock
Introduce selftests that trigger AA, ABBA deadlocks, and test the edge
case where the held locks table runs out of entries, since we then
fallback to the timeout as the final line of defense. Also exercise
verifier's AA detection where applicable.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250316040541.108729-26-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-19 08:03:06 -07:00
Paolo Bonzini 783e9cd05c KVM selftests changes for 6.15, part 2
- Fix a variety of flaws, bugs, and false failures/passes dirty_log_test, and
    improve its coverage by collecting all dirty entries on each iteration.
 
  - Fix a few minor bugs related to handling of stats FDs.
 
  - Add infrastructure to make vCPU and VM stats FDs available to tests by
    default (open the FDs during VM/vCPU creation).
 
  - Relax an assertion on the number of HLT exits in the xAPIC IPI test when
    running on a CPU that supports AMD's Idle HLT (which elides interception of
    HLT if a virtual IRQ is pending and unmasked).
 
  - Misc cleanups and fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmfZqZYACgkQOlYIJqCj
 N/2G0Q/5AZAnOPBU/xmIZrntaWSkPX0wYCBzTCFAbyuXTYdWmCMaVJUjj0W4h15U
 rAPxq76moGSHT06EQhw8clrNHbRSr4OZvsJIQMgGK5GBXlxfD+S1bNb/NKijpKm1
 yjI4VIKA1ugpPLbb4KM522UQdAwkBI8KuPbhzWCsz2kzFf3Iy/B6ZgfevTjldZ9f
 KbocBQhCaEOmk0i1UZWMnuJU+rh06y/BNR/7a/vhaZGy4pPm/WZpSezKbti3j7F5
 +/VXJ+fh2T1oPzlycdQ4i9phYAOMJONcNxlzhs0Hp71zkxNykVKyv1zxvCI3ioEn
 nnHFYQyremDfDJhxabOJRYB4ETlhLpX9nOlz2euIy7YeurDs73Qtjg2rixgAyq80
 /FITPTFVlcHxjPr9YC+wtWzZB2qnGNMfA7QujnoG3XlGQJJsvnFTygXfBMj+W0G/
 K8V93lvGTcwtT7x+9MCDhD0wZXfPBXorze8QrDfCWF96G1Y+5gZmYGs38Cu6IWrc
 6q3tHcDmUaOGI1Mag97J1smSWw+ZC1+UuHJSW7DMZldOm/2fXkqrRpbHRXEzRQIa
 vZjBZKoYHpqGMKayQ+GIRql9R6o+ZGQ6OQXgtLl0fxPPA01/hKX1ITZayBnHDl3C
 wm3svDZy6hPWs17kwtLj+cRSMVYItbjOgpOj58gBo16w1NPNYTI=
 =t77Q
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-selftests-6.15' of https://github.com/kvm-x86/linux into HEAD

KVM selftests changes for 6.15, part 2

 - Fix a variety of flaws, bugs, and false failures/passes dirty_log_test, and
   improve its coverage by collecting all dirty entries on each iteration.

 - Fix a few minor bugs related to handling of stats FDs.

 - Add infrastructure to make vCPU and VM stats FDs available to tests by
   default (open the FDs during VM/vCPU creation).

 - Relax an assertion on the number of HLT exits in the xAPIC IPI test when
   running on a CPU that supports AMD's Idle HLT (which elides interception of
   HLT if a virtual IRQ is pending and unmasked).

 - Misc cleanups and fixes.
2025-03-19 09:05:34 -04:00
Paolo Bonzini 9b47f288eb KVM selftests changes for 6.15, part 1
- Misc cleanups and prep work.
 
  - Annotate _no_printf() with "printf" so that pr_debug() statements are
    checked by the compiler for default builds (and pr_info() when QUIET).
 
  - Attempt to whack the last LLC references/misses mole in the Intel PMU
    counters test by adding a data load and doing CLFLUSH{OPT} on the data
    instead of the code being executed.  The theory is that modern Intel CPUs
    have learned new code prefetching tricks that bypass the PMU counters.
 
  - Fix a flaw in the Intel PMU counters test where it asserts that an event is
    counting correctly without actually knowing what the event counts on the
    underlying hardware.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmfZpvcACgkQOlYIJqCj
 N/3a+BAAmv0iagaYSiigAWZ9EM/2+CZDGp0+g7sB95graJFKnrKCXSkpuUr1UIYr
 aDPLHAPBZ8szWNqrJARdSui0QP8cEd2erJkMNRDEQqx/OYKQJ+0W+gmlbleQ6GL6
 7C/zUn0JdC0AhpdfePw6nIPpwIYBxDwHMQSuPUOxw5ZJmVCzQxn43MmPWTfL11Eo
 ARfh6VWmXSvp+5emyjnlGmcT1/a3UlfAGDRekK0gaPYmEtaf6LfKCXc+nX84q0fr
 T2N8nVcmEDg9VUO3JZfexoLcKpFMsTdwg4GbQ4beytYwPx3YELvCP9eFl5yo4lxk
 9YL6uPdXyk0/Cgpv/QyB4YtnaQ4tktPmFLYjEtf8NIMLubouTS6dtYErKq9MOgsP
 7UaOBVY6O7+cG+iOU2yGn1Mct0XxhKKLu3n5ZlGLJ/+8v1PUIegL3wUwpQaycp7O
 xgRd/narmKEmMYFr7D0y7epcUVBlkP17YsFH4w6z9moDjJNVz4Ziqft6ju0N3P1s
 a/8CA5gd2lplp/Yae62f2mgJNjw5kbEYtGO9ybINbhacznsl+3JsU85k1h+7dZrW
 8lmfNJ8+B95UrZ9Wo0mWsv+UKYYFHRFoHjKlevuUTPQ/QxoghvpQ5iU3P+rT8E5n
 858OhTi2vlAhg8a338XdfUOy799F73ecDaIKyPMdO74v7wq1swo=
 =RjiY
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-selftests_6.15-1' of https://github.com/kvm-x86/linux into HEAD

KVM selftests changes for 6.15, part 1

 - Misc cleanups and prep work.

 - Annotate _no_printf() with "printf" so that pr_debug() statements are
   checked by the compiler for default builds (and pr_info() when QUIET).

 - Attempt to whack the last LLC references/misses mole in the Intel PMU
   counters test by adding a data load and doing CLFLUSH{OPT} on the data
   instead of the code being executed.  The theory is that modern Intel CPUs
   have learned new code prefetching tricks that bypass the PMU counters.

 - Fix a flaw in the Intel PMU counters test where it asserts that an event is
   counting correctly without actually knowing what the event counts on the
   underlying hardware.
2025-03-19 09:05:22 -04:00
Paolo Bonzini 4d9a677596 KVM x86 misc changes for 6.15:
- Fix a bug in PIC emulation that caused KVM to emit a spurious KVM_REQ_EVENT.
 
  - Add a helper to consolidate handling of mp_state transitions, and use it to
    clear pv_unhalted whenever a vCPU is made RUNNABLE.
 
  - Defer runtime CPUID updates until KVM emulates a CPUID instruction, to
    coalesce updates when multiple pieces of vCPU state are changing, e.g. as
    part of a nested transition.
 
  - Fix a variety of nested emulation bugs, and add VMX support for synthesizing
    nested VM-Exit on interception (instead of injecting #UD into L2).
 
  - Drop "support" for PV Async #PF with proctected guests without SEND_ALWAYS,
    as KVM can't get the current CPL.
 
  - Misc cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmfZhoYACgkQOlYIJqCj
 N/1oSBAAlhsZzv4m6rHtWACjGsSBlAE5WM7HrpnwjOyMW+Desc0WLL4L6qAlxgT7
 4ZkBvQ4zsiyUIdLv1XARvkBYHTUkcgG8fQSSJQ6grZit5OcFcMgWafvQrJYI6428
 TyzF/x6t0gj8CjDluA6jM/eL5HhWZZDOIg0Wma+pyYpW+7y2tkphXyYyOQPBGwv3
 geRUetbDHHXjf042k/8f1j1vjzrNNvAg3YyNyx1YbdU9XKsn5D+SeUW2eVfYk8G7
 5QsCOGvUYcbbjrR8kbCZKexvoH6Np9J6YKDe4R9R2yDzgs/96qz6xkYTGCVkHA1y
 uursKqRHgbXBxzxa+ban073laT7Qt3S01Gd9bJW3IO7hzG89gl4qfX7fap8T9Yc2
 yeBTYIgInpyx+NCdZ2Z/++BzPagBGfa77gFX/eIkmsVA9LWYi9CI3FSjtr/czvWm
 a4tfMPvTVBjsBQQ7t/lNksrq0O51lbb3iqqv3ToQpDOOqCWuMEU5xcihhPRr5NSZ
 dX4o/jIDhCV8EyXdtASyqMlYBXcuC45ojEZn1elh0QogzYAdSGQ2bIDyxuBtA//k
 kSbi+E4GB64jVfBWUyK2QeLOBnBkH7mh6Cg5UYr1Ln9Sm6l8vrcxhcbnchiWxXMI
 WCK7BJwI2HojBVpEZ04jMkjHvg36uSfjOzmMLT5yPXfFNebsGmA=
 =8SGF
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-misc-6.15' of https://github.com/kvm-x86/linux into HEAD

KVM x86 misc changes for 6.15:

 - Fix a bug in PIC emulation that caused KVM to emit a spurious KVM_REQ_EVENT.

 - Add a helper to consolidate handling of mp_state transitions, and use it to
   clear pv_unhalted whenever a vCPU is made RUNNABLE.

 - Defer runtime CPUID updates until KVM emulates a CPUID instruction, to
   coalesce updates when multiple pieces of vCPU state are changing, e.g. as
   part of a nested transition.

 - Fix a variety of nested emulation bugs, and add VMX support for synthesizing
   nested VM-Exit on interception (instead of injecting #UD into L2).

 - Drop "support" for PV Async #PF with proctected guests without SEND_ALWAYS,
   as KVM can't get the current CPL.

 - Misc cleanups
2025-03-19 09:04:48 -04:00
Clément Léger 7a9827e777
KVM: riscv: selftests: Add Zaamo/Zalrsc extensions to get-reg-list test
The KVM RISC-V allows Zaamo/Zalrsc extensions for Guest/VM so add these
extensions to get-reg-list test.

Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20240619153913.867263-6-cleger@rivosinc.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
2025-03-19 12:03:46 +00:00
Ingo Molnar 89771319e0 Linux 6.14-rc7
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmfXVtUeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGN/sH/i5423Gt/z51gDjA
 s4v5Z7GaBJ9zOGBahn2RWFe72zytTqKrEJmMnGfguirs0atD1DtQj4WAP7iFKP+e
 WyO663X6HF7i5y37ja0Yd4PZc31hwtqzKH8LjBf8f8tTy8UsEVqumdi5A4sS9KTM
 qm4kTyyVEY9D/s7oRY8ywjDlRJtO6nT0aKMp4kAqNEbrNUYbilT/a0hgXcgSmPyB
 uIjmjL2fZfutxGI5LgfbaSHCa1ElmhvTvivOMpaAmZSGCRVHCKGgT0CTNnHyn/7C
 dB145JkRO4ZOUqirCdO4PE/23id3ajq9fcixJGBzAv7c45y+B3JZ1r2kAfKalE8/
 qrOKLys=
 =8r7a
 -----END PGP SIGNATURE-----

Merge tag 'v6.14-rc7' into x86/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-03-19 11:03:06 +01:00
Yafang Shao be16ddeaae selftests/bpf: Add selftest for attaching fexit to __noreturn functions
The reuslt:

  $ tools/testing/selftests/bpf/test_progs --name=fexit_noreturns
  #99/1    fexit_noreturns/noreturns:OK
  #99      fexit_noreturns:OK
  Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Link: https://lore.kernel.org/r/20250318114447.75484-3-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-18 19:07:18 -07:00
Nicolin Chen 97717a1f28 iommufd/selftest: Add IOMMU_VEVENTQ_ALLOC test coverage
Trigger vEVENTs by feeding an idev ID and validating the returned output
virt_ids whether they equal to the value that was set to the vDEVICE.

Link: https://patch.msgid.link/r/e829532ec0a3927d61161b7674b20e731ecd495b.1741719725.git.nicolinc@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-18 14:17:48 -03:00
Nicolin Chen 941d0719aa iommufd/selftest: Require vdev_id when attaching to a nested domain
When attaching a device to a vIOMMU-based nested domain, vdev_id must be
present. Add a piece of code hard-requesting it, preparing for a vEVENTQ
support in the following patch. Then, update the TEST_F.

A HWPT-based nested domain will return a NULL new_viommu, thus no such a
vDEVICE requirement.

Link: https://patch.msgid.link/r/4051ca8a819e51cb30de6b4fe9e4d94d956afe3d.1741719725.git.nicolinc@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-18 14:17:48 -03:00
Yunhui Cui 36dec9e448
RISC-V: selftests: Add TEST_ZICBOM into CBO tests
Add test for Zicbom and its block size into CBO tests, when
Zicbom is present, test that cbo.clean/flush may be issued and works.
As the software can't verify the clean/flush functions, we just judged
that cbo.clean/flush isn't executed illegally.

Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Samuel Holland <samuel.holland@sifive.com>
Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
Link: https://lore.kernel.org/r/20250226063206.71216-4-cuiyunhui@bytedance.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
2025-03-18 12:44:17 +00:00
Linus Torvalds 76b6905c11 15 hotfixes. 7 are cc:stable and the remainder address post-6.13 issues
or aren't considered necessary for -stable kernels.
 
 13 are for MM and the other two are for squashfs and procfs.
 
 All are singletons.  Please see the individual changelogs for details.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ9jkDgAKCRDdBJ7gKXxA
 jtgjAP0ZIVQ8wEL/txkrNyMT5JrWCK1XfjsfPNJ0afkcl4Oo/QEA8rnJHT/hX8aY
 Wr+aY2CoCsFEOsW4oDRUPrUYpkUBqA8=
 =AwLJ
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2025-03-17-20-09' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc hotfixes from Andrew Morton:
 "15 hotfixes. 7 are cc:stable and the remainder address post-6.13
  issues or aren't considered necessary for -stable kernels.

  13 are for MM and the other two are for squashfs and procfs.

  All are singletons. Please see the individual changelogs for details"

* tag 'mm-hotfixes-stable-2025-03-17-20-09' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mm/page_alloc: fix memory accept before watermarks gets initialized
  mm: decline to manipulate the refcount on a slab page
  memcg: drain obj stock on cpu hotplug teardown
  mm/huge_memory: drop beyond-EOF folios with the right number of refs
  selftests/mm: run_vmtests.sh: fix half_ufd_size_MB calculation
  mm: fix error handling in __filemap_get_folio() with FGP_NOWAIT
  mm: memcontrol: fix swap counter leak from offline cgroup
  mm/vma: do not register private-anon mappings with khugepaged during mmap
  squashfs: fix invalid pointer dereference in squashfs_cache_delete
  mm/migrate: fix shmem xarray update during migration
  mm/hugetlb: fix surplus pages in dissolve_free_huge_page()
  mm/damon/core: initialize damos->walk_completed in damon_new_scheme()
  mm/damon: respect core layer filters' allowance decision on ops layer
  filemap: move prefaulting out of hot write path
  proc: fix UAF in proc_get_inode()
2025-03-17 22:27:27 -07:00
Cyan Yang f841ad9ca5 selftests/mm/cow: fix the incorrect error handling
Error handling doesn't check the correct return value.  This patch will
fix it.

Link: https://lkml.kernel.org/r/20250312043840.71799-1-cyan.yang@sifive.com
Fixes: f4b5fd6946 ("selftests/vm: anon_cow: THP tests")
Signed-off-by: Cyan Yang <cyan.yang@sifive.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17 22:07:06 -07:00
Zi Yan 80a5c494c8 selftests/mm: add tests for folio_split(), buddy allocator like split
It splits page cache folios to orders from 0 to 8 at different in-folio
offset.

Link: https://lkml.kernel.org/r/20250307174001.242794-9-ziy@nvidia.com
Signed-off-by: Zi Yan <ziy@nvidia.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Kirill A. Shuemov <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Yang Shi <yang@os.amperecomputing.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Kairui Song <kasong@tencent.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17 22:07:00 -07:00
Zi Yan 3fec86f8aa xarray: add xas_try_split() to split a multi-index entry
Patch series "Buddy allocator like (or non-uniform) folio split", v10.

This patchset adds a new buddy allocator like (or non-uniform) large folio
split from a order-n folio to order-m with m < n.  It reduces

1. the total number of after-split folios from 2^(n-m) to n-m+1;

2. the amount of memory needed for multi-index xarray split from 2^(n/6-m/6) to
   n/6-m/6, assuming XA_CHUNK_SHIFT=6;

3. keep more large folios after a split from all order-m folios to
   order-(n-1) to order-m folios.

For example, to split an order-9 to order-0, folio split generates 10 (or
11 for anonymous memory) folios instead of 512, allocates 1 xa_node
instead of 8, and leaves 1 order-8, 1 order-7, ..., 1 order-1 and 2
order-0 folios (or 4 order-0 for anonymous memory) instead of 512 order-0
folios.

Instead of duplicating existing split_huge_page*() code, __folio_split()
is introduced as the shared backend code for both
split_huge_page_to_list_to_order() and folio_split().  __folio_split() can
support both uniform split and buddy allocator like (or non-uniform)
split.  All existing split_huge_page*() users can be gradually converted
to use folio_split() if possible.  In this patchset, I converted
truncate_inode_partial_folio() to use folio_split().

xfstests quick group passed for both tmpfs and xfs.  I also
semi-replicated Hugh's test[12] and ran it without any issue for almost 24
hours.


This patch (of 8):

A preparation patch for non-uniform folio split, which always split a
folio into half iteratively, and minimal xarray entry split.

Currently, xas_split_alloc() and xas_split() always split all slots from a
multi-index entry.  They cost the same number of xa_node as the
to-be-split slots.  For example, to split an order-9 entry, which takes
2^(9-6)=8 slots, assuming XA_CHUNK_SHIFT is 6 (!CONFIG_BASE_SMALL), 8
xa_node are needed.  Instead xas_try_split() is intended to be used
iteratively to split the order-9 entry into 2 order-8 entries, then split
one order-8 entry, based on the given index, to 2 order-7 entries, ...,
and split one order-1 entry to 2 order-0 entries.  When splitting the
order-6 entry and a new xa_node is needed, xas_try_split() will try to
allocate one if possible.  As a result, xas_try_split() would only need 1
xa_node instead of 8.

When a new xa_node is needed during the split, xas_try_split() can try to
allocate one but no more.  -ENOMEM will be return if a node cannot be
allocated.  -EINVAL will be return if a sibling node is split or cascade
split happens, where two or more new nodes are needed, and these are not
supported by xas_try_split().

xas_split_alloc() and xas_split() split an order-9 to order-0:

         ---------------------------------
         |   |   |   |   |   |   |   |   |
         | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
         |   |   |   |   |   |   |   |   |
         ---------------------------------
           |   |                   |   |
     -------   ---               ---   -------
     |           |     ...       |           |
     V           V               V           V
----------- -----------     ----------- -----------
| xa_node | | xa_node | ... | xa_node | | xa_node |
----------- -----------     ----------- -----------

xas_try_split() splits an order-9 to order-0:
   ---------------------------------
   |   |   |   |   |   |   |   |   |
   | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
   |   |   |   |   |   |   |   |   |
   ---------------------------------
     |
     |
     V
-----------
| xa_node |
-----------

Link: https://lkml.kernel.org/r/20250307174001.242794-1-ziy@nvidia.com
Link: https://lkml.kernel.org/r/20250307174001.242794-2-ziy@nvidia.com
Signed-off-by: Zi Yan <ziy@nvidia.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Kirill A. Shuemov <kirill.shutemov@linux.intel.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Yang Shi <yang@os.amperecomputing.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Kairui Song <kasong@tencent.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17 22:06:59 -07:00
Mykyta Yatsenko a024843d92 selftests/bpf: Test freplace from user namespace
Add selftests to verify that it is possible to load freplace program
from user namespace if BPF token is initialized by bpf_object__prepare
before calling bpf_program__set_attach_target.
Negative test is added as well.

Modified type of the priv_prog to xdp, as kprobe did not work on aarch64
and s390x.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20250317174039.161275-5-mykyta.yatsenko5@gmail.com
2025-03-17 13:45:12 -07:00
Wei Yang ccaf3efcee lib/interval_tree: add test case for span iteration
Verify interval_tree_span_iter_xxx() helpers works as expected.

Link: https://lkml.kernel.org/r/20250310074938.26756-6-richard.weiyang@gmail.com
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michel Lespinasse <michel@lespinasse.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17 12:17:01 -07:00
Wei Yang 16b1936ae6 lib/rbtree: add random seed
Current test use pseudo rand function with fixed seed, which means the
test data is the same pattern each time.

Add random seed parameter to randomize the test.

Link: https://lkml.kernel.org/r/20250310074938.26756-4-richard.weiyang@gmail.com
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michel Lespinasse <michel@lespinasse.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17 12:17:00 -07:00
Wei Yang 4164e1525d lib/rbtree: enable userland test suite for rbtree related data structure
Patch series "lib/interval_tree: add some test cases and cleanup", v2.

Since rbtree/augmented tree/interval tree share similar data structure,
besides new cases for interval tree, this patch set also does cleanup for
others.


This patch (of 7):

Currently we have some tests for rbtree related data structure, e.g. 
rbtree, augmented rbtree, interval tree, in lib/ as kernel module.

To facilitate the test and debug for those fundamental data structure,
this patch enable those tests in userland.

Link: https://lkml.kernel.org/r/20250310074938.26756-1-richard.weiyang@gmail.com
Link: https://lkml.kernel.org/r/20250310074938.26756-2-richard.weiyang@gmail.com
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michel Lespinasse <michel@lespinasse.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17 12:17:00 -07:00
Dave Jiang 1729808c54 cxl/test: Add Set Feature support to cxl_test
Add emulation to support Set Feature mailbox command to cxl_test.
The only feature supported is the device patrol scrub feature. The
set feature allows activation of patrol scrub for the cxl_test
emulated device. The command does not support partial data transfer
even though the spec allows it. This restriction is to reduce complexity
of the emulation given the patrol scrub feature is very minimal.

Link: https://patch.msgid.link/r/20250307205648.1021626-8-dave.jiang@intel.com
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Li Ming <ming.li@zohomail.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-17 14:41:37 -03:00
Dave Jiang e77e9c1079 cxl/test: Add Get Feature support to cxl_test
Add emulation of Get Feature command to cxl_test. The feature for
device patrol scrub is returned by the emulation code. This is
the only feature currently supported by cxl_test. It returns
the information for the device patrol scrub feature.

Link: https://patch.msgid.link/r/20250307205648.1021626-7-dave.jiang@intel.com
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Li Ming <ming.li@zohomail.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-17 14:41:37 -03:00
Dave Jiang 858ce2f56b cxl: Add FWCTL support to CXL
Add fwctl support code to allow sending of CXL feature commands from
userspace through as ioctls via FWCTL. Provide initial setup bits. The
CXL PCI probe function will call devm_cxl_setup_fwctl() after the
cxl_memdev has been enumerated in order to setup FWCTL char device under
the cxl_memdev like the existing memdev char device for issuing CXL raw
mailbox commands from userspace via ioctls.

Link: https://patch.msgid.link/r/20250307205648.1021626-2-dave.jiang@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Li Ming <ming.li@zohomail.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-17 14:41:36 -03:00
Dave Jiang 3b5d43245f Merge branch 'for-6.15/features' into cxl-for-next
Add CXL Features support. Setup code for enabling in kernel usage of CXL
Features. Expecting EDAC/RAS to utilize CXL Features in kernel for
things such as memory sparing. Also prepartion for enabling of CXL FWCTL
support to issue allowed Features from user space.
2025-03-17 09:22:59 -07:00
Lorenzo Stoakes f3b92176f4 tools/selftests: add guard region test for /proc/$pid/pagemap
Add a test to the guard region self tests to assert that the
/proc/$pid/pagemap information now made availabile to the user correctly
identifies and reports guard regions.

As a part of this change, update vm_util.h to add the new bit (note there
is no header file in the kernel where this is exposed, the user is
expected to provide their own mask) and utilise the helper functions there
for pagemap functionality.

[lorenzo.stoakes@oracle.com: fixup define name]
  Link: https://lkml.kernel.org/r/32e83941-e6f5-42ee-9292-a44c16463cf1@lucifer.local
Link: https://lkml.kernel.org/r/164feb0a43ae72650e6b20c3910213f469566311.1740139449.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:41 -07:00
Brendan Jackman 1ddae9d67e selftests/mm/mlock: print error on failure
It's not really possible to start diagnosing this without knowing the
actual error.

Also update the mlock2 helper to behave like libc would by setting errno
and returning -1.

Link: https://lkml.kernel.org/r/20250311-mm-selftests-v4-12-dec210a658f5@google.com
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:40 -07:00
Brendan Jackman 5d2146a335 selftests/mm: skip mlock tests if nobody user can't read it
If running from a directory that can't be read by unprivileged users,
executing on-fault-test via the nobody user will fail.

The kselftest build does give the file the correct permissions, but after
being installed it might be in a directory without global execute
permissions.

Since the script can't safely fix that, just skip if it happens.  Note
that the stderr of the `ls` command is unfiltered meaning the user sees a
"permission denied" error that can help inform them why the test was
skipped.

Link: https://lkml.kernel.org/r/20250311-mm-selftests-v4-11-dec210a658f5@google.com
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:40 -07:00
Brendan Jackman f896c6de83 selftests/mm: ensure uffd-wp-mremap gets pages of each size
This test allocates a page of every available size and doesn't have any
SKIP logic if the allocation fails.  So, ensure it's available and skip
the test if we can't do so.

Link: https://lkml.kernel.org/r/20250311-mm-selftests-v4-10-dec210a658f5@google.com
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:39 -07:00
Brendan Jackman e9269b2cc4 selftests/mm: drop unnecessary sudo usage
This script must be run as root anyway (see all the writing to privileged
files in /proc etc).

Remove the unnecessary use of sudo to avoid breaking on single-user
systems that don't have sudo.  This also avoids confusing readers.

Link: https://lkml.kernel.org/r/20250311-mm-selftests-v4-9-dec210a658f5@google.com
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:39 -07:00
Brendan Jackman 32b42970e8 selftests/mm: skip gup_longterm tests on weird filesystems
Some filesystems don't support ftruncate()ing unlinked files.  They return
ENOENT.  In that case, skip the test.

Link: https://lkml.kernel.org/r/20250311-mm-selftests-v4-8-dec210a658f5@google.com
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:39 -07:00
Brendan Jackman 571a4b62ed selftests/mm: skip map_populate on weird filesystems
It seems that 9pfs does not allow truncating unlinked files, Mark Brown
has noted that NFS may also behave this way.

It doesn't seem quite right to call this a "bug" but it's probably a
special enough case that it makes sense for the test to just SKIP if it
happens.

Link: https://lkml.kernel.org/r/20250311-mm-selftests-v4-7-dec210a658f5@google.com
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:39 -07:00
Brendan Jackman bf6d575e24 selftests/mm: don't fail uffd-stress if too many CPUs
This calculation divides a fixed parameter by an environment-dependent
parameter i.e.  the number of CPUs.

The simple way to avoid machine-specific failures here is to just put a
cap on the max value of the latter.

Link: https://lkml.kernel.org/r/20250311-mm-selftests-v4-6-dec210a658f5@google.com
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Suggested-by: Mateusz Guzik <mjguzik@gmail.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:38 -07:00
Brendan Jackman db0f1c138f selftests/mm: print some details when uffd-stress gets bad params
So this can be debugged more easily.

Link: https://lkml.kernel.org/r/20250311-mm-selftests-v4-5-dec210a658f5@google.com
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:38 -07:00
Brendan Jackman f3b5535abc selftests/mm/uffd: rename nr_cpus -> nr_parallel
A later commit will bound this variable so it no longer necessarily
matches the number of CPUs.  Rename it appropriately.

Link: https://lkml.kernel.org/r/20250311-mm-selftests-v4-4-dec210a658f5@google.com
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:38 -07:00
Brendan Jackman f4b3e6c7f1 selftests/mm: skip uffd-wp-mremap if userfaultfd not available
It's obvious that this should fail in that case, but still, save the
reader the effort of figuring out that they've run into this by just
SKIPping

Link: https://lkml.kernel.org/r/20250311-mm-selftests-v4-3-dec210a658f5@google.com
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:38 -07:00
Brendan Jackman 0046dbed80 selftests/mm: skip uffd-stress if userfaultfd not available
It's pretty obvious that the test wouldn't work if you don't have the
feature enabled.  But, it's still useful to SKIP instead of failing so the
reader can immediately tell that this is the reason why.

Link: https://lkml.kernel.org/r/20250311-mm-selftests-v4-2-dec210a658f5@google.com
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:37 -07:00
Brendan Jackman 800ddf3cd7 selftests/mm: report errno when things fail in gup_longterm
Patch series "selftests/mm: Some cleanups from trying to run them", v4.

I never had much luck running mm selftests so I spent a few hours digging
into why.

Looks like most of the reason is missing SKIP checks, so this series is
just adding a bunch of those that I found.  I did not do anything like all
of them, just the ones I spotted in gup_longterm, gup_test, mmap,
userfaultfd and memfd_secret.

It's a bit unfortunate to have to skip those tests when ftruncate() fails,
but I don't have time to dig deep enough into it to actually make them
pass.  I have observed the issue on 9pfs and heard rumours that NFS has a
similar problem.

I'm now able to run these test groups successfully:

- mmap
- gup_test
- compaction
- migration
- page_frag
- userfaultfd
- mlock

I've never gone past "Waiting for hugetlb memory to get depleted", in the
hugetlb tests.  I don't know if they are stuck or if they would eventually
work if I was patient enough (testing on a 1G machine).  I have not
investigated further.

I had some issues with mlock tests failing due to -ENOSRCH from mlock2(),
I can no longer reproduce that though, things work OK now.

Of the remaining tests there may be others that work fine, but there's no
convenient way to survey the whole output of run_vmtests.sh so I'm just
going test by test here.

In my spare moments I am slowly chipping away at a setup to run these
tests continuously in a reasonably hermetic QEMU environment via
virtme-ng:

5fad4b9c59/README.md

Hopefully that will eventually offer a way to provide a "canned"
environment where the tests are known to work, which can be fairly easily
reproduced by any developer.


This patch (of 12):

Just reporting failure doesn't tell you what went wrong.  This can fail in
different ways so report errno to help the reader get started debugging.

Link: https://lkml.kernel.org/r/20250311-mm-selftests-v4-0-dec210a658f5@google.com
Link: https://lkml.kernel.org/r/20250311-mm-selftests-v4-1-dec210a658f5@google.com
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:37 -07:00
Ujwal Kundur af3b45aac5 selftests/mm: fix spelling
Fix misspelling flagged by codespell.

Link: https://lkml.kernel.org/r/20250215081803.1793-1-ujwal.kundur@gmail.com
Signed-off-by: Ujwal Kundur <ujwal.kundur@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:23 -07:00
Suren Baghdasaryan 3104138517 mm: make vma cache SLAB_TYPESAFE_BY_RCU
To enable SLAB_TYPESAFE_BY_RCU for vma cache we need to ensure that
object reuse before RCU grace period is over will be detected by
lock_vma_under_rcu().

Current checks are sufficient as long as vma is detached before it is
freed.  The only place this is not currently happening is in exit_mmap(). 
Add the missing vma_mark_detached() in exit_mmap().

Another issue which might trick lock_vma_under_rcu() during vma reuse is
vm_area_dup(), which copies the entire content of the vma into a new one,
overriding new vma's vm_refcnt and temporarily making it appear as
attached.  This might trick a racing lock_vma_under_rcu() to operate on a
reused vma if it found the vma before it got reused.  To prevent this
situation, we should ensure that vm_refcnt stays at detached state (0)
when it is copied and advances to attached state only after it is added
into the vma tree.  Introduce vm_area_init_from() which preserves new
vma's vm_refcnt and use it in vm_area_dup().  Since all vmas are in
detached state with no current readers when they are freed,

lock_vma_under_rcu() will not be able to take vm_refcnt after vma got
detached even if vma is reused. vma_mark_attached() in modified to
include a release fence to ensure all stores to the vma happen before
vm_refcnt gets initialized.

Finally, make vm_area_cachep SLAB_TYPESAFE_BY_RCU. This will facilitate
vm_area_struct reuse and will minimize the number of call_rcu() calls.

[surenb@google.com: remove atomic_set_release() usage in tools/]
  Link: https://lkml.kernel.org/r/20250217054351.2973666-1-surenb@google.com
Link: https://lkml.kernel.org/r/20250213224655.1680278-18-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Shivank Garg <shivankg@amd.com>
  Link: https://lkml.kernel.org/r/5e19ec93-8307-47c2-bb13-3ddf7150624e@amd.com
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Klara Modin <klarasmodin@gmail.com>
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Sourav Panda <souravpanda@google.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:21 -07:00
Suren Baghdasaryan 6bef4c2f97 mm: move lesser used vma_area_struct members into the last cacheline
Move several vma_area_struct members which are rarely or never used during
page fault handling into the last cacheline to better pack vm_area_struct.
As a result vm_area_struct will fit into 3 as opposed to 4 cachelines. 
New typical vm_area_struct layout:

struct vm_area_struct {
    union {
        struct {
            long unsigned int vm_start;              /*     0     8 */
            long unsigned int vm_end;                /*     8     8 */
        };                                           /*     0    16 */
        freeptr_t          vm_freeptr;               /*     0     8 */
    };                                               /*     0    16 */
    struct mm_struct *         vm_mm;                /*    16     8 */
    pgprot_t                   vm_page_prot;         /*    24     8 */
    union {
        const vm_flags_t   vm_flags;                 /*    32     8 */
        vm_flags_t         __vm_flags;               /*    32     8 */
    };                                               /*    32     8 */
    unsigned int               vm_lock_seq;          /*    40     4 */

    /* XXX 4 bytes hole, try to pack */

    struct list_head           anon_vma_chain;       /*    48    16 */
    /* --- cacheline 1 boundary (64 bytes) --- */
    struct anon_vma *          anon_vma;             /*    64     8 */
    const struct vm_operations_struct  * vm_ops;     /*    72     8 */
    long unsigned int          vm_pgoff;             /*    80     8 */
    struct file *              vm_file;              /*    88     8 */
    void *                     vm_private_data;      /*    96     8 */
    atomic_long_t              swap_readahead_info;  /*   104     8 */
    struct mempolicy *         vm_policy;            /*   112     8 */
    struct vma_numab_state *   numab_state;          /*   120     8 */
    /* --- cacheline 2 boundary (128 bytes) --- */
    refcount_t          vm_refcnt (__aligned__(64)); /*   128     4 */

    /* XXX 4 bytes hole, try to pack */

    struct {
        struct rb_node     rb (__aligned__(8));      /*   136    24 */
        long unsigned int  rb_subtree_last;          /*   160     8 */
    } __attribute__((__aligned__(8))) shared;        /*   136    32 */
    struct anon_vma_name *     anon_name;            /*   168     8 */
    struct vm_userfaultfd_ctx  vm_userfaultfd_ctx;   /*   176     8 */

    /* size: 192, cachelines: 3, members: 18 */
    /* sum members: 176, holes: 2, sum holes: 8 */
    /* padding: 8 */
    /* forced alignments: 2, forced holes: 1, sum forced holes: 4 */
} __attribute__((__aligned__(64)));

Memory consumption per 1000 VMAs becomes 48 pages:

    slabinfo after vm_area_struct changes:
     <name>           ... <objsize> <objperslab> <pagesperslab> : ...
     vm_area_struct   ...    192   42    2 : ...

Link: https://lkml.kernel.org/r/20250213224655.1680278-14-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Tested-by: Shivank Garg <shivankg@amd.com>
  Link: https://lkml.kernel.org/r/5e19ec93-8307-47c2-bb13-3ddf7150624e@amd.com
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Klara Modin <klarasmodin@gmail.com>
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Sourav Panda <souravpanda@google.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:20 -07:00
Suren Baghdasaryan f35ab95ca0 mm: replace vm_lock and detached flag with a reference count
rw_semaphore is a sizable structure of 40 bytes and consumes considerable
space for each vm_area_struct.  However vma_lock has two important
specifics which can be used to replace rw_semaphore with a simpler
structure:

1. Readers never wait.  They try to take the vma_lock and fall back to
   mmap_lock if that fails.

2. Only one writer at a time will ever try to write-lock a vma_lock
   because writers first take mmap_lock in write mode.  Because of these
   requirements, full rw_semaphore functionality is not needed and we can
   replace rw_semaphore and the vma->detached flag with a refcount
   (vm_refcnt).

When vma is in detached state, vm_refcnt is 0 and only a call to
vma_mark_attached() can take it out of this state.  Note that unlike
before, now we enforce both vma_mark_attached() and vma_mark_detached() to
be done only after vma has been write-locked.  vma_mark_attached() changes
vm_refcnt to 1 to indicate that it has been attached to the vma tree. 
When a reader takes read lock, it increments vm_refcnt, unless the top
usable bit of vm_refcnt (0x40000000) is set, indicating presence of a
writer.  When writer takes write lock, it sets the top usable bit to
indicate its presence.  If there are readers, writer will wait using newly
introduced mm->vma_writer_wait.  Since all writers take mmap_lock in write
mode first, there can be only one writer at a time.  The last reader to
release the lock will signal the writer to wake up.  refcount might
overflow if there are many competing readers, in which case read-locking
will fail.  Readers are expected to handle such failures.

In summary:
1. all readers increment the vm_refcnt;
2. writer sets top usable (writer) bit of vm_refcnt;
3. readers cannot increment the vm_refcnt if the writer bit is set;
4. in the presence of readers, writer must wait for the vm_refcnt to drop
to 1 (plus the VMA_LOCK_OFFSET writer bit), indicating an attached vma
with no readers;
5. vm_refcnt overflow is handled by the readers.

While this vm_lock replacement does not yet result in a smaller
vm_area_struct (it stays at 256 bytes due to cacheline alignment), it
allows for further size optimization by structure member regrouping to
bring the size of vm_area_struct below 192 bytes.

[surenb@google.com: fix a crash due to vma_end_read() that should have been removed]
  Link: https://lkml.kernel.org/r/20250220200208.323769-1-surenb@google.com
Link: https://lkml.kernel.org/r/20250213224655.1680278-13-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Tested-by: Shivank Garg <shivankg@amd.com>
  Link: https://lkml.kernel.org/r/5e19ec93-8307-47c2-bb13-3ddf7150624e@amd.com
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Klara Modin <klarasmodin@gmail.com>
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Sourav Panda <souravpanda@google.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:20 -07:00
Suren Baghdasaryan 55e50223bf mm: introduce vma_iter_store_attached() to use with attached vmas
vma_iter_store() functions can be used both when adding a new vma and when
updating an existing one.  However for existing ones we do not need to
mark them attached as they are already marked that way.  With
vma->detached being a separate flag, double-marking a vmas as attached or
detached is not an issue because the flag will simply be overwritten with
the same value.  However once we fold this flag into the refcount later in
this series, re-attaching or re-detaching a vma becomes an issue since
these operations will be incrementing/decrementing a refcount.

Introduce vma_iter_store_new() and vma_iter_store_overwrite() to replace
vma_iter_store() and avoid re-attaching a vma during vma update.  Add
assertions in vma_mark_attached()/vma_mark_detached() to catch invalid
usage.  Update vma tests to check for vma detached state correctness.

Link: https://lkml.kernel.org/r/20250213224655.1680278-5-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Tested-by: Shivank Garg <shivankg@amd.com>
  Link: https://lkml.kernel.org/r/5e19ec93-8307-47c2-bb13-3ddf7150624e@amd.com
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Klara Modin <klarasmodin@gmail.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Sourav Panda <souravpanda@google.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:18 -07:00
Suren Baghdasaryan 8ef95d8f15 mm: mark vma as detached until it's added into vma tree
Current implementation does not set detached flag when a VMA is first
allocated.  This does not represent the real state of the VMA, which is
detached until it is added into mm's VMA tree.  Fix this by marking new
VMAs as detached and resetting detached flag only after VMA is added into
a tree.

Introduce vma_mark_attached() to make the API more readable and to
simplify possible future cleanup when vma->vm_mm might be used to indicate
detached vma and vma_mark_attached() will need an additional mm parameter.

Link: https://lkml.kernel.org/r/20250213224655.1680278-4-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Tested-by: Shivank Garg <shivankg@amd.com>
  Link: https://lkml.kernel.org/r/5e19ec93-8307-47c2-bb13-3ddf7150624e@amd.com
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Klara Modin <klarasmodin@gmail.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Sourav Panda <souravpanda@google.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:17 -07:00
Suren Baghdasaryan 7b6218ae12 mm: move per-vma lock into vm_area_struct
Back when per-vma locks were introduces, vm_lock was moved out of
vm_area_struct in [1] because of the performance regression caused by
false cacheline sharing.  Recent investigation [2] revealed that the
regressions is limited to a rather old Broadwell microarchitecture and
even there it can be mitigated by disabling adjacent cacheline
prefetching, see [3].

Splitting single logical structure into multiple ones leads to more
complicated management, extra pointer dereferences and overall less
maintainable code.  When that split-away part is a lock, it complicates
things even further.  With no performance benefits, there are no reasons
for this split.  Merging the vm_lock back into vm_area_struct also allows
vm_area_struct to use SLAB_TYPESAFE_BY_RCU later in this patchset.  Move
vm_lock back into vm_area_struct, aligning it at the cacheline boundary
and changing the cache to be cacheline-aligned as well.  With kernel
compiled using defconfig, this causes VMA memory consumption to grow from
160 (vm_area_struct) + 40 (vm_lock) bytes to 256 bytes:

    slabinfo before:
     <name>           ... <objsize> <objperslab> <pagesperslab> : ...
     vma_lock         ...     40  102    1 : ...
     vm_area_struct   ...    160   51    2 : ...

    slabinfo after moving vm_lock:
     <name>           ... <objsize> <objperslab> <pagesperslab> : ...
     vm_area_struct   ...    256   32    2 : ...

Aggregate VMA memory consumption per 1000 VMAs grows from 50 to 64 pages,
which is 5.5MB per 100000 VMAs.  Note that the size of this structure is
dependent on the kernel configuration and typically the original size is
higher than 160 bytes.  Therefore these calculations are close to the
worst case scenario.  A more realistic vm_area_struct usage before this
change is:

     <name>           ... <objsize> <objperslab> <pagesperslab> : ...
     vma_lock         ...     40  102    1 : ...
     vm_area_struct   ...    176   46    2 : ...

Aggregate VMA memory consumption per 1000 VMAs grows from 54 to 64 pages,
which is 3.9MB per 100000 VMAs.  This memory consumption growth can be
addressed later by optimizing the vm_lock.

[1] https://lore.kernel.org/all/20230227173632.3292573-34-surenb@google.com/
[2] https://lore.kernel.org/all/ZsQyI%2F087V34JoIt@xsang-OptiPlex-9020/
[3] https://lore.kernel.org/all/CAJuCfpEisU8Lfe96AYJDZ+OM4NoPmnw9bP53cT_kbfP_pR+-2g@mail.gmail.com/

Link: https://lkml.kernel.org/r/20250213224655.1680278-3-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Tested-by: Shivank Garg <shivankg@amd.com>
  Link: https://lkml.kernel.org/r/5e19ec93-8307-47c2-bb13-3ddf7150624e@amd.com
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Klara Modin <klarasmodin@gmail.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Sourav Panda <souravpanda@google.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:17 -07:00
Lorenzo Stoakes 0b6d4853d1 tools/selftests: add file/shmem-backed mapping guard region tests
Extend the guard region self tests to explicitly assert that guard regions
work correctly for functionality specific to file-backed and shmem
mappings.

In addition to testing all of the existing guard region functionality that
is currently tested against anonymous mappings against file-backed and
shmem mappings (except those which are exclusive to anonymous mapping), we
now also:

* Test that MADV_SEQUENTIAL does not cause unexpected readahead behaviour.
* Test that MAP_PRIVATE behaves as expected with guard regions installed in
  both a shared and private mapping of an fd.
* Test that a read-only file can correctly establish guard regions.
* Test a probable fault-around case does not interfere with guard regions
  (or vice-versa).
* Test that truncation does not eliminate guard regions.
* Test that hole punching functions as expected in the presence of guard
  regions.
* Test that a read-only mapping of a memfd write sealed mapping can have
  guard regions established within it and function correctly without
  violation of the seal.
* Test that guard regions installed into a mapping of the anonymous zero
  page function correctly.

Link: https://lkml.kernel.org/r/90c16bec5fcaafcd1700dfa3e9988c3e1aa9ac1d.1739469950.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:15 -07:00
Lorenzo Stoakes 272f37d3e9 tools/selftests: expand all guard region tests to file-backed
Extend the guard region tests to allow for test fixture variants for anon,
shmem, and local file files.

This allows us to assert that each of the expected behaviours of anonymous
memory also applies correctly to file-backed (both shmem and an a file
created locally in the current working directory) and thus asserts the
same correctness guarantees as all the remaining tests do.

The fixture teardown is now performed in the parent process rather than
child forked ones, meaning cleanup is always performed, including
unlinking any generated temporary files.

Additionally the variant fixture data type now contains an enum value
indicating the type of backing store and the mmap() invocation is
abstracted to allow for the mapping of whichever backing store the variant
is testing.

We adjust tests as necessary to account for the fact they may now
reference files rather than anonymous memory.

Link: https://lkml.kernel.org/r/ab42228d2bd9b8aa18e9faebcd5c88732a7e5820.1739469950.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:15 -07:00
Lorenzo Stoakes ce1c0824fc selftests/mm: rename guard-pages to guard-regions
The feature formerly referred to as guard pages is more correctly referred
to as 'guard regions', as in fact no pages are ever allocated in the
process of installing the regions.

To avoid confusion, rename the tests accordingly.

[lorenzo.stoakes@oracle.com: fix guard regions invocation]
  Link: https://lkml.kernel.org/r/13426c71-d069-4407-9340-b227ff8b8736@lucifer.local
Link: https://lkml.kernel.org/r/1c3cd04a3f69b5756b94bda701ac88325a9be18b.1739469950.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:15 -07:00
Mark Brown 85968b6a20 selftests/mm: allow tests to run with no huge pages support
Currently the mm selftests refuse to run if huge pages are not available
in the current system but this is an optional feature and not all the
tests actually require them.  Change the test during startup to be
non-fatal and skip or omit tests which actually rely on having huge pages,
allowing the other tests to be run.

The gup_test does support using madvise() to configure huge pages but it
ignores the error code so we just let it run.

Link: https://lkml.kernel.org/r/20250212-kselftest-mm-no-hugepages-v1-2-44702f538522@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Nico Pache <npache@redhat.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:14 -07:00
Eric Salem 3fae696393 selftests: mm: fix typo
Fix misspelling.

Link: https://lkml.kernel.org/r/77e0e915-36c3-4c95-84b8-0b73aaa17951@gmail.com
Signed-off-by: Eric Salem <ericsalem@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:11 -07:00
Mark Brown 023fff71d8 selftests/mm: fix thuge-gen test name uniqueness
The thuge-gen test_mmap() and test_shmget() tests are repeatedly run for a
variety of sizes but always report the result of their test with the same
name, meaning that automated sysetms running the tests are unable to
distinguish between the various tests.  Add the supplied sizes to the
logged test names to distinguish between runs.

My test automation was getting pretty confused about what was going on
- the test names are a pretty important external interface.

Link: https://lkml.kernel.org/r/20250204-kselftest-mm-fix-dups-v1-1-6afe417ef4bb@kernel.org
Fixes: b38bd9b2c4 ("selftests/mm: thuge-gen: conform to TAP format output")
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:03 -07:00
Lorenzo Stoakes c372473a54 mm: completely abstract unnecessary adj_start calculation
The adj_start calculation has been a constant source of confusion in the
VMA merge code.

There are two cases to consider, one where we adjust the start of the
vmg->middle VMA (i.e.  the vmg->__adjust_middle_start merge flag is set),
in which case adj_start is calculated as:

(1) adj_start = vmg->end - vmg->middle->vm_start

And the case where we adjust the start of the vmg->next VMA (i.e.  the
vmg->__adjust_next_start merge flag is set), in which case adj_start is
calculated as:

(2) adj_start = -(vmg->middle->vm_end - vmg->end)

We apply (1) thusly:

vmg->middle->vm_start =
	vmg->middle->vm_start + vmg->end - vmg->middle->vm_start

Which simplifies to:

vmg->middle->vm_start = vmg->end

Similarly, we apply (2) as:

vmg->next->vm_start =
	vmg->next->vm_start + -(vmg->middle->vm_end - vmg->end)

Noting that for these VMAs to be mergeable vmg->middle->vm_end ==
vmg->next->vm_start and so this simplifies to:

vmg->next->vm_start =
	vmg->next->vm_start + -(vmg->next->vm_start - vmg->end)

Which simplifies to:

vmg->next->vm_start = vmg->end

Therefore in each case, we simply need to adjust the start of the VMA to
vmg->end (!) and can do away with this adj_start calculation.  The only
caveat is that we must ensure we update the vm_pgoff field correctly.

We therefore abstract this entire calculation to a new function
vmg_adjust_set_range() which performs this calculation and sets the
adjusted VMA's new range using the general vma_set_range() function.

We also must update vma_adjust_trans_huge() which expects the
now-abstracted adj_start parameter.  It turns out this is wholly
unnecessary.

In vma_adjust_trans_huge() the relevant code is:

	if (adjust_next > 0) {
		struct vm_area_struct *next = find_vma(vma->vm_mm, vma->vm_end);
		unsigned long nstart = next->vm_start;
		nstart += adjust_next;
		split_huge_pmd_if_needed(next, nstart);
	}

The only case where this is relevant is when vmg->__adjust_middle_start is
specified (in which case adj_next would have been positive), i.e.  the one
in which the vma specified is vmg->prev and this the sought 'next' VMA
would be vmg->middle.

We can therefore eliminate the find_vma() invocation altogether and simply
provide the vmg->middle VMA in this instance, or NULL otherwise.

Again we have an adj_next offset calculation:

next->vm_start + vmg->end - vmg->middle->vm_start

Where next == vmg->middle this simplifies to vmg->end as previously
demonstrated.

Therefore nstart is equal to vmg->end, which is already passed to
vma_adjust_trans_huge() via the 'end' parameter and so this code (rather
delightfully) simplifies to:

	if (next)
		split_huge_pmd_if_needed(next, end);

With these changes in place, it becomes silly for commit_merge() to return
vmg->target, as it is always the same and threaded through vmg, so we
finally change commit_merge() to return an error value once again.

This patch has no change in functional behaviour.

Link: https://lkml.kernel.org/r/7bce2cd4b5afb56211822835d145471280c3dccc.1738326519.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:02 -07:00
Lorenzo Stoakes fe3e9cf0d7 mm: eliminate adj_start parameter from commit_merge()
Introduce internal vmg->__adjust_middle_start and vmg->__adjust_next_start
merge flags, enabling us to indicate to commit_merge() that we are
performing a merge which either spans only part of vmg->middle, or part of
vmg->next respectively.

In the former instance, we change the start of vmg->middle to match the
attributes of vmg->prev, without spanning all of vmg->middle.

This implies that vmg->prev->vm_end and vmg->middle->vm_start are both
increased to form the new merged VMA (vmg->prev) and the new subsequent
VMA (vmg->middle).

In the latter case, we change the end of vmg->middle to match the
attributes of vmg->next, without spanning all of vmg->next.

This implies that vmg->middle->vm_end and vmg->next->vm_start are both
decreased to form the new merged VMA (vmg->next) and the new prior VMA
(vmg->middle).

Since we now have a stable set of prev, middle, next VMAs threaded through
vmg and with these flags set know what is happening, we can perform the
calculation in commit_merge() instead.

This allows us to drop the confusing adj_start parameter and instead pass
semantic information to commit_merge().

In the latter case the -(middle->vm_end - start) calculation becomes
-(middle->vm-end - vmg->end), however this is correct as vmg->end is set
to the start parameter.

This is because in this case (rather confusingly), we manipulate
vmg->middle, but ultimately return vmg->next, whose range will be
correctly specified.  At this point vmg->start, end is the new range for
the prior VMA rather than the merged one.

This patch has no change in functional behaviour.

Link: https://lkml.kernel.org/r/bcec0cd980b373a5eb02236cb033034ce1effe42.1738326519.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:02 -07:00
Lorenzo Stoakes 6ab2d9c7c6 mm: further refactor commit_merge()
The current VMA merge mechanism contains a number of confusing mechanisms
around removal of VMAs on merge and the shrinking of the VMA adjacent to
vma->target in the case of merges which result in a partial merge with
that adjacent VMA.

Since we now have a STABLE set of VMAs - prev, middle, next - we are now
able to have the caller of commit_merge() explicitly tell us which VMAs
need deleting, using newly introduced internal VMA merge flags.

Doing so allows us to embed this state within the VMG and remove the
confusing remove, remove2 parameters from commit_merge().

We additionally are able to eliminate the highly confusing and misleading
'expanded' parameter - a parameter that in reality refers to whether or
not the return VMA is the target one or the one immediately adjacent.

We can infer which is the case from whether or not the adj_start parameter
is negative.  This also allows us to simplify further logic around
iterator configuration and VMA iterator stores.

Doing so means we can also eliminate the adjust parameter, as we are able
to infer which VMA ought to be adjusted from adj_start - a positive value
implies we adjust the start of 'middle', a negative one implies we adjust
the start of 'next'.

We are then able to have commit_merge() explicitly return the target VMA,
or NULL on inability to pre-allocate memory.  Errors were previously
filtered so behaviour does not change.

We additionally move from the slightly odd use of a bitwise-flag enum
vmg->merge_flags field to vmg bitfields.

This patch has no change in functional behaviour.

Link: https://lkml.kernel.org/r/7bf2ed24af68aac18672b7acebbd9102f48c5b03.1738326519.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:02 -07:00
Lorenzo Stoakes 3a75ccba04 mm: simplify vma merge structure and expand comments
Patch series "mm: further simplify VMA merge operation", v3.

While significant efforts have been made to improve the VMA merge
operation, there remains remnants of the bad (or rather confusing) old
days, which make the code difficult to understand, more bug prone and thus
harder to modify.

This series attempts to significantly improve matters in a number of
respects - with a focus on simplifying the commit_merge() function which
actually actions the merge operation - and importantly, adjusting the two
most confusing merge cases - those in which we 'adjust' the VMA
immediately adjacent to the one being merged.

One source of confusion are the VMAs being threaded through the operation
themselves - vmg->prev, vmg->vma and vmg->next.

At the start of the operation, vmg->vma is either NULL if a new VMA is
propose to be added, or if not then a pointer to an existing VMA being
modified, and prev/next are (perhaps not present) VMAs sat immediately
before and after the range specified in vmg->start, end, respectively.

However, during the VMA merge operation, we change vmg->start, end and
pgoff to span the newly merged range and vmg->vma to either be:

a.  The ultimately returned VMA (in most cases) or b.  A VMA which we will
manipulate, but ultimately instead return vmg->next.

Case b.  especially here is confusing for somebody reading this code, but
the fact we update this state, along with vmg->start, end, pgoff only
makes matters worse.

We simplify things by replacing vmg->vma with vmg->middle and never
changing it - this is always either NULL (for a new VMA) or the VMA being
modified between vmg->prev and vmg->next.

We further simplify by placing the merged VMA in a new vmg->target field -
whether case b.  above is the case or not.  The reader of the code can now
simply rely on vmg->middle being the middle VMA and vmg->target being the
ultimately merged VMA.

We additionally tackle the confusing cases where we 'adjust' VMAs other
than the one we ultimately return as the merged VMA (this includes case b.
above).  These are:

(1)
	    merge
	<----------->
	|------||--------|    |------------|---|
	| prev || middle | -> |   target   | m |
	|------||--------|    |------------|---|

In which case middle must be adjusted so middle->vm_start is increased as
well as performing the merge.

(2) (equivalent to case b. above)

            <------------->
	|---------||------|    |---|-------------|
	|  middle || next | -> | m |   target    |
	|---------||------|    |---|-------------|

In which case next must be adjusted so next->vm_start is decreased as well
as performing the merge.

This cases have previously been performed by calculating and passing
around a dubious and confusing 'adj_start' parameter along side a pointer
to an 'adjust' VMA indicating which VMA requires additional adjustment
(middle in case 1 and next in case 2).

With the VMG structure in place we are able to avoid this by simply
setting a merge flag to describe each case:

(1) Sets the vmg->__adjust_middle_start flag
(2) Sets the vmg->__adjust_next_start flag

By doing so it turns out we can vastly simplify the logic and calculate
what is required to perform the operation.

Taken together the refactorings make it far easier to understand what is
being done even in these more confusing cases, make the code far more
maintainable, debuggable, and testable, providing more internal state
indicating what is happening in the merge operation.

The changes have no functional net impact on the merge operation and
everything should still behave as it did before.


This patch (of 5):

The merge code, while much improved, still has a number of points of
confusion.  As part of a broader series cleaning this up to make this more
maintainable, we start by addressing some confusion around
vma_merge_struct fields.

So far, the caller either provides no vmg->vma (a new VMA) or supplies the
existing VMA which is being altered, setting vmg->start,end,pgoff to the
proposed VMA dimensions.

vmg->vma is then updated, as are vmg->start,end,pgoff as the merge process
proceeds and the appropriate merge strategy is determined.

This is rather confusing, as vmg->vma starts off as the 'middle' VMA
between vmg->prev,next, but becomes the 'target' VMA, except in one
specific edge case (merge next, shrink middle).

Int his patch we introduce vmg->middle to describe the VMA that is between
vmg->prev and vmg->next, and does NOT change during the merge operation.

We replace vmg->vma with vmg->target, and use this only during the merge
operation itself.

Aside from the merge right, shrink middle case, this becomes the VMA that
forms the basis of the VMA that is returned.  This edge case can be
addressed in a future commit.

We also add a number of comments to explain what is going on.

Finally, we adjust the ASCII diagrams showing each merge case in
vma_merge_existing_range() to be clearer - the arrow range previously
showed the vmg->start, end spanned area, but it is clearer to change this
to show the final merged VMA.

This patch has no change in functional behaviour.

Link: https://lkml.kernel.org/r/cover.1738326519.git.lorenzo.stoakes@oracle.com
Link: https://lkml.kernel.org/r/4dfe60f1419d55e5d0516f56349695d73a57184c.1738326519.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:01 -07:00
Zi Yan ad4c9bb541 selftests/mm: test splitting file-backed THP to any lower order
Now split_huge_page*() supports shmem THP split to any lower order.  Test
it.

The test now reads file content out after split to check if the split
corrupts the file data.

Link: https://lkml.kernel.org/r/20250122161928.1240637-3-ziy@nvidia.com
Signed-off-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Kirill A. Shuemov <kirill.shutemov@linux.intel.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Yang Shi <yang@os.amperecomputing.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:05:56 -07:00
Zi Yan 035a112e5f selftests/mm: make file-backed THP split work by writing PMD size data
Commit acd7ccb284 ("mm: shmem: add large folio support for tmpfs")
changes huge=always to allocate THP/mTHP based on write size and
split_huge_page_test does not write PMD size data, so file-back THP is not
created during the test.  Fix it by writing PMD size data.

Link: https://lkml.kernel.org/r/20250122161928.1240637-1-ziy@nvidia.com
Signed-off-by: Zi Yan <ziy@nvidia.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Kirill A. Shuemov <kirill.shutemov@linux.intel.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Yang Shi <yang@os.amperecomputing.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:05:56 -07:00
Rafael Aquini 67a2f86846 selftests/mm: run_vmtests.sh: fix half_ufd_size_MB calculation
We noticed that uffd-stress test was always failing to run when invoked
for the hugetlb profiles on x86_64 systems with a processor count of 64 or
bigger:

  ...
  # ------------------------------------
  # running ./uffd-stress hugetlb 128 32
  # ------------------------------------
  # ERROR: invalid MiB (errno=9, @uffd-stress.c:459)
  ...
  # [FAIL]
  not ok 3 uffd-stress hugetlb 128 32 # exit=1
  ...

The problem boils down to how run_vmtests.sh (mis)calculates the size of
the region it feeds to uffd-stress.  The latter expects to see an amount
of MiB while the former is just giving out the number of free hugepages
halved down.  This measurement discrepancy ends up violating uffd-stress'
assertion on number of hugetlb pages allocated per CPU, causing it to bail
out with the error above.

This commit fixes that issue by adjusting run_vmtests.sh's
half_ufd_size_MB calculation so it properly renders the region size in
MiB, as expected, while maintaining all of its original constraints in
place.

Link: https://lkml.kernel.org/r/20250218192251.53243-1-aquini@redhat.com
Fixes: 2e47a445d7 ("selftests/mm: run_vmtests.sh: fix hugetlb mem size calculation")
Signed-off-by: Rafael Aquini <raquini@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 17:40:25 -07:00
Rae Moar 2e0cf2b32f kunit: tool: add test to check parsing late test plan
Add test to check for the infinite loop caused by the inability
to parse a late test plan.

The test parses the following output:
 TAP version 13
 ok 4 test4
 1..4

Link: https://lore.kernel.org/r/20250313192714.1380005-1-rmoar@google.com
Signed-off-by: Rae Moar <rmoar@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <shuah@kernel.org>
2025-03-15 18:13:43 -06:00
Rae Moar 1d4c06d519 kunit: tool: Fix bug in parsing test plan
A bug was identified where the KTAP below caused an infinite loop:

 TAP version 13
 ok 4 test_case
 1..4

The infinite loop was caused by the parser not parsing a test plan
if following a test result line.

Fix this bug by parsing test plan line to avoid the infinite loop.

Link: https://lore.kernel.org/r/20250313192714.1380005-1-rmoar@google.com
Signed-off-by: Rae Moar <rmoar@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <shuah@kernel.org>
2025-03-15 18:13:38 -06:00
Saket Kumar Bhaskar 8c10109a97 selftests/bpf: Fix sockopt selftest failure on powerpc
The SO_RCVLOWAT option is defined as 18 in the selftest header,
which matches the generic definition. However, on powerpc,
SO_RCVLOWAT is defined as 16. This discrepancy causes
sol_socket_sockopt() to fail with the default switch case on powerpc.

This commit fixes by defining SO_RCVLOWAT as 16 for powerpc.

Signed-off-by: Saket Kumar Bhaskar <skb99@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Link: https://lore.kernel.org/bpf/20250311084647.3686544-1-skb99@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15 11:49:24 -07:00
Viktor Malik de07b18289 selftests/bpf: Fix string read in strncmp benchmark
The strncmp benchmark uses the bpf_strncmp helper and a hand-written
loop to compare two strings. The values of the strings are filled from
userspace. One of the strings is non-const (in .bss) while the other is
const (in .rodata) since that is the requirement of bpf_strncmp.

The problem is that in the hand-written loop, Clang optimizes the reads
from the const string to always return 0 which breaks the benchmark.

Use barrier_var to prevent the optimization.

The effect can be seen on the strncmp-no-helper variant.

Before this change:

    # ./bench strncmp-no-helper
    Setting up benchmark 'strncmp-no-helper'...
    Benchmark 'strncmp-no-helper' started.
    Iter   0 (112.309us): hits    0.000M/s (  0.000M/prod), drops    0.000M/s, total operations    0.000M/s
    Iter   1 (-23.238us): hits    0.000M/s (  0.000M/prod), drops    0.000M/s, total operations    0.000M/s
    Iter   2 ( 58.994us): hits    0.000M/s (  0.000M/prod), drops    0.000M/s, total operations    0.000M/s
    Iter   3 (-30.466us): hits    0.000M/s (  0.000M/prod), drops    0.000M/s, total operations    0.000M/s
    Iter   4 ( 29.996us): hits    0.000M/s (  0.000M/prod), drops    0.000M/s, total operations    0.000M/s
    Iter   5 ( 16.949us): hits    0.000M/s (  0.000M/prod), drops    0.000M/s, total operations    0.000M/s
    Iter   6 (-60.035us): hits    0.000M/s (  0.000M/prod), drops    0.000M/s, total operations    0.000M/s
    Summary: hits    0.000 ± 0.000M/s (  0.000M/prod), drops    0.000 ± 0.000M/s, total operations    0.000 ± 0.000M/s

After this change:

    # ./bench strncmp-no-helper
    Setting up benchmark 'strncmp-no-helper'...
    Benchmark 'strncmp-no-helper' started.
    Iter   0 ( 77.711us): hits    5.534M/s (  5.534M/prod), drops    0.000M/s, total operations    5.534M/s
    Iter   1 ( 11.215us): hits    6.006M/s (  6.006M/prod), drops    0.000M/s, total operations    6.006M/s
    Iter   2 (-14.253us): hits    5.931M/s (  5.931M/prod), drops    0.000M/s, total operations    5.931M/s
    Iter   3 ( 59.087us): hits    6.005M/s (  6.005M/prod), drops    0.000M/s, total operations    6.005M/s
    Iter   4 (-21.379us): hits    6.010M/s (  6.010M/prod), drops    0.000M/s, total operations    6.010M/s
    Iter   5 (-20.310us): hits    5.861M/s (  5.861M/prod), drops    0.000M/s, total operations    5.861M/s
    Iter   6 ( 53.937us): hits    6.004M/s (  6.004M/prod), drops    0.000M/s, total operations    6.004M/s
    Summary: hits    5.969 ± 0.061M/s (  5.969M/prod), drops    0.000 ± 0.000M/s, total operations    5.969 ± 0.061M/s

Fixes: 9c42652f8b ("selftests/bpf: Add benchmark for bpf_strncmp() helper")
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Viktor Malik <vmalik@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/bpf/20250313122852.1365202-1-vmalik@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15 11:49:24 -07:00
Kumar Kartikeya Dwivedi 1f375aef6c selftests/bpf: Fix arena_spin_lock compilation on PowerPC
Venkat reported a compilation error for BPF selftests on PowerPC [0].
The crux of the error is the following message:
  In file included from progs/arena_spin_lock.c:7:
  /root/bpf-next/tools/testing/selftests/bpf/bpf_arena_spin_lock.h:122:8:
  error: member reference base type '__attribute__((address_space(1)))
  u32' (aka '__attribute__((address_space(1))) unsigned int') is not a
  structure or union
     122 |         old = atomic_read(&lock->val);

This is because PowerPC overrides the qspinlock type changing the
lock->val member's type from atomic_t to u32.

To remedy this, import the asm-generic version in the arena spin lock
header, name it __qspinlock (since it's aliased to arena_spinlock_t, the
actual name hardly matters), and adjust the selftest to not depend on
the type in vmlinux.h.

  [0]: https://lore.kernel.org/bpf/7bc80a3b-d708-4735-aa3b-6a8c21720f9d@linux.ibm.com

Fixes: 88d706ba7c ("selftests/bpf: Introduce arena spin lock")
Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Link: https://lore.kernel.org/bpf/20250311154244.3775505-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15 11:48:59 -07:00
Blaise Boscaccy 7987f1627e selftests/bpf: Add a kernel flag test for LSM bpf hook
This test exercises the kernel flag added to security_bpf by
effectively blocking light-skeletons from loading while allowing
normal skeletons to function as-is. Since this should work with any
arbitrary BPF program, an existing program from LSKELS_EXTRA was
used as a test payload.

Signed-off-by: Blaise Boscaccy <bboscaccy@linux.microsoft.com>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20250310221737.821889-3-bboscaccy@linux.microsoft.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15 11:48:58 -07:00
Blaise Boscaccy 082f1db02c security: Propagate caller information in bpf hooks
Certain bpf syscall subcommands are available for usage from both
userspace and the kernel. LSM modules or eBPF gatekeeper programs may
need to take a different course of action depending on whether or not
a BPF syscall originated from the kernel or userspace.

Additionally, some of the bpf_attr struct fields contain pointers to
arbitrary memory. Currently the functionality to determine whether or
not a pointer refers to kernel memory or userspace memory is exposed
to the bpf verifier, but that information is missing from various LSM
hooks.

Here we augment the LSM hooks to provide this data, by simply passing
a boolean flag indicating whether or not the call originated in the
kernel, in any hook that contains a bpf_attr struct that corresponds
to a subcommand that may be called from the kernel.

Signed-off-by: Blaise Boscaccy <bboscaccy@linux.microsoft.com>
Acked-by: Song Liu <song@kernel.org>
Acked-by: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20250310221737.821889-2-bboscaccy@linux.microsoft.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15 11:48:58 -07:00
Chen Ni a03d375330 selftests/bpf: Convert comma to semicolon
Replace comma between expressions with semicolons.

Using a ',' in place of a ';' can have unintended side effects.
Although that is not the case here, it is seems best to use ';'
unless ',' is intended.

Found by inspection.
No functional change intended.
Compile tested only.

Signed-off-by: Chen Ni <nichen@iscas.ac.cn>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Anton Protopopov <aspsk@isovalent.com>
Link: https://lore.kernel.org/bpf/20250310032045.651068-1-nichen@iscas.ac.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15 11:48:58 -07:00
Anton Protopopov caa4237a79 selftests/bpf: Fix selection of static vs. dynamic LLVM
The Makefile uses the exit code of the `llvm-config --link-static --libs`
command to choose between statically-linked and dynamically-linked LLVMs.
The stdout and stderr of that command are redirected to /dev/null.
To redirect the output the "&>" construction is used, which might not be
supported by /bin/sh, which is executed by make for $(shell ...) commands.
On such systems the test will fail even if static LLVM is actually
supported. Replace "&>" by ">/dev/null 2>&1" to fix this.

Fixes: 2a9d30fac8 ("selftests/bpf: Support dynamically linking LLVM if static is not available")
Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/bpf/20250310145112.1261241-1-aspsk@isovalent.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15 11:48:58 -07:00
Emil Tsalapatis c06707ff07 selftests: bpf: fix duplicate selftests in cpumask_success.
The BPF cpumask selftests are currently run twice in
test_progs/cpumask.c, once by traversing cpumask_success_testcases, and
once by invoking RUN_TESTS(cpumask_success). Remove the invocation of
RUN_TESTS to properly run the selftests only once.

Now that the tests are run only through cpumask_success_testscases, add
to it the missing test_refcount_null_tracking testcase. Also remove the
__success annotation from it, since it is now loaded and invoked by the
runner.

Signed-off-by: Emil Tsalapatis (Meta) <emil@etsalapatis.com>
Acked-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20250309230427.26603-5-emil@etsalapatis.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15 11:48:58 -07:00
Emil Tsalapatis 918ba2636d selftests: bpf: add bpf_cpumask_populate selftests
Add selftests for the bpf_cpumask_populate helper that sets a
bpf_cpumask to a bit pattern provided by a BPF program.

Signed-off-by: Emil Tsalapatis (Meta) <emil@etsalapatis.com>
Acked-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20250309230427.26603-3-emil@etsalapatis.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15 11:48:57 -07:00
Bastien Curutchet (eBPF Foundation) 1041b8bc9f selftests/bpf: lwt_seg6local: Move test to test_progs
test_lwt_seg6local.sh isn't used by the BPF CI.

Add a new file in the test_progs framework to migrate the tests done by
test_lwt_seg6local.sh. It uses the same network topology and the same BPF
programs located in progs/test_lwt_seg6local.c.
Use the network helpers instead of `nc` to exchange the final packet.

Remove test_lwt_seg6local.sh and its Makefile entry.

Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Link: https://lore.kernel.org/r/20250307-seg6local-v1-2-990fff8f180d@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15 11:48:57 -07:00
Bastien Curutchet (eBPF Foundation) b1d85ff517 selftests/bpf: lwt_seg6local: Remove unused routes
Some routes in fb00:: are initialized during setup, even though they
aren't needed by the test as the UDP packets will travel through the
lightweight tunnels.

Remove these unnecessary routes.

Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Link: https://lore.kernel.org/r/20250307-seg6local-v1-1-990fff8f180d@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15 11:48:57 -07:00
Feng Yang 339c1f8ea1 selftests/bpf: Fix cap_enable_effective() return code
The caller of cap_enable_effective() expects negative error code.
Fix it.

Before:
  failed to restore CAP_SYS_ADMIN: -1, Unknown error -1

After:
  failed to restore CAP_SYS_ADMIN: -3, No such process
  failed to restore CAP_SYS_ADMIN: -22, Invalid argument

Signed-off-by: Feng Yang <yangfeng@kylinos.cn>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250305022234.44932-1-yangfeng59949@163.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15 11:48:57 -07:00
Amery Hung 8bda5b787d selftests/bpf: Fix dangling stdout seen by traffic monitor thread
Traffic monitor thread may see dangling stdout as the main thread closes
and reassigns stdout without protection. This happens when the main thread
finishes one subtest and moves to another one in the same netns_new()
scope.

The issue can be reproduced by running test_progs repeatedly with traffic
monitor enabled:

for ((i=1;i<=100;i++)); do
   ./test_progs -a flow_dissector_skb* -m '*'
done

For restoring stdout in crash_handler(), since it does not really care
about closing stdout, simlpy flush stdout and restore it to the original
one.

Then, Fix the issue by consolidating stdio_restore_cleanup() and
stdio_restore(), and protecting the use/close/assignment of stdout with
a lock. The locking in the main thread is always performed regradless of
whether traffic monitor is running or not for simplicity. It won't have
any side-effect.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://patch.msgid.link/20250305182057.2802606-3-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15 11:48:57 -07:00
Amery Hung 34a25aabcd selftests/bpf: Allow assigning traffic monitor print function
Allow users to change traffic monitor's print function. If not provided,
traffic monitor will print to stdout by default.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250305182057.2802606-2-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15 11:48:57 -07:00
Amery Hung aeb5bbb025 selftests/bpf: Clean up call sites of stdio_restore()
reset_affinity() and save_ns() are only called in run_one_test(). There is
no need to call stdio_restore() in reset_affinity() and save_ns() if
stdio_restore() is moved right after a test finishes in run_one_test().

Also remove an unnecessary check of env.stdout_saved in crash_handler()
by moving env.stdout_saved assignment to the beginning of main().

Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://patch.msgid.link/20250305182057.2802606-1-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15 11:48:57 -07:00
Bastien Curutchet (eBPF Foundation) f5e288943e selftests/bpf: Move test_lwt_ip_encap to test_progs
test_lwt_ip_encap.sh isn't used by the BPF CI.

Add a new file in the test_progs framework to migrate the tests done by
test_lwt_ip_encap.sh. It uses the same network topology and the same BPF
programs located in progs/test_lwt_ip_encap.c.
Rework the GSO part to avoid using nc and dd.

Remove test_lwt_ip_encap.sh and its Makefile entry.

Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250304-lwt_ip-v1-1-8fdeb9e79a56@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15 11:48:57 -07:00
Kumar Kartikeya Dwivedi 2dfc8186d6 selftests/bpf: Add tests for arena spin lock
Add some basic selftests for qspinlock built over BPF arena using
cond_break_label macro.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250306035431.2186189-4-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15 11:48:56 -07:00
Kumar Kartikeya Dwivedi 88d706ba7c selftests/bpf: Introduce arena spin lock
Implement queued spin lock algorithm as BPF program for lock words
living in BPF arena.

The algorithm is copied from kernel/locking/qspinlock.c and adapted for
BPF use.

We first implement abstract helpers for portable atomics and
acquire/release load instructions, by relying on X86_64 presence to
elide expensive barriers and rely on implementation details of the JIT,
and fall back to slow but correct implementations elsewhere. When
support for acquire/release load/stores lands, we can improve this
state.

Then, the qspinlock algorithm is adapted to remove dependence on
multi-word atomics due to lack of support in BPF ISA. For instance,
xchg_tail cannot use 16-bit xchg, and needs to be a implemented as a
32-bit try_cmpxchg loop.

Loops which are seemingly infinite from verifier PoV are annotated with
cond_break_label macro to return an error. Only 1024 NR_CPUs are
supported.

Note that the slow path is a global function, hence the verifier doesn't
know the return value's precision. The recommended way of usage is to
always test against zero for success, and not ret < 0 for error, as the
verifier would assume ret > 0 has not been accounted for. Add comments
in the function documentation about this quirk.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250306035431.2186189-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15 11:48:56 -07:00
Kumar Kartikeya Dwivedi 4b7ede0be3 selftests/bpf: Introduce cond_break_label
Add a new cond_break_label macro that jumps to the specified label when
the cond_break termination check fires, and allows us to better handle
the uncontrolled termination of the loop.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250306035431.2186189-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15 11:48:56 -07:00
Eduard Zingerman 871ef8d50e bpf: correct use/def for may_goto instruction
may_goto instruction does not use any registers,
but in compute_insn_live_regs() it was treated as a regular
conditional jump of kind BPF_K with r0 as source register.
Thus unnecessarily marking r0 as used.

Fixes: 14c8552db6 ("bpf: simple DFA-based live registers analysis")
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250305085436.2731464-1-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15 11:48:30 -07:00
Eduard Zingerman 2ea8f6a1cd selftests/bpf: test cases for compute_live_registers()
Cover instructions from each kind:
- assignment
- arithmetic
- store/load
- endian conversion
- atomics
- branches, conditional branches, may_goto, calls
- LD_ABS/LD_IND
- address_space_cast

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250304195024.2478889-6-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15 11:48:30 -07:00
Eduard Zingerman 14c8552db6 bpf: simple DFA-based live registers analysis
Compute may-live registers before each instruction in the program.
The register is live before the instruction I if it is read by I or
some instruction S following I during program execution and is not
overwritten between I and S.

This information would be used in the next patch as a hint in
func_states_equal().

Use a simple algorithm described in [1] to compute this information:
- define the following:
  - I.use : a set of all registers read by instruction I;
  - I.def : a set of all registers written by instruction I;
  - I.in  : a set of all registers that may be alive before I execution;
  - I.out : a set of all registers that may be alive after I execution;
  - I.successors : a set of instructions S that might immediately
                   follow I for some program execution;
- associate separate empty sets 'I.in' and 'I.out' with each instruction;
- visit each instruction in a postorder and update corresponding
  'I.in' and 'I.out' sets as follows:

      I.out = U [S.in for S in I.successors]
      I.in  = (I.out / I.def) U I.use

  (where U stands for set union, / stands for set difference)
- repeat the computation while I.{in,out} changes for any instruction.

On implementation side keep things as simple, as possible:
- check_cfg() already marks instructions EXPLORED in post-order,
  modify it to save the index of each EXPLORED instruction in a vector;
- represent I.{in,out,use,def} as bitmasks;
- don't split the program into basic blocks and don't maintain the
  work queue, instead:
  - do fixed-point computation by visiting each instruction;
  - maintain a simple 'changed' flag if I.{in,out} for any instruction
    change;
  Measurements show that even such simplistic implementation does not
  add measurable verification time overhead (for selftests, at-least).

Note on check_cfg() ex_insn_beg/ex_done change:
To avoid out of bounds access to env->cfg.insn_postorder array,
it should be guaranteed that instruction transitions to EXPLORED state
only once. Previously this was not the fact for incorrect programs
with direct calls to exception callbacks.

The 'align' selftest needs adjustment to skip computed insn/live
registers printout. Otherwise it matches lines from the live registers
printout.

[1] https://en.wikipedia.org/wiki/Live-variable_analysis

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250304195024.2478889-4-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15 11:48:29 -07:00
Peilin Ye ff3afe5da9 selftests/bpf: Add selftests for load-acquire and store-release instructions
Add several ./test_progs tests:

  - arena_atomics/load_acquire
  - arena_atomics/store_release
  - verifier_load_acquire/*
  - verifier_store_release/*
  - verifier_precision/bpf_load_acquire
  - verifier_precision/bpf_store_release

The last two tests are added to check if backtrack_insn() handles the
new instructions correctly.

Additionally, the last test also makes sure that the verifier
"remembers" the value (in src_reg) we store-release into e.g. a stack
slot.  For example, if we take a look at the test program:

    #0:  r1 = 8;
      /* store_release((u64 *)(r10 - 8), r1); */
    #1:  .8byte %[store_release];
    #2:  r1 = *(u64 *)(r10 - 8);
    #3:  r2 = r10;
    #4:  r2 += r1;
    #5:  r0 = 0;
    #6:  exit;

At #1, if the verifier doesn't remember that we wrote 8 to the stack,
then later at #4 we would be adding an unbounded scalar value to the
stack pointer, which would cause the program to be rejected:

  VERIFIER LOG:
  =============
...
  math between fp pointer and register with unbounded min value is not allowed

For easier CI integration, instead of using built-ins like
__atomic_{load,store}_n() which depend on the new
__BPF_FEATURE_LOAD_ACQ_STORE_REL pre-defined macro, manually craft
load-acquire/store-release instructions using __imm_insn(), as suggested
by Eduard.

All new tests depend on:

  (1) Clang major version >= 18, and
  (2) ENABLE_ATOMICS_TESTS is defined (currently implies -mcpu=v3 or
      v4), and
  (3) JIT supports load-acquire/store-release (currently arm64 and
      x86-64)

In .../progs/arena_atomics.c:

  /* 8-byte-aligned */
  __u8 __arena_global load_acquire8_value = 0x12;
  /* 1-byte hole */
  __u16 __arena_global load_acquire16_value = 0x1234;

That 1-byte hole in the .addr_space.1 ELF section caused clang-17 to
crash:

  fatal error: error in backend: unable to write nop sequence of 1 bytes

To work around such llvm-17 CI job failures, conditionally define
__arena_global variables as 64-bit if __clang_major__ < 18, to make sure
.addr_space.1 has no holes.  Ideally we should avoid compiling this file
using clang-17 at all (arena tests depend on
__BPF_FEATURE_ADDR_SPACE_CAST, and are skipped for llvm-17 anyway), but
that is a separate topic.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Peilin Ye <yepeilin@google.com>
Link: https://lore.kernel.org/r/1b46c6feaf0f1b6984d9ec80e500cc7383e9da1a.1741049567.git.yepeilin@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15 11:48:29 -07:00
Kumar Kartikeya Dwivedi 2fb761823e bpf, x86: Add x86 JIT support for timed may_goto
Implement the arch_bpf_timed_may_goto function using inline assembly to
have control over which registers are spilled, and use our special
protocol of using BPF_REG_AX as an argument into the function, and as
the return value when going back.

Emit call depth accounting for the call made from this stub, and ensure
we don't have naked returns (when rethunk mitigations are enabled) by
falling back to the RET macro (instead of retq). After popping all saved
registers, the return address into the BPF program should be on top of
the stack.

Since the JIT support is now enabled, ensure selftests which are
checking the produced may_goto sequences do not break by adjusting them.
Make sure we still test the old may_goto sequence on other
architectures, while testing the new sequence on x86_64.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250304003239.2390751-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15 11:48:28 -07:00
Mykyta Yatsenko 6419d08b6c selftests/bpf: Add tests for bpf_object__prepare
Add selftests, checking that running bpf_object__prepare successfully
creates maps before load step.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250303135752.158343-5-mykyta.yatsenko5@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15 11:48:28 -07:00
Bastien Curutchet (eBPF Foundation) a54e700696 selftests/bpf: test_tunnel: Remove test_tunnel.sh
All tests from test_tunnel.sh have been migrated into test test_progs.
The last test remaining in the script is the test_ipip() that is already
covered in the test_prog framework by the NONE case of test_ipip_tunnel().

Remove the test_tunnel.sh script and its Makefile entry

Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250303-tunnels-v2-10-8329f38f0678@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15 11:48:27 -07:00
Bastien Curutchet (eBPF Foundation) 05cd60ab57 selftests/bpf: test_tunnel: Move ip6tnl tunnel tests to test_progs
ip6tnl tunnels are tested in the test_tunnel.sh but not in the test_progs
framework.

Add a new test in test_progs to test ip6tnl tunnels. It uses the same
network topology and the same BPF programs than the script.
Remove test_ipip6() and test_ip6ip6() from the script.

Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250303-tunnels-v2-9-8329f38f0678@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15 11:48:27 -07:00
Bastien Curutchet (eBPF Foundation) 260f2da62d selftests/bpf: test_tunnel: Move ip6geneve tunnel test to test_progs
ip6geneve tunnels are tested in the test_tunnel.sh but not in the
test_progs framework.

Add a new test in test_progs to test ip6geneve tunnels. It uses the same
network topology and the same BPF programs than the script.
Remove test_ip6geneve() from the script.

Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250303-tunnels-v2-8-8329f38f0678@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15 11:48:27 -07:00
Bastien Curutchet (eBPF Foundation) bd477738e6 selftests/bpf: test_tunnel: Move geneve tunnel test to test_progs
geneve tunnels are tested in the test_tunnel.sh but not in the test_progs
framework.

Add a new test in test_progs to test geneve tunnels. It uses the same
network topology and the same BPF programs than the script.
Remove test_geneve() from the script.

Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250303-tunnels-v2-7-8329f38f0678@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15 11:48:27 -07:00
Bastien Curutchet (eBPF Foundation) ea60b6a524 selftests/bpf: test_tunnel: Move ip6erspan tunnel test to test_progs
ip6erspan tunnels are tested in the test_tunnel.sh but not in the
test_progs framework.

Add a new test in test_progs to test ip6erspan tunnels. It uses the same
network topology and the same BPF programs than the script.
Remove test_ip6erspan() from the script.

Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250303-tunnels-v2-6-8329f38f0678@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15 11:48:27 -07:00
Bastien Curutchet (eBPF Foundation) cadb08a4d3 selftests/bpf: test_tunnel: Move erspan tunnel tests to test_progs
erspan tunnels are tested in the test_tunnel.sh but not in the test_progs
framework.

Add a new test in test_progs to test erspan tunnels. It uses the same
network topology and the same BPF programs than the script.
Remove test_erspan() from the script.

Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250303-tunnels-v2-5-8329f38f0678@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15 11:48:27 -07:00
Bastien Curutchet (eBPF Foundation) 856818b28f selftests/bpf: test_tunnel: Move ip6gre tunnel test to test_progs
ip6gre tunnels are tested in the test_tunnel.sh but not in the test_progs
framework.

Add a new test in test_progs to test ip6gre tunnels. It uses the same
network topology and the same BPF programs than the script. Disable the
IPv6 DAD feature because it can take lot of time and cause some tests to
fail depending on the environment they're run on.
Remove test_ip6gre() and test_ip6gretap() from the script.

Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250303-tunnels-v2-4-8329f38f0678@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15 11:48:27 -07:00
Bastien Curutchet (eBPF Foundation) 257dfd1c6b selftests/bpf: test_tunnel: Move gre tunnel test to test_progs
gre tunnels are tested in the test_tunnel.sh but not in the test_progs
framework.

Add a new test in test_progs to test gre tunnels. It uses the same
network topology and the same BPF programs than the script.
Remove test_gre() and test_gre_no_tunnel_key() from the script.

Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250303-tunnels-v2-3-8329f38f0678@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15 11:48:27 -07:00
Bastien Curutchet (eBPF Foundation) fcb39996a2 selftests/bpf: test_tunnel: Add ping helpers
All tests use more or less the same ping commands as final validation.
Also test_ping()'s return value is checked with ASSERT_OK() while this
check is already done by the SYS() macro inside test_ping().

Create helpers around test_ping() and use them in the tests to avoid code
duplication.
Remove the unnecessary ASSERT_OK() from the tests.

Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250303-tunnels-v2-2-8329f38f0678@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15 11:48:26 -07:00
Bastien Curutchet (eBPF Foundation) 5d6aa606c1 selftests/bpf: test_tunnel: Add generic_attach* helpers
A fair amount of code duplication is present among tests to attach BPF
programs.

Create generic_attach* helpers that attach BPF programs to a given
interface.
Use ASSERT_OK_FD() instead of ASSERT_GE() to check fd's validity.
Use these helpers in all the available tests.

Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250303-tunnels-v2-1-8329f38f0678@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15 11:48:26 -07:00
Eduard Zingerman 346e4ca462 veristat: Report program type guess results to sdterr
In order not to pollute CSV output, e.g.:

  $ ./veristat -o csv exceptions_ext.bpf.o > test.csv
  Using guessed program type 'sched_cls' for exceptions_ext.bpf.o/extension...
  Using guessed program type 'sched_cls' for exceptions_ext.bpf.o/throwing_extension...

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Mykyta Yatsenko <mykyta.yatsenko5@gmail.com>
Link: https://lore.kernel.org/bpf/20250301000147.1583999-4-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15 11:48:26 -07:00
Eduard Zingerman 2d95b3f582 veristat: Strerror expects positive number (errno)
Before:

  ./veristat -G @foobar iters.bpf.o
  Failed to open presets in 'foobar': Unknown error -2
  ...

After:

  ./veristat -G @foobar iters.bpf.o
  Failed to open presets in 'foobar': No such file or directory
  ...

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Mykyta Yatsenko <mykyta.yatsenko5@gmail.com>
Link: https://lore.kernel.org/bpf/20250301000147.1583999-3-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15 11:48:26 -07:00
Eduard Zingerman c0d078da7a veristat: @files-list.txt notation for object files list
Allow reading object file list from file.
E.g. the following command:

  ./veristat @list.txt

Is equivalent to the following invocation:

  ./veristat line-1 line-2 ... line-N

Where line-i corresponds to lines from list.txt.
Lines starting with '#' are ignored.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Mykyta Yatsenko <mykyta.yatsenko5@gmail.com>
Link: https://lore.kernel.org/bpf/20250301000147.1583999-2-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15 11:48:26 -07:00
Kumar Kartikeya Dwivedi 72ed076abf selftests/bpf: Add tests for extending sleepable global subprogs
Add tests for freplace behavior with the combination of sleepable
and non-sleepable global subprogs. The changes_pkt_data selftest
did all the hardwork, so simply rename it and include new support
for more summarization tests for might_sleep bit.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250301151846.1552362-4-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15 11:48:25 -07:00
Kumar Kartikeya Dwivedi b2bb703434 selftests/bpf: Test sleepable global subprogs in atomic contexts
Add tests for rejecting sleepable and accepting non-sleepable global
function calls in atomic contexts. For spin locks, we still reject
all global function calls. Once resilient spin locks land, we will
carefully lift in cases where we deem it safe.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250301151846.1552362-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15 11:48:25 -07:00
Alexis Lothoré (eBPF Foundation) 93cf4e537e bpf/selftests: test_select_reuseport_kern: Remove unused header
test_select_reuseport_kern.c is currently including <stdlib.h>, but it
does not use any definition from there.

Remove stdlib.h inclusion from test_select_reuseport_kern.c

Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250227-remove_wrong_header-v1-1-bc94eb4e2f73@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15 11:48:25 -07:00
Yonghong Song 2222aa1c89 selftests/bpf: Add selftests allowing cgroup prog pre-ordering
Add a few selftests with cgroup prog pre-ordering.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250224230121.283601-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15 11:48:25 -07:00
Jiayuan Chen 7f260af1f2 selftests/bpf: Fixes for test_maps test
BPF CI has failed 3 times in the last 24 hours. Add retry for ENOMEM.
It's similar to the optimization plan:
commit 2f553b032c ("selftsets/bpf: Retry map update for non-preallocated per-cpu map")

Failed CI:
https://github.com/kernel-patches/bpf/actions/runs/13549227497/job/37868926343
https://github.com/kernel-patches/bpf/actions/runs/13548089029/job/37865812030
https://github.com/kernel-patches/bpf/actions/runs/13553536268/job/37883329296

selftests/bpf: Fixes for test_maps test
Fork 100 tasks to 'test_update_delete'
Fork 100 tasks to 'test_update_delete'
Fork 100 tasks to 'test_update_delete'
Fork 100 tasks to 'test_update_delete'
......
test_task_storage_map_stress_lookup:PASS
test_maps: OK, 0 SKIPPED

Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Link: https://lore.kernel.org/r/20250227142646.59711-4-jiayuan.chen@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15 11:48:25 -07:00
Jiayuan Chen 93a279b65a selftests/bpf: Allow auto port binding for bpf nf
Allow auto port binding for bpf nf test to avoid binding conflict.

./test_progs -a bpf_nf
24/1    bpf_nf/xdp-ct:OK
24/2    bpf_nf/tc-bpf-ct:OK
24/3    bpf_nf/alloc_release:OK
24/4    bpf_nf/insert_insert:OK
24/5    bpf_nf/lookup_insert:OK
24/6    bpf_nf/set_timeout_after_insert:OK
24/7    bpf_nf/set_status_after_insert:OK
24/8    bpf_nf/change_timeout_after_alloc:OK
24/9    bpf_nf/change_status_after_alloc:OK
24/10   bpf_nf/write_not_allowlisted_field:OK
24      bpf_nf:OK
Summary: 1/10 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Link: https://lore.kernel.org/r/20250227142646.59711-3-jiayuan.chen@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15 11:48:24 -07:00
Jiayuan Chen acf0d6f681 selftests/bpf: Allow auto port binding for cgroup connect
Allow auto port binding for cgroup connect test to avoid binding conflict.

Result:
./test_progs -a cgroup_v1v2
59      cgroup_v1v2:OK
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Link: https://lore.kernel.org/r/20250227142646.59711-2-jiayuan.chen@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15 11:48:24 -07:00
Mykyta Yatsenko 064e9aacfd selftests/bpf: Add tests for bpf_dynptr_copy
Add XDP setup type for dynptr tests, enabling testing for
non-contiguous buffer.
Add 2 tests:
 - test_dynptr_copy - verify correctness for the fast (contiguous
 buffer) code path.
 - test_dynptr_copy_xdp - verifies code paths that handle
 non-contiguous buffer.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250226183201.332713-4-mykyta.yatsenko5@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15 11:48:20 -07:00
Alison Schofield 114a89b433 cxl/test: Define a CFMWS capable of a 3 way HB interleave
The CXL unit test cxl-xor-region.sh is skipping a 1+1+1 region
interleave test case because the window is not defined.

Additionally, upcoming expansion of 3 way HB interleave test cases
(like 2+2+2) require the same window.

Replace an unused CFMWS with a 3-way capable CFMWS in the set of
CFMWS's loaded when interleave_arithmetic=1.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Tested-by: Li Zhijian <lizhijian@fujitsu.com>
Link: https://patch.msgid.link/20250226221931.2352061-1-alison.schofield@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-03-14 16:27:40 -07:00
Dave Jiang 763e15d047 Merge branch 'for-6.15/extended-linear-cache' into cxl-for-next2
Add support for Extended Linear Cache for CXL. Add enumeration support
of the cache. Add MCE notification of the aliased memory address.
2025-03-14 16:22:34 -07:00
Dave Jiang d781a45270 Merge branch 'for-6.15/dirty-shutdown' into cxl-for-next2
Add support for Global Persistent Flush (GPF) and dirty shutdown
accounting.
2025-03-14 16:11:42 -07:00
Davidlohr Bueso 6eb52f63ea tools/testing/cxl: Set Shutdown State support
Add support to emulate the CXL Set Shutdown State operation.

Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Li Ming <ming.li@zohomail.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://patch.msgid.link/20250220220235.276831-5-dave@stgolabs.net
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-03-14 15:55:27 -07:00
Yuquan Wang 2da9ad027e cxl/pmem: debug invalid serial number data
In a nvdimm interleave-set each device with an invalid or zero
serial number may cause pmem region initialization to fail, but in
cxl case such device could still set cookies of nd_interleave_set
and create a nvdimm pmem region.

This adds the validation of serial number in cxl pmem region creation.
The event of no serial number would cause to fail to set the cookie
and pmem region.

For cxl-test to work properly, always +1 on mock device's serial
number.

Signed-off-by: Yuquan Wang <wangyuquan1236@phytium.com.cn>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://patch.msgid.link/20250219040029.515451-2-wangyuquan1236@phytium.com.cn
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-03-14 15:01:04 -07:00
Dave Jiang 9387c6aec0 Merge branch 'for-6.15/fw-first-error-logging' into cxl-for-next2
Add logging support for CXL CPER endpoint and port protocol errors.
Including the 2 patches that was completed later.

Link: https://lore.kernel.org/linux-cxl/20250123084421.127697-1-Smita.KoralahalliChannabasappa@amd.com/
Link: https://lore.kernel.org/linux-cxl/20250310223839.31342-1-Smita.KoralahalliChannabasappa@amd.com/
2025-03-14 14:27:17 -07:00
Smita Koralahalli 36f257e3b0 acpi/ghes, cxl/pci: Process CXL CPER Protocol Errors
When PCIe AER is in FW-First, OS should process CXL Protocol errors from
CPER records. Introduce support for handling and logging CXL Protocol
errors.

The defined trace events cxl_aer_uncorrectable_error and
cxl_aer_correctable_error trace native CXL AER endpoint errors. Reuse them
to trace FW-First Protocol errors.

Since the CXL code is required to be called from process context and
GHES is in interrupt context, use workqueues for processing.

Similar to CXL CPER event handling, use kfifo to handle errors as it
simplifies queue processing by providing lock free fifo operations.

Add the ability for the CXL sub-system to register a workqueue to
process CXL CPER protocol errors.

[DJ: return cxl_cper_register_prot_err_work() directly in cxl_ras_init()]

Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Reviewed-by: Li Ming <ming.li@zohomail.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://patch.msgid.link/20250310223839.31342-2-Smita.KoralahalliChannabasappa@amd.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-03-14 14:21:45 -07:00
Tamir Duberstein 97c1f302f2 scanf: convert self-test to KUnit
Convert the scanf() self-test to a KUnit test.

In the interest of keeping the patch reasonably-sized this doesn't
refactor the tests into proper parameterized tests - it's all one big
test case.

Reviewed-by: David Gow <davidgow@google.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Tested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Tamir Duberstein <tamird@gmail.com>
Link: https://lore.kernel.org/r/20250307-scanf-kunit-convert-v9-3-b98820fa39ff@gmail.com
Signed-off-by: Kees Cook <kees@kernel.org>
2025-03-14 13:55:37 -07:00
Niklas Cassel 24a42582b0
selftests: pci_endpoint: Use IRQ_TYPE_* defines from UAPI header
In order to improve readability, use the IRQ_TYPE_* defines from the UAPI
header rather than using raw values.

No functional change.

Signed-off-by: Niklas Cassel <cassel@kernel.org>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Link: https://lore.kernel.org/r/20250310111016.859445-12-cassel@kernel.org
2025-03-14 16:12:04 +00:00
Matthieu Baerts (NGI0) a1e36ec363 selftests: drv-net: fix merge conflicts resolution
After the recent merge between net-next and net, I got some conflicts on
my side because the merge resolution was different from Stephen's one
[1] I applied on my side in the MPTCP tree.

It looks like the code that is now in net-next is using the old way to
retrieve the local and remote addresses. This patch is now using the new
way, like what was in Stephen's email [1].

Also, in get_interface_info(), there were no conflicts in this area,
because that was new code from 'net', but a small adaptation was needed
there as well to get the remote address.

Fixes: 941defcea7 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")
Link: https://lore.kernel.org/20250311115758.17a1d414@canb.auug.org.au [1]
Suggested-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250314-net-next-drv-net-ping-fix-merge-v1-1-0d5c19daf707@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-14 10:24:14 +01:00
Paolo Abeni 941defcea7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.14-rc6).

Conflicts:

tools/testing/selftests/drivers/net/ping.py
  75cc19c8ff ("selftests: drv-net: add xdp cases for ping.py")
  de94e86974 ("selftests: drv-net: store addresses in dict indexed by ipver")
https://lore.kernel.org/netdev/20250311115758.17a1d414@canb.auug.org.au/

net/core/devmem.c
  a70f891e0f ("net: devmem: do not WARN conditionally after netdev_rx_queue_restart()")
  1d22d3060b ("net: drop rtnl_lock for queue_mgmt operations")
https://lore.kernel.org/netdev/20250313114929.43744df1@canb.auug.org.au/

Adjacent changes:

tools/testing/selftests/net/Makefile
  6f50175cca ("selftests: Add IPv6 link-local address generation tests for GRE devices.")
  2e5584e0f9 ("selftests/net: expand cmsg_ipv6.sh with ipv4")

drivers/net/ethernet/broadcom/bnxt/bnxt.c
  661958552e ("eth: bnxt: do not use BNXT_VNIC_NTUPLE unconditionally in queue restart logic")
  fe96d717d3 ("bnxt_en: Extend queue stop/start for TX rings")

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-13 23:08:11 +01:00
Jason Xing a1e0783e10 selftests/bpf: Add bpf_getsockopt() for TCP_BPF_DELACK_MAX and TCP_BPF_RTO_MIN
Add selftests for TCP_BPF_DELACK_MAX and TCP_BPF_RTO_MIN BPF socket
cases.

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250312153523.9860-5-kerneljasonxing@gmail.com
2025-03-13 14:42:53 -07:00
Linus Torvalds 4003c9e787 Including fixes from netfilter, bluetooth and wireless.
No known regressions outstanding.
 
 Current release - regressions:
 
   - wifi: nl80211: fix assoc link handling
 
   - eth: lan78xx: sanitize return values of register read/write functions
 
 Current release - new code bugs:
 
   - ethtool: tsinfo: fix dump command
 
   - bluetooth: btusb: configure altsetting for HCI_USER_CHANNEL
 
   - eth: mlx5: DR, use the right action structs for STEv3
 
 Previous releases - regressions:
 
   - netfilter: nf_tables: make destruction work queue pernet
 
   - gre: fix IPv6 link-local address generation.
 
   - wifi: iwlwifi: fix TSO preparation
 
   - bluetooth: revert "bluetooth: hci_core: fix sleeping function called from invalid context"
 
   - ovs: revert "openvswitch: switch to per-action label counting in conntrack"
 
   - eth: ice: fix switchdev slow-path in LAG
 
   - eth: bonding: fix incorrect MAC address setting to receive NS messages
 
 Previous releases - always broken:
 
   - core: prevent TX of unreadable skbs
 
   - sched: prevent creation of classes with TC_H_ROOT
 
   - netfilter: nft_exthdr: fix offset with ipv4_find_option()
 
   - wifi: cfg80211: cancel wiphy_work before freeing wiphy
 
   - mctp: copy headers if cloned
 
   - phy: nxp-c45-tja11xx: add errata for TJA112XA/B
 
   - eth: bnxt: fix kernel panic in the bnxt_get_queue_stats{rx | tx}
 
   - eth: mlx5: bridge, fix the crash caused by LAG state check
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmfS+yMSHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOknd8P/3XdIaDXrwVl9BSi37qToRs40i+6qPep
 5MZN+wakJkuHQqooADF/UarqHWEtQwFOTJalgsPbrUpsespA0lhmzSRHsk+2Xy//
 8o8RuZ2VngN1/UVBaoXb0QOY28CgtDs0SdW6/jST/dEp3bHvW+ZTDR2aaivKAWtJ
 gdWIWR+Ctw1lpetKJer3loPXjPd2doDedSlf64CiHw7dRFTwiE2AcrTRkYdd18Ml
 HJjiBkqKMnUTxqWxOU3GJxdl7Qru7CvAfn3XQ0PjCO9hADPJb/uWZDRENauk3/jE
 6lJc+EP5AoG3Ce8MLzHr/aayZR3YtyhJ2TaWmyPbe6BJq7tPH84ydKMZnF7UnOd5
 Jp4T6rykkJOYnUdiYGiOPHcTxQxUugKUrL3rNvGJep+x3JAi/DJJkHNWEx+EBB55
 3YbDTOUiY1bCGOoGJIZRwNzw52O0+7qINvCtAN21sMu4pKr/nYMzO8L9ldFExllj
 EuOcxr813V09NyoJs+wWcy9FdlVmcPwCsT4Pylrdb+mA1XohPzMwibH3yHk4XCZ6
 Aijz5Bb78i4vg+kZzNZmVY11J5yL0/eC8DefMgUe3+JKwyM2TiSn5hHK6tYbTLtM
 yAjbTkeu/FD6alSRAnIAL+YBHg4ri3fXXjU4vLZA5n3wWIjfqpL7JAFgnK57Bd8Z
 1X5k+52VjQzj
 =lIrc
 -----END PGP SIGNATURE-----

Merge tag 'net-6.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from netfilter, bluetooth and wireless.

  No known regressions outstanding.

  Current release - regressions:

   - wifi: nl80211: fix assoc link handling

   - eth: lan78xx: sanitize return values of register read/write
     functions

  Current release - new code bugs:

   - ethtool: tsinfo: fix dump command

   - bluetooth: btusb: configure altsetting for HCI_USER_CHANNEL

   - eth: mlx5: DR, use the right action structs for STEv3

  Previous releases - regressions:

   - netfilter: nf_tables: make destruction work queue pernet

   - gre: fix IPv6 link-local address generation.

   - wifi: iwlwifi: fix TSO preparation

   - bluetooth: revert "bluetooth: hci_core: fix sleeping function
     called from invalid context"

   - ovs: revert "openvswitch: switch to per-action label counting in
     conntrack"

   - eth:
       - ice: fix switchdev slow-path in LAG
       - bonding: fix incorrect MAC address setting to receive NS
         messages

  Previous releases - always broken:

   - core: prevent TX of unreadable skbs

   - sched: prevent creation of classes with TC_H_ROOT

   - netfilter: nft_exthdr: fix offset with ipv4_find_option()

   - wifi: cfg80211: cancel wiphy_work before freeing wiphy

   - mctp: copy headers if cloned

   - phy: nxp-c45-tja11xx: add errata for TJA112XA/B

   - eth:
       - bnxt: fix kernel panic in the bnxt_get_queue_stats{rx | tx}
       - mlx5: bridge, fix the crash caused by LAG state check"

* tag 'net-6.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (65 commits)
  net: mana: cleanup mana struct after debugfs_remove()
  net/mlx5e: Prevent bridge link show failure for non-eswitch-allowed devices
  net/mlx5: Bridge, fix the crash caused by LAG state check
  net/mlx5: Lag, Check shared fdb before creating MultiPort E-Switch
  net/mlx5: Fix incorrect IRQ pool usage when releasing IRQs
  net/mlx5: HWS, Rightsize bwc matcher priority
  net/mlx5: DR, use the right action structs for STEv3
  Revert "openvswitch: switch to per-action label counting in conntrack"
  net: openvswitch: remove misbehaving actions length check
  selftests: Add IPv6 link-local address generation tests for GRE devices.
  gre: Fix IPv6 link-local address generation.
  netfilter: nft_exthdr: fix offset with ipv4_find_option()
  selftests/tc-testing: Add a test case for DRR class with TC_H_ROOT
  net_sched: Prevent creation of classes with TC_H_ROOT
  ipvs: prevent integer overflow in do_ip_vs_get_ctl()
  selftests: netfilter: skip br_netfilter queue tests if kernel is tainted
  netfilter: nf_conncount: Fully initialize struct nf_conncount_tuple in insert_tree()
  wifi: mac80211: fix MPDU length parsing for EHT 5/6 GHz
  qlcnic: fix memory leak issues in qlcnic_sriov_common.c
  rtase: Fix improper release of ring list entries in rtase_sw_reset
  ...
2025-03-13 07:58:48 -10:00
Tamir Duberstein 7a79e7daa8 printf: convert self-test to KUnit
Convert the printf() self-test to a KUnit test.

In the interest of keeping the patch reasonably-sized this doesn't
refactor the tests into proper parameterized tests - it's all one big
test case.

Signed-off-by: Tamir Duberstein <tamird@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Tested-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20250307-printf-kunit-convert-v6-1-4d85c361c241@gmail.com
Signed-off-by: Kees Cook <kees@kernel.org>
2025-03-13 10:26:33 -07:00
Paolo Abeni 2409fa66e2 netfilter pull request 25-03-13
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEjF9xRqF1emXiQiqU1w0aZmrPKyEFAmfSqyUACgkQ1w0aZmrP
 KyGocBAAu0+rdAcW1wwSYk53tRN663WpbbYg/AchJ9ilLH2IXdCzzbeuekRzvSmS
 /gS2WNANmckL314300RV5vTd9+PCbANjh1248O4S8ZQnhmomoiB3xD4QM9fu8HKZ
 RE+koHmnHESl+TiiUzgw31OtZEdSpjn7zUzm+qAznBeDXfyBE1YWQ9q8LALi0+3m
 RW/waYykwRktzHCANBYNF0oqt6siBrHnI9KHgXIfQCEK0mc3A/s8MfO1IJqC177Y
 It/+4a6qsaGZZsSwGCrQ/40CO3rItzF7jefpeAjv/di224EQ+p2sWZgEiLPCMtvk
 6WSfYhKNiDnKGAe1u9hCG2RdaSQ8Og7BsBWFeB56gZbK7gPoFY2VCM/hATsBTYG8
 bn8xQKFoA9y7dYp8AP6a33OlzV+/qOfJa2kXQ90yjS4yWnMY+q59j1D564z/WhZ9
 gVZjrcUDSsv3ETtp7vk7s+0SoI3EvSLb/umcjHZR6oeLkNFWpQ4r9NqY0GX6+zDC
 89YR2w/RbO9WMRb52OAqY3HdKmOflwEIN5mDWCAB4sIPUgCpoDzvIjC2xLdr6lUW
 ydoisH2B87luG0p7zKP7iGE1KIp8wm+zO+XciOrcjyie5YNMHyCFRbzGAsd47CLB
 UjNrrlAS0tU65Nb+JzsrZOYye2cghE+KC8fKHvpwjk54lsaPHl0=
 =OTtx
 -----END PGP SIGNATURE-----

Merge tag 'nf-25-03-13' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter/IPVS fixes for net

The following patchset contains Netfilter/IPVS fixes for net:

1) Missing initialization of cpu and jiffies32 fields in conncount,
   from Kohei Enju.

2) Skip several tests in case kernel is tainted, otherwise tests bogusly
   report failure too as they also check for tainted kernel,
   from Florian Westphal.

3) Fix a hyphothetical integer overflow in do_ip_vs_get_ctl() leading
   to bogus error logs, from Dan Carpenter.

4) Fix incorrect offset in ipv4 option match in nft_exthdr, from
   Alexey Kashavkin.

netfilter pull request 25-03-13

* tag 'nf-25-03-13' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nft_exthdr: fix offset with ipv4_find_option()
  ipvs: prevent integer overflow in do_ip_vs_get_ctl()
  selftests: netfilter: skip br_netfilter queue tests if kernel is tainted
  netfilter: nf_conncount: Fully initialize struct nf_conncount_tuple in insert_tree()
====================

Link: https://patch.msgid.link/20250313095636.2186-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-13 15:07:39 +01:00
Thomas Gleixner 8e63360d86 selftests/timers/posix-timers: Add a test for exact allocation mode
The exact timer ID allocation mode is used by CRIU to restore timers with a
given ID. Add a test case for it.

It's skipped on older kernels when the prctl() fails.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/all/8734fl2tkx.ffs@tglx
2025-03-13 12:07:18 +01:00
Guillaume Nault 6f50175cca selftests: Add IPv6 link-local address generation tests for GRE devices.
GRE devices have their special code for IPv6 link-local address
generation that has been the source of several regressions in the past.

Add selftest to check that all gre, ip6gre, gretap and ip6gretap get an
IPv6 link-link local address in accordance with the
net.ipv6.conf.<dev>.addr_gen_mode sysctl.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/2d6772af8e1da9016b2180ec3f8d9ee99f470c77.1741375285.git.gnault@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-13 10:17:42 +01:00
Sebastian Ott edfd826b8b KVM: arm64: selftests: Test that TGRAN*_2 fields are writable
Userspace can write to these fields for non-NV guests; add test that do
just that.

Signed-off-by: Sebastian Ott <sebott@redhat.com>
Link: https://lore.kernel.org/kvmarm/20250306184013.30008-1-sebott@redhat.com/
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-12 13:40:53 -07:00
Jakub Kicinski 188107b2c4 selftests: net: bump GRO timeout for gro/setup_veth
Commit 51bef03e1a ("selftests/net: deflake GRO tests") recently
switched to NAPI suspension, and lowered the timeout from 1ms to 100us.
This started causing flakes in netdev-run CI. Let's bump it to 200us.
In a quick test of a debug kernel I see failures with 100us, with 200us
in 5 runs I see 2 completely clean runs and 3 with a single retry
(GRO test will retry up to 5 times).

Reviewed-by: Kevin Krakauer <krakauer@google.com>
Link: https://patch.msgid.link/20250310110821.385621-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-12 13:20:04 -07:00
Cong Wang bb7737de5f selftests/tc-testing: Add a test case for DRR class with TC_H_ROOT
Integrate the reproduer from Mingi to TDC.

All test results:

1..4
ok 1 0385 - Create DRR with default setting
ok 2 2375 - Delete DRR with handle
ok 3 3092 - Show DRR class
ok 4 4009 - Reject creation of DRR class with classid TC_H_ROOT

Cc: Mingi Cho <mincho@theori.io>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250306232355.93864-3-xiyou.wangcong@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-12 12:52:14 -07:00
Florian Westphal c21b02fd9c selftests: netfilter: skip br_netfilter queue tests if kernel is tainted
These scripts fail if the kernel is tainted which leads to wrong test
failure reports in CI environments when an unrelated test triggers some
splat.

Check taint state at start of script and SKIP if its already dodgy.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-03-12 15:28:34 +01:00
Michael Jeanson e6644c967d rseq/selftests: Ensure the rseq ABI TLS is actually 1024 bytes
Adding the aligned(1024) attribute to the definition of __rseq_abi did
not increase its size to 1024, for this attribute to impact the size of
__rseq_abi it would need to be added to the declaration of 'struct
rseq_abi'. We only want to increase the size of the TLS allocation to
ensure registration will succeed with future extended ABI. Use a union
with a dummy member to ensure we allocate 1024 bytes.

Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/r/20250311192222.323453-1-mjeanson@efficios.com
2025-03-12 13:19:47 +01:00
Madhavan Srinivasan 73276cee1a selftest/powerpc/mm/pkey: fix build-break introduced by commit 00894c3fc9
Build break was reported in the powerpc mailing list for next-20250218 with below errors

make[1]: Nothing to be done for 'all'.
BUILD_TARGET=/root/venkat/linux-next/tools/testing/selftests/powerpc/mm; mkdir -p $BUILD_TARGET; make OUTPUT=$BUILD_TARGET -k -C mm all
  CC       pkey_exec_prot
In file included from pkey_exec_prot.c:18:
/root/venkat/linux-next/tools/testing/selftests/powerpc/include/pkeys.h: In function ‘pkeys_unsupported’:
/root/venkat/linux-next/tools/testing/selftests/powerpc/include/pkeys.h:96:34: error: ‘PKEY_UNRESTRICTED’ undeclared (first use in this function)
   96 |         pkey = sys_pkey_alloc(0, PKEY_UNRESTRICTED);
      |                                  ^~~~~~~~~~~~~~~~~

https://lore.kernel.org/all/20250113170619.484698-2-yury.khrustalev@arm.com/ patchset
has been queued to arm64/for-next/pkey_unrestricted which is causing a build break
in the selftest/powerpc builds.

Commit 6d61527d93 ("mm/pkey: Add PKEY_UNRESTRICTED macro") added a macro
PKEY_UNRESTRICTED to handle implicit literal value of 0x0 (which is "unrestricted").
Add the same to selftest/powerpc/pkeys.h to fix the reported build break.

Fixes: 00894c3fc9 ("selftests/powerpc: Use PKEY_UNRESTRICTED macro")
Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Closes: https://lore.kernel.org/lkml/3267ea6e-5a1a-4752-96ef-8351c912d386@linux.ibm.com/T/
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://lore.kernel.org/r/20250311084129.39308-1-maddy@linux.ibm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-03-11 15:16:10 +00:00
Hangbin Liu 9318dc2357 selftests: bonding: fix incorrect mac address
The correct mac address for NS target 2001:db8::254 is 33:33:ff:00:02:54,
not 33:33:00:00:02:54. The same with client maddress.

Fixes: 86fb6173d1 ("selftests: bonding: add ns multicast group testing")
Acked-by: Jay Vosburgh <jv@jvosburgh.net>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250306023923.38777-3-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-11 13:19:27 +01:00
Miklos Szeredi e1c24b52ad
selftests: add tests for mount notification
Provide coverage for all mnt_notify_add() instances.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Link: https://lore.kernel.org/r/20250307204046.322691-1-mszeredi@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-11 12:04:02 +01:00
Ming Lei 390174c91d selftests: ublk: improve test usability
Add UBLK_TEST_QUIET, so we can print test result(PASS/SKIP/FAIL) only.

Also always run from test script's current directory, then the same test
script can be started from other work directory.

This way helps a lot to reuse this test source code and scripts for
other projects(liburing, blktests, ...)

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250303124324.3563605-12-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-10 16:24:42 -06:00
Ming Lei af83ccc7db selftests: ublk: add stress test for covering IO vs. killing ublk server
Add stress_test_01 for running IO vs. killing ublk server, so io_uring exit &
cancel code path can be covered, same with ublk's cancel code path.

Especially IO buffer lifetime is one big thing for ublk zero copy, the added
test can verify if this area works as expected.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250303124324.3563605-11-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-10 16:24:42 -06:00
Ming Lei c60ac48eab selftests: ublk: add one stress test for covering IO vs. removing device
Add stress_test_01 for running IO vs. removing device for verifying that
ublk device removal can work as expected when heavy IO workloads are in
progress.

null, loop and loop/zc are covered in this tests.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250303124324.3563605-10-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-10 16:24:42 -06:00
Ming Lei 87a9265213 selftests: ublk: load/unload ublk_drv when preparing & cleaning up tests
Load ublk_drv module in _prep_test(), and unload it in _cleanup_test(),
so that test can always be done in consistent state.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250303124324.3563605-9-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-10 16:24:42 -06:00
Ming Lei c2cb669a86 selftests: ublk: move zero copy feature check into _add_ublk_dev()
Move zero copy feature check into _add_ublk_dev() since we will have
more tests which requires to cover zero copy.

Then one check function of _check_add_dev() has to be added for dealing
with cleanup since '_add_ublk_dev()' is run in sub-shell, and we can't
exit from it to terminal shell.

Meantime always return error code from _add_ublk_dev().

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250303124324.3563605-8-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-10 16:24:42 -06:00
Ming Lei c83b089a70 selftests: ublk: don't pass ${dev_id} to _cleanup_test()
More devices can be created in single tests, so simply remove all
ublk devices in _cleanup_test(), meantime remove the ${dev_id} argument
of _cleanup_test().

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250303124324.3563605-7-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-10 16:24:42 -06:00
Ming Lei 632051ffbd selftests: ublk: support shellcheck and fix all warning
Add shellcheck, meantime fixes all warnings.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250303124324.3563605-6-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-10 16:24:42 -06:00
Ming Lei 5b2db7a8c7 selftests: ublk: fix parsing '-a' argument
The argument of '-a' doesn't follow any value, so fix it by putting it
with '-z' together.

Fixes: bedc9cbc5f ("selftests: ublk: add ublk zero copy test")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20250303124324.3563605-5-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-10 16:24:18 -06:00
Taehee Yoo 75cc19c8ff selftests: drv-net: add xdp cases for ping.py
ping.py has 3 cases, test_v4, test_v6 and test_tcp.
But these cases are not executed on the XDP environment.
So, it adds XDP environment, existing tests(test_v4, test_v6, and
test_tcp) are executed too on the below XDP environment.
So, it adds XDP cases.
1. xdp-generic + single-buffer
2. xdp-generic + multi-buffer
3. xdp-native + single-buffer
4. xdp-native + multi-buffer
5. xdp-offload

It also makes test_{v4 | v6 | tcp} sending large size packets. this may
help to check whether multi-buffer is working or not.

Note that the physical interface may be down and then up when xdp is
attached or detached.
This takes some period to activate traffic. So sleep(10) is
added if the test interface is the physical interface.
netdevsim and veth type interfaces skip sleep.

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Link: https://patch.msgid.link/20250309134219.91670-9-ap420073@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-10 13:31:12 -07:00
Willem de Bruijn 0922cb68ed selftests/net: expand cmsg_ip with MSG_MORE
UDP send with MSG_MORE takes a slightly different path than the
lockless fast path.

For completeness, add coverage to this case too.

Pass MSG_MORE on the initial sendmsg, then follow up with a zero byte
write to unplug the cork.

Unrelated: also add two missing endlines in usage().

Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250307033620.411611-4-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-10 13:13:04 -07:00
Greg Kroah-Hartman 993a47bd7b Linux 6.14-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmfOKBUeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG1aQH/iC+Oyij4VxAjBek
 BOXIT/p6CwlIXb8ObiWWcRjDPizlcxb3RaV8J2RO+IqaQ2wltxpFANq2G7Re2FPm
 SNcEpIURAOVcxHGedcfFA91srO5F4FzNTO8LVp7MIbcgMYy3pdk+dbZmi6A691R+
 t9pb74m+MAnF1o/MUx7pUlhAT/4ymuuR0F7WCSg4h0Xwe5m0nlJY89kJBC7PCjyd
 n3mdhsz3rDSLmt/z/T7HGD89r8sYSvm9cOKtL3ELgGTrm7boQV8ii9Y9w04DI8PQ
 JmIernugcCxmhH36mVUAHgJf2+/T388xFUh/D5+skeUOUZpaJZG866rnb32WpsHc
 eWLFUeg=
 =Wypt
 -----END PGP SIGNATURE-----

Merge 6.14-rc6 into driver-core-next

We need the driver core fix in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-10 17:37:25 +01:00
Ming Lei 2ecdcdfee5 selftests: ublk: add --foreground command line
Add --foreground command for helping to debug.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20250303124324.3563605-4-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-10 09:17:13 -06:00
Ming Lei 9d80f48c5e selftests: ublk: fix build failure
Fixes the following build failure:

ublk//file_backed.c: In function ‘backing_file_tgt_init’:
ublk//file_backed.c:28:42: error: ‘O_DIRECT’ undeclared (first use in this function); did you mean ‘O_DIRECTORY’?
   28 |                 fd = open(file, O_RDWR | O_DIRECT);
      |                                          ^~~~~~~~
      |                                          O_DIRECTORY

when trying to reuse this same utility for liburing test.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20250303124324.3563605-3-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-10 09:17:13 -06:00
Ming Lei 9894e0eaae selftests: ublk: make ublk_stop_io_daemon() more reliable
Improve ublk_stop_io_daemon() in the following ways:

- don't wait if ->ublksrv_pid becomes -1, which means that the disk
has been stopped

- don't wait if ublk char device doesn't exist any more, so we can
avoid to rely on inoitfy for wait until the char device is closed

And this way may reduce time of delete command a lot.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20250303124324.3563605-2-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-10 09:17:13 -06:00
Linus Torvalds a382b06d29 KVM/arm64 fixes for 6.14, take #4
* Fix a couple of bugs affecting pKVM's PSCI relay implementation
   when running in the hVHE mode, resulting in the host being entered
   with the MMU in an unknown state, and EL2 being in the wrong mode.
 
 x86:
 
 * Set RFLAGS.IF in C code on SVM to get VMRUN out of the STI shadow.
 
 * Ensure DEBUGCTL is context switched on AMD to avoid running the guest with
   the host's value, which can lead to unexpected bus lock #DBs.
 
 * Suppress DEBUGCTL.BTF on AMD (to match Intel), as KVM doesn't properly
   emulate BTF.  KVM's lack of context switching has meant BTF has always been
   broken to some extent.
 
 * Always save DR masks for SNP vCPUs if DebugSwap is *supported*, as the guest
   can enable DebugSwap without KVM's knowledge.
 
 * Fix a bug in mmu_stress_tests where a vCPU could finish the "writes to RO
   memory" phase without actually generating a write-protection fault.
 
 * Fix a printf() goof in the SEV smoke test that causes build failures with
   -Werror.
 
 * Explicitly zero EAX and EBX in CPUID.0x8000_0022 output when PERFMON_V2
   isn't supported by KVM.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmfNSeUUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroNKngf/cLgQAT9AF4nFqcwh5b5uucKHVJ8W
 uTiGlWqLAf2UN53L63eZ/7vKQWGQYkOTFvormR14Jam6IYtytsZw1xLBH4fGtUyB
 qVjk0EPzaKGqn3LrgyneQNCXdyxJv7EBVBgoOKH0pvOksoW2E5ZizhhtRFtL7nCE
 Yk8FQKpP0mIBk04RMsvzJVEFKIb4OZgJadWo0gryg1oF2aAv7mxQjyqUWsBDsb3q
 99c0ElSBfV39FeT8xeok4k7S5jbBWii2KiaH72ZsNiBu0rYmEuLwIoygCNNWL9Wu
 FPdQ+r//YrzfCJSXwGPfdUaRaF4p2642S6oiXQuusNNUmhK6/MRo3mZo8A==
 =XQHm
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "arm64:

   - Fix a couple of bugs affecting pKVM's PSCI relay implementation
     when running in the hVHE mode, resulting in the host being entered
     with the MMU in an unknown state, and EL2 being in the wrong mode

  x86:

   - Set RFLAGS.IF in C code on SVM to get VMRUN out of the STI shadow

   - Ensure DEBUGCTL is context switched on AMD to avoid running the
     guest with the host's value, which can lead to unexpected bus lock
     #DBs

   - Suppress DEBUGCTL.BTF on AMD (to match Intel), as KVM doesn't
     properly emulate BTF. KVM's lack of context switching has meant BTF
     has always been broken to some extent

   - Always save DR masks for SNP vCPUs if DebugSwap is *supported*, as
     the guest can enable DebugSwap without KVM's knowledge

   - Fix a bug in mmu_stress_tests where a vCPU could finish the "writes
     to RO memory" phase without actually generating a write-protection
     fault

   - Fix a printf() goof in the SEV smoke test that causes build
     failures with -Werror

   - Explicitly zero EAX and EBX in CPUID.0x8000_0022 output when
     PERFMON_V2 isn't supported by KVM"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: Explicitly zero EAX and EBX when PERFMON_V2 isn't supported by KVM
  KVM: selftests: Fix printf() format goof in SEV smoke test
  KVM: selftests: Ensure all vCPUs hit -EFAULT during initial RO stage
  KVM: SVM: Don't rely on DebugSwap to restore host DR0..DR3
  KVM: SVM: Save host DR masks on CPUs with DebugSwap
  KVM: arm64: Initialize SCTLR_EL1 in __kvm_hyp_init_cpu()
  KVM: arm64: Initialize HCR_EL2.E2H early
  KVM: x86: Snapshot the host's DEBUGCTL after disabling IRQs
  KVM: SVM: Manually context switch DEBUGCTL if LBR virtualization is disabled
  KVM: x86: Snapshot the host's DEBUGCTL in common x86
  KVM: SVM: Suppress DEBUGCTL.BTF on AMD
  KVM: SVM: Drop DEBUGCTL[5:2] from guest's effective value
  KVM: selftests: Assert that STI blocking isn't set after event injection
  KVM: SVM: Set RFLAGS.IF=1 in C code, to get VMRUN out of the STI shadow
2025-03-09 09:04:08 -10:00
Paolo Bonzini ea9bd29a9c KVM x86 fixes for 6.14-rcN #2
- Set RFLAGS.IF in C code on SVM to get VMRUN out of the STI shadow.
 
  - Ensure DEBUGCTL is context switched on AMD to avoid running the guest with
    the host's value, which can lead to unexpected bus lock #DBs.
 
  - Suppress DEBUGCTL.BTF on AMD (to match Intel), as KVM doesn't properly
    emulate BTF.  KVM's lack of context switching has meant BTF has always been
    broken to some extent.
 
  - Always save DR masks for SNP vCPUs if DebugSwap is *supported*, as the guest
    can enable DebugSwap without KVM's knowledge.
 
  - Fix a bug in mmu_stress_tests where a vCPU could finish the "writes to RO
    memory" phase without actually generating a write-protection fault.
 
  - Fix a printf() goof in the SEV smoke test that causes build failures with
    -Werror.
 
  - Explicitly zero EAX and EBX in CPUID.0x8000_0022 output when PERFMON_V2
    isn't supported by KVM.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmfLlhUACgkQOlYIJqCj
 N/0x7w/+MhqJdHbshL7Gzw+rcXwCROiCkqsxFP+YoTXte8uaHS5CEfcMYjE8SuGp
 KBpgLo4Lj1dVTXiCjemlY5sn6CDiuSs74X8A88ksuu5hVsFByJUgyWU9iw8J/crZ
 B2vj8huhqa8OCEPe5JujWfnfyAkKE5tUA4GFi73vhHMcftTNj+ftxT33/Pfg7y7M
 xOvWFWS6ZshKrouRKzI7ZFEYLwp0lr4U3dzO5rCRAd5J4MSBWRx6Dx2um5dyEYKJ
 xgwl4ylM4S/+78u1+0nQnToM0UWHJ3e7x8nze6UXYTZIrBr/lSeKlbhOPnEWJcJB
 Eemnur9ORI2BRPUReqBKluCZsSK+E5B/HPCVt5cxtuRIuUOD+kW17LPgnPyE4Sso
 eVt+XAvQc7EjrpWDSHr3ZQZZM89l9zHhuSAQ0npO6y71s0FzEVZQoDamNmOLAPjH
 Qg+qhBV2l6pyfqhqiLzADasYLOl57cJsfiMjM331ALLqAn57jzd+B8c4hdB2Xg4s
 KPuy8w8uBaY9zpd9YDBLLr7JJVs35KexNZMjT2vqBYXcScyLgmAuSQXy3hub6Mzn
 gI5ZXIKG8eO9v2jejfClI6/OEdtEwgSGEVwuBKB16pMrIxqpguMTMTWLVRn5G+oo
 qA8anmKaac62GaB66JE/Wjy069OPIGYnHSU2nal0Tej6kG0xv6E=
 =as6u
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-fixes-6.14-rcN.2' of https://github.com/kvm-x86/linux into HEAD

KVM x86 fixes for 6.14-rcN #2

 - Set RFLAGS.IF in C code on SVM to get VMRUN out of the STI shadow.

 - Ensure DEBUGCTL is context switched on AMD to avoid running the guest with
   the host's value, which can lead to unexpected bus lock #DBs.

 - Suppress DEBUGCTL.BTF on AMD (to match Intel), as KVM doesn't properly
   emulate BTF.  KVM's lack of context switching has meant BTF has always been
   broken to some extent.

 - Always save DR masks for SNP vCPUs if DebugSwap is *supported*, as the guest
   can enable DebugSwap without KVM's knowledge.

 - Fix a bug in mmu_stress_tests where a vCPU could finish the "writes to RO
   memory" phase without actually generating a write-protection fault.

 - Fix a printf() goof in the SEV smoke test that causes build failures with
   -Werror.

 - Explicitly zero EAX and EBX in CPUID.0x8000_0022 output when PERFMON_V2
   isn't supported by KVM.
2025-03-09 03:44:06 -04:00
Linus Torvalds 1110ce6a1e 33 hotfixes. 24 are cc:stable and the remainder address post-6.13 issues
or aren't considered necessary for -stable kernels.
 
 26 are for MM and 7 are for non-MM.
 
 - "mm: memory_failure: unmap poisoned folio during migrate properly"
   from Ma Wupeng fixes a couple of two year old bugs involving the
   migration of hwpoisoned folios.
 
 - "selftests/damon: three fixes for false results" from SeongJae Park
   fixes three one year old bugs in the SAMON selftest code.
 
 The remainder are singletons and doubletons.  Please see the individual
 changelogs for details.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ8zgnAAKCRDdBJ7gKXxA
 jmTzAP9LsTIpkPkRXDpwxPR/Si5KPwOkE6sGj4ETEqbX3vUvcAEA/Lp7oafc7Vqr
 XxlC1VFush1ZK29Tecxzvnapl2/VSAs=
 =zBvY
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2025-03-08-16-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "33 hotfixes. 24 are cc:stable and the remainder address post-6.13
  issues or aren't considered necessary for -stable kernels.

  26 are for MM and 7 are for non-MM.

   - "mm: memory_failure: unmap poisoned folio during migrate properly"
     from Ma Wupeng fixes a couple of two year old bugs involving the
     migration of hwpoisoned folios.

   - "selftests/damon: three fixes for false results" from SeongJae Park
     fixes three one year old bugs in the SAMON selftest code.

  The remainder are singletons and doubletons. Please see the individual
  changelogs for details"

* tag 'mm-hotfixes-stable-2025-03-08-16-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (33 commits)
  mm/page_alloc: fix uninitialized variable
  rapidio: add check for rio_add_net() in rio_scan_alloc_net()
  rapidio: fix an API misues when rio_add_net() fails
  MAINTAINERS: .mailmap: update Sumit Garg's email address
  Revert "mm/page_alloc.c: don't show protection in zone's ->lowmem_reserve[] for empty zone"
  mm: fix finish_fault() handling for large folios
  mm: don't skip arch_sync_kernel_mappings() in error paths
  mm: shmem: remove unnecessary warning in shmem_writepage()
  userfaultfd: fix PTE unmapping stack-allocated PTE copies
  userfaultfd: do not block on locking a large folio with raised refcount
  mm: zswap: use ATOMIC_LONG_INIT to initialize zswap_stored_pages
  mm: shmem: fix potential data corruption during shmem swapin
  mm: fix kernel BUG when userfaultfd_move encounters swapcache
  selftests/damon/damon_nr_regions: sort collected regiosn before checking with min/max boundaries
  selftests/damon/damon_nr_regions: set ops update for merge results check to 100ms
  selftests/damon/damos_quota: make real expectation of quota exceeds
  include/linux/log2.h: mark is_power_of_2() with __always_inline
  NFS: fix nfs_release_folio() to not deadlock via kcompactd writeback
  mm, swap: avoid BUG_ON in relocate_cluster()
  mm: swap: use correct step in loop to wait all clusters in wait_for_allocation()
  ...
2025-03-08 14:34:06 -10:00
Kunihiko Hayashi a28d2f2398
selftests: pci_endpoint: Add GET_IRQTYPE checks to each interrupt test
Add GET_IRQTYPE API checks to each interrupt test.

While at it, change pci_ep_ioctl() to get the appropriate return
value from ioctl().

Suggested-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
Link: https://lore.kernel.org/r/20250225110252.28866-2-hayashi.kunihiko@socionext.com
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-03-08 14:36:00 +00:00
Niklas Cassel af1451b673
selftests: pci_endpoint: Skip disabled BARs
Currently BARs that have been disabled by the endpoint controller driver
will result in a test FAIL.

Returning FAIL for a BAR that is disabled seems overly pessimistic.

There are EPC that disables one or more BARs intentionally.

One reason for this is that there are certain EPCs that are hardwired to
expose internal PCIe controller registers over a certain BAR, so the EPC
driver disables such a BAR, such that the host will not overwrite random
registers during testing.

Such a BAR will be disabled by the EPC driver's init function, and the
BAR will be marked as BAR_RESERVED, such that it will be unavailable to
endpoint function drivers.

Let's return FAIL only for BARs that are actually enabled and failed the
test, and let's return skip for BARs that are not even enabled.

Signed-off-by: Niklas Cassel <cassel@kernel.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20250123120147.3603409-4-cassel@kernel.org
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-03-08 14:35:57 +00:00
Jakub Kicinski 93b1e05517 bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQ6NaUOruQGUkvPdG4raS+Z+3y5EwUCZ8qHFwAKCRAraS+Z+3y5
 E6QRAQCI/irYxpT6nBgT3ejyO6CIHjbHjdNP5KRkE8JT61zmJgD8Diet4dwwOxLK
 c1Pq5Jnl9KLpEPG99TcjtH3c++N3fww=
 =ltml
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Martin KaFai Lau says:

====================
pull-request: bpf-next 2025-03-06

We've added 6 non-merge commits during the last 13 day(s) which contain
a total of 6 files changed, 230 insertions(+), 56 deletions(-).

The main changes are:

1) Add XDP metadata support for tun driver, from Marcus Wichelmann.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
  selftests/bpf: Fix file descriptor assertion in open_tuntap helper
  selftests/bpf: Add test for XDP metadata support in tun driver
  selftests/bpf: Refactor xdp_context_functional test and bpf program
  selftests/bpf: Move open_tuntap to network helpers
  net: tun: Enable transfer of XDP metadata to skb
  net: tun: Enable XDP metadata support
====================

Link: https://patch.msgid.link/20250307055335.441298-1-martin.lau@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-07 19:08:49 -08:00
Willem de Bruijn 7ae495a537 selftests/net: add proc_net_pktgen to .gitignore
Ensure git doesn't pick up this new target.

Fixes: 03544faad7 ("selftest: net: add proc_net_pktgen")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250307031356.368350-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-07 19:08:14 -08:00
Linus Torvalds 2a520073e7 s390 updates for 6.14-rc6
- Fix return address recovery of traced function in ftrace to ensure
   reliable stack unwinding
 
 - Fix compiler warnings and runtime crashes of vDSO selftests on s390 by
   introducing a dedicated GNU hash bucket pointer with correct 32-bit
   entry size
 
 - Fix test_monitor_call() inline asm, which misses CC clobber, by
   switching to an instruction that doesn't modify CC
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAmfLedEACgkQjYWKoQLX
 FBhXMQf/bIaJEOV/Ww5PWY+AqtHPpQjhP2cvOfSRoh2py38Gp1sy/6ddpn8lcl3R
 skyeW4XHLpARYGg6NNOAOgFr43c7wQYrr+7uSW0zska+hOlSOi5TtycEOgu7o+VE
 Nig5XRuBPc01CBokFc+gbvMTjp2fTVlzTQ2/F/B7py62j2c21Y2/n7yZ4hhfQolM
 oiWlRDXzCRHM/sfa3FlML6sSgfIcHhhMSZLpdmRDhir9vBm1VpeGHa+FM5V8Pzsk
 KsiuoH7+ylbw+swVn2V/p3fqfS7JoopYdng8h40eLRkeHaTi052+NmMwv8gYAnCk
 BxfC1re5KlZ6kgWh0CDLUoYQHdwHxw==
 =KCjH
 -----END PGP SIGNATURE-----

Merge tag 's390-6.14-6' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 fixes from Vasily Gorbik:

 - Fix return address recovery of traced function in ftrace to ensure
   reliable stack unwinding

 - Fix compiler warnings and runtime crashes of vDSO selftests on s390
   by introducing a dedicated GNU hash bucket pointer with correct
   32-bit entry size

 - Fix test_monitor_call() inline asm, which misses CC clobber, by
   switching to an instruction that doesn't modify CC

* tag 's390-6.14-6' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/ftrace: Fix return address recovery of traced function
  selftests/vDSO: Fix GNU hash table entry size for s390x
  s390/traps: Fix test_monitor_call() inline assembly
2025-03-07 16:21:02 -10:00
Thomas Weißschuh a782d45c86 selftests/nolibc: stop testing constructor order
The execution order of constructors in undefined and depends on the
toolchain.  While recent toolchains seems to have a stable order, it
doesn't work for older ones and may also change at any time.

Stop validating the order and instead only validate that all
constructors are executed.

Reported-by: Willy Tarreau <w@1wt.eu>
Closes: https://lore.kernel.org/lkml/20250301110735.GA18621@1wt.eu/
Link: https://lore.kernel.org/r/20250306-nolibc-constructor-order-v1-1-68fd161cc5ec@weissschuh.net
Acked-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2025-03-07 07:34:12 +01:00
Jakub Kicinski 1cc3462159 selftests: openvswitch: don't hardcode the drop reason subsys
WiFi removed one of their subsys entries from drop reasons, in
commit 286e696770 ("wifi: mac80211: Drop cooked monitor support")
SKB_DROP_REASON_SUBSYS_OPENVSWITCH is now 2 not 3.
The drop reasons are not uAPI, read the correct value
from debug info.

We need to enable vmlinux BTF, otherwise pahole needs
a few GB of memory to decode the enum name.

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Acked-by: Aaron Conole <aconole@redhat.com>
Link: https://patch.msgid.link/20250304180615.945945-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06 16:45:43 -08:00
Jakub Kicinski 56a586961b selftests: net: bpf_offload: add 'libbpf_global' to ignored maps
After installing pahole on the CI image we have a new map created
by libbpf. Ignore it otherwise we see:

  Exception: Time out waiting for map counts to stabilize want 2, have 3

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250304233204.1139251-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06 15:41:37 -08:00
Jakub Kicinski f9d2f5ddd4 selftests: net: fix error message in bpf_offload
We hit a following exception on timeout, nmaps is never set:

    Test bpftool bound info reporting (own ns)...
    Traceback (most recent call last):
      File "/home/virtme/testing-1/tools/testing/selftests/net/./bpf_offload.py", line 1128, in <module>
        check_dev_info(False, "")
      File "/home/virtme/testing-1/tools/testing/selftests/net/./bpf_offload.py", line 583, in check_dev_info
        maps = bpftool_map_list_wait(expected=2, ns=ns)
      File "/home/virtme/testing-1/tools/testing/selftests/net/./bpf_offload.py", line 215, in bpftool_map_list_wait
        raise Exception("Time out waiting for map counts to stabilize want %d, have %d" % (expected, nmaps))
    NameError: name 'nmaps' is not defined

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250304233204.1139251-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06 15:41:37 -08:00
Louis Taylor 6e406202a4 selftests/nolibc: use O_RDONLY flag instead of 0
This doesn't matter much, but is what the standard says.

Signed-off-by: Louis Taylor <louis@kragniz.eu>
Link: https://lore.kernel.org/r/20250306184147.208723-5-louis@kragniz.eu
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2025-03-06 22:30:21 +01:00
Louis Taylor b2edaad7f5 tools/nolibc: add support for openat(2)
openat is useful to avoid needing to construct relative paths, so expose
a wrapper for using it directly.

Signed-off-by: Louis Taylor <louis@kragniz.eu>
Link: https://lore.kernel.org/r/20250306184147.208723-1-louis@kragniz.eu
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2025-03-06 22:30:20 +01:00
Ingo Molnar 82354fce16 Merge branch 'sched/urgent' into sched/core, to pick up dependent commits
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-03-06 22:26:36 +01:00
Jakub Kicinski 2525e16a2b Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.14-rc6).

Conflicts:

net/ethtool/cabletest.c
  2bcf4772e4 ("net: ethtool: try to protect all callback with netdev instance lock")
  637399bf7e ("net: ethtool: netlink: Allow NULL nlattrs when getting a phy_device")

No Adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06 13:03:35 -08:00
Marcus Wichelmann 49306d5bfc selftests/bpf: Fix file descriptor assertion in open_tuntap helper
The open_tuntap helper function uses open() to get a file descriptor for
/dev/net/tun.

The open(2) manpage writes this about its return value:

  On success, open(), openat(), and creat() return the new file
  descriptor (a nonnegative integer).  On error, -1 is returned and
  errno is set to indicate the error.

This means that the fd > 0 assertion in the open_tuntap helper is
incorrect and should rather check for fd >= 0.

When running the BPF selftests locally, this incorrect assertion was not
an issue, but the BPF kernel-patches CI failed because of this:

  open_tuntap:FAIL:open(/dev/net/tun) unexpected open(/dev/net/tun):
  actual 0 <= expected 0

Signed-off-by: Marcus Wichelmann <marcus.wichelmann@hetzner-cloud.de>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250305213438.3863922-7-marcus.wichelmann@hetzner-cloud.de
2025-03-06 12:31:08 -08:00
Marcus Wichelmann 73eeecc3cd selftests/bpf: Add test for XDP metadata support in tun driver
Add a selftest that creates a tap device, attaches XDP and TC programs,
writes a packet with a test payload into the tap device and checks the
test result. This test ensures that the XDP metadata support in the tun
driver is enabled and that the metadata size is correctly passed to the
skb.

See the previous commit ("selftests/bpf: refactor xdp_context_functional
test and bpf program") for details about the test design.

The test runs in its own network namespace. This provides some extra
safety against conflicting interface names.

Signed-off-by: Marcus Wichelmann <marcus.wichelmann@hetzner-cloud.de>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250305213438.3863922-6-marcus.wichelmann@hetzner-cloud.de
2025-03-06 12:31:08 -08:00
Marcus Wichelmann b46aa22b66 selftests/bpf: Refactor xdp_context_functional test and bpf program
The existing XDP metadata test works by creating a veth pair and
attaching XDP & TC programs that drop the packet when the condition of
the test isn't fulfilled. The test then pings through the veth pair and
succeeds when the ping comes through.

While this test works great for a veth pair, it is hard to replicate for
tap devices to test the XDP metadata support of them. A similar test for
the tun driver would either involve logic to reply to the ping request,
or would have to capture the packet to check if it was dropped or not.

To make the testing of other drivers easier while still maximizing code
reuse, this commit refactors the existing xdp_context_functional test to
use a test_result map. Instead of conditionally passing or dropping the
packet, the TC program is changed to copy the received metadata into the
value of that single-entry array map. Tests can then verify that the map
value matches the expectation.

This testing logic is easy to adapt to other network drivers as the only
remaining requirement is that there is some way to send a custom
Ethernet packet through it that triggers the XDP & TC programs.

The Ethernet header of that custom packet is all-zero, because it is not
required to be valid for the test to work. The zero ethertype also helps
to filter out packets that are not related to the test and would
otherwise interfere with it.

The payload of the Ethernet packet is used as the test data that is
expected to be passed as metadata from the XDP to the TC program and
written to the map. It has a fixed size of 32 bytes which is a
reasonable size that should be supported by both drivers. Additional
packet headers are not necessary for the test and were therefore skipped
to keep the testing code short.

This new testing methodology no longer requires the veth interfaces to
have IP addresses assigned, therefore these were removed.

Signed-off-by: Marcus Wichelmann <marcus.wichelmann@hetzner-cloud.de>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250305213438.3863922-5-marcus.wichelmann@hetzner-cloud.de
2025-03-06 12:31:08 -08:00
Marcus Wichelmann d5ca409c86 selftests/bpf: Move open_tuntap to network helpers
To test the XDP metadata functionality of the tun driver, it's necessary
to create a new tap device first. A helper function for this already
exists in lwt_helpers.h. Move it to the common network helpers header,
so it can be reused in other tests.

Signed-off-by: Marcus Wichelmann <marcus.wichelmann@hetzner-cloud.de>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250305213438.3863922-4-marcus.wichelmann@hetzner-cloud.de
2025-03-06 12:31:08 -08:00
SeongJae Park 582ccf78f6 selftests/damon/damon_nr_regions: sort collected regiosn before checking with min/max boundaries
damon_nr_regions.py starts DAMON, periodically collect number of regions
in snapshots, and see if it is in the requested range.  The check code
assumes the numbers are sorted on the collection list, but there is no
such guarantee.  Hence this can result in false positive test success. 
Sort the list before doing the check.

Link: https://lkml.kernel.org/r/20250225222333.505646-4-sj@kernel.org
Fixes: 781497347d ("selftests/damon: implement test for min/max_nr_regions")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-05 21:36:16 -08:00
SeongJae Park 695469c07a selftests/damon/damon_nr_regions: set ops update for merge results check to 100ms
damon_nr_regions.py updates max_nr_regions to a number smaller than
expected number of real regions and confirms DAMON respect the harsh
limit.  To give time for DAMON to make changes for the regions, 3
aggregation intervals (300 milliseconds) are given.

The internal mechanism works with not only the max_nr_regions, but also
sz_limit, though.  It avoids merging region if that casn make region of
size larger than sz_limit.  In the test, sz_limit is set too small to
achive the new max_nr_regions, unless it is updated for the new
min_nr_regions.  But the update is done only once per operations set
update interval, which is one second by default.

Hence, the test randomly incurs false positive failures.  Fix it by
setting the ops interval same to aggregation interval, to make sure
sz_limit is updated by the time of the check.

Link: https://lkml.kernel.org/r/20250225222333.505646-3-sj@kernel.org
Fixes: 8bf890c816 ("selftests/damon/damon_nr_regions: test online-tuned max_nr_regions")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-05 21:36:16 -08:00
SeongJae Park 1c684d77df selftests/damon/damos_quota: make real expectation of quota exceeds
Patch series "selftests/damon: three fixes for false results".

Fix three DAMON selftest bugs that cause two and one false positive
failures and successes.


This patch (of 3):

damos_quota.py assumes the quota will always exceeded.  But whether quota
will be exceeded or not depend on the monitoring results.  Actually the
monitored workload has chaning access pattern and hence sometimes the
quota may not really be exceeded.  As a result, false positive test
failures happen.  Expect how much time the quota will be exceeded by
checking the monitoring results, and use it instead of the naive
assumption.

Link: https://lkml.kernel.org/r/20250225222333.505646-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20250225222333.505646-2-sj@kernel.org
Fixes: 51f58c9da1 ("selftests/damon: add a test for DAMOS quota")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-05 21:36:16 -08:00
SeongJae Park 349db086a6 selftests/damon/damos_quota_goal: handle minimum quota that cannot be further reduced
damos_quota_goal.py selftest see if DAMOS quota goals tuning feature
increases or reduces the effective size quota for given score as expected.
The tuning feature sets the minimum quota size as one byte, so if the
effective size quota is already one, we cannot expect it further be
reduced.  However the test is not aware of the edge case, and fails since
it shown no expected change of the effective quota.  Handle the case by
updating the failure logic for no change to see if it was the case, and
simply skips to next test input.

Link: https://lkml.kernel.org/r/20250217182304.45215-1-sj@kernel.org
Fixes: f1c07c0a16 ("selftests/damon: add a test for DAMOS quota goal")
Signed-off-by: SeongJae Park <sj@kernel.org>
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202502171423.b28a918d-lkp@intel.com
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>	[6.10.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-05 21:36:12 -08:00
John Hubbard 0a7565ee6e Revert "selftests/mm: remove local __NR_* definitions"
This reverts commit a5c6bc5900.

The general approach described in commit e076eaca59 ("selftests: break
the dependency upon local header files") was taken one step too far here:
it should not have been extended to include the syscall numbers.  This is
because doing so would require per-arch support in tools/include/uapi, and
no such support exists.

This revert fixes two separate reports of test failures, from Dave
Hansen[1], and Li Wang[2].  An excerpt of Dave's report:

Before this commit (a5c6bc5900) things are
fine.  But after, I get:

	running PKEY tests for unsupported CPU/OS

An excerpt of Li's report:

    I just found that mlock2_() return a wrong value in mlock2-test

[1] https://lore.kernel.org/dc585017-6740-4cab-a536-b12b37a7582d@intel.com
[2] https://lore.kernel.org/CAEemH2eW=UMu9+turT2jRie7+6ewUazXmA6kL+VBo3cGDGU6RA@mail.gmail.com

Link: https://lkml.kernel.org/r/20250214033850.235171-1-jhubbard@nvidia.com
Fixes: a5c6bc5900 ("selftests/mm: remove local __NR_* definitions")
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Li Wang <liwang@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Andrei Vagin <avagin@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Kees Cook <kees@kernel.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-05 21:36:12 -08:00
Atish Patra ee4e778c58 KVM: riscv: selftests: Allow number of interrupts to be configurable
It is helpful to vary the number of the LCOFI interrupts generated
by the overflow test. Allow additional argument for overflow test
to accommodate that. It can be easily cross-validated with
/proc/interrupts output in the host.

Signed-off-by: Atish Patra <atishp@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20250303-kvm_pmu_improve-v2-4-41d177e45929@rivosinc.com
Signed-off-by: Anup Patel <anup@brainfault.org>
2025-03-06 09:57:07 +05:30
Atish Patra 4b506adfea KVM: riscv: selftests: Change command line option
The PMU test commandline option takes an argument to disable a
certain test. The initial assumption behind this was a common use case
is just to run all the test most of the time. However, running a single
test seems more useful instead. Especially, the overflow test has been
helpful to validate PMU virtualizaiton interrupt changes.

Switching the command line option to run a single test instead
of disabling a single test also allows to provide additional
test specific arguments to the test. The default without any options
remains unchanged which continues to run all the tests.

Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20250303-kvm_pmu_improve-v2-3-41d177e45929@rivosinc.com
Signed-off-by: Anup Patel <anup@brainfault.org>
2025-03-06 09:57:07 +05:30
Atish Patra 1f6bbe1255 KVM: riscv: selftests: Do not start the counter in the overflow handler
There is no need to start the counter in the overflow handler as we
intend to trigger precise number of LCOFI interrupts through these
tests. The overflow irq handler has already stopped the counter. As
a result, the stop call from the test function may return already
stopped error which is fine as well.

Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20250303-kvm_pmu_improve-v2-2-41d177e45929@rivosinc.com
Signed-off-by: Anup Patel <anup@brainfault.org>
2025-03-06 09:57:07 +05:30
Catalin Marinas 306219d59b kselftest/arm64: mte: Skip the hugetlb tests if MTE not supported on such mappings
While the kselftest was added at the same time with the kernel support
for MTE on hugetlb mappings, the tests may be run on older kernels. Skip
the tests if PROT_MTE is not supported on MAP_HUGETLB mappings.

Fixes: 27879e8cb6 ("selftests: arm64: add hugetlb mte tests")
Cc: Yang Shi <yang@os.amperecomputing.com>
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Reviewed-by: Yang Shi <yang@os.amperecomputing.com>
Link: https://lore.kernel.org/r/20250221093331.2184245-3-catalin.marinas@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-03-05 18:54:02 +00:00
Catalin Marinas 7ae95109c6 kselftest/arm64: mte: Use the correct naming for tag check modes in check_hugetlb_options.c
The architecture doesn't define precise/imprecise MTE tag check modes,
only synchronous and asynchronous. Use the correct naming and also
ensure they match the MTE_{ASYNC,SYNC}_ERR type.

Fixes: 27879e8cb6 ("selftests: arm64: add hugetlb mte tests")
Cc: Yang Shi <yang@os.amperecomputing.com>
Reviewed-by: Yang Shi <yang@os.amperecomputing.com>
Link: https://lore.kernel.org/r/20250221093331.2184245-2-catalin.marinas@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-03-05 18:54:02 +00:00
Wojtek Wasko 76868642e4 testptp: Add option to open PHC in readonly mode
PTP Hardware Clocks no longer require WRITE permission to perform
readonly operations, such as listing device capabilities or listening to
EXTTS events once they have been enabled by a process with WRITE
permissions.

Add '-r' option to testptp to open the PHC in readonly mode instead of
the default read-write mode. Skip enabling EXTTS if readonly mode is
requested.

Acked-by: Richard Cochran <richardcochran@gmail.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Signed-off-by: Wojtek Wasko <wwasko@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2025-03-05 12:43:54 +00:00
Christian Brauner 56f235da15
selftests/pidfd: add seventh PIDFD_INFO_EXIT selftest
Add a selftest for PIDFD_INFO_EXIT behavior.

Link: https://lore.kernel.org/r/20250305-work-pidfs-kill_on_last_close-v3-16-c8c3d8361705@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-05 13:26:21 +01:00
Christian Brauner 1c4f2dbe42
selftests/pidfd: add sixth PIDFD_INFO_EXIT selftest
Add a selftest for PIDFD_INFO_EXIT behavior.

Link: https://lore.kernel.org/r/20250305-work-pidfs-kill_on_last_close-v3-15-c8c3d8361705@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-05 13:26:21 +01:00
Christian Brauner 2adf6ca638
selftests/pidfd: add fifth PIDFD_INFO_EXIT selftest
Add a selftest for PIDFD_INFO_EXIT behavior.

Link: https://lore.kernel.org/r/20250305-work-pidfs-kill_on_last_close-v3-14-c8c3d8361705@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-05 13:26:21 +01:00
Christian Brauner 2e94e4c649
selftests/pidfd: add fourth PIDFD_INFO_EXIT selftest
Add a selftest for PIDFD_INFO_EXIT behavior.

Link: https://lore.kernel.org/r/20250305-work-pidfs-kill_on_last_close-v3-13-c8c3d8361705@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-05 13:26:21 +01:00
Christian Brauner a79975f05e
selftests/pidfd: add third PIDFD_INFO_EXIT selftest
Add a selftest for PIDFD_INFO_EXIT behavior.

Link: https://lore.kernel.org/r/20250305-work-pidfs-kill_on_last_close-v3-12-c8c3d8361705@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-05 13:26:21 +01:00
Christian Brauner 86c1dfdd52
selftests/pidfd: add second PIDFD_INFO_EXIT selftest
Add a selftest for PIDFD_INFO_EXIT behavior.

Link: https://lore.kernel.org/r/20250305-work-pidfs-kill_on_last_close-v3-11-c8c3d8361705@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-05 13:26:21 +01:00
Christian Brauner 853ab1ff2c
selftests/pidfd: add first PIDFD_INFO_EXIT selftest
Add a selftest for PIDFD_INFO_EXIT behavior.

Link: https://lore.kernel.org/r/20250305-work-pidfs-kill_on_last_close-v3-10-c8c3d8361705@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-05 13:26:21 +01:00
Christian Brauner ddf5315526
selftests/pidfd: expand common pidfd header
Move more infrastructure to the pidfd header.

Link: https://lore.kernel.org/r/20250305-work-pidfs-kill_on_last_close-v3-9-c8c3d8361705@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-05 13:26:20 +01:00
Christian Brauner 18938f7180
pidfs/selftests: ensure correct headers for ioctl handling
Ensure that necessary ioctl infrastructure is available.

Link: https://lore.kernel.org/r/20250305-work-pidfs-kill_on_last_close-v3-8-c8c3d8361705@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-05 13:26:20 +01:00
Christian Brauner 17457a47d2
selftests/pidfd: fix header inclusion
Ensure that necessary defines are present.

Link: https://lore.kernel.org/r/20250305-work-pidfs-kill_on_last_close-v3-7-c8c3d8361705@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-05 13:26:20 +01:00
Boqun Feng 467c890f2d Merge branches 'docs.2025.02.04a', 'lazypreempt.2025.03.04a', 'misc.2025.03.04a', 'srcu.2025.02.05a' and 'torture.2025.02.05a' 2025-03-04 18:47:32 -08:00
Paul E. McKenney 8d608f0801 rcutorture: Make scenario TREE07 build CONFIG_PREEMPT_LAZY=y
This commit tests lazy preemption by causing the TREE07 rcutorture
scenario to build its kernel with CONFIG_PREEMPT_LAZY=y.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-03-04 18:46:47 -08:00
Paul E. McKenney 118559a994 rcutorture: Make scenario TREE10 build CONFIG_PREEMPT_LAZY=y
This commit tests lazy preemption by causing the TREE10 rcutorture
scenario to build its kernel with CONFIG_PREEMPT_LAZY=y.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-03-04 18:46:47 -08:00
Uladzislau Rezki (Sony) a6cea3954e rcu: Update TREE05.boot to test normal synchronize_rcu()
Add extra parameters for rcutorture module. One is the "nfakewriters"
which is set -1. There will be created number of test-kthreads which
correspond to number of CPUs in a test system. Those threads randomly
invoke synchronize_rcu() call.

Apart of that "rcu_normal" is set to 1, because it is specifically for
a normal synchronize_rcu() testing, also a newly added parameter which
is "rcu_normal_wake_from_gp" is set to 1 also. That prevents interaction
with other callbacks in a system.

Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Link: https://lore.kernel.org/r/20250227131613.52683-2-urezki@gmail.com
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-03-04 18:44:29 -08:00
Jakub Kicinski ea4342739d selftests: drv-net: use env.rpath in the HDS test
Commit 29b036be1b ("selftests: drv-net: test XDP, HDS auto and
the ioctl path") added a new test case in the net tree, now that
this code has made its way to net-next convert it to use the env.rpath()
helper instead of manually computing the relative path.

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250228212956.25399-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-04 17:14:23 -08:00
Gang Yan ba24001665 selftests: mptcp: add a test for mptcp_diag_dump_one
This patch introduces a new 'chk_diag' test in diag.sh. It retrieves
the token for a specified MPTCP socket (msk) using the 'ss' command and
then accesses the 'mptcp_diag_dump_one' in kernel via ./mptcp_diag
to verify if the correct token is returned.

Link: https://github.com/multipath-tcp/mptcp_net-next/issues/524
Signed-off-by: Gang Yan <yangang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250228-net-next-mptcp-coverage-small-opti-v1-2-f933c4275676@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-04 16:57:38 -08:00
Gang Yan 00f5e338cf selftests: mptcp: Add a tool to get specific msk_info
This patch enables the retrieval of the mptcp_info structure corresponding
to a specified MPTCP socket (msk). When multiple MPTCP connections are
present, specific information can be obtained for a given connection
through the 'mptcp_diag_dump_one' by using the 'token' associated with
the msk.

Signed-off-by: Gang Yan <yangang@kylinos.cn>
Co-developed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250228-net-next-mptcp-coverage-small-opti-v1-1-f933c4275676@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-04 16:57:37 -08:00
Yi Lai df6f8c4d72 selftests/pcie_bwctrl: Add 'set_pcie_speed.sh' to TEST_PROGS
The test shell script "set_pcie_speed.sh" is not installed in INSTALL_PATH.
Attempting to execute set_pcie_cooling_state.sh shows warning:

  ./set_pcie_cooling_state.sh: line 119: ./set_pcie_speed.sh: No such file or directory

Add "set_pcie_speed.sh" to TEST_PROGS.

Link: https://lore.kernel.org/r/Z8FfK8rN30lKzvVV@ly-workstation
Fixes: 838f12c3d5 ("selftests/pcie_bwctrl: Create selftests")
Signed-off-by: Yi Lai <yi1.lai@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-04 16:49:19 -06:00
Thomas Weißschuh a22ee38d2e selftests/vDSO: Fix GNU hash table entry size for s390x
Commit 14be4e6f35 ("selftests: vDSO: fix ELF hash table entry size for s390x")
changed the type of the ELF hash table entries to 64bit on s390x.
However the *GNU* hash tables entries are always 32bit.
The "bucket" pointer is shared between both hash algorithms.
On s390, this caused the GNU hash algorithm to access its 32-bit entries as if they
were 64-bit, triggering compiler warnings (assignment between "Elf64_Xword *" and
"Elf64_Word *") and runtime crashes.

Introduce a new dedicated "gnu_bucket" pointer which is used by the GNU hash.

Fixes: e0746bde6f ("selftests/vDSO: support DT_GNU_HASH")
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/r/20250217-selftests-vdso-s390-gnu-hash-v2-1-f6c2532ffe2a@linutronix.de
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2025-03-04 17:15:18 +01:00
Bharadwaj Raju 82ef781f24 selftests/ftrace: add 'poll' binary to gitignore
When building this test, a binary file 'poll' is
generated and should be gitignore'd.

Link: https://lore.kernel.org/r/20250210160138.4745-1-bharadwaj.raju777@gmail.com
Signed-off-by: Bharadwaj Raju <bharadwaj.raju777@gmail.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Shuah Khan <shuah@kernel.org>
2025-03-04 08:51:17 -07:00
Breno Leitao d7a2522426 netconsole: selftest: add task name append testing
Add test coverage for the netconsole task name feature to the existing
sysdata selftest script. This extends the test infrastructure to verify
that task names are correctly appended when enabled and absent when
disabled.

The test validates that:
  - Task names appear in the expected format "taskname=<name>"
  - Task names are included when the feature is enabled
  - Task names are excluded when the feature is disabled
  - The feature works correctly alongside other sysdata fields like CPU

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-04 15:28:28 +01:00
Yi Liu 1062d81086 iommufd: Disallow allocating nested parent domain with fault ID
Allocating a domain with a fault ID indicates that the domain is faultable.
However, there is a gap for the nested parent domain to support PRI. Some
hardware lacks the capability to distinguish whether PRI occurs at stage 1
or stage 2. This limitation may require software-based page table walking
to resolve. Since no in-tree IOMMU driver currently supports this
functionality, it is disallowed. For more details, refer to the related
discussion at [1].

[1] https://lore.kernel.org/linux-iommu/bd1655c6-8b2f-4cfa-adb1-badc00d01811@intel.com/

Link: https://patch.msgid.link/r/20250226104012.82079-1-yi.l.liu@intel.com
Suggested-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-04 09:34:54 -04:00
Nicolas Dichtel 0c493da863 net: rename netns_local to netns_immutable
The name 'netns_local' is confusing. A following commit will export it via
netlink, so let's use a more explicit name.

Reported-by: Eric Dumazet <edumazet@google.com>
Suggested-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-04 12:44:48 +01:00
Ingo Molnar cfdaa618de Merge branch 'x86/cpu' into x86/asm, to pick up dependent commits
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-03-04 11:19:21 +01:00
Ingo Molnar 1b4c36f9b1 Merge branch 'x86/urgent' into x86/cpu, to pick up dependent commits
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-03-04 11:15:26 +01:00
Peter Seiderer 03544faad7 selftest: net: add proc_net_pktgen
Add some test for /proc/net/pktgen/... interface.

- enable 'CONFIG_NET_PKTGEN=m' in tools/testing/selftests/net/config

Signed-off-by: Peter Seiderer <ps.report@gmx.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-04 10:57:58 +01:00
Christian Brauner e5c10b19b3
selftests: test subdirectory mounting
This tests mounting a subdirectory without ever having to expose the
filesystem to a non-anonymous mount namespace.

Link: https://lore.kernel.org/r/20250225-work-mount-propagation-v1-3-e6e3724500eb@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-04 09:29:55 +01:00
Christian Brauner 12e40cbb0e
selftests: add test for detached mount tree propagation
Test that detached mount trees receive propagation events.

Link: https://lore.kernel.org/r/20250225-work-mount-propagation-v1-2-e6e3724500eb@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-04 09:29:55 +01:00
Christian Brauner 7bece690a9
selftests: seventh test for mounting detached mounts onto detached mounts
Add a test to verify that detached mounts behave correctly.

Link: https://lore.kernel.org/r/20250221-brauner-open_tree-v1-16-dbcfcb98c676@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-04 09:29:54 +01:00
Christian Brauner 0e6eef95df
selftests: sixth test for mounting detached mounts onto detached mounts
Add a test to verify that detached mounts behave correctly.

Link: https://lore.kernel.org/r/20250221-brauner-open_tree-v1-15-dbcfcb98c676@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-04 09:29:54 +01:00
Christian Brauner d346ae7a64
selftests: fifth test for mounting detached mounts onto detached mounts
Add a test to verify that detached mounts behave correctly.

Link: https://lore.kernel.org/r/20250221-brauner-open_tree-v1-14-dbcfcb98c676@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-04 09:29:53 +01:00
Christian Brauner 1768f9bd53
selftests: fourth test for mounting detached mounts onto detached mounts
Add a test to verify that detached mounts behave correctly.

Link: https://lore.kernel.org/r/20250221-brauner-open_tree-v1-13-dbcfcb98c676@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-04 09:29:53 +01:00
Christian Brauner 4fd0aea1ea
selftests: third test for mounting detached mounts onto detached mounts
Add a test to verify that detached mounts behave correctly.

Link: https://lore.kernel.org/r/20250221-brauner-open_tree-v1-12-dbcfcb98c676@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-04 09:29:53 +01:00
Christian Brauner 89ddfd4af1
selftests: second test for mounting detached mounts onto detached mounts
Add a test to verify that detached mounts behave correctly.

Link: https://lore.kernel.org/r/20250221-brauner-open_tree-v1-11-dbcfcb98c676@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-04 09:29:53 +01:00
Christian Brauner d83a55b826
selftests: first test for mounting detached mounts onto detached mounts
Add a test to verify that detached mounts behave correctly.

Link: https://lore.kernel.org/r/20250221-brauner-open_tree-v1-10-dbcfcb98c676@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-04 09:29:53 +01:00
Christian Brauner 3f1724dd56
selftests: create detached mounts from detached mounts
Link: https://lore.kernel.org/r/20250221-brauner-open_tree-v1-7-dbcfcb98c676@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-04 09:29:53 +01:00
Jakub Kicinski d110dbf149 selftests: net: report output format as TAP 13 in Python tests
The Python lib based tests report that they are producing
"KTAP version 1", but really we aren't making use of any
KTAP features, like subtests. Our output is plain TAP.

Report TAP 13 instead of KTAP 1, this is what mptcp tests do,
and what NIPA knows how to parse best. For HW testing we need
precise subtest result tracking.

Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250228180007.83325-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03 15:03:19 -08:00
Thomas Weißschuh 8770a9183f selftests: vDSO: vdso_standalone_test_x86: Switch to nolibc
vdso_standalone_test_x86 provides its own ASM syscall wrappers and
_start() implementation. The in-tree nolibc library already provides
this functionality for multiple architectures. By making use of nolibc,
the standalone testcase can be built from the exact same codebase as the
non-standalone version.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/all/20250226-parse_vdso-nolibc-v2-16-28e14e031ed8@linutronix.de
2025-03-03 20:00:13 +01:00
Thomas Weißschuh 4f65df6a58 selftests: vDSO: vdso_test_gettimeofday: Make compatible with nolibc
nolibc does not provide sys/time.h and sys/auxv.h,
instead their definitions are available unconditionally.

Guard the includes so they are not attempted on nolibc.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/all/20250226-parse_vdso-nolibc-v2-15-28e14e031ed8@linutronix.de
2025-03-03 20:00:13 +01:00
Thomas Weißschuh 97a8814124 selftests: vDSO: vdso_test_gettimeofday: Clean up includes
Some unnecessary headers are included, remove them.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/all/20250226-parse_vdso-nolibc-v2-14-28e14e031ed8@linutronix.de
2025-03-03 20:00:13 +01:00
Thomas Weißschuh 032e871686 selftests: vDSO: parse_vdso: Test __SIZEOF_LONG__ instead of ULONG_MAX
According to limits.h(2) ULONG_MAX is only guaranteed to expand to an
expression, not a symbolic constant which can be evaluated by the
preprocessor.

Specifically the definition of ULONG_MAX from nolibc can not be evaluated
by the preprocessor. To provide compatibility with nolibc, check with
__SIZEOF_LONG__ instead, with is provided directly by the preprocessor
and therefore always a symbolic constant.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/all/20250226-parse_vdso-nolibc-v2-13-28e14e031ed8@linutronix.de
2025-03-03 20:00:13 +01:00
Thomas Weißschuh c9fbaa8795 selftests: vDSO: parse_vdso: Use UAPI headers instead of libc headers
To allow the usage of parse_vdso.c together with a limited libc like
nolibc, use the kernels own elf.h and auxvec.h headers.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/all/20250226-parse_vdso-nolibc-v2-12-28e14e031ed8@linutronix.de
2025-03-03 20:00:12 +01:00
Thomas Weißschuh 09dcec6470 selftests: vDSO: parse_vdso: Drop vdso_init_from_auxv()
There are no users left.

This also removes the usage of ElfXX_auxv_t, which is not formally
standardized.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/all/20250226-parse_vdso-nolibc-v2-11-28e14e031ed8@linutronix.de
2025-03-03 20:00:12 +01:00
Thomas Weißschuh 05c204acf5 selftests: vDSO: vdso_standalone_test_x86: Use vdso_init_form_sysinfo_ehdr
vdso_standalone_test_x86 is the only user of vdso_init_from_auxv().
Instead of combining the parsing the aux vector with the parsing of the
vDSO, split them apart into getauxval() and the regular
vdso_init_from_sysinfo_ehdr().

The implementation of getauxval() is taken from
tools/include/nolibc/stdlib.h.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/all/20250226-parse_vdso-nolibc-v2-10-28e14e031ed8@linutronix.de
2025-03-03 20:00:12 +01:00
Thomas Weißschuh 1a59f5d315 selftests: Add headers target
Some selftests need access to a full UAPI headers tree, for example when
building with nolibc which heavily relies on UAPI headers.
A reference to such a tree is available in the KHDR_INCLUDES variable,
but there is currently no way to populate such a tree automatically.

Provide a target that the tests can depend on to get access to usable
UAPI headers.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/all/20250226-parse_vdso-nolibc-v2-8-28e14e031ed8@linutronix.de
2025-03-03 20:00:12 +01:00
Tejun Heo 8a9b1585e2 sched_ext: Merge branch 'for-6.14-fixes' into for-6.15
Pull for-6.14-fixes to receive:

  9360dfe4cb ("sched_ext: Validate prev_cpu in scx_bpf_select_cpu_dfl()")

which conflicts with:

  337d1b354a ("sched_ext: Move built-in idle CPU selection policy to a separate file")

Signed-off-by: Tejun Heo <tj@kernel.org>
2025-03-03 08:02:22 -10:00
Sean Christopherson 3b2d3db368 KVM: selftests: Fix printf() format goof in SEV smoke test
Print out the index of mismatching XSAVE bytes using unsigned decimal
format.  Some versions of clang complain about trying to print an integer
as an unsigned char.

  x86/sev_smoke_test.c:55:51: error: format specifies type 'unsigned char'
                                     but the argument has type 'int' [-Werror,-Wformat]

Fixes: 8c53183dba ("selftests: kvm: add test for transferring FPU state into VMSA")
Link: https://lore.kernel.org/r/20250228233852.3855676-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-03-03 07:45:34 -08:00
Sean Christopherson d88ed5fb7c KVM: selftests: Ensure all vCPUs hit -EFAULT during initial RO stage
During the initial mprotect(RO) stage of mmu_stress_test, keep vCPUs
spinning until all vCPUs have hit -EFAULT, i.e. until all vCPUs have tried
to write to a read-only page.  If a vCPU manages to complete an entire
iteration of the loop without hitting a read-only page, *and* the vCPU
observes mprotect_ro_done before starting a second iteration, then the
vCPU will prematurely fall through to GUEST_SYNC(3) (on x86 and arm64) and
get out of sequence.

Replace the "do-while (!r)" loop around the associated _vcpu_run() with
a single invocation, as barring a KVM bug, the vCPU is guaranteed to hit
-EFAULT, and retrying on success is super confusion, hides KVM bugs, and
complicates this fix.  The do-while loop was semi-unintentionally added
specifically to fudge around a KVM x86 bug, and said bug is unhittable
without modifying the test to force x86 down the !(x86||arm64) path.

On x86, if forced emulation is enabled, vcpu_arch_put_guest() may trigger
emulation of the store to memory.  Due a (very, very) longstanding bug in
KVM x86's emulator, emulate writes to guest memory that fail during
__kvm_write_guest_page() unconditionally return KVM_EXIT_MMIO.  While that
is desirable in the !memslot case, it's wrong in this case as the failure
happens due to __copy_to_user() hitting a read-only page, not an emulated
MMIO region.

But as above, x86 only uses vcpu_arch_put_guest() if the __x86_64__ guards
are clobbered to force x86 down the common path, and of course the
unexpected MMIO is a KVM bug, i.e. *should* cause a test failure.

Fixes: b6c304aec6 ("KVM: selftests: Verify KVM correctly handles mprotect(PROT_READ)")
Reported-by: Yan Zhao <yan.y.zhao@intel.com>
Closes: https://lore.kernel.org/all/20250208105318.16861-1-yan.y.zhao@intel.com
Debugged-by: Yan Zhao <yan.y.zhao@intel.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Yan Zhao <yan.y.zhao@intel.com>
Link: https://lore.kernel.org/r/20250228230804.3845860-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-03-03 07:37:28 -08:00
Mirsad Todorovac 40fc756101 selftests/x86/syscall: Fix coccinelle WARNING recommending the use of ARRAY_SIZE()
Coccinelle gives WARNING recommending the use of ARRAY_SIZE() macro definition
to improve the code readability:

  ./tools/testing/selftests/x86/syscall_numbering.c:316:35-36: WARNING: Use ARRAY_SIZE

Fixes: 15c82d98a0 ("selftests/x86/syscall: Update and extend syscall_numbering_64")
Signed-off-by: Mirsad Todorovac <mtodorovac69@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Link: https://lore.kernel.org/r/20241101111523.1293193-2-mtodorovac69@gmail.com
2025-03-03 12:38:49 +01:00
Thomas Weißschuh cb839e0cc8 selftests/nolibc: add armthumb configuration
While nolibc does support ARM Thumb instructions,
that support was not tested specifically.

Add a new test configuration for it.

Tested-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20250301-nolibc-armthumb-v1-2-d1f04abb5f6d@weissschuh.net
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2025-03-02 13:25:20 +01:00
Thomas Weißschuh f8bedb30d6 selftests/nolibc: explicitly enable ARM mode
The default could also be -mthumb.

Explicitly use -marm to keep everything predictable.

Tested-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20250301-nolibc-armthumb-v1-1-d1f04abb5f6d@weissschuh.net
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2025-03-02 13:25:20 +01:00
Linus Torvalds 5c44ddaf7d Tracing fixes for v6.14:
- Fix crash from bad histogram entry
 
   An error path in the histogram creation could leave an entry
   in a link list that gets freed. Then when a new entry is added
   it can cause a u-a-f bug. This is fixed by restructuring the code
   so that the histogram is consistent on failure and everything is
   cleaned up appropriately.
 
 - Fix fprobe self test
 
   The fprobe self test relies on no function being attached by ftrace.
   BPF programs can attach to functions via ftrace and systemd now
   does so. This causes those functions to appear in the enabled_functions
   list which holds all functions attached by ftrace. The selftest also
   uses that file to see if functions are being connected correctly.
   It counts the functions in the file, but if there's already functions
   in the file, it fails. Instead, add the number of functions in the file
   at the start of the test to all the calculations during the test.
 
 - Fix potential division by zero of the function profiler stddev
 
   The calculated divisor that calculates the standard deviation of
   the function times can overflow. If the overflow happens to land
   on zero, that can cause a division by zero. Check for zero from
   the calculation before doing the division.
 
   TODO: Catch when it ever overflows and report it accordingly.
         For now, just prevent the system from crashing.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZ8HqYBQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qpoXAP90gvO2LfjItjZVjBYudr4GOzcsjAAK
 cZ2vL2LJp3hT4QD+Kud2YaZqzrV8tvFFBikO7FvEV3zZpnw48895pIgcoww=
 =NLe0
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fixes from Steven Rostedt:

 - Fix crash from bad histogram entry

   An error path in the histogram creation could leave an entry in a
   link list that gets freed. Then when a new entry is added it can
   cause a u-a-f bug. This is fixed by restructuring the code so that
   the histogram is consistent on failure and everything is cleaned up
   appropriately.

 - Fix fprobe self test

   The fprobe self test relies on no function being attached by ftrace.
   BPF programs can attach to functions via ftrace and systemd now does
   so. This causes those functions to appear in the enabled_functions
   list which holds all functions attached by ftrace. The selftest also
   uses that file to see if functions are being connected correctly. It
   counts the functions in the file, but if there's already functions in
   the file, it fails. Instead, add the number of functions in the file
   at the start of the test to all the calculations during the test.

 - Fix potential division by zero of the function profiler stddev

   The calculated divisor that calculates the standard deviation of the
   function times can overflow. If the overflow happens to land on zero,
   that can cause a division by zero. Check for zero from the
   calculation before doing the division.

   TODO: Catch when it ever overflows and report it accordingly. For
   now, just prevent the system from crashing.

* tag 'trace-v6.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  ftrace: Avoid potential division by zero in function_stat_show()
  selftests/ftrace: Let fprobe test consider already enabled functions
  tracing: Fix bad hist from corrupting named_triggers list
2025-02-28 15:43:32 -08:00
Sean Christopherson 62838fa5ea KVM: selftests: Relax assertion on HLT exits if CPU supports Idle HLT
If the CPU supports Idle HLT, which elides HLT VM-Exits if the vCPU has an
unmasked pending IRQ or NMI, relax the xAPIC IPI test's assertion on the
number of HLT exits to only require that the number of exits is less than
or equal to the number of HLT instructions that were executed.  I.e. don't
fail the test if Idle HLT does what it's supposed to do.

Note, unfortunately there's no way to determine if *KVM* supports Idle HLT,
as this_cpu_has() checks raw CPU support, and kvm_cpu_has() checks what can
be exposed to L1, i.e. the latter would check if KVM supports nested Idle
HLT.  But, since the assert is purely bonus coverage, checking for CPU
support is good enough.

Cc: Manali Shukla <Manali.Shukla@amd.com>
Tested-by: Manali Shukla <Manali.Shukla@amd.com>
Link: https://lore.kernel.org/r/20250226231809.3183093-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-28 15:42:28 -08:00
Sean Christopherson f3513a335e KVM: selftests: Assert that STI blocking isn't set after event injection
Add an L1 (guest) assert to the nested exceptions test to verify that KVM
doesn't put VMRUN in an STI shadow (AMD CPUs bleed the shadow into the
guest's int_state if a #VMEXIT occurs before VMRUN fully completes).

Add a similar assert to the VMX side as well, because why not.

Reviewed-by: Jim Mattson <jmattson@google.com>
Link: https://lore.kernel.org/r/20250224165442.2338294-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-28 09:15:23 -08:00
Colin Ian King 75418e222e KVM: selftests: Fix spelling mistake "UFFDIO_CONINUE" -> "UFFDIO_CONTINUE"
There is a spelling mistake in a PER_PAGE_DEBUG debug message. Fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Link: https://lore.kernel.org/r/20250227220819.656780-1-colin.i.king@gmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-28 09:12:52 -08:00
Ming Lei bedc9cbc5f selftests: ublk: add ublk zero copy test
Enable zero copy on file backed target, meantime add one fio test for
covering write verify, another test for mkfs/mount/umount.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250228161919.2869102-4-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-28 09:28:08 -07:00
Ming Lei 5d95bfb535 selftests: ublk: add file backed ublk
Add file backed ublk target code, meantime add one fio test for
covering write verify, another test for mkfs/mount/umount.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250228161919.2869102-3-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-28 09:28:08 -07:00
Ming Lei 6aecda00b7 selftests: ublk: add kernel selftests for ublk
Both ublk driver and userspace heavily depends on io_uring subsystem,
and tools/testing/selftests/ should be the best place for holding this
cross-subsystem tests.

Add basic read/write IO test over this ublk null disk, and make sure ublk
working.

More tests will be added.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250228161919.2869102-2-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-28 09:28:08 -07:00
Kevin Krakauer 51bef03e1a selftests/net: deflake GRO tests
GRO tests are timing dependent and can easily flake. This is partially
mitigated in gro.sh by giving each subtest 3 chances to pass. However,
this still flakes on some machines. Reduce the flakiness by:

- Bumping retries to 6.
- Setting napi_defer_hard_irqs to 1 to reduce the chance that GRO is
  flushed prematurely. This also lets us reduce the gro_flush_timeout
  from 1ms to 100us.

Tested: Ran `gro.sh -t large` 1000 times. There were no failures with
this change. Ran inside strace to increase flakiness.

Signed-off-by: Kevin Krakauer <krakauer@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250226192725.621969-4-krakauer@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-27 18:38:23 -08:00
Kevin Krakauer 41cda57284 selftests/net: only print passing message in GRO tests when tests pass
gro.c:main no longer erroneously claims a test passes when running as a
sender.

Tested: Ran `gro.sh -t large` to verify the sender no longer prints a
status.

Signed-off-by: Kevin Krakauer <krakauer@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250226192725.621969-3-krakauer@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-27 18:38:23 -08:00
Kevin Krakauer 784e6abd99 selftests/net: have `gro.sh -t` return a correct exit code
Modify gro.sh to return a useful exit code when the -t flag is used. It
formerly returned 0 no matter what.

Tested: Ran `gro.sh -t large` and verified that test failures return 1.
Signed-off-by: Kevin Krakauer <krakauer@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250226192725.621969-2-krakauer@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-27 18:38:23 -08:00
Heiko Carstens 3908b6baf2 selftests/ftrace: Let fprobe test consider already enabled functions
The fprobe test fails on Fedora 41 since the fprobe test assumption that
the number of enabled_functions is zero before the test starts is not
necessarily true. Some user space tools, like systemd, add BPF programs
that attach to functions. Those will show up in the enabled_functions table
and must be taken into account by the fprobe test.

Therefore count the number of lines of enabled_functions before tests
start, and use that as base when comparing expected results.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Link: https://lore.kernel.org/20250226142703.910860-1-hca@linux.ibm.com
Fixes: e85c5e9792 ("selftests/ftrace: Update fprobe test to check enabled_functions file")
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-02-27 21:02:09 -05:00
Jakub Kicinski 357660d759 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.14-rc5).

Conflicts:

drivers/net/ethernet/cadence/macb_main.c
  fa52f15c74 ("net: cadence: macb: Synchronize stats calculations")
  75696dd0fd ("net: cadence: macb: Convert to get_stats64")
https://lore.kernel.org/20250224125848.68ee63e5@canb.auug.org.au

Adjacent changes:

drivers/net/ethernet/intel/ice/ice_sriov.c
  79990cf5e7 ("ice: Fix deinitializing VF in error path")
  a203163274 ("ice: simplify VF MSI-X managing")

net/ipv4/tcp.c
  18912c5206 ("tcp: devmem: don't write truncated dmabuf CMSGs to userspace")
  297d389e9e ("net: prefix devmem specific helpers")

net/mptcp/subflow.c
  8668860b0a ("mptcp: reset when MPTCP opts are dropped after join")
  c3349a22c2 ("mptcp: consolidate subflow cleanup")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-27 10:20:58 -08:00
Linus Torvalds 1e15510b71 Including fixes from bluetooth. We didn't get netfilter or wireless PRs
this week, so next week's PR is probably going to be bigger. A healthy
 dose of fixes for bugs introduced in the current release nonetheless.
 
 Current release - regressions:
 
  - Bluetooth: always allow SCO packets for user channel
 
  - af_unix: fix memory leak in unix_dgram_sendmsg()
 
  - rxrpc:
    - remove redundant peer->mtu_lock causing lockdep splats
    - fix spinlock flavor issues with the peer record hash
 
  - eth: iavf: fix circular lock dependency with netdev_lock
 
  - net: use rtnl_net_dev_lock() in register_netdevice_notifier_dev_net()
    RDMA driver register notifier after the device
 
 Current release - new code bugs:
 
  - ethtool: fix ioctl confusing drivers about desired HDS user config
 
  - eth: ixgbe: fix media cage present detection for E610 device
 
 Previous releases - regressions:
 
  - loopback: avoid sending IP packets without an Ethernet header
 
  - mptcp: reset connection when MPTCP opts are dropped after join
 
 Previous releases - always broken:
 
  - net: better track kernel sockets lifetime
 
  - ipv6: fix dst ref loop on input in seg6 and rpl lw tunnels
 
  - phy: qca807x: use right value from DTS for DAC_DSP_BIAS_CURRENT
 
  - eth: enetc: number of error handling fixes
 
  - dsa: rtl8366rb: reshuffle the code to fix config / build issue
    with LED support
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmfAj8MACgkQMUZtbf5S
 IrtoTRAAj0XNWXGWZdOuVub0xhtjsPLoZktux4AzsELqaynextkJW6w9pG5qVrWu
 UZt3a3bC7u6+JoTgb+GQVhyjuuVjv6NOSuLK3FS+NePW8ijhLP5oTg6eD0MQS60Z
 wa9yQx3yL1Kvb6b80Go/3WgRX9V6Rx8zlROAl/gOlZ9NKB0rSVqnueZGPjGZJf1a
 ayyXsmzRykshbr5Ic0e+b74hFP3DGxVgHjIob1C4kk/Q+WOfQKnm3C3fnZ/R2QcS
 7B7kSk9WokvNwk3hJc7ZtFxJbrQKSSuRI8nCD93hBjTn76yJjlPicJ9b6HJoGhE/
 Pwt7fBnDCCA00x6ejD3OrurR+/80PbPtyvNtgMMTD49wSwxQpQ6YpTMInnodCzAV
 NvIhkkXBprI0kiTT4dDpNoeFMKD3i07etKpvMfEoDzZR7vgUsj6aClSmuxILeU9a
 crFC4Vp5SgyU1/lUPDiG4dfbd8s4hfM4bZ+d0zAtth3/rQA7/EA6dLqbRXXWX7h5
 Gl6egKWPsSl+WUgFjpBjYfhqrQsc06hxaCh0SQYH6SnS3i+PlMU2uRJYZMLQ66rX
 QsSQOyqCEHwd1qnrLedg9rCniv+DzOJf+qh+H0eY9WhuOay+8T52OHLxpRjSHxBo
 SCP+qQxSX0qhH5DtUiOV50Fwg19UhJJyWd0COfv5SIGm/I1dUOY=
 =+Ci7
 -----END PGP SIGNATURE-----

Merge tag 'net-6.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from bluetooth.

  We didn't get netfilter or wireless PRs this week, so next week's PR
  is probably going to be bigger. A healthy dose of fixes for bugs
  introduced in the current release nonetheless.

  Current release - regressions:

   - Bluetooth: always allow SCO packets for user channel

   - af_unix: fix memory leak in unix_dgram_sendmsg()

   - rxrpc:
       - remove redundant peer->mtu_lock causing lockdep splats
       - fix spinlock flavor issues with the peer record hash

   - eth: iavf: fix circular lock dependency with netdev_lock

   - net: use rtnl_net_dev_lock() in
     register_netdevice_notifier_dev_net() RDMA driver register notifier
     after the device

  Current release - new code bugs:

   - ethtool: fix ioctl confusing drivers about desired HDS user config

   - eth: ixgbe: fix media cage present detection for E610 device

  Previous releases - regressions:

   - loopback: avoid sending IP packets without an Ethernet header

   - mptcp: reset connection when MPTCP opts are dropped after join

  Previous releases - always broken:

   - net: better track kernel sockets lifetime

   - ipv6: fix dst ref loop on input in seg6 and rpl lw tunnels

   - phy: qca807x: use right value from DTS for DAC_DSP_BIAS_CURRENT

   - eth: enetc: number of error handling fixes

   - dsa: rtl8366rb: reshuffle the code to fix config / build issue with
     LED support"

* tag 'net-6.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (53 commits)
  net: ti: icss-iep: Reject perout generation request
  idpf: fix checksums set in idpf_rx_rsc()
  selftests: drv-net: Check if combined-count exists
  net: ipv6: fix dst ref loop on input in rpl lwt
  net: ipv6: fix dst ref loop on input in seg6 lwt
  usbnet: gl620a: fix endpoint checking in genelink_bind()
  net/mlx5: IRQ, Fix null string in debug print
  net/mlx5: Restore missing trace event when enabling vport QoS
  net/mlx5: Fix vport QoS cleanup on error
  net: mvpp2: cls: Fixed Non IP flow, with vlan tag flow defination.
  af_unix: Fix memory leak in unix_dgram_sendmsg()
  net: Handle napi_schedule() calls from non-interrupt
  net: Clear old fragment checksum value in napi_reuse_skb
  gve: unlink old napi when stopping a queue using queue API
  net: Use rtnl_net_dev_lock() in register_netdevice_notifier_dev_net().
  tcp: Defer ts_recent changes until req is owned
  net: enetc: fix the off-by-one issue in enetc_map_tx_tso_buffs()
  net: enetc: remove the mm_lock from the ENETC v4 driver
  net: enetc: add missing enetc4_link_deinit()
  net: enetc: update UDP checksum when updating originTimestamp field
  ...
2025-02-27 09:32:42 -08:00
Joe Damato 1cbddbddee selftests: drv-net: Check if combined-count exists
Some drivers, like tg3, do not set combined-count:

$ ethtool -l enp4s0f1
Channel parameters for enp4s0f1:
Pre-set maximums:
RX:		4
TX:		4
Other:		n/a
Combined:	n/a
Current hardware settings:
RX:		4
TX:		1
Other:		n/a
Combined:	n/a

In the case where combined-count is not set, the ethtool netlink code
in the kernel elides the value and the code in the test:

  netnl.channels_get(...)

With a tg3 device, the returned dictionary looks like:

{'header': {'dev-index': 3, 'dev-name': 'enp4s0f1'},
 'rx-max': 4,
 'rx-count': 4,
 'tx-max': 4,
 'tx-count': 1}

Note that the key 'combined-count' is missing. As a result of this
missing key the test raises an exception:

 # Exception|     if channels['combined-count'] == 0:
 # Exception|        ~~~~~~~~^^^^^^^^^^^^^^^^^^
 # Exception| KeyError: 'combined-count'

Change the test to check if 'combined-count' is a key in the dictionary
first and if not assume that this means the driver has separate RX and
TX queues.

With this change, the test now passes successfully on tg3 and mlx5
(which does have a 'combined-count').

Fixes: 1cf2704242 ("net: selftest: add test for netdev netlink queue-get API")
Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: David Wei <dw@davidwei.uk>
Link: https://patch.msgid.link/20250226181957.212189-1-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-27 07:30:32 -08:00
Colin Ian King bd64e9d6aa selftests/x86/xstate: Fix spelling mistake "hader" -> "header"
There is a spelling mistake in a sig_print() message. Fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20250227091533.599213-1-colin.i.king@gmail.com
2025-02-27 10:49:31 +01:00
Bharadwaj Raju 29fa7d7934 selftests/sysctl: fix wording of help messages
Change wording of test number recommendation to "the recommended number
of times".

Signed-off-by: Bharadwaj Raju <bharadwaj.raju777@gmail.com>
Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-02-27 10:02:12 +01:00
Jakub Kicinski 185646a8a0 selftests: drv-net: add tests for napi IRQ affinity notifiers
Add tests to check that the napi retained the IRQ after down/up,
multiple changes in the number of rx queues and after
attaching/releasing XDP program.

Tested on ice and idpf:

   # NETIF=<iface> tools/testing/selftests/drivers/net/hw/irq.py
    KTAP version 1
    1..4
    ok 1 irq.check_irqs_reported
    ok 2 irq.check_reconfig_queues
    ok 3 irq.check_reconfig_xdp
    ok 4 irq.check_down
    # Totals: pass:4 fail:0 xfail:0 xpass:0 skip:0 error:0

Tested-by: Ahmed Zaki <ahmed.zaki@intel.com>
Link: https://patch.msgid.link/20250224232228.990783-7-ahmed.zaki@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-26 19:51:38 -08:00
Willem de Bruijn 2e5584e0f9 selftests/net: expand cmsg_ipv6.sh with ipv4
Expand IPV6_TCLASS to also cover IP_TOS.
Expand IPV6_HOPLIMIT to also cover IP_TTL.

Expand csmg_sender.c to allow setting IPv4 setsockopts.
Also rename struct v6 to cmsg to match its expanded scope.
Don't bother updating all occurrences of tclass and hoplimit.

Rename cmsg_ipv6.sh to cmsg_ip.sh to match the expanded scope.

Be careful around the subtle API difference between TCLASS and TOS.
IP_TOS includes ECN bits. Add a test to verify that these are masked
when making routing decisions.

Diff is more concise with --word-diff

Signed-off-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250225022431.2083926-3-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-26 18:59:00 -08:00
Willem de Bruijn c1d6d629ab selftests/net: prepare cmsg_ipv6.sh for ipv4
Move IPV6_TCLASS and IPV6_HOPLIMIT into loops, to be able to use them
for IP_TOS and IP_TTL in a follow-on patch.

Indentation in this file is a mix of four spaces and tabs for double
indents. To minimize code churn, maintain that pattern.

Very small diff if viewing with -w.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250225022431.2083926-2-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-26 18:59:00 -08:00
Shameer Kolothum f69656656f KVM: selftests: Add test for KVM_REG_ARM_VENDOR_HYP_BMAP_2
One difference here with other pseudo-firmware bitmap registers
is that the default/reset value for the supported hypercall
function-ids is 0 at present. Hence, modify the test accordingly.

Reviewed-by: Sebastian Ott <sebott@redhat.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Link: https://lore.kernel.org/r/20250221140229.12588-7-shameerali.kolothum.thodi@huawei.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-02-26 13:30:37 -08:00
Dave Jiang 516e5bd0b6 cxl: Add mce notifier to emit aliased address for extended linear cache
Below is a setup with extended linear cache configuration with an example
layout of memory region shown below presented as a single memory region
consists of 256G memory where there's 128G of DRAM and 128G of CXL memory.
The kernel sees a region of total 256G of system memory.

              128G DRAM                          128G CXL memory
|-----------------------------------|-------------------------------------|

Data resides in either DRAM or far memory (FM) with no replication. Hot
data is swapped into DRAM by the hardware behind the scenes. When error is
detected in one location, it is possible that error also resides in the
aliased location. Therefore when a memory location that is flagged by MCE
is part of the special region, the aliased memory location needs to be
offlined as well.

Add an mce notify callback to identify if the MCE address location is part
of an extended linear cache region and handle accordingly.

Added symbol export to set_mce_nospec() in x86 code in order to call
set_mce_nospec() from the CXL MCE notify callback.

Link: https://lore.kernel.org/linux-cxl/668333b17e4b2_5639294fd@dwillia2-xfh.jf.intel.com.notmuch/
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Li Ming <ming.li@zohomail.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Link: https://patch.msgid.link/20250226162224.3633792-5-dave.jiang@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-02-26 14:13:49 -07:00
Thomas Weißschuh 3bd53b2fa5 Revert "selftests: kselftest: Fix build failure with NOLIBC"
This reverts commit 16767502aa.

Nolibc gained support for uname(2) and sscanf(3) which are the
dependencies of ksft_min_kernel_version().

So re-enable support for ksft_min_kernel_version() under nolibc.

Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20250209-nolibc-scanf-v2-2-c29dea32f1cd@weissschuh.net
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2025-02-26 22:13:48 +01:00
Thomas Weißschuh 22edf1f8d4 tools/nolibc: add support for [v]sscanf()
These functions are used often, also in selftests.
sscanf() itself is also used by kselftest.h itself.

The implementation is limited and only supports numeric arguments.

Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20250209-nolibc-scanf-v2-1-c29dea32f1cd@weissschuh.net
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2025-02-26 22:13:48 +01:00
Heiko Carstens dc4b165855 selftests/ftrace: Use readelf to find entry point in uprobe test
The uprobe events test fails on s390, but also on x86 (Fedora 41). The
problem appears to be that there is an assumption that adding a uprobe to
the beginning of the executable mapping of /bin/sh is sufficient to trigger
a uprobe event when /bin/sh is executed.

This assumption is not necessarily true. Therefore use "readelf -h" to find
the entry point address of /bin/sh and use this address when adding the
uprobe event.

This adds a dependency to readelf which is not always installed. Therefore
add a check and exit with exit_unresolved if it is not installed.

Link: https://lore.kernel.org/r/20250220130102.2079179-1-hca@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2025-02-26 13:53:58 -07:00
Dave Jiang 0ec9849b63 acpi/hmat / cxl: Add extended linear cache support for CXL
The current cxl region size only indicates the size of the CXL memory
region without accounting for the extended linear cache size. Retrieve the
cache size from HMAT and append that to the cxl region size for the cxl
region range that matches the SRAT range that has extended linear cache
enabled.

The SRAT defines the whole memory range that includes the extended linear
cache and the CXL memory region. The new HMAT ECN/ECR to the Memory Side
Cache Information Structure defines the size of the extended linear cache
size and matches to the SRAT Memory Affinity Structure by the memory
proxmity domain. Add a helper to match the cxl range to the SRAT memory
range in order to retrieve the cache size.

There are several places that checks the cxl region range against the
decoder range. Use new helper to check between the two ranges and address
the new cache size.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Li Ming <ming.li@zohomail.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Link: https://patch.msgid.link/20250226162224.3633792-3-dave.jiang@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-02-26 13:45:22 -07:00
Linus Torvalds c0d35086a2 Landlock fix for v6.14-rc5
-----BEGIN PGP SIGNATURE-----
 
 iIYEABYKAC4WIQSVyBthFV4iTW/VU1/l49DojIL20gUCZ785FhAcbWljQGRpZ2lr
 b2QubmV0AAoJEOXj0OiMgvbSILQBAMFwpFClzjVeWyLFNd/gaTlPWeedvnag+yZu
 CK9q39jOAP9tj1unQBFpsI7jsTk6ZxPVxb4DymPIirZMb/5FuxUnDQ==
 =QrSM
 -----END PGP SIGNATURE-----

Merge tag 'landlock-6.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux

Pull landlock fixes from Mickaël Salaün:
 "Fixes to TCP socket identification, documentation, and tests"

* tag 'landlock-6.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
  selftests/landlock: Add binaries to .gitignore
  selftests/landlock: Test that MPTCP actions are not restricted
  selftests/landlock: Test TCP accesses with protocol=IPPROTO_TCP
  landlock: Fix non-TCP sockets restriction
  landlock: Minor typo and grammar fixes in IPC scoping documentation
  landlock: Fix grammar error
  selftests/landlock: Enable the new CONFIG_AF_UNIX_OOB
2025-02-26 11:55:44 -08:00
Andrea Righi 5ae5161820 selftests/sched_ext: Add NUMA-aware scheduler test
Add a selftest to validate the behavior of the NUMA-aware scheduler
functionalities, including idle CPU selection within nodes, per-node
DSQs and CPU to node mapping.

Signed-off-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-02-26 08:49:02 -10:00
Mykyta Yatsenko 3d1033caf0 selftests/bpf: Introduce veristat test
Introducing test for veristat, part of test_progs.
Test cases cover functionality of setting global variables in BPF
program.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20250225163101.121043-3-mykyta.yatsenko5@gmail.com
2025-02-26 10:45:01 -08:00
Mykyta Yatsenko e3c9abd0d1 selftests/bpf: Implement setting global variables in veristat
To better verify some complex BPF programs we'd like to preset global
variables.
This patch introduces CLI argument `--set-global-vars` or `-G` to
veristat, that allows presetting values to global variables defined
in BPF program. For example:

prog.c:
```
enum Enum { ELEMENT1 = 0, ELEMENT2 = 5 };
const volatile __s64 a = 5;
const volatile __u8 b = 5;
const volatile enum Enum c = ELEMENT2;
const volatile bool d = false;

char arr[4] = {0};

SEC("tp_btf/sched_switch")
int BPF_PROG(...)
{
	bpf_printk("%c\n", arr[a]);
	bpf_printk("%c\n", arr[b]);
	bpf_printk("%c\n", arr[c]);
	bpf_printk("%c\n", arr[d]);
	return 0;
}
```
By default verification of the program fails:
```
./veristat prog.bpf.o
```
By presetting global variables, we can make verification pass:
```
./veristat wq.bpf.o  -G "a = 0" -G "b = 1" -G "c = 2" -G "d = 3"
```

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20250225163101.121043-2-mykyta.yatsenko5@gmail.com
2025-02-26 10:45:00 -08:00
Ihor Solodrai 0ba0ef012e selftests/bpf: Test bpf_usdt_arg_size() function
Update usdt tests to also check for correct behavior of
bpf_usdt_arg_size().

Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20250224235756.2612606-2-ihor.solodrai@linux.dev
2025-02-26 08:59:44 -08:00
Dave Jiang 44818d387e cxl/test: Add Get Supported Features mailbox command support
Add cxl-test emulation of Get Supported Features mailbox command.
Currently only adding a test feature with feature identifier of
all f's for testing.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Li Ming <ming.li@zohomail.com>
Link: https://patch.msgid.link/20250220194438.2281088-4-dave.jiang@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-02-26 08:51:31 -07:00
Dave Jiang f0e6a2329b cxl: Add Get Supported Features command for kernel usage
CXL spec r3.2 8.2.9.6.1 Get Supported Features (Opcode 0500h)
The command retrieve the list of supported device-specific features
(identified by UUID) and general information about each Feature.

The driver will retrieve the Feature entries in order to make checks and
provide information for the Get Feature and Set Feature command. One of
the main piece of information retrieved are the effects a Set Feature
command would have for a particular feature. The retrieved Feature
entries are stored in the cxl_mailbox context.

The setup of Features is initiated via devm_cxl_setup_features() during the
pci probe function before the cxl_memdev is enumerated.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Li Ming <ming.li@zohomail.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Tested-by: Shiju Jose <shiju.jose@huawei.com>
Link: https://patch.msgid.link/20250220194438.2281088-3-dave.jiang@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-02-26 08:51:27 -07:00
Mahe Tardy 9138048bb5 selftests/bpf: add cgroup_skb netns cookie tests
Add netns cookie test that verifies the helper is now supported and work
in the context of cgroup_skb programs.

Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
Link: https://lore.kernel.org/r/20250225125031.258740-2-mahe.tardy@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-02-26 07:35:51 -08:00
Chang S. Bae bfc98dbcb3 selftests/x86/avx: Add AVX tests
Add xstate testing specifically for those vector register states,
validating kernel's context switching and ensuring ABI compliance.
Use the established xstate testing framework.

Alternatively, this invocation could be placed directly in
xstate.c::main(). However, the current test file naming convention, which
clearly specifies the tested area, seems reasonable. Adding avx.c
considerably aligns with that convention.

The test output should be like this for ZMM_Hi256 as an example:

  $ avx_64
  ...
  [RUN]   AVX-512 ZMM_Hi256: check context switches, 10 iterations, 5 threads.
  [OK]    No incorrect case was found.
  [RUN]   AVX-512 ZMM_Hi256: inject xstate via ptrace().
  [OK]    'xfeatures' in SW reserved area was correctly written
  [OK]    xstate was correctly updated.
  [RUN]   AVX-512 ZMM_Hi256: load xstate and raise SIGUSR1
  [OK]    'magic1' is valid
  [OK]    'xfeatures' in SW reserved area is valid
  [OK]    'xfeatures' in XSAVE header is valid
  [OK]    xstate delivery was successful
  [OK]    'magic2' is valid
  [RUN]   AVX-512 ZMM_Hi256: load new xstate from sighandler and check it after sigreturn
  [OK]    xstate was restored correctly

But systems without AVX-512 will look like:
  ...
  The kernel does not support feature number: 5
  The kernel does not support feature number: 6
  The kernel does not support feature number: 7

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20250226010731.2456-10-chang.seok.bae@intel.com
2025-02-26 13:05:30 +01:00
Chang S. Bae fa826c1f2c selftests/x86/xstate: Clarify supported xstates
The established xstate test code is designed to be generic, but certain
xstates require special handling and cannot be tested without additional
adjustments.

Clarify which xstates are currently supported, and enforce testing only
for them.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20250226010731.2456-9-chang.seok.bae@intel.com
2025-02-26 13:05:30 +01:00
Chang S. Bae 10d8a204c5 selftests/x86/xstate: Consolidate test invocations into a single entry
Currently, each of the three xstate tests runs as a separate invocation,
requiring the xstate number to be passed and state information to be
reconstructed repeatedly. This approach arose from their individual and
isolated development, but now it makes sense to unify them.

Introduce a wrapper function that first verifies feature availability
from the kernel and constructs the necessary state information once. The
wrapper then sequentially invokes all tests to ensure consistent
execution.

Update the AMX test to use this unified invocation.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20250226010731.2456-8-chang.seok.bae@intel.com
2025-02-26 13:05:29 +01:00
Chang S. Bae e075d9fa16 selftests/x86/xstate: Introduce signal ABI test
With the refactored test cases, another xstate exposure to userspace is
through signal delivery. While amx.c includes signal-related scenarios,
its primary focus is on xstate permission management, which is largely
specific to dynamic states.

The remaining gap is testing xstate preservation and restoration across
signal delivery. The kernel defines an ABI for presenting xstate in the
signal frame, closely resembling the hardware XSAVE format, where xstate
modification is also possible.

Introduce a new test case to verify xstate preservation across signal
delivery and return, that is ensuring ABI compatibility by:

 - Loading xstate before raising a signal.

 - Verifying correct exposure in the signal frame

 - Modifying xstate in the signal frame before returning.

 - Checking the state restoration upon signal return.

Integrate this test into the AMX test suite as an initial usage site.

  Expected output:
  $ amx_64
  ...
  [RUN]   AMX Tile data: load xstate and raise SIGUSR1
  [OK]    'magic1' is valid
  [OK]    'xfeatures' in SW reserved area is valid
  [OK]    'xfeatures' in XSAVE header is valid
  [OK]    xstate delivery was successful
  [OK]    'magic2' is valid
  [RUN]   AMX Tile data: load new xstate from sighandler and check it after sigreturn
  [OK]    xstate was restored correctly

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20250226010731.2456-7-chang.seok.bae@intel.com
2025-02-26 13:05:29 +01:00
Chang S. Bae 7cb2fbe419 selftests/x86/xstate: Refactor ptrace ABI test
Following the refactoring of the context switching test, the ptrace test is
another component reusable for other xstate features. As part of this
restructuring, add a missing check to validate the
user_xstateregs->xstate_fx_sw field in the ABI.

Also, replace err() and fatal_error() with ksft_exit_fail_msg() for
consistency in error handling.

  Expected output:
  $ amx_64
  ...
  [RUN]   AMX Tile data: inject xstate via ptrace().
  [OK]    'xfeatures' in SW reserved area was correctly written
  [OK]    xstate was correctly updated.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20250226010731.2456-6-chang.seok.bae@intel.com
2025-02-26 13:05:29 +01:00
Chang S. Bae 40f6852ef2 selftests/x86/xstate: Refactor context switching test
The existing context switching and ptrace tests in amx.c are not specific
to dynamic states, making them reusable for general xstate testing.

As a first step, move the context switching test to xstate.c. Refactor
the test code to allow specifying which xstate component being tested.

To decouple the test from dynamic states, remove the permission request
code. In fact, The permission request inside the test wrapper was
redundant.

Additionally, replace fatal_error() with ksft_exit_fail_msg() for
consistency in error handling.

  Expected output:
  $ amx_64
  ...
  [RUN]   AMX Tile data: check context switches, 10 iterations, 5 threads.
  [OK]    No incorrect case was found.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20250226010731.2456-5-chang.seok.bae@intel.com
2025-02-26 13:05:29 +01:00
Chang S. Bae 3fcb4d6146 selftests/x86/xstate: Enumerate and name xstate components
After moving essential helpers from amx.c, the code remains neutral
regarding which xstate components it handles. However, explicitly listing
known components helps users identify which features are ready for
testing.

Enumerate xstate components to facilitate identification. Extend struct
xstate_info to include a name field, providing a human-readable
identifier.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20250226010731.2456-4-chang.seok.bae@intel.com
2025-02-26 13:05:28 +01:00
Chang S. Bae 0f6d91a327 selftests/x86/xstate: Refactor XSAVE helpers for general use
The AMX test introduced several XSAVE-related helper functions, but so
far, it has been the only user of them. These helpers can be generalized
for broader test of multiple xstate features.

Move most XSAVE-related code into xsave.h, making it shareable. The
restructuring includes:

 * Establishing low-level XSAVE helpers for saving and restoring register
  states, as well as handling XSAVE buffers.

 * Generalizing state data manipuldations: set_rand_data()

 * Introducing a generic feature query helper: get_xstate_info()

While doing so, remove unused defines in amx.c.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20250226010731.2456-3-chang.seok.bae@intel.com
2025-02-26 13:05:28 +01:00
Chang S. Bae dbd6b649e7 selftests/x86: Consolidate redundant signal helper functions
The x86 selftests frequently register and clean up signal handlers, but
the sethandler() and clearhandler() functions have been redundantly
copied across multiple .c files.

Move these functions to helpers.h to enable reuse across tests,
eliminating around 250 lines of duplicate code.

Converge the error handling by using ksft_exit_fail_msg(), which is
functionally equivalent with err() within the selftest framework.

This change is a prerequisite for the upcoming xstate selftest, which
requires signal handling for registering and cleaning up handlers.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20250226010731.2456-2-chang.seok.bae@intel.com
2025-02-26 13:05:28 +01:00
Sebastian Ott a88c7c2244 KVM: selftests: arm64: Test writes to MIDR,REVIDR,AIDR
Assert that MIDR_EL1, REVIDR_EL1, AIDR_EL1 are writable from userspace,
that the changed values are visible to guests, and that they are
preserved across a vCPU reset.

Signed-off-by: Sebastian Ott <sebott@redhat.com>
Link: https://lore.kernel.org/r/20250225005401.679536-6-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-02-26 01:32:23 -08:00
Amery Hung 4e4136c644 selftests/bpf: Test gen_pro/epilogue that generate kfuncs
Test gen_prologue and gen_epilogue that generate kfuncs that have not
been seen in the main program.

The main bpf program and return value checks are identical to
pro_epilogue.c introduced in commit 47e69431b5 ("selftests/bpf: Test
gen_prologue and gen_epilogue"). However, now when bpf_testmod_st_ops
detects a program name with prefix "test_kfunc_", it generates slightly
different prologue and epilogue: They still add 1000 to args->a in
prologue, add 10000 to args->a and set r0 to 2 * args->a in epilogue,
but involve kfuncs.

At high level, the alternative version of prologue and epilogue look
like this:

  cgrp = bpf_cgroup_from_id(0);
  if (cgrp)
          bpf_cgroup_release(cgrp);
  else
          /* Perform what original bpf_testmod_st_ops prologue or
           * epilogue does
           */

Since 0 is never a valid cgroup id, the original prologue or epilogue
logic will be performed. As a result, the __retval check should expect
the exact same return value.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20250225233545.285481-2-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-02-25 19:04:43 -08:00
Gal Pressman da87cabaf8 selftests: drv-net-hw: Add a test for symmetric RSS hash
Add a selftest that verifies symmetric RSS hash is working as intended.
The test runs iterations of traffic, swapping the src/dst UDP ports, and
verifies that the same RX queue is receiving the traffic in both cases.

Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20250224174416.499070-5-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-25 18:31:05 -08:00
Gal Pressman 0163250039 selftests: drv-net: Make rand_port() get a port more reliably
Instead of guessing a port and checking whether it's available, get an
available port from the OS.

Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20250224174416.499070-4-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-25 18:31:05 -08:00
Hangbin Liu 0f58804080 selftests/net: ensure mptcp is enabled in netns
Some distributions may not enable MPTCP by default. All other MPTCP tests
source mptcp_lib.sh to ensure MPTCP is enabled before testing. However,
the ip_local_port_range test is the only one that does not include this
step.

Let's also ensure MPTCP is enabled in netns for ip_local_port_range so
that it passes on all distributions.

Suggested-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250224094013.13159-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-25 18:13:28 -08:00
Linus Torvalds 9f5270d758 perf tools fixes for v6.14: 2nd batch
- Fix tools/ quiet build Makefile infrastructure that was broken when
   working on tools/perf/ without testing on other tools/ living
   utilities.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCZ74OJwAKCRCyPKLppCJ+
 J30mAPsHCA8A+CNq/5yW2VhFLV1GgCSL5oWqxXRn7QjhSrCQBQEAot2u4O5zXs7M
 sg+mPlYiS1oT+zmvTLlXrN+bVyWP9A4=
 =jH1N
 -----END PGP SIGNATURE-----

Merge tag 'perf-tools-fixes-for-v6.14-2-2025-02-25' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools

Pull perf tools fixes from Arnaldo Carvalho de Melo:

 - Fix tools/ quiet build Makefile infrastructure that was broken when
   working on tools/perf/ without testing on other tools/ living
   utilities.

* tag 'perf-tools-fixes-for-v6.14-2-2025-02-25' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
  tools: Remove redundant quiet setup
  tools: Unify top-level quiet infrastructure
2025-02-25 13:32:32 -08:00
liuye c4f23a9d6e selftests/x86/lam: Fix minor memory in do_uring()
Exception branch returns without freeing 'fi'.

Signed-off-by: liuye <liuye@kylinos.cn>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20250114082650.113105-1-liuye@kylinos.cn
2025-02-25 15:03:06 +01:00
Linus Torvalds 2a1944bff5 RISC-V Fixes for 6.14-rc5
* A fix for cacheinfo DT probing to avoid reading non-boolean properties
   as booleans.
 * A fix for cpufeature to use bitmap_equal() instead of memcmp(), so
   unused bits are ignored.
 * Fixes for cmpxchg and futex cmpxchg that properly encode the sign
   extension requirements on inline asm, which results in spurious
   successes.  This manifests in at least inode_set_ctime_current, but is
   likely just a disaster waiting to happen.
 * A fix for the rseq selftests, which was using an invalid constraint.
 * A pair of fixes for signal frame size handling:
     * We were reserving space for an extra empty extension context
       header on systems with extended signal context, thus resulting in
       unnecessarily large allocations.
     * We weren't properly checking for available extensions before
       calculating the signal stack size, which resulted in undersized
       stack allocations on some systems (at least those with T-Head
       custom vectors).
 
 Also, we've added Alex as a reviewer.  He's been helping out a ton
 lately, thanks!
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEAM520YNJYN/OiG3470yhUCzLq0EFAme8rsQTHHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRDvTKFQLMurQVumD/9EZz6+ejftG1u7M/YzFfIa6ZOqYtoi
 7aaecvsKNhVu0zvLU2vDAj+lTeokutQKAI9hByoVDVBzCllW/AYnpentQXTbHbl4
 3pyIeOPKMXEDyru8heQ/u7h2qhq1AN54btPHx9UdIc5/vyrQIF2Ec4jo1GojJmyj
 qNHSyeTOB8nhBHYzM2RYAw6r0s27UX5wfL09Cm97b7KJigJ6GGPsI+KjruOqVfST
 i7wqbH1ufQoXrU8DhICA5wT/Q7f2WqJ3W2IyP186EJOAXhEG9xzOMQzVwCm2KI/I
 Rf5mhE5MoS9+ZmyhpATRc8Dwr1iZjIh7nKIFJh4/IGcSCuwsnWEFRvihmqCnphl3
 /fFLstBflFAEKadmc7LgRJDTZYAjVAe/kSrl3v9ArVBU7d07/dAUOXcgjJy6fygx
 7yrMnT3Ew4J160iDvFIalUdjEgYsKbwNygT/D6XlcXuR/KoWA+HYQ7S4eTgbqpiz
 366JAhZQ9xyQxTlca13sdkz+2eBjU/SAX8O9/WYpDp6J5uqOFtyQ52a2qEzQFznF
 MRtFe2YERlM8L0vqJhNFEDefo7FUuZIyOU8evhJA+L7X8b3FpsEx5xUzamXdeGsQ
 0sEbsBPhNToVF0DWn8HtdpMYYX95xujj83QneMawaGjpS7cpY93OVUNFAJ/a/uuJ
 bGLv+1HIdW0LoQ==
 =+DfW
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-6.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fixes from Palmer Dabbelt:

 - A fix for cacheinfo DT probing to avoid reading non-boolean
   properties as booleans.

 - A fix for cpufeature to use bitmap_equal() instead of memcmp(), so
   unused bits are ignored.

 - Fixes for cmpxchg and futex cmpxchg that properly encode the sign
   extension requirements on inline asm, which results in spurious
   successes. This manifests in at least inode_set_ctime_current, but is
   likely just a disaster waiting to happen.

 - A fix for the rseq selftests, which was using an invalid constraint.

 - A pair of fixes for signal frame size handling:

     - We were reserving space for an extra empty extension context
       header on systems with extended signal context, thus resulting in
       unnecessarily large allocations.

     - We weren't properly checking for available extensions before
       calculating the signal stack size, which resulted in undersized
       stack allocations on some systems (at least those with T-Head
       custom vectors).

Also, we've added Alex as a reviewer.  He's been helping out a ton
lately, thanks!

* tag 'riscv-for-linus-6.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  MAINTAINERS: Add myself as a riscv reviewer
  riscv: signal: fix signal_minsigstksz
  riscv: signal: fix signal frame size
  rseq/selftests: Fix riscv rseq_offset_deref_addv inline asm
  riscv/futex: sign extend compare value in atomic cmpxchg
  riscv/atomic: Do proper sign extension also for unsigned in arch_cmpxchg
  riscv: cpufeature: use bitmap_equal() instead of memcmp()
  riscv: cacheinfo: Use of_property_present() for non-boolean properties
2025-02-24 16:40:32 -08:00
Martin K. Petersen adc4fb9c81 Merge patch series "Initial support for RK3576 UFS controller"
Shawn Lin <shawn.lin@rock-chips.com> says:

This patchset adds initial UFS controller supprt for RK3576 SoC.
Patch 1 is the dt-bindings. Patch 2-4 deal with rpm and spm support
in advanced suggested by Ulf. Patch 5 exports two new APIs for host
driver. Patch 6 and 7 are the host driver and dtsi support.

Link: https://lore.kernel.org/r/1738736156-119203-1-git-send-email-shawn.lin@rock-chips.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-02-24 19:24:14 -05:00
Yiqian Xun e402c70856 selftests/user_events: Fix failures caused by test code
In parse_abi function,the dyn_test fails because the
enable_file isn’t closed after successfully registering an event.
By adding wait_for_delete(), the dyn_test now passes as expected.

Link: https://lore.kernel.org/r/20250221033555.326716-1-realxxyq@163.com
Signed-off-by: Yiqian Xun <xunyiqian@kylinos.cn>
Acked-by: Beau Belgrave <beaub@linux.microsoft.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2025-02-24 16:37:17 -07:00
Jakub Kicinski 29b036be1b selftests: drv-net: test XDP, HDS auto and the ioctl path
Test XDP and HDS interaction. While at it add a test for using the IOCTL,
as that turned out to be the real culprit.

Testing bnxt:

  # NETIF=eth0 ./ksft-net-drv/drivers/net/hds.py
  KTAP version 1
  1..12
  ok 1 hds.get_hds
  ok 2 hds.get_hds_thresh
  ok 3 hds.set_hds_disable # SKIP disabling of HDS not supported by the device
  ok 4 hds.set_hds_enable
  ok 5 hds.set_hds_thresh_zero
  ok 6 hds.set_hds_thresh_max
  ok 7 hds.set_hds_thresh_gt
  ok 8 hds.set_xdp
  ok 9 hds.enabled_set_xdp
  ok 10 hds.ioctl
  ok 11 hds.ioctl_set_xdp
  ok 12 hds.ioctl_enabled_set_xdp
  # Totals: pass:11 fail:0 xfail:0 xpass:0 skip:1 error:0

and netdevsim:

  # ./ksft-net-drv/drivers/net/hds.py
  KTAP version 1
  1..12
  ok 1 hds.get_hds
  ok 2 hds.get_hds_thresh
  ok 3 hds.set_hds_disable
  ok 4 hds.set_hds_enable
  ok 5 hds.set_hds_thresh_zero
  ok 6 hds.set_hds_thresh_max
  ok 7 hds.set_hds_thresh_gt
  ok 8 hds.set_xdp
  ok 9 hds.enabled_set_xdp
  ok 10 hds.ioctl
  ok 11 hds.ioctl_set_xdp
  ok 12 hds.ioctl_enabled_set_xdp
  # Totals: pass:12 fail:0 xfail:0 xpass:0 skip:0 error:0

Netdevsim needs a sane default for tx/rx ring size.

ethtool 6.11 is needed for the --disable-netlink option.

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Tested-by: Taehee Yoo <ap420073@gmail.com>
Link: https://patch.msgid.link/20250221025141.1132944-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-24 14:16:37 -08:00
David Wei 89baa22d75 io_uring/zcrx: add selftest case for recvzc with read limit
Add a selftest case to iou-zcrx where the sender sends 4x4K = 16K and
the receiver does 4x4K recvzc requests. Validate that the requests
complete successfully and that the data is not corrupted.

Signed-off-by: David Wei <dw@davidwei.uk>
Link: https://lore.kernel.org/r/20250224041319.2389785-3-dw@davidwei.uk
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-24 12:56:13 -07:00
Sean Christopherson 2428865bf0 KVM: selftests: Add a nested (forced) emulation intercept test for x86
Add a rudimentary test for validating KVM's handling of L1 hypervisor
intercepts during instruction emulation on behalf of L2.  To minimize
complexity and avoid overlap with other tests, only validate KVM's
handling of instructions that L1 wants to intercept, i.e. that generate a
nested VM-Exit.  Full testing of emulation on behalf of L2 is better
achieved by running existing (forced) emulation tests in a VM, (although
on VMX, getting L0 to emulate on #UD requires modifying either L1 KVM to
not intercept #UD, or modifying L0 KVM to prioritize L0's exception
intercepts over L1's intercepts, as is done by KVM for SVM).

Since emulation should never be successful, i.e. L2 always exits to L1,
dynamically generate the L2 code stream instead of adding a helper for
each instruction.  Doing so requires hand coding instruction opcodes, but
makes it significantly easier for the test to compute the expected "next
RIP" and instruction length.

Link: https://lore.kernel.org/r/20250201015518.689704-12-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-24 09:01:07 -08:00
Linus Torvalds b8c8c1414f tracing fixes for v6.14:
Function graph accounting fixes:
 
 - Fix the manage ops hashes
 
   The function graph registers a "manager ops" and "sub-ops" to ftrace.
   The manager ops does not have any callback but calls the sub-ops
   callbacks. The manage ops hashes (what is used to tell ftrace what
   functions to attach to) is built on the sub-ops it manages.
 
   There was an error in the way it built the hash. An empty hash means to
   attach to all functions. When the manager ops had one sub-ops it properly
   copied its hash. But when the manager ops had more than one sub-ops, it
   went into a loop to make a set of all functions it needed to add to the
   hash. If any of the subops hashes was empty, that would mean to attach
   to all functions. The error was that the first iteration of the loop
   passed in an empty hash to start with in order to add the other hashes.
   That starting hash was mistaken as to attach to all functions. This made
   the manage ops attach to all functions whenever it had two or more
   sub-ops, even if each sub-op was attached to only a single function.
 
 - Do not add duplicate entries to the manager ops hash
 
   If two or more subops hashes trace the same function, an entry for that
   function will be added to the manager ops for each subops. This causes
   waste and extra overhead.
 
 Fprobe accounting fixes:
 
 - Remove last function from fprobe hash
 
   Fprobes has a ftrace hash to manage which functions an fprobe is attached
   to. It also has a counter of how many fprobes are attached. When the last
   fprobe is removed, it unregisters the fprobe from ftrace but does not
   remove the functions the last fprobe was attached to from the hash. This
   leaves the old functions attached. When a new fprobe is added, the fprobe
   infrastructure attaches to not only the functions of the new fprobe, but
   also to the functions of the last fprobe.
 
 - Fix accounting of the fprobe counter
 
   When a fprobe is added, it updates a counter. If the counter goes from
   zero to one, it attaches its ops to ftrace. When an fprobe is removed, the
   counter is decremented. If the counter goes from 1 to zero, it removes the
   fprobes ops from ftrace. There was an issue where if two fprobes trace the
   same function, the addition of each fprobe would increment the counter.
   But when removing the first of the fprobes, it would notice that another
   fprobe is still attached to one of its functions no it does not remove
   the functions from the ftrace ops. But it also did not decrement the
   counter. When the last fprobe is removed, the counter is still one. This
   leaves the fprobes callback still registered with ftrace and it being
   called by the functions defined by the fprobes ops hash.  Worse yet,
   because all the functions from the fprobe ops hash have been removed, that
   tells ftrace that it wants to trace all functions. Thus, this puts the
   state of the system where every function is calling the fprobe callback
   handler (which does nothing as there are no registered fprobes), but this
   causes a good 13% slow down of the entire system.
 
 Other updates:
 
 - Add a selftest to test the above issues to prevent regressions.
 
 - Fix preempt count accounting in function tracing
 
   Better recursion protection was added to function tracing which added
   another layer of preempt disable. As the preempt_count gets traced in the
   event, it needs to subtract the amount of preempt disabling the tracer
   does to record what the preempt_count was when the trace was triggered.
 
 - Fix memory leak in output of set_event
 
   A variable is passed by the seq_file functions in the location that is
   set by the return of the next() function. The start() function allocates
   it and the stop() function frees it. But when the last item is found, the
   next() returns NULL which leaks the data that was allocated in start().
   The m->private is used for something else, so have next() free the data
   when it returns NULL, as stop() will then just receive NULL in that case.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZ7j6ARQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6quFxAQDrO8tjYbhLqg/LMOQyzwn/EF3Jx9ub
 87961mA0rKTkYwEAhPNzTZ6GwKyKc4ny/R338KgNY69wWnOK6k/BTxCRmwk=
 =TOah
 -----END PGP SIGNATURE-----

Merge tag 'ftrace-v6.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fixes from Steven Rostedt:
 "Function graph accounting fixes:

   - Fix the manage ops hashes

     The function graph registers a "manager ops" and "sub-ops" to
     ftrace. The manager ops does not have any callback but calls the
     sub-ops callbacks. The manage ops hashes (what is used to tell
     ftrace what functions to attach to) is built on the sub-ops it
     manages.

     There was an error in the way it built the hash. An empty hash
     means to attach to all functions. When the manager ops had one
     sub-ops it properly copied its hash. But when the manager ops had
     more than one sub-ops, it went into a loop to make a set of all
     functions it needed to add to the hash. If any of the subops hashes
     was empty, that would mean to attach to all functions. The error
     was that the first iteration of the loop passed in an empty hash to
     start with in order to add the other hashes. That starting hash was
     mistaken as to attach to all functions. This made the manage ops
     attach to all functions whenever it had two or more sub-ops, even
     if each sub-op was attached to only a single function.

   - Do not add duplicate entries to the manager ops hash

     If two or more subops hashes trace the same function, an entry for
     that function will be added to the manager ops for each subops.
     This causes waste and extra overhead.

  Fprobe accounting fixes:

   - Remove last function from fprobe hash

     Fprobes has a ftrace hash to manage which functions an fprobe is
     attached to. It also has a counter of how many fprobes are
     attached. When the last fprobe is removed, it unregisters the
     fprobe from ftrace but does not remove the functions the last
     fprobe was attached to from the hash. This leaves the old functions
     attached. When a new fprobe is added, the fprobe infrastructure
     attaches to not only the functions of the new fprobe, but also to
     the functions of the last fprobe.

   - Fix accounting of the fprobe counter

     When a fprobe is added, it updates a counter. If the counter goes
     from zero to one, it attaches its ops to ftrace. When an fprobe is
     removed, the counter is decremented. If the counter goes from 1 to
     zero, it removes the fprobes ops from ftrace.

     There was an issue where if two fprobes trace the same function,
     the addition of each fprobe would increment the counter. But when
     removing the first of the fprobes, it would notice that another
     fprobe is still attached to one of its functions no it does not
     remove the functions from the ftrace ops.

     But it also did not decrement the counter, so when the last fprobe
     is removed, the counter is still one. This leaves the fprobes
     callback still registered with ftrace and it being called by the
     functions defined by the fprobes ops hash. Worse yet, because all
     the functions from the fprobe ops hash have been removed, that
     tells ftrace that it wants to trace all functions.

     Thus, this puts the state of the system where every function is
     calling the fprobe callback handler (which does nothing as there
     are no registered fprobes), but this causes a good 13% slow down of
     the entire system.

  Other updates:

   - Add a selftest to test the above issues to prevent regressions.

   - Fix preempt count accounting in function tracing

     Better recursion protection was added to function tracing which
     added another layer of preempt disable. As the preempt_count gets
     traced in the event, it needs to subtract the amount of preempt
     disabling the tracer does to record what the preempt_count was when
     the trace was triggered.

   - Fix memory leak in output of set_event

     A variable is passed by the seq_file functions in the location that
     is set by the return of the next() function. The start() function
     allocates it and the stop() function frees it. But when the last
     item is found, the next() returns NULL which leaks the data that
     was allocated in start(). The m->private is used for something
     else, so have next() free the data when it returns NULL, as stop()
     will then just receive NULL in that case"

* tag 'ftrace-v6.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Fix memory leak when reading set_event file
  ftrace: Correct preemption accounting for function tracing.
  selftests/ftrace: Update fprobe test to check enabled_functions file
  fprobe: Fix accounting of when to unregister from function graph
  fprobe: Always unregister fgraph function from ops
  ftrace: Do not add duplicate entries in subops manager ops
  ftrace: Fix accounting of adding subops to a manager ops
2025-02-22 09:03:54 -08:00
Tamir Duberstein ba6be7ba2d selftests: remove reference to prime_numbers.sh
Remove a leftover shell script reference from commit 313b38a6ec
("lib/prime_numbers: convert self-test to KUnit").

Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202502171110.708d965a-lkp@intel.com
Fixes: 313b38a6ec ("lib/prime_numbers: convert self-test to KUnit")
Signed-off-by: Tamir Duberstein <tamird@gmail.com>
Link: https://lore.kernel.org/r/20250217-fix-prime-numbers-v1-1-eb0ca7235e60@gmail.com
Signed-off-by: Kees Cook <kees@kernel.org>
2025-02-22 06:44:45 -08:00
Michael Jeanson 3c27b40830 selftests/rseq: Add rseq syscall errors test
This test adds coverage of expected errors during rseq registration and
unregistration, it disables glibc integration and will thus always
exercise the rseq syscall explictly.

Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/r/20250121213402.1754762-1-mjeanson@efficios.com
2025-02-22 14:13:45 +01:00
Maciej Wieczor-Retman 782b819827 selftests/lam: Test get_user() LAM pointer handling
Recent change in how get_user() handles pointers:

  https://lore.kernel.org/all/20241024013214.129639-1-torvalds@linux-foundation.org/

has a specific case for LAM. It assigns a different bitmask that's
later used to check whether a pointer comes from userland in get_user().

Add test case to LAM that utilizes a ioctl (FIOASYNC) syscall which uses
get_user() in its implementation. Execute the syscall with differently
tagged pointers to verify that valid user pointers are passing through
and invalid kernel/non-canonical pointers are not.

Signed-off-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/r/1624d9d1b9502517053a056652d50dc5d26884ac.1737990375.git.maciej.wieczor-retman@intel.com
2025-02-22 13:17:07 +01:00
Maciej Wieczor-Retman 51f909dcd1 selftests/lam: Skip test if LAM is disabled
Until LASS is merged into the kernel:

  https://lore.kernel.org/all/20241028160917.1380714-1-alexander.shishkin@linux.intel.com/

LAM is left disabled in the config file. Running the LAM selftest with
disabled LAM only results in unhelpful output.

Use one of LAM syscalls() to determine whether the kernel was compiled
with LAM support (CONFIG_ADDRESS_MASKING) or not. Skip running the tests
in the latter case.

Merge CPUID checking function with the one mentioned above to achieve a
single function that shows LAM's availability from both CPU and the
kernel.

Signed-off-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/r/251d0f45f6a768030115e8d04bc85458910cb0dc.1737990375.git.maciej.wieczor-retman@intel.com
2025-02-22 13:17:07 +01:00
Maciej Wieczor-Retman ec8f5b4659 selftests/lam: Move cpu_has_la57() to use cpuinfo flag
In current form cpu_has_la57() reports platform's support for LA57
through reading the output of cpuid. A much more useful information is
whether 5-level paging is actually enabled on the running system.

Check whether 5-level paging is enabled by trying to map a page in the
high linear address space.

Signed-off-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/r/8b1ca51b13e6d94b5a42b6930d81b692cbb0bcbb.1737990375.git.maciej.wieczor-retman@intel.com
2025-02-22 13:17:07 +01:00
Hangbin Liu 465b210fdc selftests: fib_nexthops: do not mark skipped tests as failed
The current test marks all unexpected return values as failed and sets ret
to 1. If a test is skipped, the entire test also returns 1, incorrectly
indicating failure.

To fix this, add a skipped variable and set ret to 4 if it was previously
0. Otherwise, keep ret set to 1.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20250220085326.1512814-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 16:23:29 -08:00
Ido Schimmel e818d1d1a6 selftests: fib_rule_tests: Add DSCP mask match tests
Add tests for FIB rules that match on DSCP with a mask. Test both good
and bad flows and both the input and output paths.

 # ./fib_rule_tests.sh
 IPv6 FIB rule tests
 [...]
    TEST: rule6 check: dscp redirect to table                           [ OK ]
    TEST: rule6 check: dscp no redirect to table                        [ OK ]
    TEST: rule6 del by pref: dscp redirect to table                     [ OK ]
    TEST: rule6 check: iif dscp redirect to table                       [ OK ]
    TEST: rule6 check: iif dscp no redirect to table                    [ OK ]
    TEST: rule6 del by pref: iif dscp redirect to table                 [ OK ]
    TEST: rule6 check: dscp masked redirect to table                    [ OK ]
    TEST: rule6 check: dscp masked no redirect to table                 [ OK ]
    TEST: rule6 del by pref: dscp masked redirect to table              [ OK ]
    TEST: rule6 check: iif dscp masked redirect to table                [ OK ]
    TEST: rule6 check: iif dscp masked no redirect to table             [ OK ]
    TEST: rule6 del by pref: iif dscp masked redirect to table          [ OK ]
 [...]

 Tests passed: 316
 Tests failed:   0

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/20250220080525.831924-7-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 16:08:49 -08:00
Jakub Kicinski e87700965a bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQ6NaUOruQGUkvPdG4raS+Z+3y5EwUCZ7ffOQAKCRAraS+Z+3y5
 EzVHAP9h/QkeYoOZW9gul08I8vFiZsFe/lbOSLJWxeVfxb9JhgD/cMqby3qAxQK6
 lsdNQ9jYG2232Wym89ag7fvTBK15Wg4=
 =gkN2
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Martin KaFai Lau says:

====================
pull-request: bpf-next 2025-02-20

We've added 19 non-merge commits during the last 8 day(s) which contain
a total of 35 files changed, 1126 insertions(+), 53 deletions(-).

The main changes are:

1) Add TCP_RTO_MAX_MS support to bpf_set/getsockopt, from Jason Xing

2) Add network TX timestamping support to BPF sock_ops, from Jason Xing

3) Add TX metadata Launch Time support, from Song Yoong Siang

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
  igc: Add launch time support to XDP ZC
  igc: Refactor empty frame insertion for launch time support
  net: stmmac: Add launch time support to XDP ZC
  selftests/bpf: Add launch time request to xdp_hw_metadata
  xsk: Add launch time hardware offload support to XDP Tx metadata
  selftests/bpf: Add simple bpf tests in the tx path for timestamping feature
  bpf: Support selective sampling for bpf timestamping
  bpf: Add BPF_SOCK_OPS_TSTAMP_SENDMSG_CB callback
  bpf: Add BPF_SOCK_OPS_TSTAMP_ACK_CB callback
  bpf: Add BPF_SOCK_OPS_TSTAMP_SND_HW_CB callback
  bpf: Add BPF_SOCK_OPS_TSTAMP_SND_SW_CB callback
  bpf: Add BPF_SOCK_OPS_TSTAMP_SCHED_CB callback
  net-timestamp: Prepare for isolating two modes of SO_TIMESTAMPING
  bpf: Disable unsafe helpers in TX timestamping callbacks
  bpf: Prevent unsafe access to the sock fields in the BPF timestamping callback
  bpf: Prepare the sock_ops ctx and call bpf prog for TX timestamping
  bpf: Add networking timestamping support to bpf_get/setsockopt()
  selftests/bpf: Add rto max for bpf_setsockopt test
  bpf: Support TCP_RTO_MAX_MS for bpf_setsockopt
====================

Link: https://patch.msgid.link/20250221022104.386462-1-martin.lau@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 15:59:47 -08:00
Xiao Liang 85cb3711ac selftests: net: Add test cases for link and peer netns
- Add test for creating link in another netns when a link of the same
   name and ifindex exists in current netns.
 - Add test to verify that link is created in target netns directly -
   no link new/del events should be generated in link netns or current
   netns.
 - Add test cases to verify that link-netns is set as expected for
   various drivers and combination of namespace-related parameters.

Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Link: https://patch.msgid.link/20250219125039.18024-14-shaw.leon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 15:28:03 -08:00
Xiao Liang 0303294162 selftests: net: Add python context manager for netns entering
Change netns of current thread and switch back on context exit.
For example:

    with NetNSEnter("ns1"):
        ip("link add dummy0 type dummy")

The command be executed in netns "ns1".

Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Link: https://patch.msgid.link/20250219125039.18024-13-shaw.leon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 15:28:03 -08:00
Steven Rostedt e85c5e9792 selftests/ftrace: Update fprobe test to check enabled_functions file
A few bugs were found in the fprobe accounting logic along with it using
the function graph infrastructure. Update the fprobe selftest to catch
those bugs in case they or something similar shows up in the future.

The test now checks the enabled_functions file which shows all the
functions attached to ftrace or fgraph. When enabling a fprobe, make sure
that its corresponding function is also added to that file. Also add two
more fprobes to enable to make sure that the fprobe logic works properly
with multiple probes.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Link: https://lore.kernel.org/20250220202055.733001756@goodmis.org
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Tested-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-02-21 09:36:12 -05:00
Chandra Pratap ae9ebda1bc selftests: fix spelling/grammar errors in sysctl/sysctl.sh
Fix the grammatical/spelling errors in sysctl/sysctl.sh.
This fixes all errors pointed out by codespell in the file.

Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-02-21 09:27:04 +01:00
Amery Hung 63817c7711 selftests/bpf: Test struct_ops program with __ref arg calling bpf_tail_call
Test if the verifier rejects struct_ops program with __ref argument
calling bpf_tail_call().

Signed-off-by: Amery Hung <ameryhung@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250220221532.1079331-2-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-02-20 18:44:35 -08:00
Alexei Starovoitov bd4319b6c2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf bpf-6.14-rc4
Cross-merge bpf fixes after downstream PR (bpf-6.14-rc4).

Minor conflict:
  kernel/bpf/btf.c
Adjacent changes:
  kernel/bpf/arena.c
  kernel/bpf/btf.c
  kernel/bpf/syscall.c
  kernel/bpf/verifier.c
  mm/memory.c

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-02-20 18:13:57 -08:00
Jakub Kicinski 932a9249f7 selftests: drv-net: rename queues check_xdp to check_xsk
The test is for AF_XDP, we refer to AF_XDP as XSK.

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Tested-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250219234956.520599-8-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20 17:58:25 -08:00
Jakub Kicinski 4fde839846 selftests: drv-net: improve the use of ksft helpers in XSK queue test
Avoid exceptions when xsk attr is not present, and add a proper ksft
helper for "not in" condition.

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Tested-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250219234956.520599-7-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20 17:58:25 -08:00
Jakub Kicinski 7147713799 selftests: drv-net: add a way to wait for a local process
We use wait_port_listen() extensively to wait for a process
we spawned to be ready. Not all processes will open listening
sockets. Add a method of explicitly waiting for a child to
be ready. Pass a FD to the spawned process and wait for it
to write a message to us. FD number is passed via KSFT_READY_FD
env variable.

Similarly use KSFT_WAIT_FD to let the child process for a sign
that we are done and child should exit. Sending a signal to
a child with shell=True can get tricky.

Make use of this method in the queues test to make it less flaky.

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Acked-by: Joe Damato <jdamato@fastly.com>
Tested-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250219234956.520599-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20 17:58:25 -08:00
Jakub Kicinski d3726ab45c selftests: drv-net: probe for AF_XDP sockets more explicitly
Separate the support check from socket binding for easier refactoring.
Use: ./helper - - just to probe if we can open the socket.

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Tested-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250219234956.520599-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20 17:58:25 -08:00
Jakub Kicinski bab59dcf71 selftests: drv-net: add missing new line in xdp_helper
Kurt and Joe report missing new line at the end of Usage.

Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Tested-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250219234956.520599-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20 17:58:25 -08:00
Jakub Kicinski dabd31baa3 selftests: drv-net: use cfg.rpath() in netlink xsk attr test
The cfg.rpath() helper was been recently added to make formatting
paths for helper binaries easier.

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Tested-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250219234956.520599-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20 17:58:25 -08:00
Jakub Kicinski 846742f7e3 selftests: drv-net: add a warning for bkg + shell + terminate
Joe Damato reports that some shells will fork before running
the command when python does "sh -c $cmd", while bash on my
machine does an exec of $cmd directly.

This will have implications for our ability to terminate
the child process on various configurations of bash and
other shells. Warn about using

	bkg(... shell=True, termininate=True)

most background commands can hopefully exit cleanly (exit_wait).

Link: https://lore.kernel.org/Z7Yld21sv_Ip3gQx@LQ3V64L9R2
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Acked-by: Joe Damato <jdamato@fastly.com>
Tested-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250219234956.520599-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20 17:57:29 -08:00
Linus Torvalds 319fc77f8f BPF fixes:
- Fix a soft-lockup in BPF arena_map_free on 64k page size
   kernels (Alan Maguire)
 
 - Fix a missing allocation failure check in BPF verifier's
   acquire_lock_state (Kumar Kartikeya Dwivedi)
 
 - Fix a NULL-pointer dereference in trace_kfree_skb by adding
   kfree_skb to the raw_tp_null_args set (Kuniyuki Iwashima)
 
 - Fix a deadlock when freeing BPF cgroup storage (Abel Wu)
 
 - Fix a syzbot-reported deadlock when holding BPF map's
   freeze_mutex (Andrii Nakryiko)
 
 - Fix a use-after-free issue in bpf_test_init when
   eth_skb_pkt_type is accessing skb data not containing an
   Ethernet header (Shigeru Yoshida)
 
 - Fix skipping non-existing keys in generic_map_lookup_batch
   (Yan Zhai)
 
 - Several BPF sockmap fixes to address incorrect TCP copied_seq
   calculations, which prevented correct data reads from recv(2)
   in user space (Jiayuan Chen)
 
 - Two fixes for BPF map lookup nullness elision (Daniel Xu)
 
 - Fix a NULL-pointer dereference from vmlinux BTF lookup in
   bpf_sk_storage_tracing_allowed (Jared Kangas)
 
 Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
 -----BEGIN PGP SIGNATURE-----
 
 iIsEABYKADMWIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZ7evlRUcZGFuaWVsQGlv
 Z2VhcmJveC5uZXQACgkQ2yufC7HISIPwHgD/dTvM00Ck4Q73fPivyT7tcqxeXJlD
 D6ggzWl/SG9LAbwA/2/cSgAM9Jm1g7ddvn/S9QaDYOs5GmFl6urq6krs+tYD
 =FCs9
 -----END PGP SIGNATURE-----

Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Pull BPF fixes from Daniel Borkmann:

 - Fix a soft-lockup in BPF arena_map_free on 64k page size kernels
   (Alan Maguire)

 - Fix a missing allocation failure check in BPF verifier's
   acquire_lock_state (Kumar Kartikeya Dwivedi)

 - Fix a NULL-pointer dereference in trace_kfree_skb by adding kfree_skb
   to the raw_tp_null_args set (Kuniyuki Iwashima)

 - Fix a deadlock when freeing BPF cgroup storage (Abel Wu)

 - Fix a syzbot-reported deadlock when holding BPF map's freeze_mutex
   (Andrii Nakryiko)

 - Fix a use-after-free issue in bpf_test_init when eth_skb_pkt_type is
   accessing skb data not containing an Ethernet header (Shigeru
   Yoshida)

 - Fix skipping non-existing keys in generic_map_lookup_batch (Yan Zhai)

 - Several BPF sockmap fixes to address incorrect TCP copied_seq
   calculations, which prevented correct data reads from recv(2) in user
   space (Jiayuan Chen)

 - Two fixes for BPF map lookup nullness elision (Daniel Xu)

 - Fix a NULL-pointer dereference from vmlinux BTF lookup in
   bpf_sk_storage_tracing_allowed (Jared Kangas)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests: bpf: test batch lookup on array of maps with holes
  bpf: skip non exist keys in generic_map_lookup_batch
  bpf: Handle allocation failure in acquire_lock_state
  bpf: verifier: Disambiguate get_constant_map_key() errors
  bpf: selftests: Test constant key extraction on irrelevant maps
  bpf: verifier: Do not extract constant map keys for irrelevant maps
  bpf: Fix softlockup in arena_map_free on 64k page kernel
  net: Add rx_skb of kfree_skb to raw_tp_null_args[].
  bpf: Fix deadlock when freeing cgroup storage
  selftests/bpf: Add strparser test for bpf
  selftests/bpf: Fix invalid flag of recv()
  bpf: Disable non stream socket for strparser
  bpf: Fix wrong copied_seq calculation
  strparser: Add read_sock callback
  bpf: avoid holding freeze_mutex during mmap operation
  bpf: unify VM_WRITE vs VM_MAYWRITE use in BPF map mmaping logic
  selftests/bpf: Adjust data size to have ETH_HLEN
  bpf, test_run: Fix use-after-free issue in eth_skb_pkt_type()
  bpf: Remove unnecessary BTF lookups in bpf_sk_storage_tracing_allowed
2025-02-20 15:37:17 -08:00
Song Yoong Siang 6164847e54 selftests/bpf: Add launch time request to xdp_hw_metadata
Add launch time hardware offload request to xdp_hw_metadata. Users can
configure the delta of launch time relative to HW RX-time using the "-l"
argument. By default, the delta is set to 0 ns, which means the launch time
is disabled. By setting the delta to a non-zero value, the launch time
hardware offload feature will be enabled and requested. Additionally, users
can configure the Tx Queue to be enabled with the launch time hardware
offload using the "-L" argument. By default, Tx Queue 0 will be used.

Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250216093430.957880-3-yoong.siang.song@intel.com
2025-02-20 15:13:45 -08:00
Jason Xing f4924aec58 selftests/bpf: Add simple bpf tests in the tx path for timestamping feature
BPF program calculates a couple of latency deltas between each tx
timestamping callbacks. It can be used in the real world to diagnose
the kernel behaviour in the tx path.

Check the safety issues by accessing a few bpf calls in
bpf_test_access_bpf_calls() which are implemented in the patch 3 and 4.

Check if the bpf timestamping can co-exist with socket timestamping.

There remains a few realistic things[1][2] to highlight:
1. in general a packet may pass through multiple qdiscs. For instance
with bonding or tunnel virtual devices in the egress path.
2. packets may be resent, in which case an ACK might precede a repeat
SCHED and SND.
3. erroneous or malicious peers may also just never send an ACK.

[1]: https://lore.kernel.org/all/67a389af981b0_14e0832949d@willemb.c.googlers.com.notmuch/
[2]: https://lore.kernel.org/all/c329a0c1-239b-4ca1-91f2-cb30b8dd2f6a@linux.dev/

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250220072940.99994-13-kerneljasonxing@gmail.com
2025-02-20 14:30:07 -08:00
Thomas Weißschuh 9c812b01f1 tools/nolibc: add support for 32-bit s390
32-bit s390 is very close to the existing 64-bit implementation.

Some special handling is necessary as there is neither LLVM nor
QEMU support. Also the kernel itself can not build natively for 32-bit
s390, so instead the test program is executed with a 64-bit kernel.

Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20250206-nolibc-s390-v2-2-991ad97e3d58@weissschuh.net
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2025-02-20 22:06:32 +01:00
Thomas Weißschuh 3d1e67c615 selftests/nolibc: rename s390 to s390x
Support for 32-bit s390 is about to be added.
As "s39032" would look horrible, use the another naming scheme.
32-bit s390 is "s390" and 64-bit s390 is "s390x",
similar to how it is handled in various toolchain components.

Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20250206-nolibc-s390-v2-1-991ad97e3d58@weissschuh.net
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2025-02-20 22:06:18 +01:00
Thomas Weißschuh 00ddf4cc97 selftests/nolibc: only run constructor tests on nolibc
The nolibc testsuite can be run against other libcs to test for
interoperability. Some aspects of the constructor execution are not
standardized and musl does not provide all tested feature, for one it
does not provide arguments to the constructors, anymore?

Skip the constructor tests on non-nolibc configurations.

Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20250212-nolibc-test-constructor-v1-1-c963875b3da4@weissschuh.net
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2025-02-20 22:04:12 +01:00
Steven Rostedt e35896f236 selftests/tracing: Allow some more tests to run in instances
The tests:

  trigger-action-hist-xfail.tc
  trigger-onchange-action-hist.tc
  trigger-snapshot-action-hist.tc
  trigger-hist-expressions.tc

can all run in an instance. Test them in an instance as well.

Link: https://lore.kernel.org/r/20250220185846.451234966@goodmis.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2025-02-20 13:15:14 -07:00
Steven Rostedt a58cc70af2 selftests/ftrace: Clean up triggers after setting them
The triggers set in trigger-onchange-action-hist.tc and
trigger-snapshot-action-hist.tc are not cleaned up at the end. These tests
can also be done in instances and without cleaning up the triggers, the
instances can not be removed as they are still "busy".

Link: https://lore.kernel.org/r/20250220185846.291817731@goodmis.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2025-02-20 13:15:07 -07:00
Steven Rostedt 4a3134b114 selftests/tracing: Test only toplevel README file not the instances
For the tests that have both a README attribute as well as the instance
flag to run the tests as an instance, the instance version will always
exit with UNSUPPORTED. That's because the instance directory does not
contain a README file. Currently, the tests check for a README file in the
directory that the test runs in and if there's a requirement for something
to be present in the README file, it will not find it, as the instance
directory doesn't have it.

Have the tests check if the current directory is an instance directory,
and if it is, check two directories above the current directory for the
README file:

  /sys/kernel/tracing/README
  /sys/kernel/tracing/instances/foo/../../README

Link: https://lore.kernel.org/r/20250220185846.130216270@goodmis.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2025-02-20 13:15:01 -07:00
Jakub Kicinski 5d6ba5ab85 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.14-rc4).

No conflicts or adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20 10:37:30 -08:00
Linus Torvalds 27eddbf344 Smaller than usual with no fixes from any subtree.
Current release - regressions:
 
   - core: fix race of rtnl_net_lock(dev_net(dev)).
 
 Previous releases - regressions:
 
   - core: remove the single page frag cache for good
 
   - flow_dissector: fix handling of mixed port and port-range keys
 
   - sched: cls_api: fix error handling causing NULL dereference
 
   - tcp:
     - adjust rcvq_space after updating scaling ratio
     - drop secpath at the same time as we currently drop dst
 
   - eth: gtp: suppress list corruption splat in gtp_net_exit_batch_rtnl().
 
 Previous releases - always broken:
 
   - vsock:
     - fix variables initialization during resuming
     - for connectible sockets allow only connected
 
   - eth: geneve: fix use-after-free in geneve_find_dev().
 
   - eth: ibmvnic: don't reference skb after sending to VIOS
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAme3Dc0SHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkQToQAKKTNLnFkYGFevxOukoZtU1l4ZOiaavg
 FlSsCVQrXfOvq5tIbkT7X86F3jYBNElNHvSDAFNQu5ZayEjqJHljSQE17rVVhepX
 Gvo9Jfk/ERwwd9vD78+KSvGMzSYAiIFoZmgLjaNyiFH53gMku8K41NRQAI2EeUzG
 G7lK/XYHvp/N7Ikt6ZMTFFW5NH7Wt2oeKZ8DbNBhK9+9lCet4s6PWzmGYrYkGUyy
 0iuYh09tKCGrBH0BpHMIGokqttVawjdg4tCJUMXurVoOWus0dKXuIdpKQuCdm8jO
 lV89xotE7Icw9CJTFQQZNn2C94J3RL4xtqo4fOuO1ztll+ma/nbPp9lJgy2P1WRI
 DqWJart1YuzC4Ef1NQiNeqRjxKoLDHbuu2A1zdAHMyHxZsKaQWowhhjylQLYsU3N
 d0WHlo43VQf95wx9cqdsGmEopk5YPLqBS1Wz/uL/m7y/+dmK4nKCcmqhIQ1QG23L
 SOhuOmHDs3mDkXusxr7NULCwGzjy3rPSExGrM68kiDEUX7cWFvVGmrRHf2bBiv7A
 gJ6UZ3J4au477adHjj2lBn3n1iIQDSN04gjPYKjunSEaMI2nqr55dN/aAnKFTjtm
 fMPXMWOHDfkjtX0iV69/345dXyzL5r4JHNwtTwoLx1QRMm6WoznJveHI6qXQ1RGb
 HdG8y9A9ZfLn
 =pEjU
 -----END PGP SIGNATURE-----

Merge tag 'net-6.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Smaller than usual with no fixes from any subtree.

  Current release - regressions:

   - core: fix race of rtnl_net_lock(dev_net(dev))

  Previous releases - regressions:

   - core: remove the single page frag cache for good

   - flow_dissector: fix handling of mixed port and port-range keys

   - sched: cls_api: fix error handling causing NULL dereference

   - tcp:
       - adjust rcvq_space after updating scaling ratio
       - drop secpath at the same time as we currently drop dst

   - eth: gtp: suppress list corruption splat in gtp_net_exit_batch_rtnl().

  Previous releases - always broken:

   - vsock:
       - fix variables initialization during resuming
       - for connectible sockets allow only connected

   - eth:
       - geneve: fix use-after-free in geneve_find_dev()
       - ibmvnic: don't reference skb after sending to VIOS"

* tag 'net-6.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (34 commits)
  Revert "net: skb: introduce and use a single page frag cache"
  net: allow small head cache usage with large MAX_SKB_FRAGS values
  nfp: bpf: Add check for nfp_app_ctrl_msg_alloc()
  tcp: drop secpath at the same time as we currently drop dst
  net: axienet: Set mac_managed_pm
  arp: switch to dev_getbyhwaddr() in arp_req_set_public()
  net: Add non-RCU dev_getbyhwaddr() helper
  sctp: Fix undefined behavior in left shift operation
  selftests/bpf: Add a specific dst port matching
  flow_dissector: Fix port range key handling in BPF conversion
  selftests/net/forwarding: Add a test case for tc-flower of mixed port and port-range
  flow_dissector: Fix handling of mixed port and port-range keys
  geneve: Suppress list corruption splat in geneve_destroy_tunnels().
  gtp: Suppress list corruption splat in gtp_net_exit_batch_rtnl().
  dev: Use rtnl_net_dev_lock() in unregister_netdev().
  net: Fix dev_net(dev) race in unregister_netdevice_notifier_dev_net().
  net: Add net_passive_inc() and net_passive_dec().
  net: pse-pd: pd692x0: Fix power limit retrieval
  MAINTAINERS: trim the GVE entry
  gve: set xdp redirect target only when it is available
  ...
2025-02-20 10:19:54 -08:00
Christian Brauner 540dcf0f44
selftests/nsfs: add ioctl validation tests
Add simple tests to validate that non-nsfs ioctls are rejected.

Link: https://lore.kernel.org/r/20250219-work-nsfs-v1-2-21128d73c5e8@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-20 09:13:52 +01:00
Jakub Kicinski 0d0f4174f6 selftests: drv-net: add a simple TSO test
Add a simple test for TSO. Send a few MB of data and check device
stats to verify that the device was performing segmentation.
Do the same thing over a few tunnel types.

Injecting GSO packets directly would give us more ability to test
corner cases, but perhaps starting simple is good enough?

  # ./ksft-net-drv/drivers/net/hw/tso.py
  # Detected qstat for LSO wire-packets
  KTAP version 1
  1..14
  ok 1 tso.ipv4 # SKIP Test requires IPv4 connectivity
  ok 2 tso.vxlan4_ipv4 # SKIP Test requires IPv4 connectivity
  ok 3 tso.vxlan6_ipv4 # SKIP Test requires IPv4 connectivity
  ok 4 tso.vxlan_csum4_ipv4 # SKIP Test requires IPv4 connectivity
  ok 5 tso.vxlan_csum6_ipv4 # SKIP Test requires IPv4 connectivity
  ok 6 tso.gre4_ipv4 # SKIP Test requires IPv4 connectivity
  ok 7 tso.gre6_ipv4 # SKIP Test requires IPv4 connectivity
  ok 8 tso.ipv6
  ok 9 tso.vxlan4_ipv6
  ok 10 tso.vxlan6_ipv6
  ok 11 tso.vxlan_csum4_ipv6
  ok 12 tso.vxlan_csum6_ipv6
  # Testing with mangleid enabled
  ok 13 tso.gre4_ipv6
  ok 14 tso.gre6_ipv6
  # Totals: pass:7 fail:0 xfail:0 xpass:0 skip:7 error:0

Note that the test currently depends on the driver reporting
the LSO count via qstat, which appears to be relatively rare
(virtio, cisco/enic, sfc/efc; but virtio needs host support).

Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250218225426.77726-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-19 19:08:50 -08:00
Jakub Kicinski de94e86974 selftests: drv-net: store addresses in dict indexed by ipver
Looks like more and more tests want to iterate over IP version,
run the same test over ipv4 and ipv6. The current naming of
members in the env class makes it a bit awkward, we have
separate members for ipv4 and ipv6 parameters.

Store the parameters inside dicts, so that tests can easily
index them with ip version.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250218225426.77726-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-19 19:08:50 -08:00
Jakub Kicinski 2aefca8e1f selftests: drv-net: get detailed interface info
We already record output of ip link for NETIF in env for easy access.
Record the detailed version. TSO test will want to know the max tso size.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20250218225426.77726-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-19 19:08:50 -08:00
Jakub Kicinski 2217bcb491 selftests: drv-net: resolve remote interface name
Find out and record in env the name of the interface which remote host
will use for the IP address provided via config.

Interface name is useful for mausezahn and for setting up tunnels.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20250218225426.77726-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-19 19:08:49 -08:00
Suchit 23dcacff2d selftests: net: Fix minor typos in MPTCP and psock tests
Fixes minor spelling errors:
- `simult_flows.sh`: "al testcases" -> "all testcases"
- `psock_tpacket.c`: "accross" -> "across"

Signed-off-by: Suchit Karunakaran <suchitkarunakaran@gmail.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250218165923.20740-1-suchitkarunakaran@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-19 19:02:48 -08:00
Cong Wang 15de6ba95d selftests/bpf: Add a specific dst port matching
After this patch:

 #102/1   flow_dissector_classification/ipv4:OK
 #102/2   flow_dissector_classification/ipv4_continue_dissect:OK
 #102/3   flow_dissector_classification/ipip:OK
 #102/4   flow_dissector_classification/gre:OK
 #102/5   flow_dissector_classification/port_range:OK
 #102/6   flow_dissector_classification/ipv6:OK
 #102     flow_dissector_classification:OK
 Summary: 1/6 PASSED, 0 SKIPPED, 0 FAILED

Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Link: https://patch.msgid.link/20250218043210.732959-5-xiyou.wangcong@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-19 18:54:59 -08:00
Cong Wang dfc1580f96 selftests/net/forwarding: Add a test case for tc-flower of mixed port and port-range
After this patch:

 # ./tc_flower_port_range.sh
 TEST: Port range matching - IPv4 UDP                                [ OK ]
 TEST: Port range matching - IPv4 TCP                                [ OK ]
 TEST: Port range matching - IPv6 UDP                                [ OK ]
 TEST: Port range matching - IPv6 TCP                                [ OK ]
 TEST: Port range matching - IPv4 UDP Drop                           [ OK ]

Cc: Qiang Zhang <dtzq01@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250218043210.732959-3-xiyou.wangcong@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-19 18:54:59 -08:00
Ido Schimmel f5d783c088 selftests: fib_rule_tests: Add port mask match tests
Add tests for FIB rules that match on source and destination ports with
a mask. Test both good and bad flows.

 # ./fib_rule_tests.sh
 IPv6 FIB rule tests
 [...]
    TEST: rule6 check: sport and dport redirect to table                [ OK ]
    TEST: rule6 check: sport and dport no redirect to table             [ OK ]
    TEST: rule6 del by pref: sport and dport redirect to table          [ OK ]
    TEST: rule6 check: sport and dport range redirect to table          [ OK ]
    TEST: rule6 check: sport and dport range no redirect to table       [ OK ]
    TEST: rule6 del by pref: sport and dport range redirect to table    [ OK ]
    TEST: rule6 check: sport and dport masked redirect to table         [ OK ]
    TEST: rule6 check: sport and dport masked no redirect to table      [ OK ]
    TEST: rule6 del by pref: sport and dport masked redirect to table   [ OK ]
 [...]

 Tests passed: 292
 Tests failed:   0

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250217134109.311176-9-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-19 18:43:38 -08:00
Ido Schimmel 94694aa641 selftests: fib_rule_tests: Add port range match tests
Currently, only matching on specific ports is tested. Add port range
testing to make sure this use case does not regress.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250217134109.311176-8-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-19 18:43:38 -08:00
Jordan Rome 7042882abc selftests/bpf: Add tests for bpf_copy_from_user_task_str
This adds tests for both the happy path and the
error path (with and without the BPF_F_PAD_ZEROS flag).

Signed-off-by: Jordan Rome <linux@jordanrome.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250213152125.1837400-3-linux@jordanrome.com
2025-02-19 17:01:36 -08:00
Alexis Lothoré (eBPF Foundation) ac13c50872 selftests/bpf: Enable kprobe_multi tests for ARM64
The kprobe_multi feature was disabled on ARM64 due to the lack of fprobe
support.

The fprobe rewrite on function_graph has been recently merged and thus
brought support for fprobes on arm64.  This then enables kprobe_multi
support on arm64, and so the corresponding tests can now be run on this
architecture.

Remove the tests depending on kprobe_multi from DENYLIST.aarch64 to
allow those to run in CI. CONFIG_FPROBE is already correctly set in
tools/testing/selftests/bpf/config

Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250219-enable_kprobe_multi_tests-v1-1-faeec99240c8@bootlin.com
2025-02-19 14:57:05 -08:00
Jason Xing 7a93ba8048 selftests/bpf: Add rto max for bpf_setsockopt test
Test the TCP_RTO_MAX_MS optname in the existing setget_sockopt test.

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250219081333.56378-3-kerneljasonxing@gmail.com
2025-02-19 12:30:52 -08:00
Bastien Curutchet (eBPF Foundation) 157feaaf18 selftests/bpf: ns_current_pid_tgid: Use test_progs's ns_ feature
Two subtests use the test_in_netns() function to run the test in a
dedicated network namespace. This can now be done directly through the
test_progs framework with a test name starting with 'ns_'.

Replace the use of test_in_netns() by test_ns_* calls.

Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Link: https://lore.kernel.org/r/20250219-b4-tc_links-v2-4-14504db136b7@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-02-19 09:46:02 -08:00
Bastien Curutchet (eBPF Foundation) 207cd7578a selftests/bpf: tc_links/tc_opts: Unserialize tests
Tests are serialized because they all use the loopback interface.

Replace the 'serial_test_' prefixes with 'test_ns_' to benefit from the
new test_prog feature which creates a dedicated namespace for each test,
allowing them to run in parallel.

Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Link: https://lore.kernel.org/r/20250219-b4-tc_links-v2-3-14504db136b7@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-02-19 09:46:02 -08:00
Bastien Curutchet (eBPF Foundation) c047e0e0e4 selftests/bpf: Optionally open a dedicated namespace to run test in it
Some tests are serialized to prevent interference with others.

Open a dedicated network namespace when a test name starts with 'ns_' to
allow more test parallelization. Use the test name as namespace name to
avoid conflict between namespaces.

Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Link: https://lore.kernel.org/r/20250219-b4-tc_links-v2-2-14504db136b7@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-02-19 09:46:02 -08:00
Bastien Curutchet (eBPF Foundation) 4a06c5251a selftests/bpf: ns_current_pid_tgid: Rename the test function
Next patch will add a new feature to test_prog to run tests in a
dedicated namespace if the test name starts with 'ns_'. Here the test
name already starts with 'ns_' and creates some namespaces which would
conflict with the new feature.

Rename the test to avoid this conflict.

Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Link: https://lore.kernel.org/r/20250219-b4-tc_links-v2-1-14504db136b7@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-02-19 09:46:02 -08:00
Christian Brauner a1579f6bf6
selftests/ovl: add third selftest for "override_creds"
Add a simple test to verify that the new "override_creds" option works.

Link: https://lore.kernel.org/r/20250219-work-overlayfs-v3-4-46af55e4ceda@kernel.org
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-19 14:32:12 +01:00
Christian Brauner 6e5ed6587e
selftests/ovl: add second selftest for "override_creds"
Add a simple test to verify that the new "override_creds" option works.

Link: https://lore.kernel.org/r/20250219-work-overlayfs-v3-3-46af55e4ceda@kernel.org
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-19 14:32:12 +01:00
Christian Brauner c68946ee7e
selftests/filesystems: add utils.{c,h}
Add a new set of helpers that will be used in follow-up patches.

Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-19 14:32:12 +01:00
Christian Brauner 96f0943259
selftests/ovl: add first selftest for "override_creds"
Add a simple test to verify that the new "override_creds" option works.

Link: https://lore.kernel.org/r/20250219-work-overlayfs-v3-2-46af55e4ceda@kernel.org
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-19 14:32:12 +01:00
Eduard Zingerman 6361cd26e4 selftests/bpf: check states pruning for deeply nested iterator
A test case with ridiculously deep bpf_for() nesting and
a conditional update of a stack location.

Consider the innermost loop structure:

	1: bpf_for(o, 0, 10)
	2:	if (unlikely(bpf_get_prandom_u32()))
	3:		buf[0] = 42;
	4: <exit>

Assuming that verifier.c:clean_live_states() operates w/o change from
the previous patch (e.g. as on current master) verification would
proceed as follows:
- at (1) state {buf[0]=?,o=drained}:
  - checkpoint
  - push visit to (2) for later
- at (4) {buf[0]=?,o=drained}
- pop (2) {buf[0]=?,o=active}, push visit to (3) for later
- at (1) {buf[0]=?,o=active}
  - checkpoint
  - push visit to (2) for later
- at (4) {buf[0]=?,o=drained}
- pop (2) {buf[0]=?,o=active}, push visit to (3) for later
- at (1) {buf[0]=?,o=active}:
  - checkpoint reached, checkpoint's branch count becomes 0
  - checkpoint is processed by clean_live_states() and
    becomes {o=active}
- pop (3) {buf[0]=42,o=active}
- at (1), {buf[0]=42,o=active}
  - checkpoint
  - push visit to (2) for later
- at (4) {buf[0]=42,o=drained}
- pop (2) {buf[0]=42,o=active}, push visit to (3) for later
- at (1) {buf[0]=42,o=active}, checkpoint reached
- pop (3) {buf[0]=42,o=active}
- at (1) {buf[0]=42,o=active}:
  - checkpoint reached, checkpoint's branch count becomes 0
  - checkpoint is processed by clean_live_states() and
    becomes {o=active}
- ...

Note how clean_live_states() converted the checkpoint
{buf[0]=42,o=active} to {o=active} and it can no longer be matched
against {buf[0]=<any>,o=active}, because iterator based states
are compared using stacksafe(... RANGE_WITHIN), that requires
stack slots to have same types. At the same time there are
still states {buf[0]=42,o=active} pushed to DFS stack.

This behaviour becomes exacerbated with multiple nesting levels,
here are veristat results:
- nesting level 1: 69 insns
- nesting level 2: 258 insns
- nesting level 3: 900 insns
- nesting level 4: 4754 insns
- nesting level 5: 35944 insns
- nesting level 6: 312558 insns
- nesting level 7: 1M limit

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250215110411.3236773-5-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-02-18 19:22:59 -08:00
Eduard Zingerman 6da35da1a3 selftests/bpf: test correct loop_entry update in copy_verifier_state
A somewhat cumbersome test case sensitive to correct copying of
bpf_verifier_state->loop_entry fields in
verifier.c:copy_verifier_state().
W/o the fix from a previous commit the program is accepted as safe.

     1:  /* poison block */
     2:  if (random() != 24) {       // assume false branch is placed first
     3:    i = iter_new();
     4:    while (iter_next(i));
     5:    iter_destroy(i);
     6:    return;
     7:  }
     8:
     9:  /* dfs_depth block */
    10:  for (i = 10; i > 0; i--);
    11:
    12:  /* main block */
    13:  i = iter_new();             // fp[-16]
    14:  b = -24;                    // r8
    15:  for (;;) {
    16:    if (iter_next(i))
    17:      break;
    18:    if (random() == 77) {     // assume false branch is placed first
    19:      *(u64 *)(r10 + b) = 7;  // this is not safe when b == -25
    20:      iter_destroy(i);
    21:      return;
    22:    }
    23:    if (random() == 42) {     // assume false branch is placed first
    24:      b = -25;
    25:    }
    26:  }
    27:  iter_destroy(i);

The goal of this example is to:
(a) poison env->cur_state->loop_entry with a state S,
    such that S->branches == 0;
(b) set state S as a loop_entry for all checkpoints in
    /* main block */, thus forcing NOT_EXACT states comparisons;
(c) exploit incorrect loop_entry set for checkpoint at line 18
    by first creating a checkpoint with b == -24 and then
    pruning the state with b == -25 using that checkpoint.

The /* poison block */ is responsible for goal (a).
It forces verifier to first validate some unrelated iterator based
loop, which leads to an update_loop_entry() call in is_state_visited(),
which places checkpoint created at line 4 as env->cur_state->loop_entry.
Starting from line 8, the branch count for that checkpoint is 0.

The /* dfs_depth block */ is responsible for goal (b).
It abuses the fact that update_loop_entry(cur, hdr) only updates
cur->loop_entry when hdr->dfs_depth <= cur->dfs_depth.
After line 12 every state has dfs_depth bigger then dfs_depth of
poisoned env->cur_state->loop_entry. Thus the above condition is never
true for lines 12-27.

The /* main block */ is responsible for goal (c).
Verification proceeds as follows:
- checkpoint {b=-24,i=active} created at line 16;
- jump 18->23 is verified first, jump to 19 pushed to stack;
- jump 23->26 is verified first, jump to 24 pushed to stack;
- checkpoint {b=-24,i=active} created at line 15;
- current state is pruned by checkpoint created at line 16,
  this sets branches count for checkpoint at line 15 to 0;
- jump to 24 is popped from stack;
- line 16 is reached in state {b=-25,i=active};
- this is pruned by a previous checkpoint {b=-24,i=active}:
  - checkpoint's loop_entry is poisoned and has branch count of 0,
    hence states are compared using NOT_EXACT rules;
  - b is not marked precise yet.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250215110411.3236773-3-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-02-18 19:22:58 -08:00
Chandra Mohan Sundar f29e41454b selftests: net: Fix few spelling mistakes
Fix few spelling mistakes in net selftests

Signed-off-by: Chandra Mohan Sundar <chandru.dav@gmail.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20250217141520.81033-1-chandru.dav@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18 18:10:31 -08:00
Yan Zhai d66b773917 selftests: bpf: test batch lookup on array of maps with holes
Iterating through array of maps may encounter non existing keys. The
batch operation should not fail on when this happens.

Signed-off-by: Yan Zhai <yan@cloudflare.com>
Acked-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/9007237b9606dc2ee44465a4447fe46e13f3bea6.1739171594.git.yan@cloudflare.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-02-18 17:27:37 -08:00
Bastien Curutchet (eBPF Foundation) e06f5bfd93 selftests/bpf: Remove test_xdp_redirect_multi.sh
The tests done by test_xdp_redirect_multi.sh are now fully covered by
the CI through test_xdp_veth.c.

Remove test_xdp_redirect_multi.sh
Remove xdp_redirect_multi.c that was used by the script to load and
attach the BPF programs.
Remove their entries in the Makefile

Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250212-redirect-multi-v5-6-fd0d39fca6e6@bootlin.com
2025-02-18 13:56:34 -08:00
Bastien Curutchet (eBPF Foundation) a93bfd824d selftests/bpf: test_xdp_veth: Add XDP program on egress test
XDP programs loaded on egress is tested by test_xdp_redirect_multi.sh
but not by the test_progs framework.

Add a test case in test_xdp_veth.c to test the XDP program on egress.
Use the same BPF program than test_xdp_redirect_multi.sh that replaces
the source MAC address by one provided through a BPF map.
Use a BPF program that stores the source MAC of received packets in a
map to check the test results.

Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250212-redirect-multi-v5-5-fd0d39fca6e6@bootlin.com
2025-02-18 13:56:34 -08:00
Bastien Curutchet (eBPF Foundation) 1e7e634542 selftests/bpf: test_xdp_veth: Add XDP broadcast redirection tests
XDP redirections with BPF_F_BROADCAST and BPF_F_EXCLUDE_INGRESS flags
are tested by test_xdp_redirect_multi.sh but not within the test_progs
framework.

Add a broadcast test case in test_xdp_veth.c to test them.
Use the same BPF programs than the one used by
test_xdp_redirect_multi.sh.
Use a BPF map to select the broadcast flags.
Use a BPF map with an entry per veth to check whether packets are
received or not

Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250212-redirect-multi-v5-4-fd0d39fca6e6@bootlin.com
2025-02-18 13:56:34 -08:00