This allows the user to set cflags when building tests that use this
shared build infrastructure.
For example, it enables building with -Werror so that patch-check scripts
will fail:
make -C tools/testing/vma -j EXTRA_CFLAGS=-Werror
Link: https://lkml.kernel.org/r/20250828-b4-vma-no-atomic-h-v2-3-02d146a58ed2@google.com
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Acked-by: Pedro Falcato <pfalcato@suse.de>
Cc: Jann Horn <jannh@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
There is an arch/ tree under tools. This contains some useful stuff, to
make that available, add it to the -I flags. This requires $(SRCARCH),
which is provided by Makefile.arch, so include that..
There still aren't that many headers so also just smush all of them into
SHARED_DEPS instead of starting to do any header dependency hocus pocus.
Link: https://lkml.kernel.org/r/20250828-b4-vma-no-atomic-h-v2-2-02d146a58ed2@google.com
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Acked-by: Pedro Falcato <pfalcato@suse.de>
Cc: Jann Horn <jannh@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The test requires at least 2 * (bytes/page_size) hugetlb memory, since we
require identical number of hugepages for src and dst location. Fix this.
Along with the above, as explained in patch "selftests/mm/uffd-stress:
Make test operate on less hugetlb memory", the racy nature of the test
requires that we have some extra number of hugepages left beyond what is
required. Therefore, stricten this constraint.
Link: https://lkml.kernel.org/r/20250909061531.57272-3-dev.jain@arm.com
Fixes: 5a6aa60d18 ("selftests/mm: skip uffd hugetlb tests with insufficient hugepages")
Signed-off-by: Dev Jain <dev.jain@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "selftests/mm: uffd-stress fixes", v2.
This patchset ensures that the number of hugepages is correctly set in the
system so that the uffd-stress test does not fail due to the racy nature
of the test. Patch 1 changes the hugepage constraint in the
run_vmtests.sh script, whereas patch 2 changes the constraint in the test
itself.
This patch (of 2):
We observed uffd-stress selftest failure on arm64 and intermittent
failures on x86 too:
running ./uffd-stress hugetlb-private 128 32
bounces: 17, mode: rnd read, ERROR: UFFDIO_COPY error: -12 (errno=12, @uffd-common.c:617) [FAIL]
not ok 18 uffd-stress hugetlb-private 128 32 # exit=1
For this particular case, the number of free hugepages from run_vmtests.sh
will be 128, and the test will allocate 64 hugepages in the source
location. The stress() function will start spawning threads which will
operate on the destination location, triggering uffd-operations like
UFFDIO_COPY from src to dst, which means that we will require 64 more
hugepages for the dst location.
Let us observe the locking_thread() function. It will lock the mutex kept
at dst, triggering uffd-copy. Suppose that 127 (64 for src and 63 for
dst) hugepages have been reserved. In case of BOUNCE_RANDOM, it may
happen that two threads trying to lock the mutex at dst, try to do so at
the same hugepage number. If one thread succeeds in reserving the last
hugepage, then the other thread may fail in alloc_hugetlb_folio(),
returning -ENOMEM. I can confirm that this is indeed the case by this
hacky patch:
:--- a/mm/hugetlb.c
; +++ b/mm/hugetlb.c
; @@ -6929,6 +6929,11 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
;
; folio = alloc_hugetlb_folio(dst_vma, dst_addr, false);
; if (IS_ERR(folio)) {
; + pte_t *actual_pte = hugetlb_walk(dst_vma, dst_addr, PMD_SIZE);
; + if (actual_pte) {
; + ret = -EEXIST;
; + goto out;
; + }
; ret = -ENOMEM;
; goto out;
; }
This code path gets triggered indicating that the PMD at which one thread
is trying to map a hugepage, gets filled by a racing thread.
Therefore, instead of using freepgs to compute the amount of memory, use
freepgs - (min(32, nr_cpus) - 1), so that the test still has some extra
hugepages to use. The adjustment is a function of min(32, nr_cpus) - the
value of nr_parallel in the test - because in the worst case, nr_parallel
number of threads will try to map a hugepage on the same PMD, one will win
the allocation race, and the other nr_parallel - 1 threads will fail, so
we need extra nr_parallel - 1 hugepages to satisfy this request. Note
that, in case the adjusted value underflows, there is a check for the
number of free hugepages in the test itself, which will fail:
get_free_hugepages() < bytes / page_size A negative value will be passed
on to bytes which is of type size_t, thus the RHS will become a large
value and the check will fail, so we are safe.
Link: https://lkml.kernel.org/r/20250909061531.57272-1-dev.jain@arm.com
Link: https://lkml.kernel.org/r/20250909061531.57272-2-dev.jain@arm.com
Signed-off-by: Dev Jain <dev.jain@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
As per Documentation/process/deprecated.rst, dynamic size calculations
should not be performed in memory allocator arguments due to possible
overflows.
Replace malloc with calloc to avoid open-ended arithmetic and prevent
possible overflows.
Link: https://lkml.kernel.org/r/20250825170643.63174-1-viswanathiyyappan@gmail.com
Signed-off-by: I Viswanath <viswanathiyyappan@gmail.com>
Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed by: Donet Tom <donettom@linux.ibm.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Several selftests subdirectories duplicated the define __maybe_unused,
leading to redundant code. Move to kselftest.h header and remove other
definitions.
This addresses the duplication noted in the proc-pid-vm warning fix
Link: https://lkml.kernel.org/r/20250821101159.2238-1-reddybalavignesh9979@gmail.com
Signed-off-by: Bala-Vignesh-Reddy <reddybalavignesh9979@gmail.com>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Link:https://lore.kernel.org/lkml/20250820143954.33d95635e504e94df01930d0@linux-foundation.org/
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Acked-by: SeongJae Park <sj@kernel.org>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Acked-by: Mickal Salan <mic@digikod.net> [landlock]
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Fix simple typos in function name and console message.
Link: https://lkml.kernel.org/r/20250823170208.184149-1-allyheev@gmail.com
Signed-off-by: ally heev <allyheev@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
As David suggested, currently we don't have a high level test case to
verify the behavior of rmap. This patch introduce the verification on
rmap by migration.
The general idea is if migrate one shared page between processes, this
would be reflected in all related processes. Otherwise, we have problem
in rmap.
Currently it covers following four scenarios:
* anonymous page
* shmem page
* pagecache page
* ksm page
Link: https://lkml.kernel.org/r/20250819080047.10063-3-richard.weiyang@gmail.com
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "test that rmap behaves as expected", v4.
As David suggested, currently we don't have a high level test case to
verify the behavior of rmap. This patch set introduce the verification
on rmap by migration.
Patch 1 is a preparation to move ksm related operations into vm_util.
Patch 2 is the new test case for rmap.
Currently it covers following four scenarios:
* anonymous page
* shmem page
* pagecache page
* ksm page
This patch (of 2):
There are some general ksm operations could be used by other related
test cases. Put them into vm_util for common use.
This is a preparation patch for later use.
Link: https://lkml.kernel.org/r/20250819080047.10063-1-richard.weiyang@gmail.com
Link: https://lkml.kernel.org/r/20250819080047.10063-2-richard.weiyang@gmail.com
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Instead of just checking the existence of PMD folios before and after folio
split tests, use check_folio_orders() to check after-split folio orders.
The split ranges in split_thp_in_pagecache_to_order_at() are changed to
[addr, addr + pagesize) for every pmd_pagesize. It prevents folios within
the range being split multiple times due to debugfs split function always
perform splits with a pagesize step for a given range.
The following tests are not changed:
1. split_pte_mapped_thp: the test already uses kpageflags to check;
2. split_file_backed_thp: no vaddr available.
Link: https://lkml.kernel.org/r/20250818184622.1521620-6-ziy@nvidia.com
Signed-off-by: Zi Yan <ziy@nvidia.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: wang lian <lianux.mm@gmail.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The helper gathers a folio order statistics of folios within a virtual
address range and checks it against a given order list. It aims to provide
a more precise folio order check instead of just checking the existence of
PMD folios.
The helper will be used the upcoming commit.
Link: https://lkml.kernel.org/r/20250818184622.1521620-5-ziy@nvidia.com
Signed-off-by: Zi Yan <ziy@nvidia.com>
Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: wang lian <lianux.mm@gmail.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
and rename it to is_backed_by_folio().
is_backed_by_folio() checks if the given vaddr is backed a folio with
a given order. It does so by:
1. getting the pfn of the vaddr;
2. checking kpageflags of the pfn;
if order is greater than 0:
3. checking kpageflags of the head pfn;
4. checking kpageflags of all tail pfns.
pmd_order is added to split_huge_page_test.c and replaces max_order.
[ziy@nvidia.com: reduce code duplication, per David]
Link: https://lkml.kernel.org/r/F54782D6-65A3-4D35-AE03-8ADE636EE258@nvidia.com
Link: https://lkml.kernel.org/r/20250818184622.1521620-4-ziy@nvidia.com
Signed-off-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: wang lian <lianux.mm@gmail.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
All functions are only used within the file.
Link: https://lkml.kernel.org/r/20250818184622.1521620-3-ziy@nvidia.com
Signed-off-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: wang lian <lianux.mm@gmail.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The test will set the global system THP setting to never, madvise or
always depending on the fixture variant and the 2M setting to inherit
before it starts (and reset to original at teardown). The fixture setup
will also test if PR_SET_THP_DISABLE prctl call can be made with
PR_THP_DISABLE_EXCEPT_ADVISED and skip if it fails.
This tests if the process can:
- successfully get the policy to disable THPs expect for madvise.
- get hugepages only on MADV_HUGE and MADV_COLLAPSE if the global policy
is madvise/always and only with MADV_COLLAPSE if the global policy is
never.
- successfully reset the policy of the process.
- after reset, only get hugepages with:
- MADV_COLLAPSE when policy is set to never.
- MADV_HUGE and MADV_COLLAPSE when policy is set to madvise.
- always when policy is set to "always".
- never get a THP with MADV_NOHUGEPAGE.
- repeat the above tests in a forked process to make sure the policy is
carried across forks.
Test results:
./prctl_thp_disable
TAP version 13
1..12
ok 1 prctl_thp_disable_completely.never.nofork
ok 2 prctl_thp_disable_completely.never.fork
ok 3 prctl_thp_disable_completely.madvise.nofork
ok 4 prctl_thp_disable_completely.madvise.fork
ok 5 prctl_thp_disable_completely.always.nofork
ok 6 prctl_thp_disable_completely.always.fork
ok 7 prctl_thp_disable_except_madvise.never.nofork
ok 8 prctl_thp_disable_except_madvise.never.fork
ok 9 prctl_thp_disable_except_madvise.madvise.nofork
ok 10 prctl_thp_disable_except_madvise.madvise.fork
ok 11 prctl_thp_disable_except_madvise.always.nofork
ok 12 prctl_thp_disable_except_madvise.always.fork
[usamaarif642@gmail.com: return after executing test in child process]
Link: https://lkml.kernel.org/r/3dca2de4-9a6a-4efe-a86c-83f9509831fc@gmail.com
Link: https://lkml.kernel.org/r/20250815135549.130506-8-usamaarif642@gmail.com
Signed-off-by: Usama Arif <usamaarif642@gmail.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yafang <laoar.shao@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The test will set the global system THP setting to never, madvise or
always depending on the fixture variant and the 2M setting to inherit
before it starts (and reset to original at teardown). The fixture setup
will also test if PR_SET_THP_DISABLE prctl call can be made to disable all
THPs and skip if it fails.
This tests if the process can:
- successfully get the policy to disable THPs completely.
- never get a hugepage when the THPs are completely disabled
with the prctl, including with MADV_HUGE and MADV_COLLAPSE.
- successfully reset the policy of the process.
- after reset, only get hugepages with:
- MADV_COLLAPSE when policy is set to never.
- MADV_HUGE and MADV_COLLAPSE when policy is set to madvise.
- always when policy is set to "always".
- never get a THP with MADV_NOHUGEPAGE.
- repeat the above tests in a forked process to make sure
the policy is carried across forks.
[usamaarif642@gmail.com: return after executing test in child process]
Link: https://lkml.kernel.org/r/2d0ea708-ecba-4021-b6ca-e93f1413d60a@gmail.com
[usamaarif642@gmail.com: include linux/mman.h for prctl_thp_disable]
Link: https://lkml.kernel.org/r/20250910204609.1720498-1-usamaarif642@gmail.com
Link: https://lore.kernel.org/all/c8249725-e91d-4c51-b9bb-40305e61e20d@sirena.org.uk/
Link: https://lkml.kernel.org/r/20250815135549.130506-7-usamaarif642@gmail.com
Signed-off-by: Usama Arif <usamaarif642@gmail.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yafang <laoar.shao@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The function already has 2 uses and will have a 3rd one in prctl
selftests. The pagesize argument is added into the function, as it's not
a global variable anymore. No functional change intended with this patch.
Link: https://lkml.kernel.org/r/20250815135549.130506-6-usamaarif642@gmail.com
Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Usama Arif <usamaarif642@gmail.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yafang <laoar.shao@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Gracefully skip test if userfaultfd is not supported (ENOSYS) or not
permitted (EPERM), instead of failing. This avoids misleading failures
with clear skip messages.
--------------
Before Patch
--------------
~ running ./hugepage-mremap
...
~ Bail out! userfaultfd: Function not implemented
~ Planned tests != run tests (1 != 0)
~ Totals: pass:0 fail:0 xfail:0 xpass:0 skip:0 error:0
~ [FAIL]
not ok 4 hugepage-mremap # exit=1
--------------
After Patch
--------------
~ running ./hugepage-mremap
...
~ ok 2 # SKIP userfaultfd is not supported/not enabled.
~ 1 skipped test(s) detected.
~ Totals: pass:0 fail:0 xfail:0 xpass:0 skip:1 error:0
~ [SKIP]
ok 4 hugepage-mremap # SKIP
Link: https://lkml.kernel.org/r/20250816040113.760010-8-aboorvad@linux.ibm.com
Co-developed-by: Donet Tom <donettom@linux.ibm.com>
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Make thuge-gen skip instead of fail when it can't run due to system
settings. If shmmax is too small or no 1G huge pages are available, the
test now prints a warning and is marked as skipped.
-------------------
Before Patch:
-------------------
~ running ./thuge-gen
~ Bail out! Please do echo 262144 > /proc/sys/kernel/shmmax
~ Totals: pass:0 fail:0 xfail:0 xpass:0 skip:0 error:0
~ [FAIL]
not ok 28 thuge-gen ~ exit=1
-------------------
After Patch:
-------------------
~ running ./thuge-gen
~ ~ WARNING: shmmax is too small to run this test.
~ ~ Please run the following command to increase shmmax:
~ ~ echo 262144 > /proc/sys/kernel/shmmax
~ 1..0 ~ SKIP Test skipped due to insufficient shmmax value.
~ [SKIP]
ok 29 thuge-gen ~ SKIP
Link: https://lkml.kernel.org/r/20250816040113.760010-7-aboorvad@linux.ibm.com
Co-developed-by: Donet Tom <donettom@linux.ibm.com>
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
In ksm_functional_tests, test_child_ksm() returned negative values to
indicate errors. However, when passed to exit(), these were interpreted
as large unsigned values (e.g, -2 became 254), leading to incorrect
handling in the parent process. As a result, some tests appeared to be
skipped or silently failed.
This patch changes test_child_ksm() to return positive error codes (1, 2,
3) and updates test_child_ksm_err() to interpret them correctly.
Additionally, test_prctl_fork_exec() now uses exit(4) after a failed
execv() to clearly signal exec failures. This ensures the parent
accurately detects and reports child process failures.
--------------
Before patch:
--------------
- [RUN] test_unmerge
ok 1 Pages were unmerged
...
- [RUN] test_prctl_fork
- No pages got merged
- [RUN] test_prctl_fork_exec
ok 7 PR_SET_MEMORY_MERGE value is inherited
...
Bail out! 1 out of 8 tests failed
- Planned tests != run tests (9 != 8)
- Totals: pass:7 fail:1 xfail:0 xpass:0 skip:0 error:0
--------------
After patch:
--------------
- [RUN] test_unmerge
ok 1 Pages were unmerged
...
- [RUN] test_prctl_fork
- No pages got merged
not ok 7 Merge in child failed
- [RUN] test_prctl_fork_exec
ok 8 PR_SET_MEMORY_MERGE value is inherited
...
Bail out! 2 out of 9 tests failed
- Totals: pass:7 fail:2 xfail:0 xpass:0 skip:0 error:0
Link: https://lkml.kernel.org/r/20250816040113.760010-6-aboorvad@linux.ibm.com
Fixes: 6c47de3be3 ("selftest/mm: ksm_functional_tests: extend test case for ksm fork/exec")
Co-developed-by: Donet Tom <donettom@linux.ibm.com>
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The split_huge_page_test fails on systems with a 64KB base page size.
This is because the order of a 2MB huge page is different:
On 64KB systems, the order is 5.
On 4KB systems, it's 9.
The test currently assumes a maximum huge page order of 9, which is only
valid for 4KB base page systems. On systems with 64KB pages, attempting
to split huge pages beyond their actual order (5) causes the test to fail.
In this patch, we calculate the huge page order based on the system's base
page size. With this change, the tests now run successfully on both 64KB
and 4KB page size systems.
Link: https://lkml.kernel.org/r/20250816040113.760010-5-aboorvad@linux.ibm.com
Fixes: fa6c02315f ("mm: huge_memory: a new debugfs interface for splitting THP tests")
Co-developed-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This patch fixes 2 issues.
1) After fork() in test_prctl_fork, the child process uses the file
descriptors from the parent process to read ksm_stat and
ksm_merging_pages. This results in incorrect values being read (parent
process ksm_stat and ksm_merging_pages will be read in child), causing
the test to fail.
This patch calls init_global_file_handles() in the child process to
ensure that the current process's file descriptors are used to read
ksm_stat and ksm_merging_pages.
2) All tests currently call ksm_merge to trigger page merging. To
ensure the system remains in a consistent state for subsequent tests,
it is better to call ksm_unmerge during the test cleanup phase
In the test_prctl_fork test, after a fork(), reading
ksm_merging_pages in the child process returns a non-zero value because
a previous test performed a merge, and the child's memory state is
inherited from the parent.
Although the child process calls ksm_unmerge, the ksm_merging_pages
counter in the parent is reset to zero, while the child's counter
remains unchanged. This discrepancy causes the test to fail.
To avoid this issue, each test should call ksm_unmerge during
cleanup to ensure the counter is reset and the system is in a clean
state for subsequent tests.
execv argument is an array of pointers to null-terminated strings. In
this patch we also added NULL in the execv argument.
Link: https://lkml.kernel.org/r/20250816040113.760010-4-aboorvad@linux.ibm.com
Fixes: 6c47de3be3 ("selftest/mm: ksm_functional_tests: extend test case for ksm fork/exec")
Co-developed-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
PowerPC64 supports a 4PB virtual address space, but this test was
previously limited to 512TB. This patch extends the coverage up to the
full 4PB VA range on PowerPC64.
Memory from 0 to 128TB is allocated without an address hint, while
allocations from 128TB to 4PB use a hint address.
Link: https://lkml.kernel.org/r/20250816040113.760010-3-aboorvad@linux.ibm.com
Co-developed-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "selftests/mm: Fix false positives and skip unsupported
tests", v4.
This patch series addresses false positives in the generic mm selftests
and skips tests that cannot run correctly due to missing features or
system limitations.
This patch (of 7):
In main(), the high address is stored in hptr, but for mark_range(), the
address passed is ptr, not hptr. Fixed this by changing ptr[i] to hptr[i]
in mark_range() function call.
Link: https://lkml.kernel.org/r/20250816040113.760010-1-aboorvad@linux.ibm.com
Link: https://lkml.kernel.org/r/20250816040113.760010-2-aboorvad@linux.ibm.com
Fixes: b2a79f6213 ("selftests/mm: virtual_address_range: unmap chunks after validation")
Co-developed-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Since the time.h header file is not actually needed in this code, we can
safely remove its inclusion.
Link: https://lkml.kernel.org/r/20250814125417.659937-1-lienze@kylinos.cn
Signed-off-by: Enze Li <lienze@kylinos.cn>
Reviewed-by: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
_common.sh was recently introduced but is not installed and then triggers
an error when trying to run the damon selftests:
selftests: damon: sysfs.sh
./sysfs.sh: line 4: _common.sh: No such file or directory
Install this file to avoid this error.
Link: https://lkml.kernel.org/r/20250812-alex-fixes_manual-v1-1-c4e99b1f80e4@rivosinc.com
Fixes: 511914506d ("selftests/damon: introduce _common.sh to host shared function")
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Tested-by: Sang-Heon Jeon <ekffu200098@gmail.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Tested-by: Enze Li <lienze@kylinos.cn>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
There are spelling mistakes in perror messages. Fix these.
Link: https://lkml.kernel.org/r/20250813081333.1978096-1-colin.i.king@gmail.com
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Now we have updated all users of mm->flags to use the bitmap accessors,
repalce it with the bitmap version entirely.
We are then able to move to having 64 bits of mm->flags on both 32-bit and
64-bit architectures.
We also update the VMA userland tests to ensure that everything remains
functional there.
No functional changes intended, other than there now being 64 bits of
available mm_struct flags.
Link: https://lkml.kernel.org/r/e1f6654e016d36c43959764b01355736c5cbcdf8.1755012943.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Marc Rutland <mark.rutland@arm.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mel Gorman <mgorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Namhyung kim <namhyung@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Currently it hard codes the number of hugepage to check for
check_huge_anon(), but it would be more reasonable to do the check based
on a number passed in.
Pass in the hugepage number and do the check based on it.
Link: https://lkml.kernel.org/r/20250809194209.30484-1-richard.weiyang@gmail.com
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed by: Donet Tom <donettom@linux.ibm.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Reviewed-by: wang lian <lianux.mm@gmail.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Use nolibc include directory rather than include a cumulative nolibc.h on
the compiler command line and replace use of 'sudo cpio' with
usr/gen_init_cpio.
While on it fix spelling of KHO_FINALIZE
Link: https://lkml.kernel.org/r/20250811082510.4154080-4-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Suggested-by: Thomas Weißschuh <linux@weissschuh.net>
Cc: Alexander Graf <graf@amazon.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Changyuan Lyu <changyuanl@google.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Pratyush Yadav <pratyush@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
... and hide it behind a kconfig option. There is really no need for any
!xen code to perform this check.
The naming is a bit off: we want to find the "normal" page when a PTE was
marked "special". So it's really not "finding a special" page.
Improve the documentation, and add a comment in the code where XEN ends up
performing the pte_mkspecial() through a hypercall. More details can be
found in commit 923b2919e2 ("xen/gntdev: mark userspace PTEs as special
on x86 PV guests").
Link: https://lkml.kernel.org/r/20250811112631.759341-12-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Juegren Gross <jgross@suse.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series " execute PROCMAP_QUERY ioctl under per-vma lock", v4.
With /proc/pid/maps now being read under per-vma lock protection we can
reuse parts of that code to execute PROCMAP_QUERY ioctl also without
taking mmap_lock. The change is designed to reduce mmap_lock contention
and prevent PROCMAP_QUERY ioctl calls from blocking address space updates.
This patchset was split out of the original patchset [1] that introduced
per-vma lock usage for /proc/pid/maps reading. It contains PROCMAP_QUERY
tests, code refactoring patch to simplify the main change and the actual
transition to per-vma lock.
This patch (of 3):
Extend /proc/pid/maps tearing tests to verify PROCMAP_QUERY ioctl operation
correctness while the vma is being concurrently modified.
Link: https://lkml.kernel.org/r/20250808152850.2580887-1-surenb@google.com
Link: https://lkml.kernel.org/r/20250808152850.2580887-2-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Tested-by: SeongJae Park <sj@kernel.org>
Acked-by: SeongJae Park <sj@kernel.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Weißschuh <linux@weissschuh.net>
Cc: T.J. Mercier <tjmercier@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Ye Bin <yebin10@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Replace typeof() with __auto_type in the swap() macro in uffd-stress.c.
__auto_type was introduced in GCC 4.9 and reduces the compile time for all
compilers. No functional changes intended.
Link: https://lkml.kernel.org/r/20250730142301.6754-1-pranav.tyagi03@gmail.com
Signed-off-by: Pranav Tyagi <pranav.tyagi03@gmail.com>
Reviewed-by: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Enable these tests to be run on other pfnmap'ed memory like NVIDIA's EGM.
Add '--' as a separator to pass in file path. This allows passing of cmd
line arguments to kselftest_harness. Use '/dev/mem' as default filename.
Existing test passes:
pfnmap
TAP version 13
1..6
# Starting 6 tests from 1 test cases.
# PASSED: 6 / 6 tests passed.
# Totals: pass:6 fail:0 xfail:0 xpass:0 skip:0 error:0
Pass params to kselftest_harness:
pfnmap -r pfnmap:mremap_fixed
TAP version 13
1..1
# Starting 1 tests from 1 test cases.
# RUN pfnmap.mremap_fixed ...
# OK pfnmap.mremap_fixed
ok 1 pfnmap.mremap_fixed
# PASSED: 1 / 1 tests passed.
# Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
Pass non-existent file name as input:
pfnmap -- /dev/blah
TAP version 13
1..6
# Starting 6 tests from 1 test cases.
# RUN pfnmap.madvise_disallowed ...
# SKIP Cannot open '/dev/blah'
Pass non pfnmap'ed file as input:
pfnmap -r pfnmap.madvise_disallowed -- randfile.txt
TAP version 13
1..1
# Starting 1 tests from 1 test cases.
# RUN pfnmap.madvise_disallowed ...
# SKIP Invalid file: 'randfile.txt'. Not pfnmap'ed
Link: https://lkml.kernel.org/r/20250805013629.47629-1-sudarsanm@google.com
Signed-off-by: Sudarsan Mahendran <sudarsanm@google.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add "extern" to the glibc-defined weak rseq symbols to convert the rseq
selftest's usage from weak symbol definitions to weak symbol _references_.
Effectively re-defining the glibc symbols wreaks havoc when building with
-fno-common, e.g. generates segfaults when running multi-threaded programs,
as dynamically linked applications end up with multiple versions of the
symbols.
Building with -fcommon, which until recently has the been the default for
GCC and clang, papers over the bug by allowing the linker to resolve the
weak/tentative definition to glibc's "real" definition.
Note, the symbol itself (or rather its address), not the value of the
symbol, is set to 0/NULL for unresolved weak symbol references, as the
symbol doesn't exist and thus can't have a value. Check for a NULL rseq
size pointer to handle the scenario where the test is statically linked
against a libc that doesn't support rseq in any capacity.
Fixes: 3bcbc20942 ("selftests/rseq: Play nice with binaries statically linked against glibc 2.35+")
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Suggested-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: stable@vger.kernel.org
Closes: https://lore.kernel.org/all/87frdoybk4.ffs@tglx
-----BEGIN PGP SIGNATURE-----
iQJBBAABCAArFiEEgKkgxbID4Gn1hq6fcJGo2a1f9gAFAmjC0WsNHGZ3QHN0cmxl
bi5kZQAKCRBwkajZrV/2AOC2D/982GOJGI79ArhsM9FzEiJDp2QLxMnz7cw0t1Vg
pU469JYA/He6VHMbhiqCBopSAskl36kMMGLbflPe+Zc1DNjconFD8wrDRBlUI+EI
lR1dERMUyafuRMuuvKaE7A0ePyR7AD1JFf1znpo6BrRDMX8mbumcIA/L7/T1Rz2F
0b9Phbj77EdKruRgD4eXCuDJibOr5j6FymNvDGR/bm3SovOsOM4gi2dQ3Y3B85ZK
aUDDrss5acypBfu36w+340eMtr1GJKoaCKoK9ScnO58peCSo9C0b+o6xwUBPnHdw
sQOhl6fErOfN+agGir357jLsBUb1H7Tg9decwv9qHp3jESei3kgAyoBF4SqH+48S
Bz1E2M0A0Hxve3/I/Pp1Hz6X5r+V+gDtx71z4+acoZg5YjDMAKFJAfgiK6sFKqZP
73QsvnU2omu2C5r8TAbryv01C5xmkbe3YyU8j4taZOBGk4YkPd5WjJELbTcVM4If
dxDE6vQ9+GrrlBl9aidrMNxIhJuS/qjVIb0tv7oZPVql995AgPJG3F3gSO78OPBj
V4dN9p/wPN9lRl5tj/odWiOfsdQiY7fEkqdXlO79sACQCzXC5O6zAwcnK37s80mQ
QFSt3oiPFcacJJyHHYeYISy13HXlxwUgdxWC6T/sXMU5Jpic+3bWFxC2YgRcIsqZ
ZE1JbA==
=FXmW
-----END PGP SIGNATURE-----
Merge tag 'nf-next-25-09-11' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Florian Westphal says:
====================
netfilter: updates for net-next
1) Don't respond to ICMP_UNREACH errors with another ICMP_UNREACH
error.
2) Support fetching the current bridge ethernet address.
This allows a more flexible approach to packet redirection
on bridges without need to use hardcoded addresses. From
Fernando Fernandez Mancera.
3) Zap a few no-longer needed conditionals from ipvs packet path
and convert to READ/WRITE_ONCE to avoid KCSAN warnings.
From Zhang Tengfei.
4) Remove a no-longer-used macro argument in ipset, from Zhen Ni.
* tag 'nf-next-25-09-11' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
netfilter: nf_reject: don't reply to icmp error messages
ipvs: Use READ_ONCE/WRITE_ONCE for ipvs->enable
netfilter: nft_meta_bridge: introduce NFT_META_BRI_IIFHWADDR support
netfilter: ipset: Remove unused htable_bits in macro ahash_region
selftest:net: fixed spelling mistakes
====================
Link: https://patch.msgid.link/20250911143819.14753-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Usually the autodefer helpers in lib.sh are expected to be run in context
where success is the expected outcome. However when using them for feature
detection, failure can legitimately occur. But the failed command still
schedules a cleanup, which will likely fail again.
Instead, only schedule deferred cleanup when the positive command succeeds.
This way of organizing the cleanup has the added benefit that now the
return code from these functions reflects whether the command passed.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/af10a5bb82ea11ead978cf903550089e006d7e70.1757004393.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The fact that all cleanup (ideally) goes through the defer framework makes
debugging of these commands a bit tricky. However, this also gives us a
nice point to place a hook along the lines of PAUSE_ON_FAIL. When the
environment variable DEFER_PAUSE_ON_FAIL is set, and a cleanup command
results in non-zero exit status, show a bit of debuginfo and give the user
an opportunity to interrupt the execution altogether.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/2a07d24568ede6c42e4701657fa0b738e490fe59.1757004393.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently the way deferred commands are stored and invoked causes any
whitespace to act as an argument separator when the command is executed.
To make it possible to use spaces in deferred commands, store the commands
quoted, and then eval the string prior to execution.
Fixes: a6e263f125 ("selftests: net: lib: Introduce deferred commands")
Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/6c2523139a6f99103889c9c9fedcdc66a75441f4.1757004393.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This is required on recent kernels, where it is now off by default.
While we're here, fix some stray =m's that were supposed to be =y.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Link: https://patch.msgid.link/20250910013644.4153708-5-Jason@zx2c4.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
It's no longer user-selectable (and the default was already "y"), so
let's just drop it.
It was never really relevant to the wireguard selftests either way.
Cc: Shuah Khan <shuah@kernel.org>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Link: https://patch.msgid.link/20250910013644.4153708-4-Jason@zx2c4.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Constrained test environment; duplicate address detection is not needed
and causes races so disable it.
Signed-off-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250910025828.38900-1-dsahern@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Cross-merge networking fixes after downstream PR (net-6.17-rc6).
Conflicts:
net/netfilter/nft_set_pipapo.c
net/netfilter/nft_set_pipapo_avx2.c
c4eaca2e10 ("netfilter: nft_set_pipapo: don't check genbit from packetpath lookups")
84c1da7b38 ("netfilter: nft_set_pipapo: use avx2 algorithm for insertions too")
Only trivial adjacent changes (in a doc and a Makefile).
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add selftests for testing the reporting of arena page faults through BPF
streams. Two new bpf programs are added that read and write to an
unmapped arena address and the fault reporting is verified in the
userspace through streams.
The added bpf programs need to access the user_vm_start in struct
bpf_arena, this is done by casting &arena to struct bpf_arena *, but
barrier_var() is used on this ptr before accessing ptr->user_vm_start;
to stop GCC from issuing an out-of-bound access due to the cast from
smaller map struct to larger "struct bpf_arena"
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250911145808.58042-7-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Start using __stderr directly in the bpf programs to test the reporting
of may_goto timeout detection and spin_lock dead lock detection.
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250911145808.58042-6-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add __stderr and __stdout to validate the output of BPF streams for bpf
selftests. Similar to __xlated, __jited, etc., __stderr/out can be used
in the BPF progs to compare a string (regex supported) to the output in
the bpf streams.
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250911145808.58042-5-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Extend the fastops test coverage to DIV and IDIV, specifically to provide
coverage for #DE (divide error) exceptions, as #DE is the only exception
that can occur in KVM's fastops path, i.e. that requires exception fixup.
Link: https://lore.kernel.org/r/20250909202835.333554-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Add a fastop() macro along with macros to define its required constraints,
and use the macros to dedup the innermost guts of the fastop testcases.
No functional change intended.
Link: https://lore.kernel.org/r/20250909202835.333554-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Extend the fastops test to cover instructions that operate on 8-bit data.
Support for 8-bit instructions was omitted from the original commit purely
due to complications with BT not having a r/m8 variant. To keep the
RFLAGS.CF behavior deterministic and not heavily biased to '0' or '1',
continue using BT, but cast and load the to-be-tested value into a
dedicated 32-bit constraint.
Supporting 8-bit operations will allow using guest_test_fastops() as-is to
provide full coverage for DIV and IDIV. For divide operations, covering
all operand sizes _is_ interesting, because KVM needs provide exception
fixup for each size (failure to handle a #DE could panic the host).
Link: https://lore.kernel.org/all/aIF7ZhWZxlkcpm4y@google.com
Link: https://lore.kernel.org/r/20250909202835.333554-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Add support for handling #DE (divide error) exceptions in KVM selftests
so that the fastops test can verify KVM correctly handles #DE when
emulating DIV or IDIV on behalf of the guest. Morph #DE to 0xff (i.e.
to -1) as a mostly-arbitrary vector to indicate #DE, so that '0' (the
real #DE vector) can still be used to indicate "no exception".
Link: https://lore.kernel.org/r/20250909202835.333554-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
We have an IPv6 routing regression with the relevant fix still
a WiP. This v2 includes a last-minute revert to avoid more
problems.
Current release - new code bugs:
- wifi: nl80211: completely disable per-link stats for now
Previous releases - regressions:
- dev_ioctl: take ops lock in hwtstamp lower paths
- netfilter:
- fix spurious set lookup failures
- fix lockdep splat due to missing annotation
- genetlink: fix genl_bind() invoking bind() after -EPERM
- phy: transfer phy_config_inband() locking responsibility to phylink
- can: xilinx_can: fix use-after-free of transmitted SKB
- hsr: fix lock warnings
- eth: igb: fix NULL pointer dereference in ethtool loopback test
- eth: i40e: fix Jumbo Frame support after iPXE boot
- eth: macsec: sync features on RTM_NEWLINK
Previous releases - always broken:
- tunnels: reset the GSO metadata before reusing the skb
- mptcp: make sync_socket_options propagate SOCK_KEEPOPEN
- can: j1939: implement NETDEV_UNREGISTER notification hanidler
- wifi: ath12k: fix WMI TLV header misalignment
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmjC4tYSHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOkNwcQAIh8d10PYJ6iJvyxqBKVNT2CTr4wVk27
lT9rB9t4Jdq4EhuDdRmolVPrkXEt4FiGnqPOsYIK+tPN5j6fgPaksA/lCKr3HMhT
N7qfhCzUFUJQqZEA3pBy5RnRIzNupdTP3rcXFG8QPnGfzdcOZ2m1Tu/36rjt6lmE
lB+9pLQBjI15r67v08ZzEGHfTX4FqlnlCu/jbcYhXNF6erv3jZRboneytJ3fxbMW
kRdmi9wctMprKmWVFmaA0OPkwigMBO8xILnYOCcFhQcGKLugc58YGsjzBJbp/yUA
Qmxb3Gl9pFe97u/URoLPXUE+2hF1X7ydT9hMXrle/gcXVnmU6rN+4xQuEzVPBOK7
J3qh8IH5joMLaQS4VKSD/Wh0RrtDHJdduLDcjtL8qRkFyS30FqY0USd8prMkIE7I
/s9Wdi96nC3WEZQtZKg0mRrSeTtbRR0/KO5gEb6MHaAE8ffKJa7MsMZlrh9VoJvM
PC4PJnOr0Qy5MNAA1MxBWK6Dcsnor4cBkvE35uMoCkUoxOeTnKuruo0kyZYqmrIh
VOnKrb3+w/I4lDXgQ4kIcpyQfrqMky8R6EGvQvjm2c2OLWZBCmBRU1c0Fjw789O4
jMdzjs7H9ild2KwBZK4e7KfdoUnlxIza2Yh4pFLUf2R3NqBZ2vQTkpoCF4AK+wzU
636F5GITLcVe
=XjZM
-----END PGP SIGNATURE-----
Merge tag 'net-6.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from CAN, netfilter and wireless.
We have an IPv6 routing regression with the relevant fix still a WiP.
This includes a last-minute revert to avoid more problems.
Current release - new code bugs:
- wifi: nl80211: completely disable per-link stats for now
Previous releases - regressions:
- dev_ioctl: take ops lock in hwtstamp lower paths
- netfilter:
- fix spurious set lookup failures
- fix lockdep splat due to missing annotation
- genetlink: fix genl_bind() invoking bind() after -EPERM
- phy: transfer phy_config_inband() locking responsibility to phylink
- can: xilinx_can: fix use-after-free of transmitted SKB
- hsr: fix lock warnings
- eth:
- igb: fix NULL pointer dereference in ethtool loopback test
- i40e: fix Jumbo Frame support after iPXE boot
- macsec: sync features on RTM_NEWLINK
Previous releases - always broken:
- tunnels: reset the GSO metadata before reusing the skb
- mptcp: make sync_socket_options propagate SOCK_KEEPOPEN
- can: j1939: implement NETDEV_UNREGISTER notification hanidler
- wifi: ath12k: fix WMI TLV header misalignment"
* tag 'net-6.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits)
Revert "net: usb: asix: ax88772: drop phylink use in PM to avoid MDIO runtime PM wakeups"
hsr: hold rcu and dev lock for hsr_get_port_ndev
hsr: use hsr_for_each_port_rtnl in hsr_port_get_hsr
hsr: use rtnl lock when iterating over ports
wifi: nl80211: completely disable per-link stats for now
net: usb: asix: ax88772: drop phylink use in PM to avoid MDIO runtime PM wakeups
net: ethtool: fix wrong type used in struct kernel_ethtool_ts_info
MAINTAINERS: add Phil as netfilter reviewer
netfilter: nf_tables: restart set lookup on base_seq change
netfilter: nf_tables: make nft_set_do_lookup available unconditionally
netfilter: nf_tables: place base_seq in struct net
netfilter: nft_set_rbtree: continue traversal if element is inactive
netfilter: nft_set_pipapo: don't check genbit from packetpath lookups
netfilter: nft_set_bitmap: fix lockdep splat due to missing annotation
can: rcar_can: rcar_can_resume(): fix s2ram with PSCI
can: xilinx_can: xcan_write_frame(): fix use-after-free of transmitted SKB
can: j1939: j1939_local_ecu_get(): undo increment when j1939_local_ecu_get() fails
can: j1939: j1939_sk_bind(): call j1939_priv_put() immediately when j1939_local_ecu_get() failed
can: j1939: implement NETDEV_UNREGISTER notification handler
selftests: can: enable CONFIG_CAN_VCAN as a module
...
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmjB/6gACgkQ6rmadz2v
bToSCg//Z8Q2ToTV/BOLTzFYLvcTm2YRlqqIe3SFxyxLQCIhC0kxQAT94baVQHky
/6ASbPjDWXdGVHNoMopA6lpMx22Tq4xi6qO5fzJHDuSqh5KTi8l5/GyJeA3egPzD
7RIvKvvgePpCx0xm9rm5O5vvUeFrsxhQPRRiN/fsOibiTJjBpRAJDp9k+pvnK6mb
HaZcHF+In5Vg7XozuHAUMzsp+4njzdLrMXL2Q54o2MrIoeBg8/oAnhLujskGMnXK
mgUA+skW42IEkw+TYUu9888/5PMDkto3BZIx0plcAIVAIvcU5BFzLt11llQswgVl
q740k50oRKrmwHyEVDwugV7WeGQMks48lMHtLKytYmdEhdTfEYUKHeBpcI87fUYy
IpOdSUT49nBxOmGl59ccBcdzsndTjo7Zrl7dMf4umN0SSjfdohwj0uu7rmZCaOdd
m/TxH13Ae7na4QzVx0N911qxBYw07uYNiq3Ati+x327ySozvvNfLIYK/sS/clJkd
lOpz3kpjwgV+PUfv2NBqEJm4nSPTtW7fiEQ8p/yBvK90nB6NnHIbq9a2rPBKeDKx
RpkDB9nhJ1J6OMKWkUDasMiP5tAXt9RrI+la/CgBxMcP/G4HxH6yf22Mf3Hzhe8V
UfjAgHqXvmrjqpgIbO0AVkfwDvlOM37DGvY0H3bMFeOXCk0DDw4=
=09/9
-----END PGP SIGNATURE-----
Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Pull bpf fixes from Alexei Starovoitov:
"A number of fixes accumulated due to summer vacations
- Fix out-of-bounds dynptr write in bpf_crypto_crypt() kfunc which
was misidentified as a security issue (Daniel Borkmann)
- Update the list of BPF selftests maintainers (Eduard Zingerman)
- Fix selftests warnings with icecc compiler (Ilya Leoshkevich)
- Disable XDP/cpumap direct return optimization (Jesper Dangaard
Brouer)
- Fix unexpected get_helper_proto() result in unusual configuration
BPF_SYSCALL=y and BPF_EVENTS=n (Jiri Olsa)
- Allow fallback to interpreter when JIT support is limited (KaFai
Wan)
- Fix rqspinlock and choose trylock fallback for NMI waiters. Pick
the simplest fix. More involved fix is targeted bpf-next (Kumar
Kartikeya Dwivedi)
- Fix cleanup when tcp_bpf_send_verdict() fails to allocate
psock->cork (Kuniyuki Iwashima)
- Disallow bpf_timer in PREEMPT_RT for now. Proper solution is being
discussed for bpf-next. (Leon Hwang)
- Fix XSK cq descriptor production (Maciej Fijalkowski)
- Tell memcg to use allow_spinning=false path in bpf_timer_init() to
avoid lockup in cgroup_file_notify() (Peilin Ye)
- Fix bpf_strnstr() to handle suffix match cases (Rong Tao)"
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: Skip timer cases when bpf_timer is not supported
bpf: Reject bpf_timer for PREEMPT_RT
tcp_bpf: Call sk_msg_free() when tcp_bpf_send_verdict() fails to allocate psock->cork.
bpf: Tell memcg to use allow_spinning=false path in bpf_timer_init()
bpf: Allow fall back to interpreter for programs with stack size <= 512
rqspinlock: Choose trylock fallback for NMI waiters
xsk: Fix immature cq descriptor production
bpf: Update the list of BPF selftests maintainers
selftests/bpf: Add tests for bpf_strnstr
selftests/bpf: Fix "expression result unused" warnings with icecc
bpf: Fix bpf_strnstr() to handle suffix match cases better
selftests/bpf: Extend crypto_sanity selftest with invalid dst buffer
bpf: Fix out-of-bounds dynptr write in bpf_crypto_crypt
bpf: Check the helper function is valid in get_helper_proto
bpf, cpumap: Disable page_pool direct xdp_return need larger scope
Create versions of the existing test cases where the routers generating
the ICMP error messages are using VRFs. Check that the source IPs of
these messages do not change in the presence of VRFs.
IPv6 always behaved correctly, but IPv4 fails when reverting "ipv4:
icmp: Fix source IP derivation in presence of VRFs".
Without IPv4 change:
# ./traceroute.sh
TEST: IPv6 traceroute [ OK ]
TEST: IPv6 traceroute with VRF [ OK ]
TEST: IPv4 traceroute [ OK ]
TEST: IPv4 traceroute with VRF [FAIL]
traceroute did not return 1.0.3.1
$ echo $?
1
The test fails because the ICMP error message is sent with the VRF
device's IP (1.0.4.1):
# traceroute -n -s 1.0.1.3 1.0.2.4
traceroute to 1.0.2.4 (1.0.2.4), 30 hops max, 60 byte packets
1 1.0.4.1 0.165 ms 0.110 ms 0.103 ms
2 1.0.2.4 0.098 ms 0.085 ms 0.078 ms
# traceroute -n -s 1.0.3.3 1.0.2.4
traceroute to 1.0.2.4 (1.0.2.4), 30 hops max, 60 byte packets
1 1.0.4.1 0.201 ms 0.138 ms 0.129 ms
2 1.0.2.4 0.123 ms 0.105 ms 0.098 ms
With IPv4 change:
# ./traceroute.sh
TEST: IPv6 traceroute [ OK ]
TEST: IPv6 traceroute with VRF [ OK ]
TEST: IPv4 traceroute [ OK ]
TEST: IPv4 traceroute with VRF [ OK ]
$ echo $?
0
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250908073238.119240-9-idosch@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
When generating ICMP error messages, the kernel will prefer a source IP
that is on the same subnet as the destination IP (see
inet_select_addr()). Test this behavior by invoking traceroute with
different source IPs and checking that the ICMP error message is
generated with a source IP in the same subnet.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250908073238.119240-8-idosch@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Both of the addresses are configured as primary addresses, but the
kernel is expected to choose 10.0.1.1/24 as the source IP of the ICMP
error message since it is on the same subnet as the destination IP of
the message (10.0.1.3/24). Reword the comment to reflect that.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250908073238.119240-7-idosch@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Use require_command() so that the test will return SKIP (4) when a
required command is not present.
Before:
# ./traceroute.sh
SKIP: Could not run IPV6 test without traceroute6
SKIP: Could not run IPV4 test without traceroute
$ echo $?
0
After:
# ./traceroute.sh
TEST: traceroute6 not installed [SKIP]
$ echo $?
4
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250908073238.119240-6-idosch@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The test always returns success even if some tests were modified to
fail. Fix by converting the test to use the appropriate library
functions instead of using its own functions.
Before:
# ./traceroute.sh
TEST: IPV6 traceroute [FAIL]
TEST: IPV4 traceroute [ OK ]
Tests passed: 1
Tests failed: 1
$ echo $?
0
After:
# ./traceroute.sh
TEST: IPv6 traceroute [FAIL]
traceroute6 did not return 2000:102::2
TEST: IPv4 traceroute [ OK ]
$ echo $?
1
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250908073238.119240-5-idosch@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
There are currently no kernel tests that verify setting and getting
options of the team driver.
In the future, options may be added that implicitly change other
options, which will make it useful to have tests like these that show
nothing breaks. There will be a follow up patch to this that adds new
"rx_enabled" and "tx_enabled" options, which will implicitly affect the
"enabled" option value and vice versa.
The tests use teamnl to first set options to specific values and then
gets them to compare to the set values.
Signed-off-by: Marc Harvey <marcharvey@google.com>
Link: https://patch.msgid.link/20250905040441.2679296-1-marcharvey@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFHBAABCgAxFiEEn/sM2K9nqF/8FWzzDHRl3/mQkZwFAmjBpO8THG1rbEBwZW5n
dXRyb25peC5kZQAKCRAMdGXf+ZCRnJOOB/9r35uZBD1+ZZVpvoByhil8XY7K42Vv
LrARJ8+kh1Q6mcuB4V/176GBPJ7JabSUTGqvqvYbJBAB5obONhLU5Bkb/7lM2z1G
MOUe0DMiBP+J3K8zqXcEKhqeW0qQO6pqfnv8iIj97Kxwd5JRPJq8pMpCIC/RLzOY
Ot7+fcsE+cSZt9VumnQTx/S1fzGoY1C9VCglSr9B0O6IiTZZBIPNg8EwIGvbm7oh
uofnFRocRZaWQ6OErdRSTLK4hu11DeFUmEAA6YN2ikpAdshUsf3U8HYaz2LIEEuu
zGflFXrMekXqDod1SvhklLPUY/ssC11dUon+4Cgz/yK6I+lFvD/cuLIP
=/YxL
-----END PGP SIGNATURE-----
Merge tag 'linux-can-fixes-for-6.17-20250910' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2025-09-10
The 1st patch is by Alex Tran and fixes the Documentation of the
struct bcm_msg_head.
Davide Caratti's patch enabled the VCAN driver as a module for the
Linux self tests.
Tetsuo Handa contributes 3 patches that fix various problems in the
CAN j1939 protocol.
Anssi Hannula's patch fixes a potential use-after-free in the
xilinx_can driver.
Geert Uytterhoeven's patch fixes the rcan_can's suspend to RAM on
R-Car Gen3 using PSCI.
* tag 'linux-can-fixes-for-6.17-20250910' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: rcar_can: rcar_can_resume(): fix s2ram with PSCI
can: xilinx_can: xcan_write_frame(): fix use-after-free of transmitted SKB
can: j1939: j1939_local_ecu_get(): undo increment when j1939_local_ecu_get() fails
can: j1939: j1939_sk_bind(): call j1939_priv_put() immediately when j1939_local_ecu_get() failed
can: j1939: implement NETDEV_UNREGISTER notification handler
selftests: can: enable CONFIG_CAN_VCAN as a module
docs: networking: can: change bcm_msg_head frames member to support flexible array
====================
Link: https://patch.msgid.link/20250910162907.948454-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
fcnal-test.sh already includes lib.sh, use relevant helpers
instead of sleeping. Replace sleep after starting nettest
as a server with wait_local_port_listen.
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250909223837.863217-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A proper kernel configuration for running kselftest can be obtained with:
$ yes | make kselftest-merge
Build of 'vcan' driver is currently missing, while the other required knobs
are already there because of net/link_netns.py [1]. Add a config file in
selftests/net/can to store the minimum set of kconfig needed for CAN
selftests.
[1] https://patch.msgid.link/20250219125039.18024-14-shaw.leon@gmail.com
Fixes: 77442ffa83 ("selftests: can: Import tst-filter from can-tests")
Reviewed-by: Vincent Mailhol <mailhol@kernel.org>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Link: https://patch.msgid.link/fa4c0ea262ec529f25e5f5aa9269d84764c67321.1757516009.git.dcaratti@redhat.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
ACPICA commit 710745713ad3a2543dbfb70e84764f31f0e46bdc
This has been renamed in more recent CXL specs, as
type3 (memory expanders) can also use HDM-DB for
device coherent memory.
Link: 710745713a
Acked-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Gregory Price <gourry@gourry.net>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://patch.msgid.link/20250908160034.86471-1-dave@stgolabs.net
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
When many ADD_ADDR need to be sent, it can take some time to send each
of them, and create new subflows. Some CIs seem to occasionally have
issues with these tests, especially with "debug" kernels.
Two subtests will now run for a slightly longer time: the last two where
3 or more ADD_ADDR are sent during the test.
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250907-net-next-mptcp-add_addr-retrans-adapt-v1-3-824cc805772b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
ADD_ADDR can be retransmitted, and with, the parent commit, these
retransmissions can be sent quicker: from 2 minutes to less than one
second.
To avoid false positives where retransmitted ADD_ADDR causes higher
counters than expected, it is required to be more tolerant. Errors are
now only reported when fewer ADD_ADDRs have been sent/received, except
if no ADD_ADDR are expected.
Before the parent commit, the tolerance was present for each tests where
the ADD_ADDR could be retransmitted in a reasonable time (1 sec). Now
that all tests can have retransmitted ADD_ADDR, it is normal to apply
the same tolerance for all tests.
An alternative could be to disable the ADD_ADDR retransmissions by
default, but that's changing the default kernel behaviour. Plus,
ADD_ADDR retransmissions can be required for some tests. To avoid adding
exceptions to many tests, it seems better to increase the tolerance.
Later, we could add a new MIB counter to identify the ADD_ADDR
retransmissions, and remove the tolerance when this counter is
available.
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250907-net-next-mptcp-add_addr-retrans-adapt-v1-2-824cc805772b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This v0.11.0 version introduces SC2329:
Warn when (non-escaping) functions are never invoked.
Except that, similar to SC2317, ShellCheck is currently unable to figure
out functions that are invoked via trap, or indirectly, when calling
functions via variables. It is then needed to disable this new SC2329.
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250908-net-mptcp-misc-fixes-6-17-rc5-v1-3-5f2168a66079@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The pmtu test takes nearly an hour when run on a debug kernel
(10min on a normal kernel, so the debug slow down is quite significant).
NIPA tries to ensure all results are delivered by a certain deadline
so this prevents it from retrying the test in case of a flake.
Looks like one of the slowest operations in the test is calling out
to ./openvswitch/ovs-dpctl.py to remove potential leftover OvS interfaces.
Check whether the interfaces exist in the first place in sysfs,
since it can be done directly in bash it is very fast.
This should save us around 20-30% of the test runtime.
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250906214535.3204785-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
fcnal-test.sh takes almost hour and a half to finish.
The tests are already grouped into ipv4, ipv6 and other.
Run those groups separately.
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250908201021.270681-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
icecc is a compiler wrapper that distributes compile jobs over a build
farm [1]. It works by sending toolchain binaries and preprocessed
source code to remote machines.
Unfortunately using it with BPF selftests causes build failures due to
a clang bug [2]. The problem is that clang suppresses the
-Wunused-value warning if the unused expression comes from a macro
expansion. Since icecc compiles preprocessed source code, this
information is not available. This leads to -Wunused-value false
positives.
obj_new_no_struct() and obj_new_acq() use the bpf_obj_new() macro and
discard the result. arena_spin_lock_slowpath() uses two macros that
produce values and ignores the results. Add (void) casts to explicitly
indicate that this is intentional and suppress the warning.
An alternative solution is to change the macros to not produce values.
This would work today for the arena_spin_lock_slowpath() issue, but in
the future there may appear users who need them. Another potential
solution is to replace these macros with functions. Unfortunately this
would not work, because these macros work with unknown types and
control flow.
[1] https://github.com/icecc/icecream
[2] https://github.com/llvm/llvm-project/issues/142614
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250829030017.102615-2-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Small cleanup and test extension to probe the bpf_crypto_{encrypt,decrypt}()
kfunc when a bad dst buffer is passed in to assert that an error is returned.
Also, encrypt_sanity() and skb_crypto_setup() were explicit to set the global
status variable to zero before any test, so do the same for decrypt_sanity().
Do not explicitly zero the on-stack err before bpf_crypto_ctx_create() given
the kfunc is expected to do it internally for the success case.
Before kernel fix:
# ./vmtest.sh -- ./test_progs -t crypto
[...]
[ 1.531200] bpf_testmod: loading out-of-tree module taints kernel.
[ 1.533388] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel
#87/1 crypto_basic/crypto_release:OK
#87/2 crypto_basic/crypto_acquire:OK
#87 crypto_basic:OK
test_crypto_sanity:PASS:skel open 0 nsec
test_crypto_sanity:PASS:ip netns add crypto_sanity_ns 0 nsec
test_crypto_sanity:PASS:ip -net crypto_sanity_ns -6 addr add face::1/128 dev lo nodad 0 nsec
test_crypto_sanity:PASS:ip -net crypto_sanity_ns link set dev lo up 0 nsec
test_crypto_sanity:PASS:open_netns 0 nsec
test_crypto_sanity:PASS:AF_ALG init fail 0 nsec
test_crypto_sanity:PASS:if_nametoindex lo 0 nsec
test_crypto_sanity:PASS:skb_crypto_setup fd 0 nsec
test_crypto_sanity:PASS:skb_crypto_setup 0 nsec
test_crypto_sanity:PASS:skb_crypto_setup retval 0 nsec
test_crypto_sanity:PASS:skb_crypto_setup status 0 nsec
test_crypto_sanity:PASS:create qdisc hook 0 nsec
test_crypto_sanity:PASS:make_sockaddr 0 nsec
test_crypto_sanity:PASS:attach encrypt filter 0 nsec
test_crypto_sanity:PASS:encrypt socket 0 nsec
test_crypto_sanity:PASS:encrypt send 0 nsec
test_crypto_sanity:FAIL:encrypt status unexpected error: -5 (errno 95)
#88 crypto_sanity:FAIL
Summary: 1/2 PASSED, 0 SKIPPED, 1 FAILED
After kernel fix:
# ./vmtest.sh -- ./test_progs -t crypto
[...]
[ 1.540963] bpf_testmod: loading out-of-tree module taints kernel.
[ 1.542404] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel
#87/1 crypto_basic/crypto_release:OK
#87/2 crypto_basic/crypto_acquire:OK
#87 crypto_basic:OK
#88 crypto_sanity:OK
Summary: 2/2 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://lore.kernel.org/r/20250829143657.318524-2-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The loop in bench_sockmap_prog_destroy() has two issues:
1. Using 'sizeof(ctx.fds)' as the loop bound results in the number of
bytes, not the number of file descriptors, causing the loop to iterate
far more times than intended.
2. The condition 'ctx.fds[0] > 0' incorrectly checks only the first fd for
all iterations, potentially leaving file descriptors unclosed. Change
it to 'ctx.fds[i] > 0' to check each fd properly.
These fixes ensure correct cleanup of all file descriptors when the
benchmark exits.
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250909124721.191555-1-jiayuan.chen@linux.dev
Closes: https://lore.kernel.org/bpf/aLqfWuRR9R_KTe5e@stanley.mountain/
Add selftest for the IPv6 fragmentation regression which affected
several stable kernels.
Commit a18dfa9925 ("ipv6: save dontfrag in cork") was backported to
stable without some prerequisite commits. This caused a regression when
sending IPv6 UDP packets by preventing fragmentation and instead
returning -1 (EMSGSIZE).
Add selftest to check for this issue by attempting to send a packet
larger than the interface MTU. The packet will be fragmented on a
working kernel, with sendmsg(2) correctly returning the expected number
of bytes sent. When the regression is present, sendmsg returns -1 and
sets errno to EMSGSIZE.
Link: https://lore.kernel.org/stable/aElivdUXqd1OqgMY@karahi.gladserv.com
Signed-off-by: Brett A C Sheffield <bacs@librecast.net>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250903154925.13481-1-bacs@librecast.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add test to check the broadcast ethernet destination field is set
correctly.
This test sends a broadcast ping, captures it using tcpdump and
ensures that all bits of the 6 octet ethernet destination address
are correctly set by examining the output capture file.
Co-developed-by: Brett A C Sheffield <bacs@librecast.net>
Signed-off-by: Brett A C Sheffield <bacs@librecast.net>
Signed-off-by: Oscar Maes <oscmaes92@gmail.com>
Link: https://patch.msgid.link/20250902150240.4272-1-oscmaes92@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
vdso_test_abi provides the exact same functionality, properly uses
kselftest.h and explicitly calls into the vDSO without relying on the libc.
Drop the pointless testcase.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250812-vdso-tests-fixes-v2-8-90f499dd35f8@linutronix.de
To be y2038-safe, 32-bit userspace needs to explicitly call the 64-bit safe
time APIs. For this the 32-bit vDSOs contains a clock_gettime() variant
which always uses 64-bit time types.
Also test this vDSO function.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250812-vdso-tests-fixes-v2-7-90f499dd35f8@linutronix.de
The array relies on the numeric values of the clock IDs.
When reading the code it is not obvious that the order is correct.
Make the code easier to read by using explicit indices.
While at it make the array static.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250812-vdso-tests-fixes-v2-5-90f499dd35f8@linutronix.de
The test uses the kselftest.h framework and declares in its testplan to
always execute 16 testcases. If any of the clockids were not available,
the testplan would not be satisfied anymore and the test would fail.
Apparently that never happened, so the clockids are always available.
Remove the pointless checks.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250812-vdso-tests-fixes-v2-4-90f499dd35f8@linutronix.de
If AT_SYSINFO_EHDR is missing the whole test needs to be skipped.
Currently this results in the following output:
TAP version 13
1..16
# AT_SYSINFO_EHDR is not present!
This output is incorrect, as "1..16" still requires the subtest lines to
be printed, which isn't done however.
Switch to the correct skipping functions, so the output now correctly
indicates that no subtests are being run:
TAP version 13
1..0 # SKIP AT_SYSINFO_EHDR is not present!
Fixes: 693f5ca08c ("kselftest: Extend vDSO selftest")
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250812-vdso-tests-fixes-v2-2-90f499dd35f8@linutronix.de
The _rval register variable is meant to be an output operand of the asm
statement but is instead used as input operand.
clang 20.1 notices this and triggers -Wuninitialized warnings:
tools/testing/selftests/timers/auxclock.c:154:10: error: variable '_rval' is uninitialized when used here [-Werror,-Wuninitialized]
154 | return VDSO_CALL(self->vdso_clock_gettime64, 2, clockid, ts);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
tools/testing/selftests/timers/../vDSO/vdso_call.h:59:10: note: expanded from macro 'VDSO_CALL'
59 | : "r" (_rval) \
| ^~~~~
tools/testing/selftests/timers/auxclock.c:154:10: note: variable '_rval' is declared here
tools/testing/selftests/timers/../vDSO/vdso_call.h:47:2: note: expanded from macro 'VDSO_CALL'
47 | register long _rval asm ("r3"); \
| ^
It seems the list of input and output operands have been switched around.
However as the argument registers are not always initialized they can not
be marked as pure inputs as that would trigger -Wuninitialized warnings.
Adding _rval as another input and output operand does also not work as it
would collide with the existing _r3 variable.
Instead reuse _r3 for both the argument and the return value.
Fixes: 6eda706a53 ("selftests: vDSO: fix the way vDSO functions are called for powerpc")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Link: https://lore.kernel.org/all/20250812-vdso-tests-fixes-v2-1-90f499dd35f8@linutronix.de
Closes: https://lore.kernel.org/oe-kbuild-all/202506180223.BOOk5jDK-lkp@intel.com/
Add comprehensive selftest to verify:
- Per-port actor priority setting via ad_actor_port_prio
- Aggregator selection behavior with port_priority ad_select policy
Also move cmd_jq helper from forwarding/lib.sh to net/lib.sh for
broader reusability across network selftests.
Here is the result output
# ./bond_lacp_prio.sh
TEST: bond 802.3ad (ad_actor_port_prio setting) [ OK ]
TEST: bond 802.3ad (ad_actor_port_prio select) [ OK ]
TEST: bond 802.3ad (ad_actor_port_prio switch) [ OK ]
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20250902064501.360822-4-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Fix a typo in hyperv_cpuid.c test assertion log:
replace "our of supported range" -> "out of supported range".
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Link: https://lore.kernel.org/r/20250824181642.629297-1-alok.a.tiwari@oracle.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Commit 9bb88c6596 ("selftests: net: test extacks in netlink dumps")
moved netlink-dumps from TEST_GEN_PROGS to YNL_GEN_FILES.
But _FILES are not for tests, rather for utilities / helpers.
Create YNL_GEN_PROGS and include netlink-dumps there.
This makes netlink-dumps part of executed tests, again.
Link: https://patch.msgid.link/20250906211351.3192412-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Recent changes to make netlink socket memory accounting must
have broken the implicit assumption of the netlink-dump test
that we can fit exactly 64 dumps into the socket. Handle the
failure mode properly, and increase the dump count to 80
to make sure we still run into the error condition if
the default buffer size increases in the future.
Link: https://patch.msgid.link/20250906211351.3192412-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The error message printed here only uses the previous err value,
which results in it being printed as 0.
When bpf_map__attach_struct_ops encounters an error,
it uses libbpf_err_ptr(err) to set errno = -err and returns NULL.
Therefore, Using -errno can fix this issue.
Fix before:
run_subtest:FAIL:1019 bpf_map__attach_struct_ops failed for map pro_epilogue: err=0
Fix after:
run_subtest:FAIL:1019 bpf_map__attach_struct_ops failed for map pro_epilogue: err=-9
Signed-off-by: Feng Yang <yangfeng@kylinos.cn>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250908060810.1054341-1-yangfeng59949@163.com
Fix grammatical error in <past tense verb> + <infinitive>
construct related to memory allocation checks.
In essence change "Failed to allocated" to "Failed to allocate".
Signed-off-by: Nikola Z. Ivanov <zlatistiv@gmail.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Will Deacon <will@kernel.org>
Added parentheses around sizeof to make the expression clearer
and improve readability. This change has no functional impact.
```
[command]
./scripts/checkpatch.pl tools/testing/selftests/arm64/fp/sve-ptrace.c
[output]
WARNING: sizeof *sve should be sizeof(*sve)
```
Signed-off-by: Vivek Yadav <vivekyadav1207731111@gmail.com>
Signed-off-by: Will Deacon <will@kernel.org>
The comment was correct, but `checkpatch` script flagged it with a warning
as shown in the output section. The comment is slightly modified
to improve readability, which also suppresses the warning.
```
[command]
./script/checkpatch.pl --strict -f tools/testing/selftests/arm64/fp/fp-stress.c
[output]
WARNING: Possible repeated word: 'on'
```
Signed-off-by: Vivek Yadav <vivekyadav1207731111@gmail.com>
Signed-off-by: Will Deacon <will@kernel.org>
Nolibc now does have getauxval(), use it.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
The return value was not assigned to 'ret', so the check afterwards
does not do anything.
Fixes: 3d37d4307e ("kselftest/arm64: Add very basic GCS test program")
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaL6SyQAKCRCRxhvAZXjc
ouTGAQDGiTnaENiOzRhzNl1XONTRv8a1uV0pxg4W3fNdiRlxgQEA/O90/+nM48KC
pdV3WHz5eGfcnMTpqgHxK6HYgwklJAY=
=oKnm
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.17-rc6.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
"fuse:
- Prevent opening of non-regular backing files.
Fuse doesn't support non-regular files anyway.
- Check whether copy_file_range() returns a larger size than
requested.
- Prevent overflow in copy_file_range() as fuse currently only
supports 32-bit sized copies.
- Cache the blocksize value if the server returned a new value as
inode->i_blkbits isn't modified directly anymore.
- Fix i_blkbits handling for iomap partial writes.
By default i_blkbits is set to PAGE_SIZE which causes iomap to mark
the whole folio as uptodate even on a partial write. But fuseblk
filesystems support choosing a blocksize smaller than PAGE_SIZE
risking data corruption. Simply enforce PAGE_SIZE as blocksize for
fuseblk's internal inode for now.
- Prevent out-of-bounds acces in fuse_dev_write() when the number of
bytes to be retrieved is truncated to the fc->max_pages limit.
virtiofs:
- Fix page faults for DAX page addresses.
Misc:
- Tighten file handle decoding from userns.
Check that the decoded dentry itself has a valid idmapping in the
user namespace.
- Fix mount-notify selftests.
- Fix some indentation errors.
- Add an FMODE_ flag to indicate IOCB_HAS_METADATA availability.
This will be moved to an FOP_* flag with a bit more rework needed
for that to happen not suitable for a fix.
- Don't silently ignore metadata for sync read/write.
- Don't pointlessly log warning when reading coredump sysctls"
* tag 'vfs-6.17-rc6.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fuse: virtio_fs: fix page fault for DAX page address
selftests/fs/mount-notify: Fix compilation failure.
fhandle: use more consistent rules for decoding file handle from userns
fuse: Block access to folio overlimit
fuse: fix fuseblk i_blkbits for iomap partial writes
fuse: reflect cached blocksize if blocksize was changed
fuse: prevent overflow in copy_file_range return value
fuse: check if copy_file_range() returns larger than requested size
fuse: do not allow mapping a non-regular backing file
coredump: don't pointlessly check and spew warnings
fs: fix indentation style
block: don't silently ignore metadata for sync read/write
fs: add a FMODE_ flag to indicate IOCB_HAS_METADATA availability
Please enter a commit message to explain why this merge is necessary,
especially if it merges an updated upstream into a topic branch.
Thanks to -Waddress, the compiler warns that the ksft_test_result()
invocations in the arm64 tpidr2 selftest are always true. Oops.
Fix the test by, err, actually running the test functions.
Fixes: 6d80cb7313 ("kselftest/arm64: Convert tpidr2 test to use kselftest.h")
Signed-off-by: Bala-Vignesh-Reddy <reddybalavignesh9979@gmail.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Use ksft_perror() to report error codes from failing ptrace operations to
make it easier to interpret logs when things go wrong.
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Fix -Wunused-result warning generated when compiled with gcc 13.3.0,
by checking fread's return value and handling errors, preventing
potential failures when reading from stdin.
Fixes compiler warning:
warning: ignoring return value of 'fread' declared with attribute
'warn_unused_result' [-Wunused-result]
Fixes: 806a15b254 ("kselftests/arm64: add PAuth test for whether exec() changes keys")
Signed-off-by: Bala-Vignesh-Reddy <reddybalavignesh9979@gmail.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
devmem test fails on NIPA. Most likely we get skb(s) with readable
frags (why?) but the failure manifests as an OOM. The OOM happens
because ncdevmem spams the following message:
recvmsg ret=-1
recvmsg: Bad address
As of today, ncdevmem can't deal with various reasons of EFAULT:
- falling back to regular recvmsg for non-devmem skbs
- increasing ctrl_data size (can't happen with ncdevmem's large buffer)
Exit (cleanly) with error when recvmsg returns EFAULT. This should at
least cause the test to cleanup its state.
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20250904182710.1586473-1-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add the ability to dump BPF program instructions directly from veristat.
Previously, inspecting a program required separate bpftool invocations:
one to load and another to dump it, which meant running multiple
commands.
During active development, it's common for developers to use veristat
for testing verification. Integrating instruction dumping into veristat
reduces the need to switch tools and simplifies the workflow.
By making this information more readily accessible, this change aims
to streamline the BPF development cycle and improve usability for
developers.
This implementation leverages bpftool, by running it directly via popen
to avoid any code duplication and keep veristat simple.
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250905140835.1416179-1-mykyta.yatsenko5@gmail.com
This test makes sure we do send a FIN on close()
if the receive queue contains data that was consumed.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://patch.msgid.link/20250903084720.1168904-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The workload hint may contain some other hints which are not defined.
So mask out unsupported types. Currently only lower 4 bits of workload
type hints are defined.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://patch.msgid.link/20250828201541.931425-1-srinivas.pandruvada@linux.intel.com
[ rjw: Subject cleanup ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
We're reverting the removal of a Sundance driver, a user has appeared.
This makes the PR rather large in terms of LoC.
There's a conspicuous absence of real, user-reported 6.17 issues.
Slightly worried that the summer distracted people from testing.
Previous releases - regressions:
- ax25: properly unshare skbs in ax25_kiss_rcv()
Previous releases - always broken:
- phylink: disable autoneg for interfaces that have no inband,
fix regression on pcs-lynx (NXP LS1088)
- vxlan: fix null-deref when using nexthop objects
- batman-adv: fix OOB read/write in network-coding decode
- icmp: icmp_ndo_send: fix reversing address translation for replies
- tcp: fix socket ref leak in TCP-AO failure handling for IPv6
- mctp:
- mctp_fraq_queue should take ownership of passed skb
- usb: initialise mac header in RX path, avoid WARN
- wifi: mac80211: do not permit 40 MHz EHT operation on 5/6 GHz,
respect device limitations
- wifi: wilc1000: avoid buffer overflow in WID string configuration
- wifi: mt76:
- fix regressions from mt7996 MLO support rework
- fix offchannel handling issues on mt7996
- fix multiple wcid linked list corruption issues
- mt7921: don't disconnect when AP requests switch to a channel which
requires radar detection
- mt7925u: use connac3 tx aggr check in tx complete
- wifi: intel:
- improve validation of ACPI DSM data
- cfg: restore some 1000 series configs
- wifi: ath:
- ath11k: a fix for GTK rekeying
- ath12k: a missed WiFi7 capability (multi-link EMLSR)
- eth: intel:
- ice: fix races in "low latency" firmware interface for Tx timestamps
- idpf: set mac type when adding and removing MAC filters
- i40e: remove racy read access to some debugfs files
Misc:
- Revert "eth: remove the DLink/Sundance (ST201) driver"
- netfilter: conntrack: helper: Replace -EEXIST by -EBUSY, avoid confusing
modprobe
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmi5tR8ACgkQMUZtbf5S
Irt/dA//YtVjqx/qVRADORDq2CF2SZk7P4Kcanci5gI36tP+7yVYOoJyGm1SFfSp
MSv5Oxqn2U37we34FBgaIbkPOc6Bg1ZDDlzgs/RzGYzNbz9mZkYMHry0kJ8bjbeA
c6tlOKJu49wG82K/3eMXRFQFUJZS0TeMEfZMGoqcKU8ziHC6+XpUzXj1IqBbTU/+
C6JLGzpWnWHI0o97rscnRF4w1h2tl5FF1eBV6MiIJzP1nuBuK7bDkwbGvA5DdBOr
nqWdi6iSfGwb/4XYMKeRMNucYSsZOwAt9DYAxlPb2AXhpKSO31OjZ1qp3e5Uhg4u
n6Oj3q9tPNBitSqizZOAftay7xZJtXISpuFVm2pijgAz0G2SITvx7Z0kPKDw7rXa
S6ZSmBhlamj/6uaOmmmIyG7zqMXarLKVCh6C+tU/+vCYVb1YhC5fcIU1QML68FxX
9YfJFOf6Fn/j68MtRdknB5pnEfh0acMcKgie5b0725KBbRPM0/QlpLL5vj4UXZD5
3yxpgQbgciU3HiYBDJM1vWwF0Z3Y89YDtZqjUavnrnrxRv+BkzTmGlmNuqICKm3f
rUCjdNm8TDPUwKgSLeUbXF2u8Aapgizd7ptXcmKT2mbjSKdhqdvafUnrdPZaBZBA
uKH0XLNezj2zxcQaTEVdV5mtDujZMoXPS/RnCrICaMfXkz5Whxg=
=e0Jg
-----END PGP SIGNATURE-----
Merge tag 'net-6.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter, wireless and Bluetooth.
We're reverting the removal of a Sundance driver, a user has appeared.
This makes the PR rather large in terms of LoC.
There's a conspicuous absence of real, user-reported 6.17 issues.
Slightly worried that the summer distracted people from testing.
Previous releases - regressions:
- ax25: properly unshare skbs in ax25_kiss_rcv()
Previous releases - always broken:
- phylink: disable autoneg for interfaces that have no inband, fix
regression on pcs-lynx (NXP LS1088)
- vxlan: fix null-deref when using nexthop objects
- batman-adv: fix OOB read/write in network-coding decode
- icmp: icmp_ndo_send: fix reversing address translation for replies
- tcp: fix socket ref leak in TCP-AO failure handling for IPv6
- mctp:
- mctp_fraq_queue should take ownership of passed skb
- usb: initialise mac header in RX path, avoid WARN
- wifi: mac80211: do not permit 40 MHz EHT operation on 5/6 GHz,
respect device limitations
- wifi: wilc1000: avoid buffer overflow in WID string configuration
- wifi: mt76:
- fix regressions from mt7996 MLO support rework
- fix offchannel handling issues on mt7996
- fix multiple wcid linked list corruption issues
- mt7921: don't disconnect when AP requests switch to a channel
which requires radar detection
- mt7925u: use connac3 tx aggr check in tx complete
- wifi: intel:
- improve validation of ACPI DSM data
- cfg: restore some 1000 series configs
- wifi: ath:
- ath11k: a fix for GTK rekeying
- ath12k: a missed WiFi7 capability (multi-link EMLSR)
- eth: intel:
- ice: fix races in "low latency" firmware interface for Tx timestamps
- idpf: set mac type when adding and removing MAC filters
- i40e: remove racy read access to some debugfs files
Misc:
- Revert "eth: remove the DLink/Sundance (ST201) driver"
- netfilter: conntrack: helper: Replace -EEXIST by -EBUSY, avoid
confusing modprobe"
* tag 'net-6.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (99 commits)
phy: mscc: Stop taking ts_lock for tx_queue and use its own lock
selftest: net: Fix weird setsockopt() in bind_bhash.c.
MAINTAINERS: add Sabrina to TLS maintainers
gve: update MAINTAINERS
ppp: fix memory leak in pad_compress_skb
net: xilinx: axienet: Add error handling for RX metadata pointer retrieval
net: atm: fix memory leak in atm_register_sysfs when device_register fail
netfilter: nf_tables: Introduce NFTA_DEVICE_PREFIX
selftests: netfilter: fix udpclash tool hang
ax25: properly unshare skbs in ax25_kiss_rcv()
mctp: return -ENOPROTOOPT for unknown getsockopt options
net/smc: Remove validation of reserved bits in CLC Decline message
ipv4: Fix NULL vs error pointer check in inet_blackhole_dev_init()
net: thunder_bgx: decrement cleanup index before use
net: thunder_bgx: add a missing of_node_put
net: phylink: move PHY interrupt request to non-fail path
net: lockless sock_i_ino()
tools: ynl-gen: fix nested array counting
wifi: wilc1000: avoid buffer overflow in WID string configuration
wifi: cfg80211: sme: cap SSID length in __cfg80211_connect_result()
...
Add a timer test case to test 'bpf_in_interrupt()'.
cd tools/testing/selftests/bpf
./test_progs -t timer_interrupt
462 timer_interrupt:OK
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Link: https://lore.kernel.org/r/20250903140438.59517-3-leon.hwang@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Filtering pid_tgid is meanlingless when the current task is preempted by
an interrupt.
To address this, introduce 'bpf_in_interrupt()' helper function, which
allows BPF programs to determine whether they are executing in interrupt
context.
'get_preempt_count()':
* On x86, '*(int *) bpf_this_cpu_ptr(&__preempt_count)'.
* On arm64, 'bpf_get_current_task_btf()->thread_info.preempt.count'.
Then 'bpf_in_interrupt()' will be:
* If !PREEMPT_RT, 'get_preempt_count() & (NMI_MASK | HARDIRQ_MASK
| SOFTIRQ_MASK)'.
* If PREEMPT_RT, '(get_preempt_count() & (NMI_MASK | HARDIRQ_MASK))
| (bpf_get_current_task_btf()->softirq_disable_cnt & SOFTIRQ_MASK)'.
As for other archs, it can be added support by updating
'get_preempt_count()'.
Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Link: https://lore.kernel.org/r/20250903140438.59517-2-leon.hwang@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
For now, the benchmark for kprobe-multi is single, which means there is
only 1 function is hooked during testing. Add the testing
"kprobe-multi-all", which will hook all the kernel functions during
the benchmark. And the "kretprobe-multi-all" is added too.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Link: https://lore.kernel.org/r/20250904021011.14069-4-dongml2@chinatelecom.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
We need to get all the kernel function that can be traced sometimes, so we
move the get_syms() and get_addrs() in kprobe_multi_test.c to
trace_helpers.c and rename it to bpf_get_ksyms() and bpf_get_addrs().
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Link: https://lore.kernel.org/r/20250904021011.14069-2-dongml2@chinatelecom.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
bind_bhash.c passes (SO_REUSEADDR | SO_REUSEPORT) to setsockopt().
In the asm-generic definition, the value happens to match with the
bare SO_REUSEPORT, (2 | 15) == 15, but not on some arch.
arch/alpha/include/uapi/asm/socket.h:18:#define SO_REUSEADDR 0x0004
arch/alpha/include/uapi/asm/socket.h:24:#define SO_REUSEPORT 0x0200
arch/mips/include/uapi/asm/socket.h:24:#define SO_REUSEADDR 0x0004 /* Allow reuse of local addresses. */
arch/mips/include/uapi/asm/socket.h:33:#define SO_REUSEPORT 0x0200 /* Allow local address and port reuse. */
arch/parisc/include/uapi/asm/socket.h:12:#define SO_REUSEADDR 0x0004
arch/parisc/include/uapi/asm/socket.h:18:#define SO_REUSEPORT 0x0200
arch/sparc/include/uapi/asm/socket.h:13:#define SO_REUSEADDR 0x0004
arch/sparc/include/uapi/asm/socket.h:20:#define SO_REUSEPORT 0x0200
include/uapi/asm-generic/socket.h:12:#define SO_REUSEADDR 2
include/uapi/asm-generic/socket.h:27:#define SO_REUSEPORT 15
Let's pass SO_REUSEPORT only.
Fixes: c35ecb95c4 ("selftests/net: Add test for timing a bind request to a port with a populated bhash entry")
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250903222938.2601522-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
All architectures implementing time-related functionality in the vDSO are
using the generic vDSO library which handles time namespaces properly.
Remove the now unnecessary Kconfig symbol.
Enables the use of time namespaces on architectures, which use the
generic vDSO but did not enable GENERIC_VDSO_TIME_NS, namely MIPS and arm.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/all/20250826-vdso-cleanups-v1-10-d9b65750e49f@linutronix.de
Yi Chen reports that 'udpclash' loops forever depending on compiler
(and optimization level used); while (x == 1) gets optimized into
for (;;). Add volatile qualifier to avoid that.
While at it, also run it under timeout(1) and fix the resize script
to not ignore the timeout passed as second parameter to insert_flood.
Reported-by: Yi Chen <yiche@redhat.com>
Suggested-by: Yi Chen <yiche@redhat.com>
Fixes: 78a5883635 ("selftests: netfilter: add conntrack clash resolution test case")
Signed-off-by: Florian Westphal <fw@strlen.de>
Commit 4b30209255 ("selftests/xsk: Add tail adjustment tests and support
check") added a new global to xsk_xdp_progs.c, but left out the access in
the testapp_xdp_metadata_copy() function. Since bpf_map_update_elem() will
write to the whole bss section, it gets truncated. Fix by writing to
skel_rx->bss->count directly.
Fixes: 4b30209255 ("selftests/xsk: Add tail adjustment tests and support check")
Signed-off-by: Ricardo B. Marlière <rbm@suse.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250829-selftests-bpf-xsk_regression_fix-v1-1-5f5acdb9fe6b@suse.com
Currently, even if some subtests fails, the end result will still yield
"ok 1 selftests: bpf: test_xsk.sh". Fix it by exiting with 1 if there are
any failures.
Signed-off-by: Ricardo B. Marlière <rbm@suse.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20250828-selftests-bpf-test_xsk_ret-v1-1-e6656c01f397@suse.com
Add test cases for VXLAN with FDB nexthop groups, testing both IPv4 and
IPv6. Test basic Tx functionality as well as some corner cases.
Example output:
# ./test_vxlan_nh.sh
TEST: VXLAN FDB nexthop: IPv4 basic Tx [ OK ]
TEST: VXLAN FDB nexthop: IPv6 basic Tx [ OK ]
TEST: VXLAN FDB nexthop: learning [ OK ]
TEST: VXLAN FDB nexthop: IPv4 proxy [ OK ]
TEST: VXLAN FDB nexthop: IPv6 proxy [ OK ]
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20250901065035.159644-4-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The buffer be used without free,fix it to avoid memory leak.
Signed-off-by: Zongmin Zhou <zhouzongmin@kylinos.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250901054557.32811-1-min_halo@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
rss_ctx.test_rss_key_indir implicitly expects at least 5 queues,
as it checks that the traffic on first 2 queues is lower than
the remaining queues when we use all queues. Special case fewer
queues.
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250901173139.881070-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The rss_ctx test has gotten pretty flaky after I increased
the queue count in NIPA 2->3. Not 100% clear why. We get
a lot of failures in the rss_ctx.test_hitless_key_update case.
Looking closer it appears that the failures are mostly due
to startup costs. I measured the following timing for ethtool -X:
- python cmd(shell=True) : 150-250msec
- python cmd(shell=False) : 50- 70msec
- timed in bash : 45- 55msec
- YNL Netlink call : 2- 4msec
- .set_rxfh callback : 1- 2msec
The target in the test was set to 200msec. We were mostly measuring
ethtool startup cost it seems. Switch to YNL since it's 100x faster.
Lower the pass criteria to 150msec, no real science behind this number
but we removed some overhead, drivers which previously passed 200msec
should easily pass 150msec now.
Separately we should probably follow up on defaulting to shell=False,
when script doesn't explicitly ask for True, because the overhead
is rather significant.
Switch from _rss_key_rand() to random.randbytes(), YNL takes a binary
array rather than array of ints.
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250901173139.881070-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Overhead of using shell=True is quite significant.
Micro-benchmark of running ethtool --help shows that
non-shell run is 2x faster.
Runtime of the XDP tests also shows improvement:
this patch: 2m34s 2m21s 2m18s 2m18s
before: 2m54s 2m36s 2m34s
Reviewed-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20250830184317.696121-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Clean up tests which expect shell=True without explicitly passing
that param to cmd(). There seems to be only one such case, and
in fact it's better converted to a direct write.
Reviewed-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20250830184317.696121-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch improves the utils.py module by removing unused imports
(errno, random), simplifying the fd_read_timeout() function by
eliminating unnecessary else clause, and cleaning up code style in the
defer class constructor.
Additionally, it renames the parameter in rand_port() from 'type' to
'stype' to avoid shadowing the built-in Python name 'type', improving
code clarity and preventing potential issues.
These changes enhance code readability and maintainability without
affecting functionality.
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250901-fix-v1-1-df0abb67481e@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
or aren't considered necessary for -stable kernels. 11 of these fixes are
for MM.
This includes a three-patch series from Harry Yoo which fixes an
intermittent boot failure which can occur on x86 systems. And a two-patch
series from Alexander Gordeev which fixes a KASAN crash on S390 systems.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaLY4WAAKCRDdBJ7gKXxA
jp2qAP92JCCzscR87um+YSc4u6a/X6ucYWkzh9BGhM8bMT8p7wD/UhIuGbYRFLPw
XbSDkAD6lKpujQkRAudRFQTcZcU7gwg=
=mPUd
-----END PGP SIGNATURE-----
Merge tag 'mm-hotfixes-stable-2025-09-01-17-20' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"17 hotfixes. 13 are cc:stable and the remainder address post-6.16
issues or aren't considered necessary for -stable kernels. 11 of these
fixes are for MM.
This includes a three-patch series from Harry Yoo which fixes an
intermittent boot failure which can occur on x86 systems. And a
two-patch series from Alexander Gordeev which fixes a KASAN crash on
S390 systems"
* tag 'mm-hotfixes-stable-2025-09-01-17-20' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm: fix possible deadlock in kmemleak
x86/mm/64: define ARCH_PAGE_TABLE_SYNC_MASK and arch_sync_kernel_mappings()
mm: introduce and use {pgd,p4d}_populate_kernel()
mm: move page table sync declarations to linux/pgtable.h
proc: fix missing pde_set_flags() for net proc files
mm: fix accounting of memmap pages
mm/damon/core: prevent unnecessary overflow in damos_set_effective_quota()
kexec: add KEXEC_FILE_NO_CMA as a legal flag
kasan: fix GCC mem-intrinsic prefix with sw tags
mm/kasan: avoid lazy MMU mode hazards
mm/kasan: fix vmalloc shadow memory (de-)population races
kunit: kasan_test: disable fortify string checker on kasan_strings() test
selftests/mm: fix FORCE_READ to read input value correctly
mm/userfaultfd: fix kmap_local LIFO ordering for CONFIG_HIGHPTE
ocfs2: prevent release journal inode after journal shutdown
rust: mm: mark VmaNew as transparent
of_numa: fix uninitialized memory nodes causing kernel panic
Fix several spelling and grammatical mistakes in output messages from
the net selftests to improve readability.
Only the message strings for the test output have been modified. No
changes to the functional logic of the tests have been made.
Signed-off-by: Praveen Balakrishnan <praveen.balakrishnan@magd.ox.ac.uk>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250828211100.51019-1-praveen.balakrishnan@magd.ox.ac.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use cfg.remote_ifname for arguments of remote command.
Without this UDP tests fail in NIPA where local interface
is called enp1s0 and remote enp0s4.
Fixes: 1d0dc857b5 ("selftests: drv-net: add checksum tests")
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250830183842.688935-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The expect non-zero macro was incorrect and never used. Fix its
definition.
Fixes: 362aecb2d8 ("selftests/nolibc: add basic infrastructure to ease creation of nolibc tests")
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://lore.kernel.org/r/20250731201225.323254-2-benjamin@sipsolutions.net
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
wait4() is deprecated, non-standard and about to be removed from nolibc.
Switch to the equivalent waitpid() call.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Reviewed-by: Mark Brown <broonie@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20250821-nolibc-enosys-v1-6-4b63f2caaa89@weissschuh.net
On 32bit ARM systems gcc-12 will use 32bit timestamps while gcc-13 and later
will use 64bit timestamps. The problem is that SYS_futex will continue
pointing at the 32bit system call. This makes the futex_wait test fail like
this:
waiter failed errno 110
not ok 1 futex_wake private returned: 0 Success
waiter failed errno 110
not ok 2 futex_wake shared (page anon) returned: 0 Success
waiter failed errno 110
not ok 3 futex_wake shared (file backed) returned: 0 Success
Instead of compiling differently depending on the gcc version, use the
-D_FILE_OFFSET_BITS=64 -D_TIME_BITS=64 options to ensure that 64bit timestamps
are used. Then use ifdefs to make SYS_futex point to the 64bit system call.
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
Tested-by: Anders Roxell <anders.roxell@linaro.org>
Link: https://lore.kernel.org/20250827130011.677600-6-bigeasy@linutronix.de
Fix multiple typos and small grammar issues in help text, comments and test
messages in the futex_priv_hash test.
Signed-off-by: Gopi Krishna Menon <krishnagopi487@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
Link: https://lore.kernel.org/20250827130011.677600-5-bigeasy@linutronix.de
Fix format-security warnings by using proper format strings when passing
message variables to ksft_exit_fail_msg(), ksft_test_result_pass(), and
ksft_test_result_skip() function.
Thus prevent potential security issues and eliminate compiler warnings when
building with -Wformat-security.
Signed-off-by: Nai-Chen Cheng <bleach1827@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20250827130011.677600-4-bigeasy@linutronix.de
The "Memory out of range" subtest of futex_numa_mpol assumes that memory
access outside of the mmap'ed area is invalid. That may not be the case
depending on the actual memory layout of the test application. When that
subtest was run on an x86-64 system with latest upstream kernel, the test
passed as an error was returned from futex_wake(). On another PowerPC system,
the same subtest failed because futex_wake() returned 0.
Bail out! futex2_wake(64, 0x86) should fail, but didn't
Looking further into the passed subtest on x86-64, it was found that an
-EINVAL was returned instead of -EFAULT. The -EINVAL error was returned
because the node value test with FLAGS_NUMA set failed with a node value
of 0x7f7f. IOW, the futex memory was accessible and futex_wake() failed
because the supposed node number wasn't valid. If that memory location
happens to have a very small value (e.g. 0), the test will pass and no
error will be returned.
Since this subtest is non-deterministic, drop it unless a guard page beyond
the mmap region is explicitly set.
The other problematic test is the "Memory too small" test. The futex_wake()
function returns the -EINVAL error code because the given futex address isn't
8-byte aligned, not because only 4 of the 8 bytes are valid and the other
4 bytes are not. So change the name of this subtest to "Mis-aligned futex" to
reflect the reality.
[ bp: Massage commit message. ]
Fixes: 3163369407 ("selftests/futex: Add futex_numa_mpol")
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lore.kernel.org/20250827130011.677600-3-bigeasy@linutronix.de
While the GCC and Clang compilers already define __ASSEMBLER__
automatically when compiling assembler code, __ASSEMBLY__ is a
macro that only gets defined by the Makefiles in the kernel.
This is bad since macros starting with two underscores are names
that are reserved by the C language. It can also be very confusing
for the developers when switching between userspace and kernelspace
coding, or when dealing with uapi headers that rather should use
__ASSEMBLER__ instead. So let's standardize now on the __ASSEMBLER__
macro that is provided by the compilers.
This is almost a completely mechanical patch (done with a simple
"sed -i" statement), apart from tweaking two comments manually in
arch/powerpc/include/asm/bug.h and arch/powerpc/include/asm/kasan.h
(which did not have proper underscores at the end) and fixing a
checkpatch error about spaces in arch/powerpc/include/asm/spu_csa.h.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250801082007.32904-3-thuth@redhat.com
- CFI failure due to kpti_ng_pgd_alloc() signature mismatch
- Underallocation bug in the SVE ptrace kselftest
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmiy2d8ACgkQa9axLQDI
XvEPOxAAhMBXB6RoYlUe9Xwnkazz9TndSJy4buxEYQhC6dUtRdHwSiDEo8caaZpt
lhIqLsmCL0SiP4SiZDobLhrcrD4U0sLNeZf9mkk6JnZ12ws3scEtZvBi2WhYn3rv
hlAoH0WvhBgELm3RNyvuwhovZNixPj/aJ4keblHV3AuQZ2V9bN7f2rH9XMUDBP2R
Q3yKUFxniyLh5UR/m2JD4lm5IvnzDT1uYdvChKviW5VMepY/UIerG6Y7fqfaKQIE
h4S/QNqulOI+hfpRICowJ+Ydpb9oTgDBTWYLSCsKpjA5aLlkv1uVQSVHayyx7hc2
XitLDcYNV8X2tBHa04Ip+TiRk1KM6UZvMsgGdnd6AzzucEjJg8glt4u4aM3gxkqq
N2jgVFOiWj4xQi2Y3wDM74/PrywzD+TwhKUW15hjwkYONzm2Ff+Q0RN86y45lCON
lchZ+khaQoZDG8eBuacsWlaIm0VGNySNsLbVNbj2+8fZWfWdCXo53DqMP71qzsWR
bKSMlQTek1RoJYzC6zbJ4NSuPXJTPFc0SbLKGUIlNL+X5Zfu+kcI5scakczVb3FB
gvqA0kek9dB/RIKUeXE2i9+1Ew9LsotnK8Woww/stYP+zUbq/Xsp6FVhk9kEZOls
JWU90GbegChS/Lgc1C8CimsUiCKb4GtZ+x1iUCSdZNSZ7yK/V+g=
=gHuh
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- CFI failure due to kpti_ng_pgd_alloc() signature mismatch
- Underallocation bug in the SVE ptrace kselftest
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
kselftest/arm64: Don't open code SVE_PT_SIZE() in fp-ptrace
arm64: mm: Fix CFI failure due to kpti_ng_pgd_alloc function signature
In fp-trace when allocating a buffer to write SVE register data we open
code the addition of the header size to the VL depeendent register data
size, which lead to an underallocation bug when we cut'n'pasted the code
for FPSIMD format writes. Use the SVE_PT_SIZE() macro that the kernel
UAPI provides for this.
Fixes: b84d2b2795 ("kselftest/arm64: Test FPSIMD format data writes via NT_ARM_SVE in fp-ptrace")
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20250812-arm64-fp-trace-macro-v1-1-317cfff986a5@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Jakub says:
nft_flowtable.sh is one of the most flake-atious test for netdev CI currently :(
The root cause is two-fold:
1. the failing part of the test is supposed to make sure that ip
fragments are forwarded for offloaded flows.
(flowtable has to pass them to classic forward path).
path mtu discovery for these subtests is disabled.
2. nft_flowtable.sh has two passes. One with fixed mtus/file size and
one where link mtus and file sizes are random.
The CI failures all have same pattern:
re-run with random mtus and file size: -o 27663 -l 4117 -r 10089 -s 54384840
[..]
PASS: dscp_egress: dscp packet counters match
FAIL: file mismatch for ns1 -> ns2
In some cases this error triggers a bit ealier, sometimes in a later
subtest:
re-run with random mtus and file size: -o 20201 -l 4555 -r 12657 -s 9405856
[..]
PASS: dscp_egress: dscp packet counters match
PASS: dscp_fwd: dscp packet counters match
2025/08/17 20:37:52 socat[18954] E write(7, 0x560716b96000, 8192): Broken pipe
FAIL: file mismatch for ns1 -> ns2
-rw------- 1 root root 9405856 Aug 17 20:36 /tmp/tmp.2n63vlTrQe
But all logs I saw show same scenario:
1. Failing tests have pmtu discovery off (i.e., ip fragmentation)
2. The test file is much larger than first-pass default (2M Byte)
3. peers have much larger MTUs compared to the 'network'.
These errors are very reproducible when re-running the test with
the same commandline arguments.
The timeout became much more prominent with
1d2fbaad7c ("tcp: stronger sk_rcvbuf checks"): reassembled packets
typically have a skb->truesize more than double the skb length.
As that commit is intentional and pmtud-off with
large-tcp-packets-as-fragments is not normal adjust the test to use a
smaller file for the pmtu-off subtests.
While at it, add more information to pass/fail messages and
also run the dscp alteration subtest with pmtu discovery enabled.
Link: https://netdev.bots.linux.dev/contest.html?test=nft-flowtable-sh
Fixes: f84ab63490 ("selftests: netfilter: nft_flowtable.sh: re-run with random mtu sizes")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/netdev/20250822071330.4168f0db@kernel.org/
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://patch.msgid.link/20250828214918.3385-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- Correctly handle 'invariant' system registers for protected VMs
- Improved handling of VNCR data aborts, including external aborts
- Fixes for handling of FEAT_RAS for NV guests, providing a sane
fault context during SEA injection and preventing the use of
RASv1p1 fault injection hardware
- Ensure that page table destruction when a VM is destroyed gives an
opportunity to reschedule
- Large fix to KVM's infrastructure for managing guest context loaded
on the CPU, addressing issues where the output of AT emulation
doesn't get reflected to the guest
- Fix AT S12 emulation to actually perform stage-2 translation when
necessary
- Avoid attempting vLPI irqbypass when GICv4 has been explicitly
disabled for a VM
- Minor KVM + selftest fixes
RISC-V:
- Fix pte settings within kvm_riscv_gstage_ioremap()
- Fix comments in kvm_riscv_check_vcpu_requests()
- Fix stack overrun when setting vlenb via ONE_REG
x86:
- Use array_index_nospec() to sanitize the target vCPU ID when handling PV
IPIs and yields as the ID is guest-controlled.
- Drop a superfluous cpumask_empty() check when reclaiming SEV memory, as
the common case, by far, is that at least one CPU will have entered the
VM, and wbnoinvd_on_cpus_mask() will naturally handle the rare case where
the set of have_run_cpus is empty.
Selftests (not KVM):
- Rename the is_signed_type() macro in kselftest_harness.h to is_signed_var()
to fix a collision with linux/overflow.h. The collision generates compiler
warnings due to the two macros having different meaning.
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmix3OMUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroOZGAf+K+xTAhbMuY4bK5Sn93/QssYUVsFv
wWc/q5FXUd8t21eAN+b/qhGF4d71eDuoIUNzOBwbJ9qY/0F42Xgihfr7BarSBBqD
anqQBnhhtCyPCa1tF8SyBv34HewNKts3bgSxnwo2V2CBGWqomm6cZ9Uh3yALFBGJ
kqHi0kKql+QL9G9DbRQ8lEJAPnCnktFFtA94T5B+o7yh1vvPeBsK40chH8bi19nh
vCdoGhNLr+k+MoYpfJ8lyOJ7QctijJBK7OlsteksMvCXKQdfz1/X7TnoF11rb4yV
MPfMUDOGlIVEBaVBkokyHXXPv0Fg4zGlt/SYzOZWRHIYgQNQ+aSscAKODA==
=W51r
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"ARM:
- Correctly handle 'invariant' system registers for protected VMs
- Improved handling of VNCR data aborts, including external aborts
- Fixes for handling of FEAT_RAS for NV guests, providing a sane
fault context during SEA injection and preventing the use of
RASv1p1 fault injection hardware
- Ensure that page table destruction when a VM is destroyed gives an
opportunity to reschedule
- Large fix to KVM's infrastructure for managing guest context loaded
on the CPU, addressing issues where the output of AT emulation
doesn't get reflected to the guest
- Fix AT S12 emulation to actually perform stage-2 translation when
necessary
- Avoid attempting vLPI irqbypass when GICv4 has been explicitly
disabled for a VM
- Minor KVM + selftest fixes
RISC-V:
- Fix pte settings within kvm_riscv_gstage_ioremap()
- Fix comments in kvm_riscv_check_vcpu_requests()
- Fix stack overrun when setting vlenb via ONE_REG
x86:
- Use array_index_nospec() to sanitize the target vCPU ID when
handling PV IPIs and yields as the ID is guest-controlled.
- Drop a superfluous cpumask_empty() check when reclaiming SEV
memory, as the common case, by far, is that at least one CPU will
have entered the VM, and wbnoinvd_on_cpus_mask() will naturally
handle the rare case where the set of have_run_cpus is empty.
Selftests (not KVM):
- Rename the is_signed_type() macro in kselftest_harness.h to
is_signed_var() to fix a collision with linux/overflow.h. The
collision generates compiler warnings due to the two macros having
different meaning"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (29 commits)
KVM: arm64: nv: Fix ATS12 handling of single-stage translation
KVM: arm64: Remove __vcpu_{read,write}_sys_reg_{from,to}_cpu()
KVM: arm64: Fix vcpu_{read,write}_sys_reg() accessors
KVM: arm64: Simplify sysreg access on exception delivery
KVM: arm64: Check for SYSREGS_ON_CPU before accessing the 32bit state
RISC-V: KVM: fix stack overrun when loading vlenb
RISC-V: KVM: Correct kvm_riscv_check_vcpu_requests() comment
RISC-V: KVM: Fix pte settings within kvm_riscv_gstage_ioremap()
KVM: arm64: selftests: Sync ID_AA64MMFR3_EL1 in set_id_regs
KVM: arm64: Get rid of ARM64_FEATURE_MASK()
KVM: arm64: Make ID_AA64PFR1_EL1.RAS_frac writable
KVM: arm64: Make ID_AA64PFR0_EL1.RAS writable
KVM: arm64: Ignore HCR_EL2.FIEN set by L1 guest's EL2
KVM: arm64: Handle RASv1p1 registers
arm64: Add capability denoting FEAT_RASv1p1
KVM: arm64: Reschedule as needed when destroying the stage-2 page-tables
KVM: arm64: Split kvm_pgtable_stage2_destroy()
selftests: harness: Rename is_signed_type() to avoid collision with overflow.h
KVM: SEV: don't check have_run_cpus in sev_writeback_caches()
KVM: arm64: Correctly populate FAR_EL2 on nested SEA injection
...
Cross-merge networking fixes after downstream PR (net-6.17-rc4).
No conflicts.
Adjacent changes:
drivers/net/ethernet/intel/idpf/idpf_txrx.c
02614eee26 ("idpf: do not linearize big TSO packets")
6c4e684802 ("idpf: remove obsolete stashing code")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- Correctly handle 'invariant' system registers for protected VMs
- Improved handling of VNCR data aborts, including external aborts
- Fixes for handling of FEAT_RAS for NV guests, providing a sane
fault context during SEA injection and preventing the use of
RASv1p1 fault injection hardware
- Ensure that page table destruction when a VM is destroyed gives an
opportunity to reschedule
- Large fix to KVM's infrastructure for managing guest context loaded
on the CPU, addressing issues where the output of AT emulation
doesn't get reflected to the guest
- Fix AT S12 emulation to actually perform stage-2 translation when
necessary
- Avoid attempting vLPI irqbypass when GICv4 has been explicitly
disabled for a VM
- Minor KVM + selftest fixes
-----BEGIN PGP SIGNATURE-----
iI0EABYIADUWIQSNXHjWXuzMZutrKNKivnWIJHzdFgUCaLC0JBccb2xpdmVyLnVw
dG9uQGxpbnV4LmRldgAKCRCivnWIJHzdFogJAQCyxHd5tuvXWWT/iC2EYFlPWYkU
LOQbNhus16QjQ9f2ggD8CoA+6UAxzYW7ZU6IzYkDhJkN/3dKQEQhh8Cx0GXXRAs=
=uky+
-----END PGP SIGNATURE-----
Merge tag 'kvmarm-fixes-6.17-1' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 changes for 6.17, take #2
- Correctly handle 'invariant' system registers for protected VMs
- Improved handling of VNCR data aborts, including external aborts
- Fixes for handling of FEAT_RAS for NV guests, providing a sane
fault context during SEA injection and preventing the use of
RASv1p1 fault injection hardware
- Ensure that page table destruction when a VM is destroyed gives an
opportunity to reschedule
- Large fix to KVM's infrastructure for managing guest context loaded
on the CPU, addressing issues where the output of AT emulation
doesn't get reflected to the guest
- Fix AT S12 emulation to actually perform stage-2 translation when
necessary
- Avoid attempting vLPI irqbypass when GICv4 has been explicitly
disabled for a VM
- Minor KVM + selftest fixes
The -g parameter was meant to the test the immutable global hash instead of the
private hash which has been made immutable. The global hash is tested as part
at the end of the regular test. The immutable private hash been removed.
Remove last traces of the immutable private hash.
Fixes: 16adc7f136 ("selftests/futex: Remove support for IMMUTABLE")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
Link: https://lore.kernel.org/20250827130011.677600-2-bigeasy@linutronix.de
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmiwwhkQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpnooD/92QqpW1IOn0hs0WGOiKl7se5JgBuLV9bd0
j7M8EpGyFK63zeDgAO0/Qu2arEva9tlDaW9S/e78kt1PwQ7it82RSlPC47nKWEv+
Xn5npn2SOEkyY/5/a+iVkOsSD++RIM/DnHNBe9vmU6wePR7AthG5Nhl6kOEi5cyw
e1eLbwB/0eJZzhHYCaowLwdtnMCcwQAEh/FFYz7tSyrgOsDpovKid6D7s2Sv9sGA
+lOzwimgszr3RF5bOMudU12RtP35gAvyF56iDVMQylHXybuaYYzzVAmxRLD5vrOf
vC4HSzRxwUCZnIW62TTP29dAB9mik3va069e1xV1le94vdEsgj2HmuV4tpCzs3LN
4a1q8oEC83QI+cixzsjxf00DaJSkk9msUNsqE+6rcEK0M6z0tRz3mbLXYiCo7V4Y
eLD7eMsxXkJUSpTwWOIVYiXuM+OvSmFxIEoz4lGnQESKyV0dA86RT5TpOiUCrkW1
G9nTccxVPZG3i8FKJXmgZLPmpviw+wwzdpjlVShqSGA++/bBXKwAnjAQPSsaVDIS
HaqhG1IngL9sCcnAy8ZBTEy2TYibasUL48vfCeRhP2u2RZa21zRSOPNFRV9cVvRE
/wOWQikOqC9ys7zvbLG0OfLQAlejGn1k+k6oEOI6P9x2a0vbjq/UZxIQSkdYEhXG
x73gWYk28g==
=/1eG
-----END PGP SIGNATURE-----
Merge tag 'block-6.17-20250828' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
- Fix a lockdep spotted issue on recursive locking for zoned writes, in
case of errors
- Update bcache MAINTAINERS entry address for Coly
- Fix for a ublk release issue, with selftests
- Fix for a regression introduced in this cycle, where it assumed
q->rq_qos was always set if the bio flag indicated that
- Fix for a regression introduced in this cycle, where loop retrieving
block device sizes got broken
* tag 'block-6.17-20250828' of git://git.kernel.dk/linux:
bcache: change maintainer's email address
ublk selftests: add --no_ublk_fixed_fd for not using registered ublk char device
ublk: avoid ublk_io_release() called after ublk char dev is closed
block: validate QoS before calling __rq_qos_done_bio()
blk-zoned: Fix a lockdep complaint about recursive locking
loop: fix zero sized loop for block special file
Commit 0d6ccfe6b3 ("selftests: drv-net: rss_ctx: check for all-zero keys")
added a skip exception if NIC has fewer than 3 queues enabled,
but it's just constructing the object, it's not actually rising
this exception.
Before:
# Exception| net.lib.py.utils.CmdExitFailure: Command failed: ethtool -X enp1s0 equal 3 hkey d1:cc:77:47:9d:ea:15:f2:b9:6c:ef:68:62:c0:45:d5:b0:99:7d:cf:29:53:40:06:3d:8e:b9:bc:d4:70:89:b8:8d:59:04:ea:a9:c2:21:b3:55:b8:ab:6b:d9:48:b4:bd:4c:ff:a5:f0:a8:c2
not ok 1 rss_ctx.test_rss_key_indir
After:
ok 1 rss_ctx.test_rss_key_indir # SKIP Device has fewer than 3 queues (or doesn't support queue stats)
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250827173558.3259072-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix the rules in tools/testing/selftests/vfio/.gitignore to not ignore
some already tracked files (.gitignore, Makefile, lib/libvfio.mk).
This change should be a no-op, since these files are already tracked by git
and thus git will not ignore updates to them even though they match the
ignore rules in the VFIO selftests .gitignore file.
However, they do generate warnings with W=1, as reported by the kernel test
robot.
$ KBUILD_EXTRA_WARN=1 scripts/misc-check
tools/testing/selftests/vfio/.gitignore: warning: ignored by one of the .gitignore files
tools/testing/selftests/vfio/Makefile: warning: ignored by one of the .gitignore files
tools/testing/selftests/vfio/lib/libvfio.mk: warning: ignored by one of the .gitignore files
Fix this by explicitly un-ignoring the tracked files.
Fixes: 292e9ee22b ("selftests: Create tools/testing/selftests/vfio")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202508280918.rFRyiLEU-lkp@intel.com/
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20250828185815.382215-1-dmatlack@google.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Commit e9fc3ce99b ("libbpf: Streamline error reporting for high-level
APIs") redefined the way that bpf_prog_detach2() returns. Therefore, adapt
the usage in test_lirc_mode2_user.c.
Signed-off-by: Ricardo B. Marlière <rbm@suse.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250828-selftests-bpf-v1-1-c7811cd8b98c@suse.com
Add a new command line option --no_ublk_fixed_fd that excludes the ublk
control device (/dev/ublkcN) from io_uring's registered files array.
When this option is used, only backing files are registered starting
from index 1, while the ublk control device is accessed using its raw
file descriptor.
Add ublk_get_registered_fd() helper function that returns the appropriate
file descriptor for use with io_uring operations.
Key optimizations implemented:
- Cache UBLKS_Q_NO_UBLK_FIXED_FD flag in ublk_queue.flags to avoid
reading dev->no_ublk_fixed_fd in fast path
- Cache ublk char device fd in ublk_queue.ublk_fd for fast access
- Update ublk_get_registered_fd() to use ublk_queue * parameter
- Update io_uring_prep_buf_register/unregister() to use ublk_queue *
- Replace ublk_device * access with ublk_queue * access in fast paths
Also pass --no_ublk_fixed_fd to test_stress_04.sh for covering
plain ublk char device mode.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250827121602.2619736-3-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When a packet flood hits one or more UDP sockets, many cpus
have to update sk->sk_drops.
This slows down other cpus, because currently
sk_drops is in sock_write_rx group.
Add a socket_drop_counters structure to udp sockets.
Using dedicated cache lines to hold drop counters
makes sure that consumers no longer suffer from
false sharing if/when producers only change sk->sk_drops.
This adds 128 bytes per UDP socket.
Tested with the following stress test, sending about 11 Mpps
to a dual socket AMD EPYC 7B13 64-Core.
super_netperf 20 -t UDP_STREAM -H DUT -l10 -- -n -P,1000 -m 120
Note: due to socket lookup, only one UDP socket is receiving
packets on DUT.
Then measure receiver (DUT) behavior. We can see both
consumer and BH handlers can process more packets per second.
Before:
nstat -n ; sleep 1 ; nstat | grep Udp
Udp6InDatagrams 615091 0.0
Udp6InErrors 3904277 0.0
Udp6RcvbufErrors 3904277 0.0
After:
nstat -n ; sleep 1 ; nstat | grep Udp
Udp6InDatagrams 816281 0.0
Udp6InErrors 7497093 0.0
Udp6RcvbufErrors 7497093 0.0
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250826125031.1578842-5-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
FORCE_READ() converts input value x to its pointer type then reads from
address x. This is wrong. If x is a non-pointer, it would be caught it
easily. But all FORCE_READ() callers are trying to read from a pointer
and FORCE_READ() basically reads a pointer to a pointer instead of the
original typed pointer. Almost no access violation was found, except the
one from split_huge_page_test.
Fix it by implementing a simplified READ_ONCE() instead.
Link: https://lkml.kernel.org/r/20250805175140.241656-1-ziy@nvidia.com
Fixes: 3f6bfd4789 ("selftests/mm: reuse FORCE_READ to replace "asm volatile("" : "+r" (XXX));"")
Signed-off-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: wang lian <lianux.mm@gmail.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Kairui Song <ryncsn@gmail.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add benchmarks for the standard set of operations: LOOKUP, INSERT,
UPDATE, DELETE. Also include benchmarks to measure the overhead of the
bench framework itself (NOOP) as well as the overhead of generating keys
(BASELINE). Lastly, this includes a benchmark for FREE (trie_free())
which is known to have terrible performance for maps with many entries.
Benchmarks operate on tries without gaps in the key range, i.e. each
test begins or ends with a trie with valid keys in the range [0,
nr_entries). This is intended to cause maximum branching when traversing
the trie.
LOOKUP, UPDATE, DELETE, and FREE fill a BPF LPM trie from userspace
using bpf_map_update_batch() and run the corresponding benchmark
operation via bpf_loop(). INSERT starts with an empty map and fills it
kernel-side from bpf_loop(). FREE records the time to free a filled LPM
trie by attaching and destroying a BPF prog. NOOP measures the overhead
of the test harness by running an empty function with bpf_loop().
BASELINE is similar to NOOP except that the function generates a key.
Each operation runs 10,000 times using bpf_loop(). Note that this value
is intentionally independent of the number of entries in the LPM trie so
that the stability of the results isn't affected by the number of
entries.
For those benchmarks that need to reset the LPM trie once it's full
(INSERT) or empty (DELETE), throughput and latency results are scaled by
the fraction of a second the operation actually ran to ignore any time
spent reinitialising the trie.
By default, benchmarks run using sequential keys in the range [0,
nr_entries). BASELINE, LOOKUP, and UPDATE can use random keys via the
--random parameter but beware there is a runtime cost involved in
generating random keys. Other benchmarks are prohibited from using
random keys because it can skew the results, e.g. when inserting an
existing key or deleting a missing one.
All measurements are recorded from within the kernel to eliminate
syscall overhead. Most benchmarks run an XDP program to generate stats
but FREE needs to collect latencies using fentry/fexit on
map_free_deferred() because it's not possible to use fentry directly on
lpm_trie.c since commit c83508da56 ("bpf: Avoid deadlock caused by
nested kprobe and fentry bpf programs") and there's no way to
create/destroy a map from within an XDP program.
Here is example output from an AMD EPYC 9684X 96-Core machine for each
of the benchmarks using a trie with 10K entries and a 32-bit prefix
length, e.g.
$ ./bench lpm-trie-$op \
--prefix_len=32 \
--producers=1 \
--nr_entries=10000
noop: throughput 74.417 ± 0.032 M ops/s ( 74.417M ops/prod), latency 13.438 ns/op
baseline: throughput 70.107 ± 0.171 M ops/s ( 70.107M ops/prod), latency 14.264 ns/op
lookup: throughput 8.467 ± 0.047 M ops/s ( 8.467M ops/prod), latency 118.109 ns/op
insert: throughput 2.440 ± 0.015 M ops/s ( 2.440M ops/prod), latency 409.290 ns/op
update: throughput 2.806 ± 0.042 M ops/s ( 2.806M ops/prod), latency 356.322 ns/op
delete: throughput 4.625 ± 0.011 M ops/s ( 4.625M ops/prod), latency 215.613 ns/op
free: throughput 0.578 ± 0.006 K ops/s ( 0.578K ops/prod), latency 1.730 ms/op
And the same benchmarks using random keys:
$ ./bench lpm-trie-$op \
--prefix_len=32 \
--producers=1 \
--nr_entries=10000 \
--random
noop: throughput 74.259 ± 0.335 M ops/s ( 74.259M ops/prod), latency 13.466 ns/op
baseline: throughput 35.150 ± 0.144 M ops/s ( 35.150M ops/prod), latency 28.450 ns/op
lookup: throughput 7.119 ± 0.048 M ops/s ( 7.119M ops/prod), latency 140.469 ns/op
insert: N/A
update: throughput 2.736 ± 0.012 M ops/s ( 2.736M ops/prod), latency 365.523 ns/op
delete: N/A
free: N/A
Signed-off-by: Matt Fleming <mfleming@cloudflare.com>
Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://lore.kernel.org/r/20250827140149.1001557-1-matt@readmodwrite.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
When using GCC on x86-64 to compile an usdt prog with -O1 or higher
optimization, the compiler will generate SIB addressing mode for global
array, e.g. "1@-96(%rbp,%rax,8)".
In this patch:
- enrich subtest_basic_usdt test case to cover SIB addressing usdt argument spec
handling logic
Signed-off-by: Jiawei Zhao <phoenix500526@163.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250827053128.1301287-3-phoenix500526@163.com
Add new selftest to test the abstract multiplication technique(s) used
by the verifier, following the recent improvement in tnum
multiplication (tnum_mul). One of the newly added programs,
verifier_mul/mul_precise, results in a false positive with the old
tnum_mul, while the program passes with the latest one.
Signed-off-by: Nandakumar Edamana <nandakumar@nandakumar.co.in>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Harishankar Vishwanathan <harishankar.vishwanathan@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20250826034524.2159515-2-nandakumar@nandakumar.co.in
Introduce run.sh, a script to help with running VFIO selftests. The
script is intended to be used for both humans manually running VFIO
selftests, and to incorporate into test automation where VFIO selftests
may run alongside other tests. As such the script aims to be hermetic,
returning the system to the state it was before the test started.
The script takes as input the BDF of a device to use and a command to
run (typically the command would be a VFIO selftest). e.g.
$ ./run.sh -d 0000:6a:01.0 ./vfio_pci_device_test
or
$ ./run.sh -d 0000:6a:01.0 -- ./vfio_pci_device_test
The script then handles unbinding device 0000:6a:01.0 from its current
driver, binding it to vfio-pci, running the test, unbinding from
vfio-pci, and binding back to the original driver.
When run.sh runs the provided test, it does so by appending the BDF as
the last parameter. For example:
$ ./run.sh -d 0000:6a:01.0 -- echo hello
Results in the following being printed to stdout:
hello 0000:6a:01.0
The script also supports a mode where it can break out into a shell so
that multiple tests can be run manually.
$ ./run.sh -d 0000:6a:01.0 -s
$ echo $VFIO_SELFTESTS_BDF
$ ./vfio_pci_device_test
$ exit
Choosing which device to use is up to the user.
In the future this script should be extensible to tests that want to use
multiple devices. The script can support accepting -d BDF multiple times
and parse them into an array, setup all the devices, pass the list of
BDFs to the test, and then cleanup all the devices.
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20250822212518.4156428-31-dmatlack@google.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Now that VFIO selftests support iommufd, make it the default mode.
IOMMUFD is the successor to VFIO_TYPE1{,v2}_IOMMU and all new features
are being added there, so it's a slightly better fit as the default
mode.
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20250822212518.4156428-30-dmatlack@google.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Add a new IOMMU mode for using iommufd directly. In this mode userspace
opens /dev/iommu and binds it to a device FD acquired through
/dev/vfio/devices/vfioX.
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20250822212518.4156428-29-dmatlack@google.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Add new IOMMU modes for using iommufd in compatibility mode with
VFIO_TYPE1_IOMMU and VFIO_TYPE1v2_IOMMU.
In these modes, VFIO selftests will open /dev/iommu and treats it as a
container FD (as if it had opened /dev/vfio/vfio) and the kernel
translates the container ioctls to iommufd calls transparently.
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20250822212518.4156428-28-dmatlack@google.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Add a new IOMMU mode for using VFIO_TYPE1v2_IOMMU.
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20250822212518.4156428-27-dmatlack@google.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Automatically replicate vfio_dma_mapping_test and vfio_pci_driver_test
across all supported IOMMU modes using fixture variants. Both of these
tests exercise DMA mapping to some degree so having automatic coverage
across all IOMMU modes will help catch bugs.
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20250822212518.4156428-26-dmatlack@google.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Encapsulate the "IOMMU mode" a test should use behind a new struct.
In the future this will be used to support other types of IOMMUs besides
VFIO_TYPE1_IOMMU, and allow users to select the mode on the command
line.
No functional change intended.
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20250822212518.4156428-25-dmatlack@google.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Move the helper function to get the VFIO cdev path to libvfio so that it
can be used in libvfio in a subsequent commit.
No functional change intended.
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20250822212518.4156428-24-dmatlack@google.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Add a driver to VFIO selftests for Intel DSA devices.
For now the driver only supports up to 32 batches and 1024 copies per
batch, which were the limits of the hardware this commit was tested
with. This is sufficient to generate 9+ minutes of DMA memcpys at a rate
of over 30 GB/s. This should be plenty to stress test VFIO and the IOMMU.
The driver does not yet support requesting interrupt handles, as this
commit was not tested against hardware that requires it.
Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20250822212518.4156428-23-dmatlack@google.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Add a driver for the Intel CBDMA device. This driver is based on and
named after the Linux driver for this device (drivers/dma/ioat/) and
also based on previous work from Peter Shier <pshier@google.com>.
The driver aims to be as simple as possible. It uses a single descriptor
to issue DMA operations, and only supports the copy operation. For "DMA
storms", the driver kicks off the maximum number of maximum-sized DMA
operations. On Skylake server parts, this was 2^16-1 copies of size 2M
and lasts about 15 seconds.
Create symlinks to drivers/dma/ioat/{hw.h,registers.h} to get access to
various macros (e.g. IOAT_CHANCMD_RESET) and struct ioat_dma_descriptor.
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Acked-by: Dave Jiang <dave.jiang@intel.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20250822212518.4156428-20-dmatlack@google.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Add a new selftest that tests all driver operations. This test serves
both as a demonstration of the driver framework, and also as a
correctness test for future drivers.
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20250822212518.4156428-14-dmatlack@google.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Add a driver framework to VFIO selftests, so that devices can generate
DMA and interrupts in a common way that can be then utilized by tests.
This will enable VFIO selftests to exercise real hardware DMA and
interrupt paths, without needing any device-specific code in the test
itself.
Subsequent commits will introduce drivers for specific devices.
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20250822212518.4156428-13-dmatlack@google.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Add a helper function for matching a device against a given vendor and
device ID. This will be used in a subsequent commit to match devices
against drivers.
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20250822212518.4156428-12-dmatlack@google.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Make it possible to assert that a given MSI eventfd did _not_ fire by
adding a helper to mark an eventfd non-blocking. Demonstrate this in
vfio_pci_device_test by asserting the MSI eventfd did not fire before
vfio_pci_irq_trigger().
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20250822212518.4156428-11-dmatlack@google.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Keep track of the list of DMA regions that are mapped into the device
using a linked list and a new struct vfio_dma_region and use that to add
{__,}to_iova() for converting host virtual addresses into IOVAs.
This will be used in a subsequent commit to map multiple DMA regions
into a device that are then used by drivers.
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20250822212518.4156428-10-dmatlack@google.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Update vfio dma mapping test to verify that the IOMMU uses 2M and 1G
mappings when 2M and 1G HugeTLB pages are mapped into a device
respectively.
This validation is done by inspecting the contents of the I/O page
tables via /sys/kernel/debug/iommu/intel/. This validation is skipped if
that directory is not available (i.e. non-Intel IOMMUs).
Signed-off-by: Josh Hilke <jrhilke@google.com>
[reword commit message, refactor code]
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20250822212518.4156428-9-dmatlack@google.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Add test coverage of mapping 2M and 1G HugeTLB to vfio_dma_mapping_test
using a fixture variant. If there isn't enough HugeTLB memory available
for the test, just skip them.
Signed-off-by: Josh Hilke <jrhilke@google.com>
[switch from command line option to fixture variant]
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20250822212518.4156428-8-dmatlack@google.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Add a test to vfio_pci_device_test which resets the device. If reset is
not supported by the device, the test is skipped.
Signed-off-by: Josh Hilke <jrhilke@google.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20250822212518.4156428-7-dmatlack@google.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Move the dma_map_unmap test from vfio_pci_device_test to a new test:
vfio_dma_mapping_test. We are going to add more complex dma mapping
tests, so it makes sense to separate this from the vfio pci device
test which is more of a sanity check for vfio pci functionality.
Signed-off-by: Josh Hilke <jrhilke@google.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20250822212518.4156428-6-dmatlack@google.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Add a vfio test suite which verifies that userspace can bind and unbind
devices, allocate I/O address space, and attach a device to an IOMMU
domain using the cdev + IOMMUfd VFIO interface.
Signed-off-by: Josh Hilke <jrhilke@google.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20250822212518.4156428-5-dmatlack@google.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Introduce a basic VFIO selftest called vfio_pci_device_test to
demonstrate the functionality of the VFIO selftest library and provide
some test coverage of basic VFIO operations, including:
- Mapping and unmapping DMA
- Mapping and unmapping BARs
- Enabling, triggering, and disabling MSI and MSI-x
- Reading and writing to PCI config space
This test should work with most PCI devices, as long as they are bound
to vfio-pci.
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20250822212518.4156428-4-dmatlack@google.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Add a basic helper library to be used by VFIO selftests.
The basic unit of the library is struct vfio_pci_device, which
represents a single PCI device that is bound to the vfio-pci driver. The
library currently only supports a single device per group and container,
and VFIO IOMMU types.
The code in this library was heavily based on prior work done by
Raghavendra Rao Ananta <rananta@google.com>, and the VFIO_ASSERT*()
macros were written by Vipin Sharma <vipinsh@google.com>.
Separate that Makefile rules for building the library into a separate
script so that the library can be built by and linked into KVM selftests
in a subsequent commit.
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20250822212518.4156428-3-dmatlack@google.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Create the directory tools/testing/selftests/vfio with a stub Makefile
and hook it up to the top-level selftests Makefile.
This directory will be used in subsequent commits to host selftests for
the VFIO subsystem.
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20250822212518.4156428-2-dmatlack@google.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
The predicates in test expect event counting from 73e75e6fc3
("cgroup/pids: Separate semantics of pids.events related to pids.max")
and the test would fail on older kernels. We want to have one version of
tests for all, so detect the feature and skip the test on old kernels.
(The test could even switch to check v1 semantics based on the flag but
keep it simple for now.)
Fixes: 9f34c56602 ("selftests: cgroup: Add basic tests for pids controller")
Signed-off-by: Michal Koutný <mkoutny@suse.com>
Tested-by: Sebastian Chlad <sebastian.chlad@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
The pci_endpoint_test tests the entire MSI/MSI-X range, which generates
false errors on platforms that do not support the whole range.
Skip the test in such cases and report accordingly.
Signed-off-by: Christian Bruel <christian.bruel@foss.st.com>
[mani: reworded description]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://patch.msgid.link/20250804170916.3212221-4-christian.bruel@foss.st.com
This patch add a simple functional test for the "abort" file
in fusectlfs (/sys/fs/fuse/connections/ID/abort).
A simple fuse daemon is added for testing.
Related discussion can be found in the link below.
Link: https://lore.kernel.org/all/CAOQ4uxjKFXOKQxPpxtS6G_nR0tpw95w0GiO68UcWg_OBhmSY=Q@mail.gmail.com/
Signed-off-by: Chen Linxuan <chenlinxuan@uniontech.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Add a guest_memfd testcase to verify that a vCPU can fault-in guest_memfd
memory that supports mmap(), but that is not currently mapped into host
userspace and/or has a userspace address (in the memslot) that points at
something other than the target guest_memfd range. Mapping guest_memfd
memory into the guest is supposed to operate completely independently from
any userspace mappings.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20250729225455.670324-25-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Expand the guest_memfd selftests to comprehensively test host userspace
mmap functionality for guest_memfd-backed memory when supported by the
VM type.
Introduce new test cases to verify the following:
* Successful mmap operations: Ensure that MAP_SHARED mappings succeed
when guest_memfd mmap is enabled.
* Data integrity: Validate that data written to the mmap'd region is
correctly persistent and readable.
* fallocate interaction: Test that fallocate(FALLOC_FL_PUNCH_HOLE)
correctly zeros out mapped pages.
* Out-of-bounds access: Verify that accessing memory beyond the
guest_memfd's size correctly triggers a SIGBUS signal.
* Unsupported mmap: Confirm that mmap attempts fail as expected when
guest_memfd mmap support is not enabled for the specific guest_memfd
instance or VM type.
* Flag validity: Introduce test_vm_type_gmem_flag_validity() to
systematically test that only allowed guest_memfd creation flags are
accepted for different VM types (e.g., GUEST_MEMFD_FLAG_MMAP for
default VMs, no flags for CoCo VMs).
The existing tests for guest_memfd creation (multiple instances, invalid
sizes), file read/write, file size, and invalid punch hole operations
are integrated into the new test_with_type() framework to allow testing
across different VM types.
Cc: James Houghton <jthoughton@google.com>
Cc: Gavin Shan <gshan@redhat.com>
Cc: Shivank Garg <shivankg@amd.com>
Co-developed-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20250729225455.670324-24-seanjc@google.com>
[Fix default vm_types to use BIT() - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Update the guest_memfd_test selftest to use getpagesize() instead of
hardcoded 4KB page size values.
Using hardcoded page sizes can cause test failures on architectures or
systems configured with larger page sizes, such as arm64 with 64KB
pages. By dynamically querying the system's page size, the test becomes
more portable and robust across different environments.
Additionally, build the guest_memfd_test selftest for arm64.
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Shivank Garg <shivankg@amd.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Suggested-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20250729225455.670324-23-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
- Use array_index_nospec() to sanitize the target vCPU ID when handling PV
IPIs and yields as the ID is guest-controlled.
- Drop a superfluous cpumask_empty() check when reclaiming SEV memory, as
the common case, by far, is that at least one CPU will have entered the
VM, and wbnoinvd_on_cpus_mask() will naturally handle the rare case where
the set of have_run_cpus is empty.
- Rename the is_signed_type() macro in kselftest_harness.h to is_signed_var()
to fix a collision with linux/overflow.h. The collision generates compiler
warnings due to the two macros having different implementations.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAminjaMACgkQOlYIJqCj
N/0QDhAAhZgUqW2BGqGzOU/pjzXr0riJvVsNeAP85pcxygCc8qO8Hg1OWQz50YL5
q4sitjZ+Ot39bSzjDMiwkrtuX25OsvlTnZDeN/liIim5rKMiYwvoKqQe5PPNxx5M
NI4dc4B2AMedJ42gP7thBO+sMGf3J07445C69nJ4K9BppHAoZHH40grVeV0oDw+0
XoujnyjI0KjghkbWgJlg51TZg6et14prjNZeiAuSulSTaMaPBfjPadkjlG1bsBFV
lDeZypPvsh/ZLhhAgUFjZCKl7+XCKwKeze5MnpwqFYKhEBL8QqS11WGhyNmPFd1u
spDe7MjMiNMOOyPlWpJktjMJXz908MJKjrn1Rd78iieqVeM1HQyhHAeC26a+A5Xi
gFI9lrnNbZ4mlas/xyiX+Tld2yR4Ns3zF4D+eSM4KwII6MF+kEcF3j++U+PqLRvh
M7r+OKjdvry9cIgHZ/5pa3VshdAfTE6EwNPakdPl+D0hVhPqKHIzi0H8rPmkuNwM
aIKYSCa9SVmU6DS2vh0qWmTgYsc4Nk7W0bBmce7NftI+PDCYl7+GJAiZ1DBt4N9P
+i9dKK19tYJCRButj5GZXnYzRpQ3WuPBzEv9C63GPwNaRuzAxvJP5ErrkxT2xE/5
2WJgd+/J+JvQ14o8HtALc7fZckdWflGlN+pyvGyyQnkNRFpBNN0=
=8zIJ
-----END PGP SIGNATURE-----
Merge tag 'kvm-x86-fixes-6.17-rc7' of https://github.com/kvm-x86/linux into HEAD
* Use array_index_nospec() to sanitize the target vCPU ID when handling PV
IPIs and yields as the ID is guest-controlled.
* Drop a superfluous cpumask_empty() check when reclaiming SEV memory, as
the common case, by far, is that at least one CPU will have entered the
VM, and wbnoinvd_on_cpus_mask() will naturally handle the rare case where
the set of have_run_cpus is empty.
* Rename the is_signed_type() macro in kselftest_harness.h to is_signed_var()
to fix a collision with linux/overflow.h. The collision generates compiler
warnings due to the two macros having different implementations.
- Use array_index_nospec() to sanitize the target vCPU ID when handling PV
IPIs and yields as the ID is guest-controlled.
- Drop a superfluous cpumask_empty() check when reclaiming SEV memory, as
the common case, by far, is that at least one CPU will have entered the
VM, and wbnoinvd_on_cpus_mask() will naturally handle the rare case where
the set of have_run_cpus is empty.
- Rename the is_signed_type() macro in kselftest_harness.h to is_signed_var()
to fix a collision with linux/overflow.h. The collision generates compiler
warnings due to the two macros having different implementations.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAminjaMACgkQOlYIJqCj
N/0QDhAAhZgUqW2BGqGzOU/pjzXr0riJvVsNeAP85pcxygCc8qO8Hg1OWQz50YL5
q4sitjZ+Ot39bSzjDMiwkrtuX25OsvlTnZDeN/liIim5rKMiYwvoKqQe5PPNxx5M
NI4dc4B2AMedJ42gP7thBO+sMGf3J07445C69nJ4K9BppHAoZHH40grVeV0oDw+0
XoujnyjI0KjghkbWgJlg51TZg6et14prjNZeiAuSulSTaMaPBfjPadkjlG1bsBFV
lDeZypPvsh/ZLhhAgUFjZCKl7+XCKwKeze5MnpwqFYKhEBL8QqS11WGhyNmPFd1u
spDe7MjMiNMOOyPlWpJktjMJXz908MJKjrn1Rd78iieqVeM1HQyhHAeC26a+A5Xi
gFI9lrnNbZ4mlas/xyiX+Tld2yR4Ns3zF4D+eSM4KwII6MF+kEcF3j++U+PqLRvh
M7r+OKjdvry9cIgHZ/5pa3VshdAfTE6EwNPakdPl+D0hVhPqKHIzi0H8rPmkuNwM
aIKYSCa9SVmU6DS2vh0qWmTgYsc4Nk7W0bBmce7NftI+PDCYl7+GJAiZ1DBt4N9P
+i9dKK19tYJCRButj5GZXnYzRpQ3WuPBzEv9C63GPwNaRuzAxvJP5ErrkxT2xE/5
2WJgd+/J+JvQ14o8HtALc7fZckdWflGlN+pyvGyyQnkNRFpBNN0=
=8zIJ
-----END PGP SIGNATURE-----
Merge tag 'kvm-x86-fixes-6.17-rc7' of https://github.com/kvm-x86/linux into HEAD
KVM x86 fixes and a selftest fix for 6.17-rcN
- Use array_index_nospec() to sanitize the target vCPU ID when handling PV
IPIs and yields as the ID is guest-controlled.
- Drop a superfluous cpumask_empty() check when reclaiming SEV memory, as
the common case, by far, is that at least one CPU will have entered the
VM, and wbnoinvd_on_cpus_mask() will naturally handle the rare case where
the set of have_run_cpus is empty.
- Rename the is_signed_type() macro in kselftest_harness.h to is_signed_var()
to fix a collision with linux/overflow.h. The collision generates compiler
warnings due to the two macros having different implementations.
I can never remember whether --raw_output takes 'all' or 'full'. No
reason we can't support both.
For the record, 'all' is the recommended, documented option.
Link: https://lore.kernel.org/r/20250730031624.1911689-1-davidgow@google.com
Signed-off-by: David Gow <davidgow@google.com>
Reviewed-by: Rae Moar <rmoar@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Make sure we set HDS threshold to 0 if the device supports changing it.
It's required for ZC.
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250825180447.2252977-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In case changing channel count with provider bound succeeds
unexpectedly - make sure we return to original settings.
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250825180447.2252977-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Using error() makes it impossible for callers to unwind their
changes. Replace error() calls with proper error handling.
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250825180447.2252977-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The test currently modifies the HDS settings and doesn't restore them.
This may cause subsequent tests to fail (or pass when they should not).
Add defer()ed reset handling.
Link: https://patch.msgid.link/20250825175939.2249165-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The may_goto instruction is now fully supported on s390x, including the
timed implementation, so remove the respective test from the denylist.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20250821113339.292434-6-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Now that the timed may_goto implementation is available on s390x,
enable the respective verifier tests.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20250821113339.292434-5-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Make it possible to limit certain tests to s390x, just like it's
already done for x86_64, arm64, and riscv64.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20250821113339.292434-4-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Fix error messages like this one:
parse_test_spec:FAIL:569 bad arch spec: 's390x'process_subtest:FAIL:1153 Can't parse test spec for program 'may_goto_simple'
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20250821113339.292434-3-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
`config.{arch}` had entries already present in `config`.
When generating the config used by vmtest, concatenate the `config` file
with the `config.{arch}` one, making those entries duplicated, so remove
those duplications.
Use the following command to get the differences:
$ comm -1 -2 <(sort tools/testing/selftests/bpf/config.x86_64) <(sort tools/testing/selftests/bpf/config)
$ comm -1 -2 <(sort tools/testing/selftests/bpf/config.aarch64) <(sort tools/testing/selftests/bpf/config)
$ comm -1 -2 <(sort tools/testing/selftests/bpf/config.riscv64) <(sort tools/testing/selftests/bpf/config)
$ comm -1 -2 <(sort tools/testing/selftests/bpf/config.ppc64el) <(sort tools/testing/selftests/bpf/config)
$ comm -1 -2 <(sort tools/testing/selftests/bpf/config.s390x) <(sort tools/testing/selftests/bpf/config)
This is similar with commit 7a42af4b94 ("selftests/bpf: Remove entries
from config.s390x already present in config").
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20250826065057.11415-1-yangtiezhu@loongson.cn
Add a test for ensuring that the dst hint mechanism is used for
directed broadcast addresses.
This test relies on mausezahn for sending directed broadcast packets.
Additionally, a high GRO flush timeout is set to ensure that packets
will be received as lists.
The test determines if the hint mechanism was used by checking
the in_brd statistic using lnstat.
Signed-off-by: Oscar Maes <oscmaes92@gmail.com>
Link: https://patch.msgid.link/20250819174642.5148-3-oscmaes92@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Some rtnetlink selftests assume the presence of ifconfig and iproute2
support for the `proto` keyword in `ip address` commands. These
assumptions can cause test failures on modern systems (e.g. Debian
Bookworm) where:
- ifconfig is not installed by default
- The iproute2 version lacks support for address protocol
This patch improves test robustness by:
- Skipping kci_test_promote_secondaries if ifconfig is missing
- Skipping do_test_address_proto if ip address help does not mention
proto
These changes ensure the tests degrade gracefully by reporting SKIP
instead of FAIL when prerequisites are not met, improving portability
across systems.
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Alessandro Ratti <alessandro@0x65c.net>
Link: https://patch.msgid.link/20250822140633.891360-2-alessandro@0x65c.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kernel tries to be helpful and attach the XDP program in generic
mode if the driver has no BPF ndo at all. Since the xdp.py tests
all have "native" in their names this can be quite confusing.
Force native / "drv" attachment. Note that netdevsim re-uses
the generic handler as its "native" handler, so we'll maintain
the test coverage of the generic mode that way. No need to test
both explicitly, I reckon.
Link: https://patch.msgid.link/20250822195645.1673390-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Test cgroup v2 freezer time stat. Freezer time accounting should
be independent of other cgroups in the hierarchy and should increase
iff a cgroup is CGRP_FREEZE (regardless of whether it reaches
CGRP_FROZEN).
Skip these tests on systems without freeze time accounting.
Signed-off-by: Tiffany Yang <ynaffit@google.com>
Cc: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
This patch adds tests for the new jeq and jne logic in
is_scalar_branch_taken. The following shows the first test failing
before the previous patch is applied. Once the previous patch is
applied, the verifier can use the tnum values to deduce that instruction
7 is dead code.
0: call bpf_get_prandom_u32#7 ; R0_w=scalar()
1: w0 = w0 ; R0_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
2: r0 >>= 30 ; R0_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=3,var_off=(0x0; 0x3))
3: r0 <<= 30 ; R0_w=scalar(smin=0,smax=umax=umax32=0xc0000000,smax32=0x40000000,var_off=(0x0; 0xc0000000))
4: r1 = r0 ; R0_w=scalar(id=1,smin=0,smax=umax=umax32=0xc0000000,smax32=0x40000000,var_off=(0x0; 0xc0000000)) R1_w=scalar(id=1,smin=0,smax=umax=umax32=0xc0000000,smax32=0x40000000,var_off=(0x0; 0xc0000000))
5: r1 += 1024 ; R1_w=scalar(smin=umin=umin32=1024,smax=umax=umax32=0xc0000400,smin32=0x80000400,smax32=0x40000400,var_off=(0x400; 0xc0000000))
6: if r1 != r0 goto pc+1 ; R0_w=scalar(id=1,smin=umin=umin32=1024,smax=umax=umax32=0xc0000000,smin32=0x80000400,smax32=0x40000000,var_off=(0x400; 0xc0000000)) R1_w=scalar(smin=umin=umin32=1024,smax=umax=umax32=0xc0000000,smin32=0x80000400,smax32=0x40000400,var_off=(0x400; 0xc0000000))
7: r10 = 0
frame pointer is read only
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/550004f935e2553bdb2fb1f09cbde7d0452112d0.1755694148.git.paul.chaignon@gmail.com
Test single-buffer XDP_TX for packets with various payload sizes.
Update the socat TX command to generate packets with 0 length payloads.
Signed-off-by: Dimitri Daskalakis <dimitri.daskalakis1@gmail.com>
Link: https://patch.msgid.link/20250821014023.1481662-3-dimitri.daskalakis1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In preparation of single-buffer XDP_TX tests, refactor common test code
into the _test_xdp_native_tx method. Add support for multiple payload
sizes, and additional validation for RX packet count. Pass the -n flag
to echo to avoid adding an extra byte into the TX packet.
Signed-off-by: Dimitri Daskalakis <dimitri.daskalakis1@gmail.com>
Link: https://patch.msgid.link/20250821014023.1481662-2-dimitri.daskalakis1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmiobRkQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpnvDEAC6ybsqvNAOSV1Tdk1EQZ/mIrmIb7tVrp/P
zRReWTK9jF7kzOLn2Mqgu0c4RFLCMABXmPb5F2aLx72uSxMSFq2sI9QZCgGzZQeZ
yjOIxFBAPsdgr+gyIOdS3zH04+IKfJw20ojJb83irCgd5M1hpmVwzZ3iGMq8Gs9q
VJQYvKny7tjjpuLpk3DWl7t1J0YV+0sGQhk3iZdWEHrui7mqmfh6DkeB5forTu6z
Gn5e4DNbZvmcvkJQ+Rnkua1UmTZ4hr/+3YV9mqzsWYv+1hOTx/uomGbY7DjSdSyK
vWWNwN97sgAjwhaFgWvB2iRk1pdAb4A3zP+NV1MXheOhHnAT3C6i43DaS1fivone
YKLEqy4v3IzB5WcdlwclJW2qizoLtopu7A4pRURv9v+Q0wb4Q2YM0gRum59QgxZN
+YUhglR5ucazYPmIAxOZMaU/WMIN6m4h3hRa1RkFRNXkBvPGxV2fQxi8exX0QWqf
oxSSfImO0QVjYPlAL7oi0eWwHtqXtebXXdrUNozQdnrEQnimTrxPAuSnfRIv63un
swlaCzfqXXhtl25t9p6Sx7xM7aKF2k7tYnZdSM7JjiOS7KXHFaZcYt3YcoFfdLc7
X/vtT9OQWwnEtqzFKnK8EvcjSN+4KbXwI4neVLmsWK81dwqI2huScB+Xe5eBPidU
6d6dZzUikA==
=mbqK
-----END PGP SIGNATURE-----
Merge tag 'block-6.17-20250822' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
"A set of fixes for block that should go into this tree. A bit larger
than what I usually have at this point in time, a lot of that is the
continued fixing of the lockdep annotation for queue freezing that we
recently added, which has highlighted a number of little issues here
and there. This contains:
- MD pull request via Yu:
- Add a legacy_async_del_gendisk mode, to prevent a user tools
regression. New user tools releases will not use such a mode,
the old release with a new kernel now will have warning about
deprecated behavior, and we prepare to remove this legacy mode
after about a year later
- The rename in kernel causing user tools build failure, revert
the rename in mdp_superblock_s
- Fix a regression that interrupted resync can be shown as
recover from mdstat or sysfs
- Improve file size detection for loop, particularly for networked
file systems, by using getattr to get the size rather than the
cached inode size.
- Hotplug CPU lock vs queue freeze fix
- Lockdep fix while updating the number of hardware queues
- Fix stacking for PI devices
- Silence bio_check_eod() for the known case of device removal where
the size is truncated to 0 sectors"
* tag 'block-6.17-20250822' of git://git.kernel.dk/linux:
block: avoid cpu_hotplug_lock depedency on freeze_lock
block: decrement block_rq_qos static key in rq_qos_del()
block: skip q->rq_qos check in rq_qos_done_bio()
blk-mq: fix lockdep warning in __blk_mq_update_nr_hw_queues
block: tone down bio_check_eod
loop: use vfs_getattr_nosec for accurate file size
loop: Consolidate size calculation logic into lo_calculate_size()
block: remove newlines from the warnings in blk_validate_integrity_limits
block: handle pi_tuple_size in queue_limits_stack_integrity
selftests: ublk: Use ARRAY_SIZE() macro to improve code
md: fix sync_action incorrect display during resync
md: add helper rdev_needs_recovery()
md: keep recovery_cp in mdp_superblock_s
md: add legacy_async_del_gendisk mode
or aren't considered necessary for -stable kernels. 17 of these fixes are
for MM.
As usual, singletons all over the place, apart from a three-patch series
of KHO followup work from Pasha which is actually also a bunch of
singletons.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaKfFVwAKCRDdBJ7gKXxA
jvZGAQCCRTRgwnYsH0op9Rlxs72zokENbErSzXweWLez31pNpAD/S7bVSjjk1mXr
BQ24ZadKUUomWkghwCusb9VomMeneg0=
=+uBT
-----END PGP SIGNATURE-----
Merge tag 'mm-hotfixes-stable-2025-08-21-18-17' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"20 hotfixes. 10 are cc:stable and the remainder address post-6.16
issues or aren't considered necessary for -stable kernels. 17 of these
fixes are for MM.
As usual, singletons all over the place, apart from a three-patch
series of KHO followup work from Pasha which is actually also a bunch
of singletons"
* tag 'mm-hotfixes-stable-2025-08-21-18-17' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm/mremap: fix WARN with uffd that has remap events disabled
mm/damon/sysfs-schemes: put damos dests dir after removing its files
mm/migrate: fix NULL movable_ops if CONFIG_ZSMALLOC=m
mm/damon/core: fix damos_commit_filter not changing allow
mm/memory-failure: fix infinite UCE for VM_PFNMAP pfn
MAINTAINERS: mark MGLRU as maintained
mm: rust: add page.rs to MEMORY MANAGEMENT - RUST
iov_iter: iterate_folioq: fix handling of offset >= folio size
selftests/damon: fix selftests by installing drgn related script
.mailmap: add entry for Easwar Hariharan
selftests/mm: add test for invalid multi VMA operations
mm/mremap: catch invalid multi VMA moves earlier
mm/mremap: allow multi-VMA move when filesystem uses thp_get_unmapped_area
mm/damon/core: fix commit_ops_filters by using correct nth function
tools/testing: add linux/args.h header and fix radix, VMA tests
mm/debug_vm_pgtable: clear page table entries at destroy_args()
squashfs: fix memory leak in squashfs_fill_super
kho: warn if KHO is disabled due to an error
kho: mm: don't allow deferred struct page with KHO
kho: init new_physxa->phys_bits to fix lockdep
Remove 0/NULL global variable assignment in mixer-test.c and pcm-test.c
Signed-off-by: Nikola Z. Ivanov <zlatistiv@gmail.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://patch.msgid.link/20250821200132.1218850-1-zlatistiv@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The ARM64_FEATURE_MASK() macro was a hack introduce whilst the
automatic generation of sysreg encoding was introduced, and was
too unreliable to be entirely trusted.
We are in a better place now, and we could really do without this
macro. Get rid of it altogether.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250817202158.395078-7-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQQ6NaUOruQGUkvPdG4raS+Z+3y5EwUCaKdsDgAKCRAraS+Z+3y5
E85CAP0aLdMc7glIhzXITfY8If4HntXCKyDjEABDKlCKMea3MAD/d/lIHXkfL7vE
ZCZMhFOpDVdQ1ZojpfEs7wLipfbl/Ao=
=iUY0
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Martin KaFai Lau says:
====================
pull-request: bpf-next 2025-08-21
We've added 9 non-merge commits during the last 3 day(s) which contain
a total of 13 files changed, 1027 insertions(+), 27 deletions(-).
The main changes are:
1) Added bpf dynptr support for accessing the metadata of a skb,
from Jakub Sitnicki.
The patches are merged from a stable branch bpf-next/skb-meta-dynptr.
The same patches have also been merged into bpf-next/master.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
selftests/bpf: Cover metadata access from a modified skb clone
selftests/bpf: Cover read/write to skb metadata at an offset
selftests/bpf: Cover write access to skb metadata via dynptr
selftests/bpf: Cover read access to skb metadata via dynptr
selftests/bpf: Parametrize test_xdp_context_tuntap
selftests/bpf: Pass just bpf_map to xdp_context_test helper
selftests/bpf: Cover verifier checks for skb_meta dynptr type
bpf: Enable read/write access to skb metadata through a dynptr
bpf: Add dynptr type for skb metadata
====================
Link: https://patch.msgid.link/20250821191827.2099022-1-martin.lau@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- Fix a subtle bug during SCX enabling where a dead task skips init but
doesn't skip sched class switch leading to invalid task state transition
warning.
- Cosmetic fix in selftests.
-----BEGIN PGP SIGNATURE-----
iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCaKdWkg4cdGpAa2VybmVs
Lm9yZwAKCRCxYfJx3gVYGWI2AP9e+OTPPHa+sHeM7g3ngigF44nyvvRIPIMJHmZO
7CYT9AD/e+YI+atHzo5iSBcpGwjW8BSLc0ozdrkI0N7XFLXC4go=
=7Ti1
-----END PGP SIGNATURE-----
Merge tag 'sched_ext-for-6.17-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext fixes from Tejun Heo:
- Fix a subtle bug during SCX enabling where a dead task skips init
but doesn't skip sched class switch leading to invalid task state
transition warning
- Cosmetic fix in selftests
* tag 'sched_ext-for-6.17-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
selftests/sched_ext: Remove duplicate sched.h header
sched/ext: Fix invalid task state transitions on class switch
Some of the bpf test progs still use linux/libc headers.
Let's use vmlinux.h instead like the rest of test progs.
This will also ease cross compiling.
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250821030254.398826-1-hengqi.chen@gmail.com
Adding uprobe checks into the current uretprobe tests.
All the related tests are now executed with attached uprobe
or uretprobe or without any probe.
Renaming the test fixture to uprobe, because it seems better.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <kees@kernel.org>
Link: https://lore.kernel.org/r/20250720112133.244369-22-jolsa@kernel.org
Now that we have uprobe syscall working properly with shadow stack,
we can remove testing limitations for shadow stack tests and make
sure uprobe gets properly optimized.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20250821141557.13233-1-jolsa@kernel.org
Changing the test_uretprobe_regs_change test to test both uprobe
and uretprobe by adding entry consumer handler to the testmod
and making it to change one of the registers.
Making sure that changed values both uprobe and uretprobe handlers
propagate to the user space.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20250720112133.244369-20-jolsa@kernel.org
Changing uretprobe_regs_trigger to allow the test for both
uprobe and uretprobe and renaming it to uprobe_regs_equal.
We check that both uprobe and uretprobe probes (bpf programs)
see expected registers with few exceptions.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20250720112133.244369-19-jolsa@kernel.org
Adding optimized usdt variant for basic usdt test to check that
usdt arguments are properly passed in optimized code path.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20250720112133.244369-18-jolsa@kernel.org
Make sure that calling uprobe syscall from outside uprobe trampoline
results in sigill signal.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20250720112133.244369-17-jolsa@kernel.org
Adding test that makes sure parallel execution of the uprobe and
attach/detach of optimized uprobe on it works properly.
By default the test runs for 500ms, which is adjustable by using
BPF_SELFTESTS_UPROBE_SYSCALL_RACE_MSEC env variable.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20250720112133.244369-16-jolsa@kernel.org
Adding tests for optimized uprobe/usdt probes.
Checking that we get expected trampoline and attached bpf programs
get executed properly.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20250720112133.244369-15-jolsa@kernel.org
Renaming uprobe_syscall_executed prog to test_uretprobe_multi
to fit properly in the following changes that add more programs.
Plus adding pid filter and increasing executed variable.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20250720112133.244369-14-jolsa@kernel.org
Adding __test_uprobe_syscall with non x86_64 stub to execute all the tests,
so we don't need to keep adding non x86_64 stub functions for new tests.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20250720112133.244369-13-jolsa@kernel.org
Current release - fix to a fix:
- usb: asix_devices: fix PHY address mask in MDIO bus initialization
Current release - regressions:
- Bluetooth: fixes for the split between BIS_LINK and PA_LINK
- Revert "net: cadence: macb: sama7g5_emac: Remove USARIO CLKEN flag",
breaks compatibility with some existing device tree blobs
- dsa: b53: fix reserved register access in b53_fdb_dump()
Current release - new code bugs:
- sched: dualpi2: run probability update timer in BH to avoid deadlock
- eth: libwx: fix the size in RSS hash key population
- pse-pd: pd692x0: improve power budget error paths and handling
Previous releases - regressions:
- tls: fix handling of zero-length records on the rx_list
- hsr: reject HSR frame if skb can't hold tag
- bonding: fix negotiation flapping in 802.3ad passive mode
Previous releases - always broken:
- gso: forbid IPv6 TSO with extensions on devices with only IPV6_CSUM
- sched: make cake_enqueue return NET_XMIT_CN when past buffer_limit,
avoid packet drops with low buffer_limit, remove unnecessary WARN()
- sched: fix backlog accounting after modifying config of a qdisc
in the middle of the hierarchy
- mptcp: improve handling of skb extension allocation failures
- eth: mlx5:
- fixes for the "HW Steering" flow management method
- fixes for QoS and device buffer management
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAminVEUACgkQMUZtbf5S
Irtt/g//YoT8G6Wzv3unTf+qYl/pia7936VqW/W51A7UyCZp2bfDi8/r8eIl+bCf
8KbUH2nSTUPEsngRrzoXhbVAXs6AFe3WKA89efCeal+GFPPA43A5+BZKaXBRE14M
DgSCo1mYjrVIMENv33LttjQOYjLIBu4finciqIkLj/rOeyRZUURlModQW+aA800K
QhXKt0BYniL7tg4ChW3/TQMNpRVLnfae6cu/CKjv/rmzdioVZRChC/cGPI0xctpq
kAIJa49i6dNGtNMWFtS+YaDjTFWS05R9y5bSiylLUnqply9+sWrF1VZIGHXMOcv8
uzxLIlZhvSQycETMinPf0w02H3PuuqpKH8X+1c77Z7/wHmCn5ht2QVv1BwgFu4hv
MZm+zRYKZDdkdYCqujD/XmqF/D7t49LIql8WctxwgWus2yTfg089+9H2VHEj2kyl
S6FBI6f3ZwwLGh6MCAcqswbGc/6QSVFg2ndW6QFxl85Py3fTefV0Y3OknSkXqom/
2azpWNX6zoKfB6QSW4ZrfPFvuDAV8Ai5OJCTpw/3gkbquUMgx+mN2wf2/t8hPbci
Qkh5Y9/44gQFY9KaLx8N7Xr1FBTQ4c21QnG67gvflw3mNMdr3ot0isGTCtjty/eC
abQ+KvQwxy8rVnlunYC8REAAFxQi3AtCiFBE/vioVApNQtsqm68=
=waf0
-----END PGP SIGNATURE-----
Merge tag 'net-6.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from Bluetooth.
Current release - fix to a fix:
- usb: asix_devices: fix PHY address mask in MDIO bus initialization
Current release - regressions:
- Bluetooth: fixes for the split between BIS_LINK and PA_LINK
- Revert "net: cadence: macb: sama7g5_emac: Remove USARIO CLKEN
flag", breaks compatibility with some existing device tree blobs
- dsa: b53: fix reserved register access in b53_fdb_dump()
Current release - new code bugs:
- sched: dualpi2: run probability update timer in BH to avoid
deadlock
- eth: libwx: fix the size in RSS hash key population
- pse-pd: pd692x0: improve power budget error paths and handling
Previous releases - regressions:
- tls: fix handling of zero-length records on the rx_list
- hsr: reject HSR frame if skb can't hold tag
- bonding: fix negotiation flapping in 802.3ad passive mode
Previous releases - always broken:
- gso: forbid IPv6 TSO with extensions on devices with only IPV6_CSUM
- sched: make cake_enqueue return NET_XMIT_CN when past buffer_limit,
avoid packet drops with low buffer_limit, remove unnecessary WARN()
- sched: fix backlog accounting after modifying config of a qdisc in
the middle of the hierarchy
- mptcp: improve handling of skb extension allocation failures
- eth: mlx5:
- fixes for the "HW Steering" flow management method
- fixes for QoS and device buffer management"
* tag 'net-6.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (81 commits)
netfilter: nf_reject: don't leak dst refcount for loopback packets
net/mlx5e: Preserve shared buffer capacity during headroom updates
net/mlx5e: Query FW for buffer ownership
net/mlx5: Restore missing scheduling node cleanup on vport enable failure
net/mlx5: Fix QoS reference leak in vport enable error path
net/mlx5: Destroy vport QoS element when no configuration remains
net/mlx5e: Preserve tc-bw during parent changes
net/mlx5: Remove default QoS group and attach vports directly to root TSAR
net/mlx5: Base ECVF devlink port attrs from 0
net: pse-pd: pd692x0: Skip power budget configuration when undefined
net: pse-pd: pd692x0: Fix power budget leak in manager setup error path
Octeontx2-af: Skip overlap check for SPI field
selftests: tls: add tests for zero-length records
tls: fix handling of zero-length records on the rx_list
net: airoha: ppe: Do not invalid PPE entries in case of SW hash collision
selftests: bonding: add test for passive LACP mode
bonding: send LACPDUs periodically in passive mode after receiving partner's LACPDU
bonding: update LACP activity flag after setting lacp_active
Revert "net: cadence: macb: sama7g5_emac: Remove USARIO CLKEN flag"
ipv6: sr: Fix MAC comparison to be constant-time
...
Test various combinations of zero-length records.
Unfortunately, kernel cannot be coerced into producing those,
so hardcode the ciphertext messages in the test.
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20250820021952.143068-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add a selftest to verify bonding behavior when `lacp_active` is set to `off`.
The test checks the following:
- The passive LACP bond should not send LACPDUs before receiving a partner's
LACPDU.
- The transmitted LACPDUs must not include the active flag.
- After transitioning to EXPIRED and DEFAULTED states, the passive side should
still not initiate LACPDUs.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20250815062000.22220-4-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Currently, we cannot write IPv4 or IPv6 specific packetdrill tests
as ksft_runner.sh runs each .pkt file for both protocols.
Let's support single protocol test by checking --ip_version in the
.pkt file.
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250819231527.1427361-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Even with slowwait used to avoid system sleep in the preferred_lft test,
failures can still occur after long runtimes.
Print the device address info when the test fails to provide better
troubleshooting data.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20250819074749.388064-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The test sometimes fails due to an unexpected number of loaded programs. e.g
FAIL: 2 BPF programs loaded, expected 1
File "/usr/libexec/kselftests/net/./bpf_offload.py", line 940, in <module>
progs = bpftool_prog_list(expected=1)
File "/usr/libexec/kselftests/net/./bpf_offload.py", line 187, in bpftool_prog_list
fail(True, "%d BPF programs loaded, expected %d" %
File "/usr/libexec/kselftests/net/./bpf_offload.py", line 89, in fail
tb = "".join(traceback.extract_stack().format())
However, the logs do not show which programs were actually loaded, making it
difficult to debug the failure.
Add printing of the loaded programs when a mismatch is detected to help
troubleshoot such errors. The list is printed on a new line to avoid breaking
the current log format.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20250819073348.387972-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
socket.c: In function ‘run_tests’:
socket.c:59:25: warning: ignoring return value of ‘strerror_r’ \
declared with attribute ‘warn_unused_result’ [-Wunused-result]
59 | strerror_r(-s->expect, err_string1, ERR_STRING_SZ);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
socket.c:60:25: warning: ignoring return value of ‘strerror_r’ \
declared with attribute ‘warn_unused_result’ [-Wunused-result]
60 | strerror_r(errno, err_string2, ERR_STRING_SZ);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
socket.c:73:33: warning: ignoring return value of ‘strerror_r’ \
declared with attribute ‘warn_unused_result’ [-Wunused-result]
73 | strerror_r(errno, err_string1, ERR_STRING_SZ);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
changelog:
v2
- const char* messages and fixed patch warnings of max 75 chars
per line
Signed-off-by: Alex Tran <alex.t.tran@gmail.com>
Link: https://patch.msgid.link/20250819025227.239885-1-alex.t.tran@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rename is_signed_type() to is_signed_var() to avoid colliding with a macro
of the same name defined by tools' linux/overflow.h. This fixes warnings
(and presumably potential test failures) in tests that utilize the
selftests harness and happen to (indirectly) include overflow.h.
In file included from tools/include/linux/bits.h:34,
from tools/include/linux/bitops.h:14,
from tools/include/linux/hashtable.h:13,
from include/kvm_util.h:11,
from x86/userspace_msr_exit_test.c:11:
tools/include/linux/overflow.h:31:9: error: "is_signed_type" redefined [-Werror]
31 | #define is_signed_type(type) (((type)(-1)) < (type)1)
| ^~~~~~~~~~~~~~
In file included from include/kvm_test_harness.h:11,
from x86/userspace_msr_exit_test.c:9:
../kselftest_harness.h:754:9: note: this is the location of the previous definition
754 | #define is_signed_type(var) (!!(((__typeof__(var))(-1)) < (__typeof__(var))1))
| ^~~~~~~~~~~~~~
Use a separate definition, at least for now, as many selftests build
without tools/include in their include path.
Fixes: fc92099902 ("tools headers: Synchronize linux/bits.h with the kernel sources")
Cc: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20250624231930.583689-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
xfrm_policy.sh, nft_flowtable.sh, and vrf-xfrm-tests.sh use 'ip xfrm'
with SHA-1, either 'auth sha1' or 'auth-trunc hmac(sha1)'. That
requires CONFIG_CRYPTO_SHA1, which CONFIG_INET_ESP intentionally doesn't
select (as per its help text). Previously, the config for these tests
relied on CONFIG_CRYPTO_SHA1 being selected by the unrelated option
CONFIG_IP_SCTP. Since CONFIG_IP_SCTP is being changed to no longer do
that, instead add CONFIG_CRYPTO_SHA1 to the configs explicitly.
Reported-by: Paolo Abeni <pabeni@redhat.com>
Closes: https://lore.kernel.org/r/766e4508-aaba-4cdc-92b4-e116e52ae13b@redhat.com
Suggested-by: Florian Westphal <fw@strlen.de>
Acked-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Link: https://patch.msgid.link/20250818205426.30222-2-ebiggers@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
ncdevmem tests that the kernel correctly rejects attempts
to deactivate queues with MPs bound.
Make the configure_channels() test support combined channels.
Currently it tries to set the queue counts to rx N tx N-1,
which only makes sense for devices which have IRQs per ring
type. Most modern devices used combined IRQs/channels with
both Rx and Tx queues. Since the math is total Rx == combined+Rx
setting Rx when combined is non-zero will be increasing the total
queue count, not decreasing as the test intends.
Note that the test would previously also try to set the Tx
ring count to Rx - 1, for some reason. Which would be 0
if the device has only 2 queues configured.
With this change (device with 2 queues):
setting channel count rx:1 tx:1
YNL set channels: Kernel error: 'requested channel counts are too low for existing memory provider setting (2)'
Reviewed-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20250815231513.381652-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We see quite a few flakes during the TSO test against virtualized
devices in NIPA. There's often 10-30 retransmissions during the
test. Sometimes as many as 100. Set the retransmission threshold
at 1/4th of the wire frame target.
Link: https://patch.msgid.link/20250815224100.363438-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
drgn_dump_damon_status is not installed during kselftest setup. It can
break other tests which depend on drgn_dump_damon_status. Install
drgn_dump_damon_status files to fix broken test.
Link: https://lkml.kernel.org/r/20250812140046.660486-1-ekffu200098@gmail.com
Fixes: f3e8e1e513 ("selftests/damon: add drgn script for extracting damon status")
Signed-off-by: Sang-Heon Jeon <ekffu200098@gmail.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: Honggyu Kim <honggyu.kim@sk.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
We can use UFFD to easily assert invalid multi VMA moves, so do so,
asserting expected behaviour when VMAs invalid for a multi VMA operation
are encountered.
We assert both that such operations are not permitted, and that we do not
even attempt to move the first VMA under these circumstances.
We also assert that we can still move a single VMA regardless.
We then assert that a partial failure can occur if the invalid VMA appears
later in the range of multiple VMAs, both at the very next VMA, and also at
the end of the range.
As part of this change, we are using the is_range_valid() helper more
aggressively. Therefore, fix a bug where stale buffered data would hang
around on success, causing subsequent calls to is_range_valid() to
potentially give invalid results.
We simply have to fflush() the stream on success to resolve this issue.
Link: https://lkml.kernel.org/r/c4fb86dd5ba37610583ad5fc0e0c2306ddf318b9.1754218667.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Commit 857d18f23a ("cleanup: Introduce ACQUIRE() and ACQUIRE_ERR() for
conditional locks") accidentally broke the radix tree, VMA userland tests
by including linux/args.h which is not present in the tools/include
directory.
This patch copies this over and adds an #ifdef block to avoid duplicate
__CONCAT declaration in conflict with system headers when we ultimately
include this.
Link: https://lkml.kernel.org/r/20250811052654.33286-1-lorenzo.stoakes@oracle.com
Fixes: 857d18f23a ("cleanup: Introduce ACQUIRE() and ACQUIRE_ERR() for conditional locks")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Jann Horn <jannh@google.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaKRttQAKCRCRxhvAZXjc
onwmAP98oaMku7CttHEVwJj8KD7luXvZWbvB23TPGmF6BNWg9wEAraks5EzZZJy3
+4xWn10b6R+gXUqvwqr+bf0ufk3c+gc=
=Nbg0
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.17-rc3.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
- Fix two memory leaks in pidfs
- Prevent changing the idmapping of an already idmapped mount without
OPEN_TREE_CLONE through open_tree_attr()
- Don't fail listing extended attributes in kernfs when no extended
attributes are set
- Fix the return value in coredump_parse()
- Fix the error handling for unbuffered writes in netfs
- Fix broken data integrity guarantees for O_SYNC writes via iomap
- Fix UAF in __mark_inode_dirty()
- Keep inode->i_blkbits constant in fuse
- Fix coredump selftests
- Fix get_unused_fd_flags() usage in do_handle_open()
- Rename EXPORT_SYMBOL_GPL_FOR_MODULES to EXPORT_SYMBOL_FOR_MODULES
- Fix use-after-free in bh_read()
- Fix incorrect lflags value in the move_mount() syscall
* tag 'vfs-6.17-rc3.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
signal: Fix memory leak for PIDFD_SELF* sentinels
kernfs: don't fail listing extended attributes
coredump: Fix return value in coredump_parse()
fs/buffer: fix use-after-free when call bh_read() helper
pidfs: Fix memory leak in pidfd_info()
netfs: Fix unbuffered write error handling
fhandle: do_handle_open() should get FD with user flags
module: Rename EXPORT_SYMBOL_GPL_FOR_MODULES to EXPORT_SYMBOL_FOR_MODULES
fs: fix incorrect lflags value in the move_mount syscall
selftests/coredump: Remove the read() that fails the test
fuse: keep inode->i_blkbits constant
iomap: Fix broken data integrity guarantees for O_SYNC writes
selftests/mount_setattr: add smoke tests for open_tree_attr(2) bug
open_tree_attr: do not allow id-mapping changes without OPEN_TREE_CLONE
fs: writeback: fix use-after-free in __mark_inode_dirty()
Move the x86 specific helpers for getting kvm_{amd,intel} module params to
x86 where they belong. Expose the module-agnostic helpers globally, there
is nothing secret about the logic.
Link: https://lore.kernel.org/r/20250806225159.1687326-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Check that the return value of KVM_GET_VCPU_MMAP_SIZE is non-negative
before comparing with sizeof(kvm_run). If KVM_GET_VCPU_MMAP_SIZE fails,
it will return -1, and `-1 > sizeof(kvm_run)` is true, so the ASSERT
passes.
There are no other locations in tools/testing/selftests/kvm that make
the same mistake.
Signed-off-by: James Houghton <jthoughton@google.com>
Link: https://lore.kernel.org/r/20250711001742.1965347-1-jthoughton@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Test that threaded state (in the persistent NAPI config) gets updated
even when NAPI with given ID is not allocated at the time.
This test is validating commit ccba9f6baa ("net: update NAPI threaded
config even for disabled NAPIs").
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20250815013314.2237512-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add "transaction_report" to the binderfs feature list, to help userspace
determine if the "BINDER_CMD_REPORT" generic netlink api is supported by
the binder driver.
Signed-off-by: Li Li <dualli@google.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20250727182932.2499194-5-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
GCC was complaining about the new label:
mptcp_inq.c:79:2: warning: label followed by a declaration is a C23 extension [-Wc23-extensions]
79 | int err = getaddrinfo(node, service, hints, res);
| ^
mptcp_sockopt.c:166:2: warning: label followed by a declaration is a C23 extension [-Wc23-extensions]
166 | int err = getaddrinfo(node, service, hints, res);
| ^
Simply declare 'err' before the label to avoid this warning.
Fixes: dd367e81b7 ("selftests: mptcp: sockopt: use IPPROTO_MPTCP for getaddrinfo")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250815-net-mptcp-misc-fixes-6-17-rc2-v1-8-521fe9957892@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
GCC was complaining about the new label:
mptcp_connect.c:187:2: warning: label followed by a declaration is a C23 extension [-Wc23-extensions]
187 | int err = getaddrinfo(node, service, hints, res);
| ^
Simply declare 'err' before the label to avoid this warning.
Fixes: a862771d1a ("selftests: mptcp: use IPPROTO_MPTCP for getaddrinfo")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250815-net-mptcp-misc-fixes-6-17-rc2-v1-7-521fe9957892@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
To prevent test instability in the "delete re-add signal" test caused by
ADD_ADDR retransmissions, disable retransmissions for this test by setting
net.mptcp.add_addr_timeout to 0.
Suggested-by: Matthieu Baerts <matttbe@kernel.org>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250815-net-mptcp-misc-fixes-6-17-rc2-v1-6-521fe9957892@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This modification is linked to the parent commit where the received
ADD_ADDR limit was accidentally reset when the endpoints were flushed.
To validate that, the test is now flushing endpoints after having set
new limits, and before checking them.
The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.
Fixes: 01cacb00b3 ("mptcp: add netlink-based PM")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250815-net-mptcp-misc-fixes-6-17-rc2-v1-3-521fe9957892@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Demonstrate that, when processing an skb clone, the metadata gets truncated
if the program contains a direct write to either the payload or the
metadata, due to an implicit unclone in the prologue, and otherwise the
dynptr to the metadata is limited to being read-only.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250814-skb-metadata-thru-dynptr-v7-9-8a39e636e0fb@cloudflare.com
Exercise r/w access to skb metadata through an offset-adjusted dynptr,
read/write helper with an offset argument, and a slice starting at an
offset.
Also check for the expected errors when the offset is out of bounds.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Jesse Brandeburg <jbrandeburg@cloudflare.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://patch.msgid.link/20250814-skb-metadata-thru-dynptr-v7-8-8a39e636e0fb@cloudflare.com
Add tests what exercise writes to skb metadata in two ways:
1. indirectly, using bpf_dynptr_write helper,
2. directly, using a read-write dynptr slice.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Jesse Brandeburg <jbrandeburg@cloudflare.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://patch.msgid.link/20250814-skb-metadata-thru-dynptr-v7-7-8a39e636e0fb@cloudflare.com
Exercise reading from SKB metadata area in two new ways:
1. indirectly, with bpf_dynptr_read(), and
2. directly, with bpf_dynptr_slice().
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Jesse Brandeburg <jbrandeburg@cloudflare.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://patch.msgid.link/20250814-skb-metadata-thru-dynptr-v7-6-8a39e636e0fb@cloudflare.com
We want to add more test cases to cover different ways to access the
metadata area. Prepare for it. Pull up the skeleton management.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Jesse Brandeburg <jbrandeburg@cloudflare.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://patch.msgid.link/20250814-skb-metadata-thru-dynptr-v7-5-8a39e636e0fb@cloudflare.com
Prepare for parametrizing the xdp_context tests. The assert_test_result
helper doesn't need the whole skeleton. Pass just what it needs.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Jesse Brandeburg <jbrandeburg@cloudflare.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://patch.msgid.link/20250814-skb-metadata-thru-dynptr-v7-4-8a39e636e0fb@cloudflare.com
dynptr for skb metadata behaves the same way as the dynptr for skb data
with one exception - writes to skb_meta dynptr don't invalidate existing
skb and skb_meta slices.
Duplicate those the skb dynptr tests which we can, since
bpf_dynptr_from_skb_meta kfunc can be called only from TC BPF, to cover the
skb_meta dynptr verifier checks.
Also add a couple of new tests (skb_data_valid_*) to ensure we don't
invalidate the slices in the mentioned case, which are specific to skb_meta
dynptr.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Jesse Brandeburg <jbrandeburg@cloudflare.com>
Link: https://patch.msgid.link/20250814-skb-metadata-thru-dynptr-v7-3-8a39e636e0fb@cloudflare.com
LLVM/clang can not build the kernel for all architectures supported by
nolibc. The current setup uses the same compiler to build the kernel as
is used for nolibc-test. This prevents using the full qemu-system tests
for LLVM builds.
Instead always build the kernel with GCC. For the nolibc testsuite the
kernel does not need to be built with LLVM.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20250719-nolibc-llvm-system-v1-3-1730216ce171@weissschuh.net
Various targets of the testsuite call back into the toplevel kernel
Makefile. These calls use various parameters and are quite long.
Introduce a common variable to make future changes smaller and the lines
shorter.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20250719-nolibc-llvm-system-v1-1-1730216ce171@weissschuh.net
Only one of these variables is used.
$CC is preferred over $CROSS_COMPILE.
Make this clear in the help message.
Suggested-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/lkml/20250817093905.GA14213@1wt.eu/
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Clobbering a lot of registers and stack slots helps exposing tail call
counter overwrite bugs in JITs.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20250813121016.163375-5-iii@linux.ibm.com
Use ARRAY_SIZE() macro while calculating size of an array to improve
code readability and reduce potential sizing errors.
Implement this suggestion given by spatch tool by running
coccinelle script - scripts/coccinelle/misc/array_size.cocci
Follow ARRAY_SIZE() macro usage pattern in ublk.c introduced by,
commit ec12009318 ("selftests: ublk: fix ublk_find_tgt()")
wherever appropriate to maintain consistency.
Signed-off-by: Akhilesh Patil <akhilesh@ee.iitb.ac.in>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/aKGihYui6/Pcijbk@bhairav-test.ee.iitb.ac.in
Signed-off-by: Jens Axboe <axboe@kernel.dk>
clock_nanosleep() returns a positive error value. Unlike other libc
functions it *does not* return -1 nor set errno.
Fix the return value and also adapt nanosleep().
Fixes: 7c02bc4088 ("tools/nolibc: add support for clock_nanosleep() and nanosleep()")
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20250731-nolibc-clock_nanosleep-ret-v1-1-9e4af7855e61@linutronix.de
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
The CI has hit a couple of cases of:
RUN global.data_steal ...
tls.c:2762:data_steal:Expected recv(cfd, buf2, sizeof(buf2), MSG_DONTWAIT) (20000) == -1 (-1)
data_steal: Test terminated by timeout
FAIL global.data_steal
Looks like the 2msec sleep is not long enough. Make the sleep longer,
and then instead of second sleep wait for the thieving process to exit.
That way we can be sure it called recv() before us.
While at it also avoid trying to steal more than a record, this seems
to be causing issues in manual testing as well.
Fixes: d7e82594a4 ("selftests: tls: test TCP stealing data from under the TLS socket")
Link: https://patch.msgid.link/20250814194323.2014650-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The test covers basic re-use of a pinned DEVMAP map,
with both matching and mismatching parameters.
Signed-off-by: Yureka Lilian <yuka@yuka.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20250814180113.1245565-4-yuka@yuka.dev
Based on a bisect, it appears that commit 7ee9887703 ("timers:
Implement the hierarchical pull model") has somehow inadvertently
broken BPF selftest test_tcpnotify_user. The error that is being
generated by this test is as follows:
FAILED: Wrong stats Expected 10 calls, got 8
It looks like the change allows timer functions to be run on CPUs
different from the one they are armed on. The test had pinned itself
to CPU 0, and in the past the retransmit attempts also occurred on CPU
0. The test had set the max_entries attribute for
BPF_MAP_TYPE_PERF_EVENT_ARRAY to 2 and was calling
bpf_perf_event_output() with BPF_F_CURRENT_CPU, so the entry was
likely to be in range. With the change to allow timers to run on other
CPUs, the current CPU tasked with performing the retransmit might be
bumped and in turn fall out of range, as the event will be filtered
out via __bpf_perf_event_output() using:
if (unlikely(index >= array->map.max_entries))
return -E2BIG;
A possible change would be to explicitly set the max_entries attribute
for perf_event_map in test_tcpnotify_kern.c to a value that's at least
as large as the number of CPUs. As it turns out however, if the field
is left unset, then the libbpf will determine the number of CPUs available
on the underlying system and update the max_entries attribute accordingly
in map_set_def_max_entries().
A further problem with the test is that it has a thread that continues
running up until the program exits. The main thread cleans up some
LIBBPF data structures, while the other thread continues to use them,
which inevitably will trigger a SIGSEGV. This can be dealt with by
telling the thread to run for as long as necessary and doing a
pthread_join on it before exiting the program.
Finally, I don't think binding the process to CPU 0 is meaningful for
this test any more, so get rid of that.
Fixes: 435f90a338 ("selftests/bpf: add a test case for sock_ops perf-event notification")
Signed-off-by: Matt Bobrowski <mattbobrowski@google.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/aJ8kHhwgATmA3rLf@google.com
Skipped tests reported by kselftest.h use a different format than KTAP,
there is no explicit test name. Normally the test name is part of the
free-form string after the SKIP keyword:
ok 3 # SKIP test: some reason
Extend the parser to handle those correctly. Use the free-form string as
test name instead.
Link: https://lore.kernel.org/r/20250813-kunit-kselftesth-skip-v1-1-57ae3de4f109@linutronix.de
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Enable arena atomics tests for RV64.
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/bpf/20250719091730.2660197-11-pulehui@huaweicloud.com
Add tests to ensure proper backlog accounting in hhf, codel, pie, fq,
fq_pie, and fq_codel qdiscs. We check for the bug pattern originally
found in fq, fq_pie, and fq_codel, which was an underflow in the tbf
parent backlog stats upon child qdisc removal.
Signed-off-by: William Liu <will@willsroot.io>
Reviewed-by: Savino Dicanosa <savy@syst3mfailure.io>
Link: https://patch.msgid.link/20250812235808.45281-1-will@willsroot.io
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Test that packets are redirected to the backup port when the primary
port is administratively down.
With the previous patch:
# ./test_bridge_backup_port.sh
[...]
TEST: swp1 administratively down [ OK ]
TEST: No forwarding out of swp1 [ OK ]
TEST: Forwarding out of vx0 [ OK ]
TEST: swp1 administratively up [ OK ]
TEST: Forwarding out of swp1 [ OK ]
TEST: No forwarding out of vx0 [ OK ]
[...]
Tests passed: 89
Tests failed: 0
Without the previous patch:
# ./test_bridge_backup_port.sh
[...]
TEST: swp1 administratively down [ OK ]
TEST: No forwarding out of swp1 [ OK ]
TEST: Forwarding out of vx0 [FAIL]
TEST: swp1 administratively up [ OK ]
TEST: Forwarding out of swp1 [ OK ]
[...]
Tests passed: 85
Tests failed: 4
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20250812080213.325298-3-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
On fast machines the tests run in quick succession so even
when tests clean up after themselves the carrier may need
some time to come back.
Specifically in NIPA when ping.py runs right after netpoll_basic.py
the first ping command fails.
Since the context manager callbacks are now common NetDrvEpEnv
gets an ip link up call as well.
Reviewed-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20250812142054.750282-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The --do-normal parameter was missing from the torture.sh script's help
text, so this commit adds it. Hopefully better late than never!
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
An embarrassing syntax error in jitter.sh makes for fixed spin time.
This commit therefore makes it be variable, as intended, albeit with
very coarse-grained adjustment.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cross-merge networking fixes after downstream PR (net-6.17-rc2).
No conflicts.
Adjacent changes:
drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
d7a276a576 ("net: stmmac: rk: convert to suspend()/resume() methods")
de1e963ad0 ("net: stmmac: rk: put the PHY clock on remove")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Current release - regressions:
- netfilter: nft_set_pipapo:
- don't return bogus extension pointer
- fix null deref for empty set
Current release - new code bugs:
- core: prevent deadlocks when enabling NAPIs with mixed kthread config
- eth: netdevsim: Fix wild pointer access in nsim_queue_free().
Previous releases - regressions:
- page_pool: allow enabling recycling late, fix false positive warning
- sched: ets: use old 'nbands' while purging unused classes
- xfrm:
- restore GSO for SW crypto
- bring back device check in validate_xmit_xfrm
- tls: handle data disappearing from under the TLS ULP
- ptp: prevent possible ABBA deadlock in ptp_clock_freerun()
- eth: bnxt: fill data page pool with frags if PAGE_SIZE > BNXT_RX_PAGE_SIZE
- eth: hv_netvsc: fix panic during namespace deletion with VF
Previous releases - always broken:
- netfilter: fix refcount leak on table dump
- vsock: do not allow binding to VMADDR_PORT_ANY
- sctp: linearize cloned gso packets in sctp_rcv
- eth: hibmcge: fix the division by zero issue
- eth: microchip: fix KSZ8863 reset problem
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmidxhgSHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOkPckP/AhJ0kARPgo72OhElbW6KvkHhXVNUTJb
6m15j9Z8ybQPBupSorxUEBx7pxczv/GBloN1xoJqUY/p7B6kLO2HDOpaReNFjZvF
J9hPl6ZF6CWzwgfcUfwI3UB1zhALKyHDClclfd8FBoKjAYXCrZXuPv/AV4oqYsA1
y7g24zpI76Cu+M+Nf5YhrlIVSmQ1/DXX8gifdcHFYnAKmCn7KxNv2lwvm2/yE2lL
9/Xl/D1cG/BiAaCUUXR4BP8RU5gdW72+lM3qmC/QFO/7jcBaoE89Y9Anona8p0PQ
dqDerHd0GDUH9QA6bht3asCS+IW+Zo2gf25o53OzlYIMAxDmEZLUBCwetJhvNJBq
DUQ6agzfNRxsCnlOc4JhMOqNr7rdU7d+9c9KuZWA/m8KRWdlvTYGJd/qzSlTWOhq
s9228dl+4oTb9Mnq8Bqafi42+TImeOyFRW9ZgF8ptjlF0l/lyv6moIrRCmVXppRZ
awABNDdG+i004XmAOAeOhjbUT7clLkLr+KEnsfH16qCa2o3dM6rlhvWYp2sucVJf
SyRvMdz5VqMLgruefpQS/DuK52UklpRawgvgngzU6UDYQUaxQKToeusMjRU7xUnW
hVI1y7/oNH6+r7Zr/iLTLKRR007B+RVC7VSbeMpxmAW+n6puMb+z7RnrJlnFapGM
qqqtk2/jItuK
=Ydk1
-----END PGP SIGNATURE-----
Merge tag 'net-6.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from Netfilter and IPsec.
Current release - regressions:
- netfilter: nft_set_pipapo:
- don't return bogus extension pointer
- fix null deref for empty set
Current release - new code bugs:
- core: prevent deadlocks when enabling NAPIs with mixed kthread
config
- eth: netdevsim: Fix wild pointer access in nsim_queue_free().
Previous releases - regressions:
- page_pool: allow enabling recycling late, fix false positive
warning
- sched: ets: use old 'nbands' while purging unused classes
- xfrm:
- restore GSO for SW crypto
- bring back device check in validate_xmit_xfrm
- tls: handle data disappearing from under the TLS ULP
- ptp: prevent possible ABBA deadlock in ptp_clock_freerun()
- eth:
- bnxt: fill data page pool with frags if PAGE_SIZE > BNXT_RX_PAGE_SIZE
- hv_netvsc: fix panic during namespace deletion with VF
Previous releases - always broken:
- netfilter: fix refcount leak on table dump
- vsock: do not allow binding to VMADDR_PORT_ANY
- sctp: linearize cloned gso packets in sctp_rcv
- eth:
- hibmcge: fix the division by zero issue
- microchip: fix KSZ8863 reset problem"
* tag 'net-6.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (54 commits)
net: usb: asix_devices: add phy_mask for ax88772 mdio bus
net: kcm: Fix race condition in kcm_unattach()
selftests: net/forwarding: test purge of active DWRR classes
net/sched: ets: use old 'nbands' while purging unused classes
bnxt: fill data page pool with frags if PAGE_SIZE > BNXT_RX_PAGE_SIZE
netdevsim: Fix wild pointer access in nsim_queue_free().
net: mctp: Fix bad kfree_skb in bind lookup test
netfilter: nf_tables: reject duplicate device on updates
ipvs: Fix estimator kthreads preferred affinity
netfilter: nft_set_pipapo: fix null deref for empty set
selftests: tls: test TCP stealing data from under the TLS socket
tls: handle data disappearing from under the TLS ULP
ptp: prevent possible ABBA deadlock in ptp_clock_freerun()
ixgbe: prevent from unwanted interface name changes
devlink: let driver opt out of automatic phys_port_name generation
net: prevent deadlocks when enabling NAPIs with mixed kthread config
net: update NAPI threaded config even for disabled NAPIs
selftests: drv-net: don't assume device has only 2 queues
docs: Fix name for net.ipv4.udp_child_hash_entries
riscv: dts: thead: Add APB clocks for TH1520 GMACs
...
Add a simple test for checking that RSS on flow label works,
and that its rejected for IPv4 flows.
# ./tools/testing/selftests/drivers/net/hw/rss_flow_label.py
TAP version 13
1..2
ok 1 rss_flow_label.test_rss_flow_label
ok 2 rss_flow_label.test_rss_flow_label_6only
# Totals: pass:2 fail:0 xfail:0 xpass:0 skip:0 error:0
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20250811234212.580748-5-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Extend the existing netconsole cmdline selftest to also validate that
interface selection can be performed via MAC address.
The test now validates that netconsole works with both interface name
and MAC address, improving test coverage.
Suggested-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Andre Carvalho <asantostc@gmail.com>
Link: https://patch.msgid.link/20250812-netcons-cmdline-selftest-v2-1-8099fb7afa9e@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Test various aspects of FDB activity notification control:
* Transitioning of an FDB entry from inactive to active state.
* Transitioning of an FDB entry from active to inactive state.
* Avoiding the resetting of an FDB entry's last activity time (i.e.,
"updated" time) using the "norefresh" keyword.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20250812071810.312346-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit d6212d82bf ("selftests/bpf: Consolidate kernel modules into
common directory") consolidated the Makefile of test_kmods. However,
since it removed test_kmods from TEST_GEN_PROGS_EXTENDED, the kernel
modules required by bpf selftests are now missing from kselftest_install
when "make install". Fix it by adding test_kmod to TEST_GEN_FILES.
Fixes: d6212d82bf ("selftests/bpf: Consolidate kernel modules into common directory")
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250812175039.2323570-1-ameryhung@gmail.com
or aren't considered necessary for -stable kernels. 10 of these fixes are
for MM.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaJwLpgAKCRDdBJ7gKXxA
js1sAP0c/XlVJhICq9aNJluu4Nj7cKTlzN7nvD/YRivZrG8XJQD/Q5nvFY6yeOdi
/HxCAdTDY5HsWv28nEbvnxKKbuFligU=
=mtKs
-----END PGP SIGNATURE-----
Merge tag 'mm-hotfixes-stable-2025-08-12-20-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"12 hotfixes. 5 are cc:stable and the remainder address post-6.16
issues or aren't considered necessary for -stable kernels.
10 of these fixes are for MM"
* tag 'mm-hotfixes-stable-2025-08-12-20-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
proc: proc_maps_open allow proc_mem_open to return NULL
mm/mremap: avoid expensive folio lookup on mremap folio pte batch
userfaultfd: fix a crash in UFFDIO_MOVE when PMD is a migration entry
mm: pass page directly instead of using folio_page
selftests/proc: fix string literal warning in proc-maps-race.c
fs/proc/task_mmu: hold PTL in pagemap_hugetlb_range and gather_hugetlb_stats
mm/smaps: fix race between smaps_hugetlb_range and migration
mm: fix the race between collapse and PT_RECLAIM under per-vma lock
mm/kmemleak: avoid soft lockup in __kmemleak_do_cleanup()
MAINTAINERS: add Masami as a reviewer of hung task detector
mm/kmemleak: avoid deadlock by moving pr_warn() outside kmemleak_lock
kasan/test: fix protection against compiler elision
Check a race where data disappears from the TCP socket after
TLS signaled that its ready to receive.
ok 6 global.data_steal
# RUN tls_basic.base_base ...
# OK tls_basic.base_base
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250807232907.600366-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The Device Under Test should always be the local system.
While the Rx test gets this right the Tx test is sending
from remote to local. So Tx of DMABUF memory happens on remote.
These tests never run in NIPA since we don't have a compatible
device so we haven't caught this.
Reviewed-by: Joe Damato <joe@dama.to>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250811231334.561137-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There is a number of:
with bkg("socat ..LISTEN..", exit_wait=True)
uses in the tests. If whatever is supposed to send the traffic
fails we will get stuck in the bkg(). Try to kill the process
in case of exception, to avoid the long wait.
A specific example where this happens is the devmem Tx tests.
Reviewed-by: Joe Damato <joe@dama.to>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250811231334.561137-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We need to use bracketed IPv6 addresses for socat.
Reviewed-by: Joe Damato <joe@dama.to>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250811231334.561137-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The general expectations for network HW selftests is that they
will be run as root. sudo doesn't seem to work on NIPA VMs.
While it's probably something solvable in the setup I think we should
remove the sudos. devmem is the only networking test using sudo.
Reviewed-by: Joe Damato <joe@dama.to>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250811231334.561137-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Looks like neither IO_URING nor UDMABUF are enabled even tho
iou-zcrx.py and devmem.py (respectively) need those.
IO_URING gets enabled by default but UDMABUF is missing.
Reviewed-by: Joe Damato <joe@dama.to>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250811231334.561137-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-Wall and -Wflex-array-member-not-at-end caught some warnings that
will be fixed in later patches.
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250811215432.3379570-2-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
scm_rights.c has no problem in functionality, but when compiled with
-Wflex-array-member-not-at-end, it shows this warning:
scm_rights.c: In function ‘__send_fd’:
scm_rights.c:275:32: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
275 | struct cmsghdr cmsghdr;
| ^~~~~~~
Let's silence it.
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250811215432.3379570-4-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
scm_inq.c has no problem in functionality, but when compiled with
-Wflex-array-member-not-at-end, it shows this warning:
scm_inq.c:15:24: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
15 | struct cmsghdr cmsghdr;
| ^~~~~~~
Let's silence it.
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250811215432.3379570-3-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The test is implicitly assuming the device only has 2 queues.
A real device will likely have more. The exact problem is that
because NAPIs get added to the list from the head, the netlink
dump reports them in reverse order. So the naive napis[0] will
actually likely give us the _last_ NAPI, not the first one.
Re-enable all the NAPIs instead of hard-coding 2 in the test.
This way the NAPIs we operated on will always reappear,
doesn't matter where they were in the registration order.
Fixes: e6d7626881 ("net: Update threaded state in napi config in netif_set_threaded")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20250809001205.1147153-2-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
This change resolves non literal string format warning invoked for
proc-maps-race.c while compiling.
proc-maps-race.c:205:17: warning: format not a string literal and no format arguments [-Wformat-security]
205 | printf(text);
| ^~~~~~
proc-maps-race.c:209:17: warning: format not a string literal and no format arguments [-Wformat-security]
209 | printf(text);
| ^~~~~~
proc-maps-race.c: In function `print_last_lines':
proc-maps-race.c:224:9: warning: format not a string literal and no format arguments [-Wformat-security]
224 | printf(start);
| ^~~~~~
Add string format specifier %s for the printf calls in both
print_first_lines() and print_last_lines() thus resolving the warnings.
The test executes fine after this change thus causing no effect to the
functional behavior of the test.
Link: https://lkml.kernel.org/r/20250804225633.841777-1-hsukrut3@gmail.com
Fixes: aadc099c48 ("selftests/proc: add verbose mode for /proc/pid/maps tearing tests")
Signed-off-by: Sukrut Heroorkar <hsukrut3@gmail.com>
Acked-by: Suren Baghdasaryan <surenb@google.com>
Cc: David Hunter <david.hunter.linux@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The get_next_frame() function in psock_tpacket.c was missing a return
statement in its default switch case, leading to a compiler warning.
This was caused by a `bug_on(1)` call, which is defined as an
`assert()`, being compiled out because NDEBUG is defined during the
build.
Instead of adding a `return NULL;` which would silently hide the error
and could lead to crashes later, this change restores the original
author's intent. By adding `#undef NDEBUG` before including <assert.h>,
we ensure the assertion is active and will cause the test to abort if
this unreachable code is ever executed.
Signed-off-by: Wake Liu <wakel@google.com>
Link: https://patch.msgid.link/20250809062013.2407822-1-wakel@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The `__WORDSIZE` macro, defined in the non-standard `<bits/wordsize.h>`
header, is a GNU extension and not universally available with all
toolchains, such as Clang when used with musl libc.
This can lead to build failures in environments where this header is
missing.
The intention of the code is to determine the bit width of a C `long`.
Replace the non-portable `__WORDSIZE` with the standard and portable
`sizeof(long) * 8` expression to achieve the same result.
This change also removes the inclusion of the now-unused
`<bits/wordsize.h>` header.
Signed-off-by: Wake Liu <wakel@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
./tools/testing/selftests/sched_ext/hotplug.c: sched.h is included more than once.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=22941
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Acked-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Resolve a conflict between
commit 6a68d28066 ("selftests/coredump: Fix "socket_detect_userspace_client" test failure")
and
commit 994dc26302 ("selftests/coredump: fix build")
The first commit adds a read() to wait for write() from another thread to
finish. But the second commit removes the write().
Now that the two commits are in the same tree, the read() now gets EOF and
the test fails.
Remove this read() so that the test passes.
Signed-off-by: Nam Cao <namcao@linutronix.de>
Link: https://lore.kernel.org/20250811074957.4079616-1-namcao@linutronix.de
Signed-off-by: Christian Brauner <brauner@kernel.org>
There appear to be no other open_tree_attr(2) tests at the moment, but
as a minimal solution just add some additional checks in the existing
MOUNT_ATTR_IDMAP tests to make sure that open_tree_attr(2) cannot be
used to bypass the tested restrictions that apply to mount_setattr(2).
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Link: https://lore.kernel.org/20250808-open_tree_attr-bugfix-idmap-v1-2-0ec7bc05646c@cyphar.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
- netlink: avoid infinite retry looping in netlink_unicast()
Previous releases - always broken:
- packet: fix a race in packet_set_ring() and packet_notifier()
- ipv6: reject malicious packets in ipv6_gso_segment()
- sched: mqprio: fix stack out-of-bounds write in tc entry parsing
- net: drop UFO packets (injected via virtio) in udp_rcv_segment()
- eth: mlx5: correctly set gso_segs when LRO is used, avoid false
positive checksum validation errors
- netpoll: prevent hanging NAPI when netcons gets enabled
- phy: mscc: fix parsing of unicast frames for PTP timestamping
- number of device tree / OF reference leak fixes
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmiUvMcACgkQMUZtbf5S
IrucuA//bQGZdQkpRo/2zWDFS4wN7hV8PkR+kI0F4rUyNQjFDyUOrccYnhHl8VRn
DfnmVC9oiWxwuW2QgrgH1KKSw9dxhDPVmhLezAHaZv9XAPPgPO/Yb2Dr3oKHF+yK
+/QW57FN8rNg8mHUzMS/26Y+rH6OubTaf3MKcsT2uZhuIXKHScOUedmr6EeEkp/L
c32MpWfcVC7M2jjvCH+HO4k4RawWaB8W93iMUrLMSVI1oE3Tsjyhl5Cv9TBixh7g
3KZW98qVqMXKBQr0QTF4LBR0IWgKOS4KMVJqrgQ3CZszbDWbmBsFfi8olr/AS0Nt
vgk6hTd8vVHXa+sOvdFpDbocdjXBq6vf5bDd9p57bt3JyxFMfFUWbG1eDj8bUpMI
YuKAhQg9mTr6R8DaLwqANY8zrSET2FiYbNkUsP9TD8++q2j34U5/a468cxymfjJ/
90vSrBGUpwxYPhx2B+Slu1SZJ1g0NbUL34jrKBuSNloVoZDRuzq2zd310BigG/35
T5CTr5OH+USlFjL6KpgGHnygiMQB/h2WgEjWbFoZHnIb+exVKCgS6IiNBVV3b2OK
Nv6iVyFKPdKJ8qxeaiYMWA16wG8buOytlBn/1hYKLBvyVKAAx9Ib4ao2TK9w/1Vs
50sX21h+Awj2a/17sXlSXA+RVdeBcAzHdsGWHr1ql31htn2Xl10=
=nwy7
-----END PGP SIGNATURE-----
Merge tag 'net-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
Previous releases - regressions:
- netlink: avoid infinite retry looping in netlink_unicast()
Previous releases - always broken:
- packet: fix a race in packet_set_ring() and packet_notifier()
- ipv6: reject malicious packets in ipv6_gso_segment()
- sched: mqprio: fix stack out-of-bounds write in tc entry parsing
- net: drop UFO packets (injected via virtio) in udp_rcv_segment()
- eth: mlx5: correctly set gso_segs when LRO is used, avoid false
positive checksum validation errors
- netpoll: prevent hanging NAPI when netcons gets enabled
- phy: mscc: fix parsing of unicast frames for PTP timestamping
- a number of device tree / OF reference leak fixes"
* tag 'net-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (44 commits)
pptp: fix pptp_xmit() error path
net: ti: icssg-prueth: Fix skb handling for XDP_PASS
net: Update threaded state in napi config in netif_set_threaded
selftests: netdevsim: Xfail nexthop test on slow machines
eth: fbnic: Lock the tx_dropped update
eth: fbnic: Fix tx_dropped reporting
eth: fbnic: remove the debugging trick of super high page bias
net: ftgmac100: fix potential NULL pointer access in ftgmac100_phy_disconnect
dt-bindings: net: Replace bouncing Alexandru Tachici emails
dpll: zl3073x: ZL3073X_I2C and ZL3073X_SPI should depend on NET
net/sched: mqprio: fix stack out-of-bounds write in tc entry parsing
Revert "net: mdio_bus: Use devm for getting reset GPIO"
selftests: net: packetdrill: xfail all problems on slow machines
net/packet: fix a race in packet_set_ring() and packet_notifier()
benet: fix BUG when creating VFs
net: airoha: npu: Add missing MODULE_FIRMWARE macros
net: devmem: fix DMA direction on unmapping
ipa: fix compile-testing with qcom-mdt=m
eth: fbnic: unlink NAPIs from queues on error to open
net: Add locking to protect skb->dev access in ip_output
...
Test multi_st_ops and demonstrate how different bpf programs can call
a kfuncs that refers to the struct_ops instance in the same source file
by id. The id is defined as a global vairable and initialized before
attaching the skeleton. Kfuncs that take the id can hide the argument
with a macro to make it almost transparent to bpf program developers.
The test involves two struct_ops returning different values from
.test_1. In syscall and tracing programs, check if the correct value is
returned by a kfunc that calls .test_1.
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250806162540.681679-4-ameryhung@gmail.com
Current struct_ops in bpf_testmod only support attaching single instance.
Add multi_st_ops that supports multiple instances. The struct_ops uses map
id as the struct_ops id and will reject attachment with an existing id.
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250806162540.681679-3-ameryhung@gmail.com
reference count leaks which are caused by:
- VMA splits, which change the offset or size of a mapping, which causes
perf_mmap_close() to ignore the unmap or unmap the wrong buffer.
- Several internal issues of perf_mmap(), which can cause reference count
leaks in the perf mmap, corrupt accounting or cause leaks in perf
drivers.
The main fix is to prevent VMA splits by implementing the [may_]split()
callback for vm operations. The other issues are addressed by rearranging
code, early returns on failure and invocation of cleanups.
Also provide a selftest to validate the fixes.
The reference counting should be converted to refcount_t, but that requires
larger refactoring of the code and will be done once these fixes are
upstream.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmiSd0gTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoWCbD/9niCHArXIWfdIp1K2ZwM2+tsCU7Ntl
XnkfnaPoCVpQQXcAN11WIEK6DlwaiY2sfOco1cpEqr5px0Hv5qzM+OGm0r3KQBpr
9d1Ox8+tqUOIxw5zsKFXpw6/WX4zzdxGHjzU/T0H+fPP3jB+hj/Q3hOk4u10+f3v
7f3Q4sOfkmOauQez2HEaDwUip6lLZFEaf8IK0tYEkOJxcStwsC2TnLvmlEmOA0Yx
PnAXOicrpbe9d8KNq6VxU0OtV6XAT+YJtf9T5cTNR1NhIkqyaMwbdzkuh9RZgxAE
oRblaAHubAUMmv2DgYOTUGoYivsXY13XOtjfXdLmxt19HmkSOyaCFO8nJgjAPOL7
gxGXS7zKxhNac7bfVgBANPUHOOWtV30H5CqYOzxaPlQs8gzOsl8l+NDZuwVlP4P6
CMdN3rz3eMpnMpuzy0mmUJhowytKDA8N81yamCP5L9hWWZVfp4boZfIXMMLtJdQa
nv/T2HxLL8HweFrI6Wd7YDhXMKhsNDAqJvtSv0z+5U+PWWd9rcOFsgS9sUHIiJuB
pLvNwLxPntzF6qw4qIp1W1AHfLz2VF/tR8WyINpEZe4oafP1TccI+aLQdIJ/vVqp
gQ0bCTiZb16IGsHruu4L9C0fe40TdSuiwEK5X9Opk4aP11oagsqQ+GxzssvQZnZc
Jx2XqouabWBBvQ==
=B9L/
-----END PGP SIGNATURE-----
Merge tag 'perf-fixes-27504' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git
Pull perf fixes from Thomas Gleixner:
"Perf fixes for perf_mmap() reference counting to prevent potential
reference count leaks which are caused by:
- VMA splits, which change the offset or size of a mapping, which
causes perf_mmap_close() to ignore the unmap or unmap the wrong
buffer.
- Several internal issues of perf_mmap(), which can cause reference
count leaks in the perf mmap, corrupt accounting or cause leaks in
perf drivers.
The main fix is to prevent VMA splits by implementing the
[may_]split() callback for vm operations.
The other issues are addressed by rearranging code, early returns on
failure and invocation of cleanups.
Also provide a selftest to validate the fixes.
The reference counting should be converted to refcount_t, but that
requires larger refactoring of the code and will be done once these
fixes are upstream"
* tag 'perf-fixes-27504' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git:
selftests/perf_events: Add a mmap() correctness test
perf/core: Prevent VMA split of buffer mappings
perf/core: Handle buffer mapping fail correctly in perf_mmap()
perf/core: Exit early on perf_mmap() fail
perf/core: Don't leak AUX buffer refcount on allocation failure
perf/core: Preserve AUX buffer allocation failure result
Commit 2677010e77 ("Add support to set NAPI threaded for individual
NAPI") added support to enable/disable threaded napi using netlink. This
also extended the napi config save/restore functionality to set the napi
threaded state. This breaks netdev reset for drivers that use napi
threaded at device level and also use napi config save/restore on
napi_disable/napi_enable. Basically on netdev with napi threaded enabled
at device level, a napi_enable call will get stuck trying to stop the
napi kthread. This is because the napi->config->threaded is set to
disabled when threaded is enabled at device level.
The issue can be reproduced on virtio-net device using qemu. To
reproduce the issue run following,
echo 1 > /sys/class/net/threaded
ethtool -L eth0 combined 1
Update the threaded state in napi config in netif_set_threaded and add a
new test that verifies this scenario.
Tested on qemu with virtio-net:
NETIF=eth0 ./tools/testing/selftests/drivers/net/napi_threaded.py
TAP version 13
1..2
ok 1 napi_threaded.change_num_queues
ok 2 napi_threaded.enable_dev_threaded_disable_napi_threaded
# Totals: pass:2 fail:0 xfail:0 xpass:0 skip:0 error:0
Fixes: 2677010e77 ("Add support to set NAPI threaded for individual NAPI")
Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
Link: https://patch.msgid.link/20250804164457.2494390-1-skhawaja@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A lot of test cases in the file are related to the idle and unbalanced
timers of resilient nexthop groups and these tests are reported to be
flaky on slow machines running debug kernels.
Rather than marking a lot of individual tests with xfail_on_slow(),
simply mark all the tests. Note that the test is stable on non-debug
machines and that with debug kernels we are mainly interested in the
output of various sanitizers in order to determine pass / fail.
Before:
# make -C tools/testing/selftests KSFT_MACHINE_SLOW=yes \
TARGETS=drivers/net/netdevsim TEST_PROGS=nexthop.sh \
TEST_GEN_PROGS="" run_tests
[...]
# TEST: Bucket migration after idle timer (with delete) [FAIL]
# Group expected to still be unbalanced
[...]
not ok 1 selftests: drivers/net/netdevsim: nexthop.sh # exit=1
After:
# make -C tools/testing/selftests KSFT_MACHINE_SLOW=yes \
TARGETS=drivers/net/netdevsim TEST_PROGS=nexthop.sh \
TEST_GEN_PROGS="" run_tests
[...]
# TEST: Bucket migration after idle timer (with delete) [XFAIL]
# Group expected to still be unbalanced
[...]
ok 1 selftests: drivers/net/netdevsim: nexthop.sh
Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/netdev/20250729160609.02e0f157@kernel.org/
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250804114320.193203-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Exercise various mmap(), munmap() and mremap() invocations, which might
cause a perf buffer mapping to be split or truncated.
To avoid hard coding the perf event and having dependencies on
architectures and configuration options, scan through event types in sysfs
and try to open them. On success, try to mmap() and if that succeeds try to
mmap() the AUX buffer.
In case that no AUX buffer supporting event is found, only test the base
buffer mapping. If no mappable event is found or permissions are not
sufficient, skip the tests.
Reserve a PROT_NONE region for both rb and aux tests to allow testing the
case where mremap unmaps beyond the end of a mapped VMA to prevent it from
unmapping unrelated mappings.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Co-developed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
- The 4 patch series "mseal cleanups" from Lorenzo Stoakes erforms some
mseal cleaning with no intended functional change.
- The 3 patch series "Optimizations for khugepaged" from David
Hildenbrand improves khugepaged throughput by batching PTE operations
for large folios. This gain is mainly for arm64.
- The 8 patch series "x86: enable EXECMEM_ROX_CACHE for ftrace and
kprobes" from Mike Rapoport provides a bugfix, additional debug code and
cleanups to the execmem code.
- The 7 patch series "mm/shmem, swap: bugfix and improvement of mTHP
swap in" from Kairui Song provides bugfixes, cleanups and performance
improvememnts to the mTHP swapin code.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaI+6HQAKCRDdBJ7gKXxA
jv7lAQCAKE5dUhdZ0pOYbhBKTlDapQh2KqHrlV3QFcxXgknEoQD/c3gG01rY3fLh
Cnf5l9+cdyfKxFniO48sUPx6IpriRg8=
=HT5/
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2025-08-03-12-35' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more MM updates from Andrew Morton:
"Significant patch series in this pull request:
- "mseal cleanups" (Lorenzo Stoakes)
Some mseal cleaning with no intended functional change.
- "Optimizations for khugepaged" (David Hildenbrand)
Improve khugepaged throughput by batching PTE operations for large
folios. This gain is mainly for arm64.
- "x86: enable EXECMEM_ROX_CACHE for ftrace and kprobes" (Mike Rapoport)
A bugfix, additional debug code and cleanups to the execmem code.
- "mm/shmem, swap: bugfix and improvement of mTHP swap in" (Kairui Song)
Bugfixes, cleanups and performance improvememnts to the mTHP swapin
code"
* tag 'mm-stable-2025-08-03-12-35' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (38 commits)
mm: mempool: fix crash in mempool_free() for zero-minimum pools
mm: correct type for vmalloc vm_flags fields
mm/shmem, swap: fix major fault counting
mm/shmem, swap: rework swap entry and index calculation for large swapin
mm/shmem, swap: simplify swapin path and result handling
mm/shmem, swap: never use swap cache and readahead for SWP_SYNCHRONOUS_IO
mm/shmem, swap: tidy up swap entry splitting
mm/shmem, swap: tidy up THP swapin checks
mm/shmem, swap: avoid redundant Xarray lookup during swapin
x86/ftrace: enable EXECMEM_ROX_CACHE for ftrace allocations
x86/kprobes: enable EXECMEM_ROX_CACHE for kprobes allocations
execmem: drop writable parameter from execmem_fill_trapping_insns()
execmem: add fallback for failures in vmalloc(VM_ALLOW_HUGE_VMAP)
execmem: move execmem_force_rw() and execmem_restore_rox() before use
execmem: rework execmem_cache_free()
execmem: introduce execmem_alloc_rw()
execmem: drop unused execmem_update_copy()
mm: fix a UAF when vma->mm is freed after vma->vm_refcnt got dropped
mm/rmap: add anon_vma lifetime debug check
mm: remove mm/io-mapping.c
...
We keep seeing flakes on packetdrill on debug kernels, while
non-debug kernels are stable, not a single flake in 200 runs.
Time to give up, debug kernels appear to suffer from 10msec
latency spikes and any timing-sensitive test is bound to flake.
Reviewed-by: Willem de Bruijn <willemb@google.com>
Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250801181638.2483531-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- Fixes for several issues in the powernv PCI hotplug path
- Fix htmldoc generation for htm.rst in toctree
- Add jit support for load_acquire and store_release in ppc64 bpf jit
Thanks to: Bjorn Helgaas, Hari Bathini, Puranjay Mohan, Saket Kumar Bhaskar,
Shawn Anastasio, Timothy Pearson, Vishal Parmar
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEqX2DNAOgU8sBX3pRpnEsdPSHZJQFAmiQDOcACgkQpnEsdPSH
ZJRqTBAAkpaZRrbzX6P1fjoTb89esIfY7YSymmXrgMwoF1fgTMzPSjg2Lzwzcjfb
Ednlo8Hy+Ei2SDvctNaqtOcZuidn81AJN41aFIu44S/Mj1cde/cuBMKUTqx+ZBO0
gI4Bd4+V7yEv484PF7ZJga9K1VM85THCLgVWJ00VNHVHyvxgAuFiXdWmh6/qMUyi
thvvLR+ANuAQz3S4VwbBg3AifDl6LXx2s5VB30xYxnPKzFNKZmnGXKwuOJH7rQe2
J8v99n0tcXW1tRGE4pVykzXg4EXL5zgWT9fJ5EZxbeXaW9sqMxi4VjO4jSsrSZ+K
q2v362Dyjgygel9aC2rzN8Q+P5horX2QBR7knJJGa0VtztUiPWKR8za7vGLbzlcm
rUvnTuoY1dwFB72Sy3amALalvWscssL/1sHazvRv65RcciW7/PZNVEiZ0xh1RKqb
J8nWlb+iNBf2z12qLmS5DvUaveaZG1eyndLeD/knsEC49DEOoEy3t7QM1F7KscRR
mYPsEpfjF/D0r+vzb6zl2ykwhJf/t7BNu7MdXH7xbIpj5iwtrhSUfvmx6g2MThzA
Vee2QvQACscdop3W/6xgATH4xoq96v1XxMmCLnZ/HVl2PorxO27ad4EMNO4sBG8c
5agWHT7EnoUgwNF30DRtIHd7jNK/jt8++3kZx6CG9hdSboAM/pM=
=tTms
-----END PGP SIGNATURE-----
Merge tag 'powerpc-6.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Madhavan Srinivasan:
- Fixes for several issues in the powernv PCI hotplug path
- Fix htmldoc generation for htm.rst in toctree
- Add jit support for load_acquire and store_release in ppc64 bpf jit
Thanks to Bjorn Helgaas, Hari Bathini, Puranjay Mohan, Saket Kumar
Bhaskar, Shawn Anastasio, Timothy Pearson, and Vishal Parmar
* tag 'powerpc-6.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc64/bpf: Add jit support for load_acquire and store_release
docs: powerpc: add htm.rst to toctree
PCI: pnv_php: Enable third attention indicator state
PCI: pnv_php: Fix surprise plug detection and recovery
powerpc/eeh: Make EEH driver device hotplug safe
powerpc/eeh: Export eeh_unfreeze_pe()
PCI: pnv_php: Work around switches with broken presence detection
PCI: pnv_php: Clean up allocated IRQs on unplug
- The 2 patch series "squashfs: Remove page->mapping references" from
Matthew Wilcox gets us closer to being able to remove page->mapping.
- The 5 patch series "relayfs: misc changes" from Jason Xing does some
maintenance and minor feature addition work in relayfs.
- The 5 patch series "kdump: crashkernel reservation from CMA" from Jiri
Bohac switches us from static preallocation of the kdump crashkernel's
working memory over to dynamic allocation. So the difficulty of
a-priori estimation of the second kernel's needs is removed and the
first kernel obtains extra memory.
- The 5 patch series "generalize panic_print's dump function to be used
by other kernel parts" from Feng Tang implements some consolidation and
rationalizatio of the various ways in which a faiing kernel splats
information at the operator.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaI+82gAKCRDdBJ7gKXxA
jj4JAP9xb+w9DrBY6sa+7KTPIb+aTqQ7Zw3o9O2m+riKQJv6jAEA6aEwRnDA0451
fDT5IqVlCWGvnVikdZHSnvhdD7TGsQ0=
=rT71
-----END PGP SIGNATURE-----
Merge tag 'mm-nonmm-stable-2025-08-03-12-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
"Significant patch series in this pull request:
- "squashfs: Remove page->mapping references" (Matthew Wilcox) gets
us closer to being able to remove page->mapping
- "relayfs: misc changes" (Jason Xing) does some maintenance and
minor feature addition work in relayfs
- "kdump: crashkernel reservation from CMA" (Jiri Bohac) switches
us from static preallocation of the kdump crashkernel's working
memory over to dynamic allocation. So the difficulty of a-priori
estimation of the second kernel's needs is removed and the first
kernel obtains extra memory
- "generalize panic_print's dump function to be used by other
kernel parts" (Feng Tang) implements some consolidation and
rationalization of the various ways in which a failing kernel
splats information at the operator
* tag 'mm-nonmm-stable-2025-08-03-12-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (80 commits)
tools/getdelays: add backward compatibility for taskstats version
kho: add test for kexec handover
delaytop: enhance error logging and add PSI feature description
samples: Kconfig: fix spelling mistake "instancess" -> "instances"
fat: fix too many log in fat_chain_add()
scripts/spelling.txt: add notifer||notifier to spelling.txt
xen/xenbus: fix typo "notifer"
net: mvneta: fix typo "notifer"
drm/xe: fix typo "notifer"
cxl: mce: fix typo "notifer"
KVM: x86: fix typo "notifer"
MAINTAINERS: add maintainers for delaytop
ucount: use atomic_long_try_cmpxchg() in atomic_long_inc_below()
ucount: fix atomic_long_inc_below() argument type
kexec: enable CMA based contiguous allocation
stackdepot: make max number of pools boot-time configurable
lib/xxhash: remove unused functions
init/Kconfig: restore CONFIG_BROKEN help text
lib/raid6: update recov_rvv.c zero page usage
docs: update docs after introducing delaytop
...
Patch series "mseal cleanups", v4.
Perform a number of cleanups to the mseal logic. Firstly, VM_SEALED is
treated differently from every other VMA flag, it really doesn't make
sense to do this, so we start by making this consistent with everything
else.
Next we place the madvise logic where it belongs - in mm/madvise.c. It
really makes no sense to abstract this elsewhere. In doing so, we go to
great lengths to explain very clearly the previously very confusing logic
as to what sealed mappings are impacted here.
In doing so, we retain existing logic regarding treatment of madvise()
discard operations for a sealed, read-only MAP_PRIVATE file-backed
mapping. This is something we likely need to revisit.
We then abstract out and explain the 'are there are any gaps in this range
in the mm?' check being performed as a prerequisite to mseal being
performed.
Finally, we simplify the actual mseal logic which is really quite
straightforward.
No functional change is intended.
This patch (of 4):
There is no reason to treat VM_SEALED in a special way, in each other case
in which a VMA flag is unavailable due to configuration, we simply assign
that flag to VM_NONE, so make VM_SEALED consistent with all other VMA
flags in this respect.
Additionally, use the next available bit for VM_SEALED, 42, rather than
arbitrarily putting it at 63 and update the declaration to match all other
VMA flags.
No functional change intended.
Link: https://lkml.kernel.org/r/cover.1753431105.git.lorenzo.stoakes@oracle.com
Link: https://lkml.kernel.org/r/aeb398a77029b6e7377cd944328bc9bbc3c90537.1753431105.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Kees Cook <kees@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add a cohesive test case that verifies cachestat behavior with
memory-mapped files using mmap(). Also refactor the test logic to reduce
redundancy, improve error reporting, and clarify failure messages for both
shmem and mmap file types.
[akpm@linux-foundation.org: coding-style cleanups]
Link: https://lkml.kernel.org/r/20250709174657.6916-1-suresh.k.chandrappa@gmail.com
Signed-off-by: Suresh K C <suresh.k.chandrappa@gmail.com>
Reviewed-by: Joshua Hahn <joshua.hahnjy@gmail.com>
Tested-by: Nhat Pham <nphamcs@gmail.com>
Acked-by: Nhat Pham <nphamcs@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add tests for process_madvise(), focusing on verifying behavior under
various conditions including valid usage and error cases.
[lianux.mm@gmail.com: v7]
Link: https://lkml.kernel.org/r/20250729113109.12272-1-lianux.mm@gmail.com
Link: https://lkml.kernel.org/r/20250729113109.12272-1-lianux.mm@gmail.com
Link: https://lkml.kernel.org/r/20250721114614.40996-1-lianux.mm@gmail.com
Signed-off-by: wang lian <lianux.mm@gmail.com>
Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Suggested-by: Zi Yan <ziy@nvidia.com>
Suggested-by: Mark Brown <broonie@kernel.org>
Acked-by: SeongJae Park <sj@kernel.org>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Tested-by: Zi Yan <ziy@nvidia.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Kairui Song <ryncsn@gmail.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Testing kexec handover requires a kernel driver that will generate some
data and preserve it with KHO on the first boot and then restore that data
and verify it was preserved properly after kexec.
To facilitate such test, along with the kernel driver responsible for data
generation, preservation and restoration add a script that runs a kernel
in a VM with a minimal /init. The /init enables KHO, loads a kernel image
for kexec and runs kexec reboot. After the boot of the kexeced kernel,
the driver verifies that the data was properly preserved.
[rppt@kernel.org: fix section mismatch]
Link: https://lkml.kernel.org/r/aIiRC8fXiOXKbPM_@kernel.org
Link: https://lkml.kernel.org/r/20250727083733.2590139-1-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Alexander Graf <graf@amazon.com>
Cc: Changyuan Lyu <changyuanl@google.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Pratyush Yadav <pratyush@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Test thread-safety of tld_create_key(). Since tld_create_key() does
not rely on locks but memory barriers and atomic operations to protect
the shared metadata, the thread-safety of the function is non-trivial.
Make sure concurrent tld_key_create(), both valid and invalid, can not
race and corrupt metatada, which may leads to TLDs not being thread-
specific or duplicate TLDs with the same name.
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Link: https://lore.kernel.org/r/20250730185903.3574598-5-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Test basic operations of task local data with valid and invalid
tld_create_key().
For invalid calls, make sure they return the right error code and check
that the TLDs are not inserted by running tld_get_data("
value_not_exists") on the bpf side. The call should a null pointer.
For valid calls, first make sure the TLDs are created by calling
tld_get_data() on the bpf side. The call should return a valid pointer.
Finally, verify that the TLDs are indeed task-specific (i.e., their
addresses do not overlap) with multiple user threads. This done by
writing values unique to each thread, reading them from both user space
and bpf, and checking if the value read back matches the value written.
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Link: https://lore.kernel.org/r/20250730185903.3574598-4-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Task local data defines an abstract storage type for storing task-
specific data (TLD). This patch provides user space and bpf
implementation as header-only libraries for accessing task local data.
Task local data is a bpf task local storage map with two UPTRs:
- tld_meta_u, shared by all tasks of a process, consists of the total
count and size of TLDs and an array of metadata of TLDs. A TLD
metadata contains the size and name. The name is used to identify a
specific TLD in bpf programs.
- u_tld_data points to a task-specific memory. It stores TLD data and
the starting offset of data in a page.
Task local design decouple user space and bpf programs. Since bpf
program does not know the size of TLDs in compile time, u_tld_data
is declared as a page to accommodate TLDs up to a page. As a result,
while user space will likely allocate memory smaller than a page for
actual TLDs, it needs to pin a page to kernel. It will pin the page
that contains enough memory if the allocated memory spans across the
page boundary.
The library also creates another task local storage map, tld_key_map,
to cache keys for bpf programs to speed up the access.
Below are the core task local data API:
User space BPF
Define TLD TLD_DEFINE_KEY(), tld_create_key() -
Init TLD object - tld_object_init()
Get TLD data tld_get_data() tld_get_data()
- TLD_DEFINE_KEY(), tld_create_key()
A TLD is first defined by the user space with TLD_DEFINE_KEY() or
tld_create_key(). TLD_DEFINE_KEY() defines a TLD statically and
allocates just enough memory during initialization. tld_create_key()
allows creating TLDs on the fly, but has a fix memory budget,
TLD_DYN_DATA_SIZE.
Internally, they all call __tld_create_key(), which iterates
tld_meta_u->metadata to check if a TLD can be added. The total TLD
size needs to fit into a page (limit of UPTR), and no two TLDs can
have the same name. If a TLD can be added, u_tld_meta->cnt is
increased using cmpxchg as there may be other concurrent
__tld_create_key(). After a successful cmpxchg, the last available
tld_meta_u->metadata now belongs to the calling thread. To prevent
other threads from reading incomplete metadata while it is being
updated, tld_meta_u->metadata->size is used to signal the completion.
Finally, the offset, derived from adding up prior TLD sizes is then
encapsulated as an opaque object key to prevent user misuse. The
offset is guaranteed to be 8-byte aligned to prevent load/store
tearing and allow atomic operations on it.
- tld_get_data()
User space programs can pass the key to tld_get_data() to get a
pointer to the associated TLD. The pointer will remain valid for the
lifetime of the thread.
tld_data_u is lazily allocated on the first call to tld_get_data().
Trying to read task local data from bpf will result in -ENODATA
during tld_object_init(). The task-specific memory need to be freed
manually by calling tld_free() on thread exit to prevent memory leak
or use TLD_FREE_DATA_ON_THREAD_EXIT.
- tld_object_init() (BPF)
BPF programs need to call tld_object_init() before calling
tld_get_data(). This is to avoid redundant map lookup in
tld_get_data() by storing pointers to the map values on stack.
The pointers are encapsulated as tld_object.
tld_key_map is also created on the first time tld_object_init()
is called to cache TLD keys successfully fetched by tld_get_data().
bpf_task_storage_get(.., F_CREATE) needs to be retried since it may
fail when another thread has already taken the percpu counter lock
for the task local storage.
- tld_get_data() (BPF)
BPF programs can also get a pointer to a TLD with tld_get_data().
It uses the cached key in tld_key_map to locate the data in
tld_data_u->data. If the cached key is not set yet (<= 0),
__tld_fetch_key() will be called to iterate tld_meta_u->metadata
and find the TLD by name. To prevent redundant string comparison
in the future when the search fail, the tld_meta_u->cnt is stored
in the non-positive range of the key. Next time, __tld_fetch_key()
will be called only if there are new TLDs and the search will start
from the newly added tld_meta_u->metadata using the old
tld_meta_u-cnt.
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Link: https://lore.kernel.org/r/20250730185903.3574598-3-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmiNNksACgkQ6rmadz2v
bTrKRhAAnju4bbFRHU88Y68p6Meq/jxgjxHZAkTqZA0Nvbu2cItPRL7XHAAhTWE7
OBEIm3UKCH4gs4fY8rDHiIgnnaQavXUmvXZblOIOjxnqRKJpU3px+wwJvGFq5Enq
WP6UZV8tj+O2tNfNNYS+mgQvvIpUISHGpKimvx7ede3e1U3cJBkppbT3gooMHYuc
5s1QtYHWaPY/1DpkHgqJ2UPGcbT9/HSPGMHRNaHKjQTcNcLcrj7RRjchgXqcc7Vs
hVijvVrLiuK0MyU42ritmaqvjjgD6hKPZguRQe2/hAtrOo0Alf+4mXkMgam7simN
iHfGc7nhw1xAFTPj4WXahja89G00FdDN5NR37Rgurm/i2fY7BuXAkMjiMiwGB3C3
jk2wG3RSifYeC2rxhkYJdqcx8Cz6m+pjgyJ2o9Jy5dn426VXg/kzkUXpl6u5jaPZ
SmKoo9Xu1r7xqTaUc9kk8pJI5Xt9vD5oQjF2KQuPZXxNidiwW6k2OGbW+wF26nEi
Q6pfDu3pvHAd/UE6cD5yFe97o3Cc2XfGwI/Sv2k99UVPvNcvfAvVo9fsItHBhCPn
zHkihW2S0zmbBlhcrB+PrLclNgLleP9JukFN+5scc0a9lbQxIm6v2TNKGlBfDQtO
I+Kn266oqT4BEgnQGlCQquINnQAdmS8VMnnunGOu6+rwPUtkI7E=
=XLHS
-----END PGP SIGNATURE-----
Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Pull bpf fixes from Alexei Starovoitov:
- Fix kCFI failures in JITed BPF code on arm64 (Sami Tolvanen, Puranjay
Mohan, Mark Rutland, Maxwell Bland)
- Disallow tail calls between BPF programs that use different cgroup
local storage maps to prevent out-of-bounds access (Daniel Borkmann)
- Fix unaligned access in flow_dissector and netfilter BPF programs
(Paul Chaignon)
- Avoid possible use of uninitialized mod_len in libbpf (Achill
Gilgenast)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: Test for unaligned flow_dissector ctx access
bpf: Improve ctx access verifier error message
bpf: Check netfilter ctx accesses are aligned
bpf: Check flow_dissector ctx accesses are aligned
arm64/cfi,bpf: Support kCFI + BPF on arm64
cfi: Move BPF CFI types and helpers to generic code
cfi: add C CFI type macro
libbpf: Avoid possible use of uninitialized mod_len
bpf: Fix oob access in cgroup local storage
bpf: Move cgroup iterator helpers to bpf.h
bpf: Move bpf map owner out of common struct
bpf: Add cookie object to bpf maps
- Add documentation template for CXL conventions to document CXL platform quirks
- Replace mutex_lock_io() with mutex_lock() for mailbox
- Add location limit for fake CFMWS range for cxl_test, ARM platform enabling
- CXL documentation typo and clarity fixes
- Use correct format specifier for function cxl_set_ecs_threshold()
- Make cxl_bus_type constant
- Introduce new helper cxl_resource_contains_addr() to check address availability
- Fix wrong DPA checking for PPR operation
- Remove core/acpi.c and CXL core dependency on ACPI
- Introduce ACQUIRE() and ACQUIRE_ERR() for conditional locks
- Add CXL updates utilizing ACQUIRE() macro to remove gotos and improve
readability
- Add return for the dummy version of cxl_decoder_detach() without CONFIG_CXL_REGION
- CXL events updates for spec r3.2
- Fix return of __cxl_decoder_detach() error path
- CXL debugfs documentation fix
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE5DAy15EJMCV1R6v9YGjFFmlTOEoFAmiM6yQACgkQYGjFFmlT
OErolBAAnUcYHhpmPJV46KWIhahOx5yLzPTVTzDjpeYQVsbkWLNSTBAL2ygKZt3d
1zfWSI2f11xyA7nMWf0/2oP6oVRhrN9ty+NtoO118olqT909KQ3PGFnCGbYLLIUb
SxPnZCxdWQ7xOq3KUgnF+2TSYoL+pa5Gv1m2bGCgf29KgPMrOk6wl/IoAwLEFfCU
HVgIPDOo9IeYJkTZEZFSTPP4tr+JXygoFrd5k2+kj/GMqQJ5iDMYZxN4dPLhalic
e4wOXQRMrD23defdegg6VDDWfftBGU3yQqixDNN7sFpWIM/hsnOMVPSz9sCbO0Bg
2130sTPpsJXMVMUJoniPO5Ad3clhM4M8PoZ/QAO1J8Uodwv5wnKKwAWYFVmu6JU1
mpn5xroWci1twXXxXCdlyM+sVA9TqvyFbndYvVWo5CTAyCkVHdx1oAvUbV0YzDXN
5oeJTz42Y9PfawzTDiE5cMlGMGWux/VCiY9n909hg66gmdkUU/7zjjGYPEGT3tNK
GjCQ0SGIss6bNE9b0g55of/o0O+U00n+8TdvzCkqcc/lG0Yqt5jDGQb94cZKNVn0
OX01iyZj87w8vyroA1g2vi0aF4HB+88X7GKeiLuQCLe3iyYSkhNVz5EFyS5dQi7H
r5gN5u7Fq9BfCVIWA6Hh1AU4BGVR1An3hxfib8dV7Ik0Bv1X9jA=
=79ot
-----END PGP SIGNATURE-----
Merge tag 'cxl-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull CXL updates from Dave Jiang:
"The most significant changes in this pull request is the series that
introduces ACQUIRE() and ACQUIRE_ERR() macros to replace conditional
locking and ease the pain points of scoped_cond_guard().
The series also includes follow on changes that refactor the CXL
sub-system to utilize the new macros.
Detail summary:
- Add documentation template for CXL conventions to document CXL
platform quirks
- Replace mutex_lock_io() with mutex_lock() for mailbox
- Add location limit for fake CFMWS range for cxl_test, ARM platform
enabling
- CXL documentation typo and clarity fixes
- Use correct format specifier for function cxl_set_ecs_threshold()
- Make cxl_bus_type constant
- Introduce new helper cxl_resource_contains_addr() to check address
availability
- Fix wrong DPA checking for PPR operation
- Remove core/acpi.c and CXL core dependency on ACPI
- Introduce ACQUIRE() and ACQUIRE_ERR() for conditional locks
- Add CXL updates utilizing ACQUIRE() macro to remove gotos and
improve readability
- Add return for the dummy version of cxl_decoder_detach() without
CONFIG_CXL_REGION
- CXL events updates for spec r3.2
- Fix return of __cxl_decoder_detach() error path
- CXL debugfs documentation fix"
* tag 'cxl-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (28 commits)
Documentation/ABI/testing/debugfs-cxl: Add 'cxl' to clear_poison path
cxl/region: Fix an ERR_PTR() vs NULL bug
cxl/events: Trace Memory Sparing Event Record
cxl/events: Add extra validity checks for CVME count in DRAM Event Record
cxl/events: Add extra validity checks for corrected memory error count in General Media Event Record
cxl/events: Update Common Event Record to CXL spec rev 3.2
cxl: Fix -Werror=return-type in cxl_decoder_detach()
cleanup: Fix documentation build error for ACQUIRE updates
cxl: Convert to ACQUIRE() for conditional rwsem locking
cxl/region: Consolidate cxl_decoder_kill_region() and cxl_region_detach()
cxl/region: Move ready-to-probe state check to a helper
cxl/region: Split commit_store() into __commit() and queue_reset() helpers
cxl/decoder: Drop pointless locking
cxl/decoder: Move decoder register programming to a helper
cxl/mbox: Convert poison list mutex to ACQUIRE()
cleanup: Introduce ACQUIRE() and ACQUIRE_ERR() for conditional locks
cxl: Remove core/acpi.c and cxl core dependency on ACPI
cxl/core: Using cxl_resource_contains_addr() to check address availability
cxl/edac: Fix wrong dpa checking for PPR operation
cxl/core: Introduce a new helper cxl_resource_contains_addr()
...
This patch adds tests for two context fields where unaligned accesses
were not properly rejected.
Note the new macro is similar to the existing narrow_load macro, but we
need a different description and access offset. Combining the two
macros into one is probably doable but I don't think it would help
readability.
vmlinux.h is included in place of bpf.h so we have the definition of
struct bpf_nf_ctx.
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/bf014046ddcf41677fb8b98d150c14027e9fddba.1754039605.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
ifconfig is deprecated and not always present, use ip command instead.
Fixes: e0f3b3e5c7 ("selftests: Add test cases for vlan_filter modification during runtime")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Dong Chenchen <dongchenchen2@huawei.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20250730115313.3356036-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmiL3OkUHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vz9bhAAqiD9REYlNUgGX/bEBgCVPFdtjjTz
FpSLzG23vWd2J0FEy04qtQWH9j71IXnM+yMybzsMe9SsPt2HhczzSCIMpPj0FZNN
ccOf3gA/KqPux7FORrS3mpM8OO4ICt3XZhCji3nNg5iW5XlH+NrQKPVxRlvBB0rP
+7RxSjDClUdZ97QSSmp1uZ7Qh1qyV0Ht0qjPMwecrnB2kApt4ZaMphAaKPEjX/4f
RgZPFqbIpRWt9e87Z8ADr5c2jokZAzIV0zauQ2fhbjBkTcXIXL3yOzUbR+ngBWDD
oq21rXJBUCQheA7J6j2SKabgF9AZaI5NI9ERld5vJ1inXSZCyuyKopN1AzuKZquG
N+jyYJqZC99ePvMLbTWs/spU58J03A6TOwaJNE3ISRgbnxFkhvLl7h68XuTDonZm
hYGloXXUj+i+rh7/eJIDDWa9MTpEvl2p1zc6EDIZ/umlnHwg9rGlGQVARMCs6Ist
EiJQEtjMMlXiBJMkFhpxesOdyonGkxAL9WtT6MoEOFF7dqgsTqSKiDUPa+6MHV+I
tsTB630J3ROsWGfQD1uJI2BrCm+op4j6faamH6UMqCrUU0TUZMHiRR3qVWbM6qgU
/WL1gZ96uy5I7UoE0+gH+wMhMClO2BnsxffocToDE5wOYpGDd5BwPEoY8ej8U2lu
CBMCkMor1jDtS8Y=
=ipv3
-----END PGP SIGNATURE-----
Merge tag 'pci-v6.17-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull PCI updates from Bjorn Helgaas:
"Enumeration:
- Allow built-in drivers, not just modular drivers, to use async
initial probing (Lukas Wunner)
- Support Immediate Readiness even on devices with no PM Capability
(Sean Christopherson)
- Consolidate definition of PCIE_RESET_CONFIG_WAIT_MS (100ms), the
required delay between a reset and sending config requests to a
device (Niklas Cassel)
- Add pci_is_display() to check for "Display" base class and use it
in ALSA hda, vfio, vga_switcheroo, vt-d (Mario Limonciello)
- Allow 'isolated PCI functions' (multi-function devices without a
function 0) for LoongArch, similar to s390 and jailhouse (Huacai
Chen)
Power control:
- Add ability to enable optional slot clock for cases where the PCIe
host controller and the slot are supplied by different clocks
(Marek Vasut)
PCIe native device hotplug:
- Fix runtime PM ref imbalance on Hot-Plug Capable ports caused by
misinterpreting a config read failure after a device has been
removed (Lukas Wunner)
- Avoid creating a useless PCIe port service device for pciehp if the
slot is handled by the ACPI hotplug driver (Lukas Wunner)
- Ignore ACPI hotplug slots when calculating depth of pciehp hotplug
ports (Lukas Wunner)
Virtualization:
- Save VF resizable BAR state and restore it after reset (Michał
Winiarski)
- Allow IOV resources (VF BARs) to be resized (Michał Winiarski)
- Add pci_iov_vf_bar_set_size() so drivers can control VF BAR size
(Michał Winiarski)
Endpoint framework:
- Add RC-to-EP doorbell support using platform MSI controller,
including a test case (Frank Li)
- Allow BAR assignment via configfs so platforms have flexibility in
determining BAR usage (Jerome Brunet)
Native PCIe controller drivers:
- Convert amazon,al-alpine-v[23]-pcie, apm,xgene-pcie,
axis,artpec6-pcie, marvell,armada-3700-pcie, st,spear1340-pcie to
DT schema format (Rob Herring)
- Use dev_fwnode() instead of of_fwnode_handle() to remove OF
dependency in altera (fixes an unused variable), designware-host,
mediatek, mediatek-gen3, mobiveil, plda, xilinx, xilinx-dma,
xilinx-nwl (Jiri Slaby, Arnd Bergmann)
- Convert aardvark, altera, brcmstb, designware-host, iproc,
mediatek, mediatek-gen3, mobiveil, plda, rcar-host, vmd, xilinx,
xilinx-dma, xilinx-nwl from using pci_msi_create_irq_domain() to
using msi_create_parent_irq_domain() instead; this makes the
interrupt controller per-PCI device, allows dynamic allocation of
vectors after initialization, and allows support of IMS (Nam Cao)
APM X-Gene PCIe controller driver:
- Rewrite MSI handling to MSI CPU affinity, drop useless CPU hotplug
bits, use device-managed memory allocations, and clean things up
(Marc Zyngier)
- Probe xgene-msi as a standard platform driver rather than a
subsys_initcall (Marc Zyngier)
Broadcom STB PCIe controller driver:
- Add optional DT 'num-lanes' property and if present, use it to
override the Maximum Link Width advertised in Link Capabilities
(Jim Quinlan)
Cadence PCIe controller driver:
- Use PCIe Message routing types from the PCI core rather than
defining private ones (Hans Zhang)
Freescale i.MX6 PCIe controller driver:
- Add IMX8MQ_EP third 64-bit BAR in epc_features (Richard Zhu)
- Add IMX8MM_EP and IMX8MP_EP fixed 256-byte BAR 4 in epc_features
(Richard Zhu)
- Configure LUT for MSI/IOMMU in Endpoint mode so Root Complex can
trigger doorbel on Endpoint (Frank Li)
- Remove apps_reset (LTSSM_EN) from
imx_pcie_{assert,deassert}_core_reset(), which fixes a hotplug
regression on i.MX8MM (Richard Zhu)
- Delay Endpoint link start until configfs 'start' written (Richard
Zhu)
Intel VMD host bridge driver:
- Add Intel Panther Lake (PTL)-H/P/U Vendor ID (George D Sworo)
Qualcomm PCIe controller driver:
- Add DT binding and driver support for SA8255p, which supports ECAM
for Configuration Space access (Mayank Rana)
- Update DT binding and driver to describe PHYs and per-Root Port
resets in a Root Port stanza and deprecate describing them in the
host bridge; this makes it possible to support multiple Root Ports
in the future (Krishna Chaitanya Chundru)
- Add Qualcomm QCS615 to SM8150 DT binding (Ziyue Zhang)
- Add Qualcomm QCS8300 to SA8775p DT binding (Ziyue Zhang)
- Drop TBU and ref clocks from Qualcomm SM8150 and SC8180x DT
bindings (Konrad Dybcio)
- Document 'link_down' reset in Qualcomm SA8775P DT binding (Ziyue
Zhang)
- Add required PCIE_RESET_CONFIG_WAIT_MS delay after Link up IRQ
(Niklas Cassel)
Rockchip PCIe controller driver:
- Drop unused PCIe Message routing and code definitions (Hans Zhang)
- Remove several unused header includes (Hans Zhang)
- Use standard PCIe config register definitions instead of
rockchip-specific redefinitions (Geraldo Nascimento)
- Set Target Link Speed to 5.0 GT/s before retraining so we have a
chance to train at a higher speed (Geraldo Nascimento)
Rockchip DesignWare PCIe controller driver:
- Prevent race between link training and register update via DBI by
inhibiting link training after hot reset and link down (Wilfred
Mallawa)
- Add required PCIE_RESET_CONFIG_WAIT_MS delay after Link up IRQ
(Niklas Cassel)
Sophgo PCIe controller driver:
- Add DT binding and driver for Sophgo SG2044 PCIe controller driver
in Root Complex mode (Inochi Amaoto)
Synopsys DesignWare PCIe controller driver:
- Add required PCIE_RESET_CONFIG_WAIT_MS after waiting for Link up on
Ports that support > 5.0 GT/s. Slower Ports still rely on the
not-quite-correct PCIE_LINK_WAIT_SLEEP_MS 90ms default delay while
waiting for the Link (Niklas Cassel)"
* tag 'pci-v6.17-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (116 commits)
dt-bindings: PCI: qcom,pcie-sa8775p: Document 'link_down' reset
dt-bindings: PCI: Remove 83xx-512x-pci.txt
dt-bindings: PCI: Convert amazon,al-alpine-v[23]-pcie to DT schema
dt-bindings: PCI: Convert marvell,armada-3700-pcie to DT schema
dt-bindings: PCI: Convert apm,xgene-pcie to DT schema
dt-bindings: PCI: Convert axis,artpec6-pcie to DT schema
dt-bindings: PCI: Convert st,spear1340-pcie to DT schema
PCI: Move is_pciehp check out of pciehp_is_native()
PCI: pciehp: Use is_pciehp instead of is_hotplug_bridge
PCI/portdrv: Use is_pciehp instead of is_hotplug_bridge
PCI/ACPI: Fix runtime PM ref imbalance on Hot-Plug Capable ports
selftests: pci_endpoint: Add doorbell test case
misc: pci_endpoint_test: Add doorbell test case
PCI: endpoint: pci-epf-test: Add doorbell test support
PCI: endpoint: Add pci_epf_align_inbound_addr() helper for inbound address alignment
PCI: endpoint: pci-ep-msi: Add checks for MSI parent and mutability
PCI: endpoint: Add RC-to-EP doorbell support using platform MSI controller
PCI: dwc: Add Sophgo SG2044 PCIe controller driver in Root Complex mode
PCI: vmd: Switch to msi_create_parent_irq_domain()
PCI: vmd: Convert to lock guards
...
The purpose of the "Periodic garbage collection" test case is to make
sure that "extern_valid" neighbors are not flushed during periodic
garbage collection, unlike regular neighbor entries.
The test case is currently doing the following:
1. Changing the base reachable time to 10 seconds so that periodic
garbage collection will run every 5 seconds.
2. Changing the garbage collection stale time to 5 seconds so that
neighbors that have not been used in the last 5 seconds will be
considered for removal.
3. Waiting for the base reachable time change to take effect.
4. Adding an "extern_valid" neighbor, a non-"extern_valid" neighbor and
a bunch of other neighbors so that the threshold ("thresh1") will be
crossed and stale neighbors will be flushed during garbage
collection.
5. Waiting for 10 seconds to give garbage collection a chance to run.
6. Checking that the "extern_valid" neighbor was not flushed and that
the non-"extern_valid" neighbor was flushed.
The test sometimes fails in the netdev CI because the non-"extern_valid"
neighbor was not flushed. I am unable to reproduce this locally, but my
theory that since we do not know exactly when the periodic garbage
collection runs, it is possible for it to run at a time when the
non-"extern_valid" neighbor is still not considered stale.
Fix by moving the addition of the two neighbors before step 3 and by
reducing the garbage collection stale time to 1 second, to ensure that
both neighbors are considered stale when garbage collection runs.
Fixes: 171f2ee31a ("selftests: net: Add a selftest for externally validated neighbor entries")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/netdev/20250728093504.4ebbd73c@kernel.org/
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250731110914.506890-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For catching up the remaining stuff for 6.17: only small updates
and the rest are mostly small fixes.
- Fixes in HD-audio codec driver Kconfig, so that configurations
can be more easily/safely carried between different versions
- Fixes in ASoC SDCA, FSL xcvr, AW88399
- ASoC IMX WM8524 support
- HD-audio and USB-audio quirks and fixes
- A minor selftest fix
-----BEGIN PGP SIGNATURE-----
iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAmiMtFsOHHRpd2FpQHN1
c2UuZGUACgkQLtJE4w1nLE/VTA/+M9QtrWs3IUGUncud9ku4Jd7YnKqR+PE8hUey
Y5rPXpj1jH7wx3bXtSbIzrn5rr2uaZ4LwX6cygmXxh9xwKJDOyCXCQ7v9X19wVFu
cw9SYFnokBVXgQ0fibXixps0YB4Ll86mflckv7TgVT0W4kYmu3XLDRsmtdnXKRrG
7LOdXON+6jbluTQGXJcxh3pXNOMmeJlQ6FxajHgPQ91iQlB5fZc6tEiLZ06fzOaE
BOyWNwitXBcsUjitHralJNUfY1POvSBBai13chY3L2xkntVR7uoojia507+YyCqA
dqGErDclSo85dHd2ZLwGKsT63tPSgPMfP6q7BS08uZrkqrMAH9KsSyg3HB7SHXcm
VvS9Is4ThovPAkq0naaGoT3UIHNuLBAG0dPy56U6YHRQa0rmp/ZBBGhLTvW735lF
uugtfDmimoZ12YmC2DkalptcOeOelB+V6CVWYF9D4558DGqLTwHtvn4zs5/oR7yt
4r6qNiBzrcKqxcMjlXgQYHHdhkCbiqbXw87daDhoLJtBtIXASpRfRBJpOx+lDe9Y
LCRDq9juGsclDQdr+rzeO7NVhLrEiZmkabenYr6VWZSCjle7d5ro7yRc8qLAlu0Z
WRUqw7P0wnIZYgSXieaMaEBndiFo0PJljPJGYS+lGjVKduNM9eTdR12PNwZeVRP+
wFwtKzM=
=Q+iZ
-----END PGP SIGNATURE-----
Merge tag 'sound-6.17-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull more sound updates from Takashi Iwai:
"For catching up the remaining stuff for 6.17: only small updates and
the rest are mostly small fixes.
- Fixes in HD-audio codec driver Kconfig, so that configurations can
be more easily/safely carried between different versions
- Fixes in ASoC SDCA, FSL xcvr, AW88399
- ASoC IMX WM8524 support
- HD-audio and USB-audio quirks and fixes
- A minor selftest fix"
* tag 'sound-6.17-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (21 commits)
ALSA: usb: scarlett2: Fix missing NULL check
mips: Update HD-audio configs again
LoongArch: Update HD-audio codec configs
arm: Update HD-audio configs again
selftests: ALSA: fix memory leak in utimer test
ALSA: usb-audio: Add DSD support for Comtrue USB Audio device
ALSA: hda/hdmi: Enable drivers as default
ALSA: hda/cirrus: Enable drivers as default
ALSA: hda/realtek: Enable drivers as default
ALSA: hda/realtek - Fix mute LED for HP Victus 16-d1xxx (MB 8A26)
ALSA: hda/realtek - Fix mute LED for HP Victus 16-s0xxx
ALSA: hda: Fix the wrong register was used for DVC of TAS2770
ALSA: scarlett2: Add retry on -EPROTO from scarlett2_usb_tx()
ALSA: hda/realtek - Fix mute LED for HP Victus 16-r1xxx
ASoC: codecs: Add acpi_match_table for aw88399 driver
ASoC: dt-bindings: atmel,at91-ssc: add microchip,sam9x7-ssc
ASoC: imx-card: Add WM8524 support
ASoC: fsl_xcvr: get channel status data with firmware exists
ASoC: fsl_xcvr: get channel status data when PHY is not exists
ASoC: SDCA: Add support for -cn- value properties
...
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEL65usyKPHcrRDEicpmLzj2vtYEkFAmiL2DAACgkQpmLzj2vt
YEk9aRAApXAYs/LNZ6hS2FvqxSKJhThLEm4sib6CvdQ82Er92aqrZEutkpb8Rq0/
KMPLYTv9QsZgFKilZJnxIRwZRpy37C7SGO8Y3PSARpakxpCPe76RyqcY+DghrB9+
kBRUzfDq9JR8RM8lcxw5dgEjgI3cz/5gS8OFh/LOVqe2nqub3U3Dr2kQkDE3DSiG
BqIPhIG//TW0al6IxJQVL59egzVqzzqcgK/efBwT7WjToaCjXuZTClDiKASVLcmW
p/xvsPjIBeVuEdOTfboiHowOs3RyXa3zWlHS76/x9JXVDPOrPcleJyoxEedU4BBG
selhkdVEYcCOYMlzz7/p1BrIkkpczWfye5rSGQ05qmOeApMAxicKRp0DfTLm0EGa
bdmDLIncSBRBb5iATrRS+rQcV6PSzLCaB9+rGiWS/EQPFCKh54SZSNvl5rw0WCHY
+b9zmxmr2KG7B8eQQXx0/x6RNCC3gANqkVO4+TXamcGhJaoW2M0+xT9tzid0sL+S
Y7t5C8T4KoI4dc2be76KNs/LlR6F5LK1E0HTkapdG+ZoiLc/PsqK0pnGc0aXd+/z
Ee246MZEMsURgOEA68dEm4XBdh2xp9xVZIsZtaWcfSSQZJnfpx89v9Tl4MUlfqzj
8BPLeYxxg9gxfxpGcP+oGMjXbeKDjSbmRJroOP4jw/WdPZ2rs0s=
=kYcB
-----END PGP SIGNATURE-----
Merge tag 'hid-for-linus-2025073101' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID updates from Jiri Kosina:
- hardening of HID core parser against conversion to 0 bits in s32ton()
by buggy/malicious devices (Alan Stern)
- fix for potential NULL pointer dereference in hid-apple that could be
caused by malicious device with APPLE_MAGIC_BACKLIGHT quirk present
triggering overflow in data field (Qasim Ijaz)
- support for Wake-on-touch in intel-thc (Even Xu)
- support for "Input max input size control" and "Input interrupt
delay" I2C features in order to improve compatibility of THC devices
with legacy HIDI2C touch devices (Even Xu)
- support for Touch Bars on x86 MacBook Pros (Kerem Karabay)
- support for XP-PEN Artist 22R Pro (Joshua Goins)
- third party trackpart support for MacBookPro15,1 (Aditya Garg)
- Apple Magic Keyboard A311[89] USB-C support (Aditya Garg, Grigorii
Sokoli)
- support for operating modes in amd-sfh (Basavaraj Natikar)
- avoid setting up battery timer for Apple and Magicmouse devices
without battery (Aditya Garg)
- fix for behavior of the hid-mcp2221 driver for !CONFIG_IIO cases
(Heiko Schocher)
- other assorted fixups and device ID additions
* tag 'hid-for-linus-2025073101' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: (54 commits)
HID: core: Harden s32ton() against conversion to 0 bits
HID: apple: validate feature-report field count to prevent NULL pointer dereference
HID: core: Improve the kerneldoc for hid_report_len()
selftests/hid: sync python tests to hid-tools 0.10
selftests/hid: sync the python tests to hid-tools 0.8
selftests/hid: run ruff format on the python part
HID: magicmouse: use secs_to_jiffies() for battery timeout
HID: apple: use secs_to_jiffies() for battery timeout
HID: magicmouse: avoid setting up battery timer when not needed
HID: apple: avoid setting up battery timer for devices without battery
HID: amd_sfh: Enable operating mode
HID: uclogic: Add support for XP-PEN Artist 22R Pro
HID: rate-limit hid_warn to prevent log flooding
HID: replace scnprintf() with sysfs_emit()
HID: uclogic: make read-only array reconnect_event static const
HID: mcp-2221: Replace manual comparison with min() macro
HID: intel-thc-hid: Separate max input size control conditional list
HID: mcp2221: set gpio pin mode
HID: multitouch: add device ID for Apple Touch Bar
HID: multitouch: specify that Apple Touch Bar is direct
...
- Add support for cgroup "cpu.max" interface.
- Code organization cleanup so that ext_idle.c doesn't depend on the
source-file-inclusion build method of sched/.
- Drop UP paths in accordance with sched core changes.
- Documentation and other misc changes.
-----BEGIN PGP SIGNATURE-----
iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCaIqnxg4cdGpAa2VybmVs
Lm9yZwAKCRCxYfJx3gVYGUh5AQC6YM7ggRPYRmy28m5B0nubpKtCHqPOAHSd/QbY
MCiThgD+JuE9ewg3wYO/jvJx3NyIRB1McMnAaG59hf6R0Plh5Qo=
=TeLF
-----END PGP SIGNATURE-----
Merge tag 'sched_ext-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext updates from Tejun Heo:
- Add support for cgroup "cpu.max" interface
- Code organization cleanup so that ext_idle.c doesn't depend on the
source-file-inclusion build method of sched/
- Drop UP paths in accordance with sched core changes
- Documentation and other misc changes
* tag 'sched_ext-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
sched_ext: Fix scx_bpf_reenqueue_local() reference
sched_ext: Drop kfuncs marked for removal in 6.15
sched_ext, rcu: Eject BPF scheduler on RCU CPU stall panic
kernel/sched/ext.c: fix typo "occured" -> "occurred" in comments
sched_ext: Add support for cgroup bandwidth control interface
sched_ext, sched/core: Factor out struct scx_task_group
sched_ext: Return NULL in llc_span
sched_ext: Always use SMP versions in kernel/sched/ext_idle.h
sched_ext: Always use SMP versions in kernel/sched/ext_idle.c
sched_ext: Always use SMP versions in kernel/sched/ext.h
sched_ext: Always use SMP versions in kernel/sched/ext.c
sched_ext: Documentation: Clarify time slice handling in task lifecycle
sched_ext: Make scx_locked_rq() inline
sched_ext: Make scx_rq_bypassing() inline
sched_ext: idle: Make local functions static in ext_idle.c
sched_ext: idle: Remove unnecessary ifdef in scx_bpf_cpu_node()
- Allow css_rstat_updated() in NMI context to enable memory accounting for
allocations in NMI context.
- /proc/cgroups doesn't contain useful information for cgroup2 and was
updated to only show v1 controllers. This unfortunately broke something in
the wild. Add an option to bring back the old behavior to ease transition.
- selftest updates and other cleanups.
-----BEGIN PGP SIGNATURE-----
iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCaIqlxQ4cdGpAa2VybmVs
Lm9yZwAKCRCxYfJx3gVYGcTMAQDUlGf50ATWB9hDU7zUG4lVn8s8n8/+x8QFGHn4
e4NERQD9FpU/jLN+cwGgspKo+L9qpu/1g+t36cJLcOuEKKoaQwI=
=FLwx
-----END PGP SIGNATURE-----
Merge tag 'cgroup-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
- Allow css_rstat_updated() in NMI context to enable memory accounting
for allocations in NMI context.
- /proc/cgroups doesn't contain useful information for cgroup2 and was
updated to only show v1 controllers. This unfortunately broke
something in the wild. Add an option to bring back the old behavior
to ease transition.
- selftest updates and other cleanups.
* tag 'cgroup-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: Add compatibility option for content of /proc/cgroups
selftests/cgroup: fix cpu.max tests
cgroup: llist: avoid memory tears for llist_node
selftests: cgroup: Fix missing newline in test_zswap_writeback_one
selftests: cgroup: Allow longer timeout for kmem_dead_cgroups cleanup
memcg: cgroup: call css_rstat_updated irrespective of in_nmi()
cgroup: remove per-cpu per-subsystem locks
cgroup: make css_rstat_updated nmi safe
cgroup: support to enable nmi-safe css_rstat_updated
selftests: cgroup: Fix compilation on pre-cgroupns kernels
selftests: cgroup: Optionally set up v1 environment
selftests: cgroup: Add support for named v1 hierarchies in test_core
selftests: cgroup_util: Add helpers for testing named v1 hierarchies
Documentation: cgroup: add section explaining controller availability
cgroup: Drop sock_cgroup_classid() dummy implementation
- The 4 patch series "mm: ksm: prevent KSM from breaking merging of new
VMAs" from Lorenzo Stoakes addresses an issue with KSM's
PR_SET_MEMORY_MERGE mode: newly mapped VMAs were not eligible for
merging with existing adjacent VMAs.
- The 4 patch series "mm/damon: introduce DAMON_STAT for simple and
practical access monitoring" from SeongJae Park adds a new kernel module
which simplifies the setup and usage of DAMON in production
environments.
- The 6 patch series "stop passing a writeback_control to swap/shmem
writeout" from Christoph Hellwig is a cleanup to the writeback code
which removes a couple of pointers from struct writeback_control.
- The 7 patch series "drivers/base/node.c: optimization and cleanups"
from Donet Tom contains largely uncorrelated cleanups to the NUMA node
setup and management code.
- The 4 patch series "mm: userfaultfd: assorted fixes and cleanups" from
Tal Zussman does some maintenance work on the userfaultfd code.
- The 5 patch series "Readahead tweaks for larger folios" from Ryan
Roberts implements some tuneups for pagecache readahead when it is
reading into order>0 folios.
- The 4 patch series "selftests/mm: Tweaks to the cow test" from Mark
Brown provides some cleanups and consistency improvements to the
selftests code.
- The 4 patch series "Optimize mremap() for large folios" from Dev Jain
does that. A 37% reduction in execution time was measured in a
memset+mremap+munmap microbenchmark.
- The 5 patch series "Remove zero_user()" from Matthew Wilcox expunges
zero_user() in favor of the more modern memzero_page().
- The 3 patch series "mm/huge_memory: vmf_insert_folio_*() and
vmf_insert_pfn_pud() fixes" from David Hildenbrand addresses some warts
which David noticed in the huge page code. These were not known to be
causing any issues at this time.
- The 3 patch series "mm/damon: use alloc_migrate_target() for
DAMOS_MIGRATE_{HOT,COLD" from SeongJae Park provides some cleanup and
consolidation work in DAMON.
- The 3 patch series "use vm_flags_t consistently" from Lorenzo Stoakes
uses vm_flags_t in places where we were inappropriately using other
types.
- The 3 patch series "mm/memfd: Reserve hugetlb folios before
allocation" from Vivek Kasireddy increases the reliability of large page
allocation in the memfd code.
- The 14 patch series "mm: Remove pXX_devmap page table bit and pfn_t
type" from Alistair Popple removes several now-unneeded PFN_* flags.
- The 5 patch series "mm/damon: decouple sysfs from core" from SeongJae
Park implememnts some cleanup and maintainability work in the DAMON
sysfs layer.
- The 5 patch series "madvise cleanup" from Lorenzo Stoakes does quite a
lot of cleanup/maintenance work in the madvise() code.
- The 4 patch series "madvise anon_name cleanups" from Vlastimil Babka
provides additional cleanups on top or Lorenzo's effort.
- The 11 patch series "Implement numa node notifier" from Oscar Salvador
creates a standalone notifier for NUMA node memory state changes.
Previously these were lumped under the more general memory on/offline
notifier.
- The 6 patch series "Make MIGRATE_ISOLATE a standalone bit" from Zi Yan
cleans up the pageblock isolation code and fixes a potential issue which
doesn't seem to cause any problems in practice.
- The 5 patch series "selftests/damon: add python and drgn based DAMON
sysfs functionality tests" from SeongJae Park adds additional drgn- and
python-based DAMON selftests which are more comprehensive than the
existing selftest suite.
- The 5 patch series "Misc rework on hugetlb faulting path" from Oscar
Salvador fixes a rather obscure deadlock in the hugetlb fault code and
follows that fix with a series of cleanups.
- The 3 patch series "cma: factor out allocation logic from
__cma_declare_contiguous_nid" from Mike Rapoport rationalizes and cleans
up the highmem-specific code in the CMA allocator.
- The 28 patch series "mm/migration: rework movable_ops page migration
(part 1)" from David Hildenbrand provides cleanups and
future-preparedness to the migration code.
- The 2 patch series "mm/damon: add trace events for auto-tuned
monitoring intervals and DAMOS quota" from SeongJae Park adds some
tracepoints to some DAMON auto-tuning code.
- The 6 patch series "mm/damon: fix misc bugs in DAMON modules" from
SeongJae Park does that.
- The 6 patch series "mm/damon: misc cleanups" from SeongJae Park also
does what it claims.
- The 4 patch series "mm: folio_pte_batch() improvements" from David
Hildenbrand cleans up the large folio PTE batching code.
- The 13 patch series "mm/damon/vaddr: Allow interleaving in
migrate_{hot,cold} actions" from SeongJae Park facilitates dynamic
alteration of DAMON's inter-node allocation policy.
- The 3 patch series "Remove unmap_and_put_page()" from Vishal Moola
provides a couple of page->folio conversions.
- The 4 patch series "mm: per-node proactive reclaim" from Davidlohr
Bueso implements a per-node control of proactive reclaim - beyond the
current memcg-based implementation.
- The 14 patch series "mm/damon: remove damon_callback" from SeongJae
Park replaces the damon_callback interface with a more general and
powerful damon_call()+damos_walk() interface.
- The 10 patch series "mm/mremap: permit mremap() move of multiple VMAs"
from Lorenzo Stoakes implements a number of mremap cleanups (of course)
in preparation for adding new mremap() functionality: newly permit the
remapping of multiple VMAs when the user is specifying MREMAP_FIXED. It
still excludes some specialized situations where this cannot be
performed reliably.
- The 3 patch series "drop hugetlb_free_pgd_range()" from Anthony Yznaga
switches some sparc hugetlb code over to the generic version and removes
the thus-unneeded hugetlb_free_pgd_range().
- The 4 patch series "mm/damon/sysfs: support periodic and automated
stats update" from SeongJae Park augments the present
userspace-requested update of DAMON sysfs monitoring files. Automatic
update is now provided, along with a tunable to control the update
interval.
- The 4 patch series "Some randome fixes and cleanups to swapfile" from
Kemeng Shi does what is claims.
- The 4 patch series "mm: introduce snapshot_page" from Luiz Capitulino
and David Hildenbrand provides (and uses) a means by which debug-style
functions can grab a copy of a pageframe and inspect it locklessly
without tripping over the races inherent in operating on the live
pageframe directly.
- The 6 patch series "use per-vma locks for /proc/pid/maps reads" from
Suren Baghdasaryan addresses the large contention issues which can be
triggered by reads from that procfs file. Latencies are reduced by more
than half in some situations. The series also introduces several new
selftests for the /proc/pid/maps interface.
- The 6 patch series "__folio_split() clean up" from Zi Yan cleans up
__folio_split()!
- The 7 patch series "Optimize mprotect() for large folios" from Dev
Jain provides some quite large (>3x) speedups to mprotect() when dealing
with large folios.
- The 2 patch series "selftests/mm: reuse FORCE_READ to replace "asm
volatile("" : "+r" (XXX));" and some cleanup" from wang lian does some
cleanup work in the selftests code.
- The 3 patch series "tools/testing: expand mremap testing" from Lorenzo
Stoakes extends the mremap() selftest in several ways, including adding
more checking of Lorenzo's recently added "permit mremap() move of
multiple VMAs" feature.
- The 22 patch series "selftests/damon/sysfs.py: test all parameters"
from SeongJae Park extends the DAMON sysfs interface selftest so that it
tests all possible user-requested parameters. Rather than the present
minimal subset.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaIqcCgAKCRDdBJ7gKXxA
jkVBAQCCn9DR1QP0CRk961ot0cKzOgioSc0aA03DPb2KXRt2kQEAzDAz0ARurFhL
8BzbvI0c+4tntHLXvIlrC33n9KWAOQM=
=XsFy
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2025-07-30-15-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
"As usual, many cleanups. The below blurbiage describes 42 patchsets.
21 of those are partially or fully cleanup work. "cleans up",
"cleanup", "maintainability", "rationalizes", etc.
I never knew the MM code was so dirty.
"mm: ksm: prevent KSM from breaking merging of new VMAs" (Lorenzo Stoakes)
addresses an issue with KSM's PR_SET_MEMORY_MERGE mode: newly
mapped VMAs were not eligible for merging with existing adjacent
VMAs.
"mm/damon: introduce DAMON_STAT for simple and practical access monitoring" (SeongJae Park)
adds a new kernel module which simplifies the setup and usage of
DAMON in production environments.
"stop passing a writeback_control to swap/shmem writeout" (Christoph Hellwig)
is a cleanup to the writeback code which removes a couple of
pointers from struct writeback_control.
"drivers/base/node.c: optimization and cleanups" (Donet Tom)
contains largely uncorrelated cleanups to the NUMA node setup and
management code.
"mm: userfaultfd: assorted fixes and cleanups" (Tal Zussman)
does some maintenance work on the userfaultfd code.
"Readahead tweaks for larger folios" (Ryan Roberts)
implements some tuneups for pagecache readahead when it is reading
into order>0 folios.
"selftests/mm: Tweaks to the cow test" (Mark Brown)
provides some cleanups and consistency improvements to the
selftests code.
"Optimize mremap() for large folios" (Dev Jain)
does that. A 37% reduction in execution time was measured in a
memset+mremap+munmap microbenchmark.
"Remove zero_user()" (Matthew Wilcox)
expunges zero_user() in favor of the more modern memzero_page().
"mm/huge_memory: vmf_insert_folio_*() and vmf_insert_pfn_pud() fixes" (David Hildenbrand)
addresses some warts which David noticed in the huge page code.
These were not known to be causing any issues at this time.
"mm/damon: use alloc_migrate_target() for DAMOS_MIGRATE_{HOT,COLD" (SeongJae Park)
provides some cleanup and consolidation work in DAMON.
"use vm_flags_t consistently" (Lorenzo Stoakes)
uses vm_flags_t in places where we were inappropriately using other
types.
"mm/memfd: Reserve hugetlb folios before allocation" (Vivek Kasireddy)
increases the reliability of large page allocation in the memfd
code.
"mm: Remove pXX_devmap page table bit and pfn_t type" (Alistair Popple)
removes several now-unneeded PFN_* flags.
"mm/damon: decouple sysfs from core" (SeongJae Park)
implememnts some cleanup and maintainability work in the DAMON
sysfs layer.
"madvise cleanup" (Lorenzo Stoakes)
does quite a lot of cleanup/maintenance work in the madvise() code.
"madvise anon_name cleanups" (Vlastimil Babka)
provides additional cleanups on top or Lorenzo's effort.
"Implement numa node notifier" (Oscar Salvador)
creates a standalone notifier for NUMA node memory state changes.
Previously these were lumped under the more general memory
on/offline notifier.
"Make MIGRATE_ISOLATE a standalone bit" (Zi Yan)
cleans up the pageblock isolation code and fixes a potential issue
which doesn't seem to cause any problems in practice.
"selftests/damon: add python and drgn based DAMON sysfs functionality tests" (SeongJae Park)
adds additional drgn- and python-based DAMON selftests which are
more comprehensive than the existing selftest suite.
"Misc rework on hugetlb faulting path" (Oscar Salvador)
fixes a rather obscure deadlock in the hugetlb fault code and
follows that fix with a series of cleanups.
"cma: factor out allocation logic from __cma_declare_contiguous_nid" (Mike Rapoport)
rationalizes and cleans up the highmem-specific code in the CMA
allocator.
"mm/migration: rework movable_ops page migration (part 1)" (David Hildenbrand)
provides cleanups and future-preparedness to the migration code.
"mm/damon: add trace events for auto-tuned monitoring intervals and DAMOS quota" (SeongJae Park)
adds some tracepoints to some DAMON auto-tuning code.
"mm/damon: fix misc bugs in DAMON modules" (SeongJae Park)
does that.
"mm/damon: misc cleanups" (SeongJae Park)
also does what it claims.
"mm: folio_pte_batch() improvements" (David Hildenbrand)
cleans up the large folio PTE batching code.
"mm/damon/vaddr: Allow interleaving in migrate_{hot,cold} actions" (SeongJae Park)
facilitates dynamic alteration of DAMON's inter-node allocation
policy.
"Remove unmap_and_put_page()" (Vishal Moola)
provides a couple of page->folio conversions.
"mm: per-node proactive reclaim" (Davidlohr Bueso)
implements a per-node control of proactive reclaim - beyond the
current memcg-based implementation.
"mm/damon: remove damon_callback" (SeongJae Park)
replaces the damon_callback interface with a more general and
powerful damon_call()+damos_walk() interface.
"mm/mremap: permit mremap() move of multiple VMAs" (Lorenzo Stoakes)
implements a number of mremap cleanups (of course) in preparation
for adding new mremap() functionality: newly permit the remapping
of multiple VMAs when the user is specifying MREMAP_FIXED. It still
excludes some specialized situations where this cannot be performed
reliably.
"drop hugetlb_free_pgd_range()" (Anthony Yznaga)
switches some sparc hugetlb code over to the generic version and
removes the thus-unneeded hugetlb_free_pgd_range().
"mm/damon/sysfs: support periodic and automated stats update" (SeongJae Park)
augments the present userspace-requested update of DAMON sysfs
monitoring files. Automatic update is now provided, along with a
tunable to control the update interval.
"Some randome fixes and cleanups to swapfile" (Kemeng Shi)
does what is claims.
"mm: introduce snapshot_page" (Luiz Capitulino and David Hildenbrand)
provides (and uses) a means by which debug-style functions can grab
a copy of a pageframe and inspect it locklessly without tripping
over the races inherent in operating on the live pageframe
directly.
"use per-vma locks for /proc/pid/maps reads" (Suren Baghdasaryan)
addresses the large contention issues which can be triggered by
reads from that procfs file. Latencies are reduced by more than
half in some situations. The series also introduces several new
selftests for the /proc/pid/maps interface.
"__folio_split() clean up" (Zi Yan)
cleans up __folio_split()!
"Optimize mprotect() for large folios" (Dev Jain)
provides some quite large (>3x) speedups to mprotect() when dealing
with large folios.
"selftests/mm: reuse FORCE_READ to replace "asm volatile("" : "+r" (XXX));" and some cleanup" (wang lian)
does some cleanup work in the selftests code.
"tools/testing: expand mremap testing" (Lorenzo Stoakes)
extends the mremap() selftest in several ways, including adding
more checking of Lorenzo's recently added "permit mremap() move of
multiple VMAs" feature.
"selftests/damon/sysfs.py: test all parameters" (SeongJae Park)
extends the DAMON sysfs interface selftest so that it tests all
possible user-requested parameters. Rather than the present minimal
subset"
* tag 'mm-stable-2025-07-30-15-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (370 commits)
MAINTAINERS: add missing headers to mempory policy & migration section
MAINTAINERS: add missing file to cgroup section
MAINTAINERS: add MM MISC section, add missing files to MISC and CORE
MAINTAINERS: add missing zsmalloc file
MAINTAINERS: add missing files to page alloc section
MAINTAINERS: add missing shrinker files
MAINTAINERS: move memremap.[ch] to hotplug section
MAINTAINERS: add missing mm_slot.h file THP section
MAINTAINERS: add missing interval_tree.c to memory mapping section
MAINTAINERS: add missing percpu-internal.h file to per-cpu section
mm/page_alloc: remove trace_mm_alloc_contig_migrate_range_info()
selftests/damon: introduce _common.sh to host shared function
selftests/damon/sysfs.py: test runtime reduction of DAMON parameters
selftests/damon/sysfs.py: test non-default parameters runtime commit
selftests/damon/sysfs.py: generalize DAMON context commit assertion
selftests/damon/sysfs.py: generalize monitoring attributes commit assertion
selftests/damon/sysfs.py: generalize DAMOS schemes commit assertion
selftests/damon/sysfs.py: test DAMOS filters commitment
selftests/damon/sysfs.py: generalize DAMOS scheme commit assertion
selftests/damon/sysfs.py: test DAMOS destinations commitment
...
- IOMMU HW now has features to directly assign HW command queues to a
guest VM. In this mode the command queue operates on a limited set of
invalidation commands that are suitable for improving guest invalidation
performance and easy for the HW to virtualize.
This PR brings the generic infrastructure to allow IOMMU drivers to
expose such command queues through the iommufd uAPI, mmap the doorbell
pages, and get the guest physical range for the command queue ring
itself.
- An implementation for the NVIDIA SMMUv3 extension "cmdqv" is built on
the new iommufd command queue features. It works with the existing SMMU
driver support for cmdqv in guest VMs.
- Many precursor cleanups and improvements to support the above cleanly,
changes to the general ioctl and object helpers, driver support for
VDEVICE, and mmap pgoff cookie infrastructure.
- Sequence VDEVICE destruction to always happen before VFIO device
destruction. When using the above type features, and also in future
confidential compute, the internal virtual device representation becomes
linked to HW or CC TSM configuration and objects. If a VFIO device is
removed from iommufd those HW objects should also be cleaned up to
prevent a sort of UAF. This became important now that we have HW backing
the VDEVICE.
- Fix one syzkaller found error related to math overflows during iova
allocation
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCaIpl9AAKCRCFwuHvBreF
YS5tAP9MDIRML5a/2IOhzcsc4LiDkWTMKm2m1wcRYd+iU2aFVQEAjdghINLHrUlx
HVuIDvNvWIUED/oTAp5kCxQ7PBFN4gU=
=NmCO
-----END PGP SIGNATURE-----
Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd
Pull iommufd updates from Jason Gunthorpe:
"This broadly brings the assigned HW command queue support to iommufd.
This feature is used to improve SVA performance in VMs by avoiding
paravirtualization traps during SVA invalidations.
Along the way I think some of the core logic is in a much better state
to support future driver backed features.
Summary:
- IOMMU HW now has features to directly assign HW command queues to a
guest VM. In this mode the command queue operates on a limited set
of invalidation commands that are suitable for improving guest
invalidation performance and easy for the HW to virtualize.
This brings the generic infrastructure to allow IOMMU drivers to
expose such command queues through the iommufd uAPI, mmap the
doorbell pages, and get the guest physical range for the command
queue ring itself.
- An implementation for the NVIDIA SMMUv3 extension "cmdqv" is built
on the new iommufd command queue features. It works with the
existing SMMU driver support for cmdqv in guest VMs.
- Many precursor cleanups and improvements to support the above
cleanly, changes to the general ioctl and object helpers, driver
support for VDEVICE, and mmap pgoff cookie infrastructure.
- Sequence VDEVICE destruction to always happen before VFIO device
destruction. When using the above type features, and also in future
confidential compute, the internal virtual device representation
becomes linked to HW or CC TSM configuration and objects. If a VFIO
device is removed from iommufd those HW objects should also be
cleaned up to prevent a sort of UAF. This became important now that
we have HW backing the VDEVICE.
- Fix one syzkaller found error related to math overflows during iova
allocation"
* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (57 commits)
iommu/arm-smmu-v3: Replace vsmmu_size/type with get_viommu_size
iommu/arm-smmu-v3: Do not bother impl_ops if IOMMU_VIOMMU_TYPE_ARM_SMMUV3
iommufd: Rename some shortterm-related identifiers
iommufd/selftest: Add coverage for vdevice tombstone
iommufd/selftest: Explicitly skip tests for inapplicable variant
iommufd/vdevice: Remove struct device reference from struct vdevice
iommufd: Destroy vdevice on idevice destroy
iommufd: Add a pre_destroy() op for objects
iommufd: Add iommufd_object_tombstone_user() helper
iommufd/viommu: Roll back to use iommufd_object_alloc() for vdevice
iommufd/selftest: Test reserved regions near ULONG_MAX
iommufd: Prevent ALIGN() overflow
iommu/tegra241-cmdqv: import IOMMUFD module namespace
iommufd: Do not allow _iommufd_object_alloc_ucmd if abort op is set
iommu/tegra241-cmdqv: Add IOMMU_VEVENTQ_TYPE_TEGRA241_CMDQV support
iommu/tegra241-cmdqv: Add user-space use support
iommu/tegra241-cmdqv: Do not statically map LVCMDQs
iommu/tegra241-cmdqv: Simplify deinit flow in tegra241_cmdqv_remove_vintf()
iommu/tegra241-cmdqv: Use request_threaded_irq
iommu/arm-smmu-v3-iommufd: Add hw_info to impl_ops
...
- Host driver for GICv5, the next generation interrupt controller for
arm64, including support for interrupt routing, MSIs, interrupt
translation and wired interrupts.
- Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on
GICv5 hardware, leveraging the legacy VGIC interface.
- Userspace control of the 'nASSGIcap' GICv3 feature, allowing
userspace to disable support for SGIs w/o an active state on hardware
that previously advertised it unconditionally.
- Map supporting endpoints with cacheable memory attributes on systems
with FEAT_S2FWB and DIC where KVM no longer needs to perform cache
maintenance on the address range.
- Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the guest
hypervisor to inject external aborts into an L2 VM and take traps of
masked external aborts to the hypervisor.
- Convert more system register sanitization to the config-driven
implementation.
- Fixes to the visibility of EL2 registers, namely making VGICv3 system
registers accessible through the VGIC device instead of the ONE_REG
vCPU ioctls.
- Various cleanups and minor fixes.
LoongArch:
- Add stat information for in-kernel irqchip
- Add tracepoints for CPUCFG and CSR emulation exits
- Enhance in-kernel irqchip emulation
- Various cleanups.
RISC-V:
- Enable ring-based dirty memory tracking
- Improve perf kvm stat to report interrupt events
- Delegate illegal instruction trap to VS-mode
- MMU improvements related to upcoming nested virtualization
s390x
- Fixes
x86:
- Add CONFIG_KVM_IOAPIC for x86 to allow disabling support for I/O APIC,
PIC, and PIT emulation at compile time.
- Share device posted IRQ code between SVM and VMX and
harden it against bugs and runtime errors.
- Use vcpu_idx, not vcpu_id, for GA log tag/metadata, to make lookups O(1)
instead of O(n).
- For MMIO stale data mitigation, track whether or not a vCPU has access to
(host) MMIO based on whether the page tables have MMIO pfns mapped; using
VFIO is prone to false negatives
- Rework the MSR interception code so that the SVM and VMX APIs are more or
less identical.
- Recalculate all MSR intercepts from scratch on MSR filter changes,
instead of maintaining shadow bitmaps.
- Advertise support for LKGS (Load Kernel GS base), a new instruction
that's loosely related to FRED, but is supported and enumerated
independently.
- Fix a user-triggerable WARN that syzkaller found by setting the vCPU
in INIT_RECEIVED state (aka wait-for-SIPI), and then putting the vCPU
into VMX Root Mode (post-VMXON). Trying to detect every possible path
leading to architecturally forbidden states is hard and even risks
breaking userspace (if it goes from valid to valid state but passes
through invalid states), so just wait until KVM_RUN to detect that
the vCPU state isn't allowed.
- Add KVM_X86_DISABLE_EXITS_APERFMPERF to allow disabling interception of
APERF/MPERF reads, so that a "properly" configured VM can access
APERF/MPERF. This has many caveats (APERF/MPERF cannot be zeroed
on vCPU creation or saved/restored on suspend and resume, or preserved
over thread migration let alone VM migration) but can be useful whenever
you're interested in letting Linux guests see the effective physical CPU
frequency in /proc/cpuinfo.
- Reject KVM_SET_TSC_KHZ for vm file descriptors if vCPUs have been
created, as there's no known use case for changing the default
frequency for other VM types and it goes counter to the very reason
why the ioctl was added to the vm file descriptor. And also, there
would be no way to make it work for confidential VMs with a "secure"
TSC, so kill two birds with one stone.
- Dynamically allocation the shadow MMU's hashed page list, and defer
allocating the hashed list until it's actually needed (the TDP MMU
doesn't use the list).
- Extract many of KVM's helpers for accessing architectural local APIC
state to common x86 so that they can be shared by guest-side code for
Secure AVIC.
- Various cleanups and fixes.
x86 (Intel):
- Preserve the host's DEBUGCTL.FREEZE_IN_SMM when running the guest.
Failure to honor FREEZE_IN_SMM can leak host state into guests.
- Explicitly check vmcs12.GUEST_DEBUGCTL on nested VM-Enter to prevent
L1 from running L2 with features that KVM doesn't support, e.g. BTF.
x86 (AMD):
- WARN and reject loading kvm-amd.ko instead of panicking the kernel if the
nested SVM MSRPM offsets tracker can't handle an MSR (which is pretty
much a static condition and therefore should never happen, but still).
- Fix a variety of flaws and bugs in the AVIC device posted IRQ code.
- Inhibit AVIC if a vCPU's ID is too big (relative to what hardware
supports) instead of rejecting vCPU creation.
- Extend enable_ipiv module param support to SVM, by simply leaving
IsRunning clear in the vCPU's physical ID table entry.
- Disable IPI virtualization, via enable_ipiv, if the CPU is affected by
erratum #1235, to allow (safely) enabling AVIC on such CPUs.
- Request GA Log interrupts if and only if the target vCPU is blocking,
i.e. only if KVM needs a notification in order to wake the vCPU.
- Intercept SPEC_CTRL on AMD if the MSR shouldn't exist according to the
vCPU's CPUID model.
- Accept any SNP policy that is accepted by the firmware with respect to
SMT and single-socket restrictions. An incompatible policy doesn't put
the kernel at risk in any way, so there's no reason for KVM to care.
- Drop a superfluous WBINVD (on all CPUs!) when destroying a VM and
use WBNOINVD instead of WBINVD when possible for SEV cache maintenance.
- When reclaiming memory from an SEV guest, only do cache flushes on CPUs
that have ever run a vCPU for the guest, i.e. don't flush the caches for
CPUs that can't possibly have cache lines with dirty, encrypted data.
Generic:
- Rework irqbypass to track/match producers and consumers via an xarray
instead of a linked list. Using a linked list leads to O(n^2) insertion
times, which is hugely problematic for use cases that create large
numbers of VMs. Such use cases typically don't actually use irqbypass,
but eliminating the pointless registration is a future problem to
solve as it likely requires new uAPI.
- Track irqbypass's "token" as "struct eventfd_ctx *" instead of a "void *",
to avoid making a simple concept unnecessarily difficult to understand.
- Decouple device posted IRQs from VFIO device assignment, as binding a VM
to a VFIO group is not a requirement for enabling device posted IRQs.
- Clean up and document/comment the irqfd assignment code.
- Disallow binding multiple irqfds to an eventfd with a priority waiter,
i.e. ensure an eventfd is bound to at most one irqfd through the entire
host, and add a selftest to verify eventfd:irqfd bindings are globally
unique.
- Add a tracepoint for KVM_SET_MEMORY_ATTRIBUTES to help debug issues
related to private <=> shared memory conversions.
- Drop guest_memfd's .getattr() implementation as the VFS layer will call
generic_fillattr() if inode_operations.getattr is NULL.
- Fix issues with dirty ring harvesting where KVM doesn't bound the
processing of entries in any way, which allows userspace to keep KVM
in a tight loop indefinitely.
- Kill off kvm_arch_{start,end}_assignment() and x86's associated tracking,
now that KVM no longer uses assigned_device_count as a heuristic for
either irqbypass usage or MDS mitigation.
Selftests:
- Fix a comment typo.
- Verify KVM is loaded when getting any KVM module param so that attempting
to run a selftest without kvm.ko loaded results in a SKIP message about
KVM not being loaded/enabled (versus some random parameter not existing).
- Skip tests that hit EACCES when attempting to access a file, and rpint
a "Root required?" help message. In most cases, the test just needs to
be run with elevated permissions.
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmiKXMgUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroMhMQf/QDhC/CP1aGXph2whuyeD2NMqPKiU
9KdnDNST+ftPwjg9QxZ9mTaa8zeVz/wly6XlxD9OQHy+opM1wcys3k0GZAFFEEQm
YrThgURdzEZ3nwJZgb+m0t4wjJQtpiFIBwAf7qq6z1VrqQBEmHXJ/8QxGuqO+BNC
j5q/X+q6KZwehKI6lgFBrrOKWFaxqhnRAYfW6rGBxRXxzTJuna37fvDpodQnNceN
zOiq+avfriUMArTXTqOteJNKU0229HjiPSnjILLnFQ+B3akBlwNG0jk7TMaAKR6q
IZWG1EIS9q1BAkGXaw6DE1y6d/YwtXCR5qgAIkiGwaPt5yj9Oj6kRN2Ytw==
=j2At
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"ARM:
- Host driver for GICv5, the next generation interrupt controller for
arm64, including support for interrupt routing, MSIs, interrupt
translation and wired interrupts
- Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on
GICv5 hardware, leveraging the legacy VGIC interface
- Userspace control of the 'nASSGIcap' GICv3 feature, allowing
userspace to disable support for SGIs w/o an active state on
hardware that previously advertised it unconditionally
- Map supporting endpoints with cacheable memory attributes on
systems with FEAT_S2FWB and DIC where KVM no longer needs to
perform cache maintenance on the address range
- Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the
guest hypervisor to inject external aborts into an L2 VM and take
traps of masked external aborts to the hypervisor
- Convert more system register sanitization to the config-driven
implementation
- Fixes to the visibility of EL2 registers, namely making VGICv3
system registers accessible through the VGIC device instead of the
ONE_REG vCPU ioctls
- Various cleanups and minor fixes
LoongArch:
- Add stat information for in-kernel irqchip
- Add tracepoints for CPUCFG and CSR emulation exits
- Enhance in-kernel irqchip emulation
- Various cleanups
RISC-V:
- Enable ring-based dirty memory tracking
- Improve perf kvm stat to report interrupt events
- Delegate illegal instruction trap to VS-mode
- MMU improvements related to upcoming nested virtualization
s390x
- Fixes
x86:
- Add CONFIG_KVM_IOAPIC for x86 to allow disabling support for I/O
APIC, PIC, and PIT emulation at compile time
- Share device posted IRQ code between SVM and VMX and harden it
against bugs and runtime errors
- Use vcpu_idx, not vcpu_id, for GA log tag/metadata, to make lookups
O(1) instead of O(n)
- For MMIO stale data mitigation, track whether or not a vCPU has
access to (host) MMIO based on whether the page tables have MMIO
pfns mapped; using VFIO is prone to false negatives
- Rework the MSR interception code so that the SVM and VMX APIs are
more or less identical
- Recalculate all MSR intercepts from scratch on MSR filter changes,
instead of maintaining shadow bitmaps
- Advertise support for LKGS (Load Kernel GS base), a new instruction
that's loosely related to FRED, but is supported and enumerated
independently
- Fix a user-triggerable WARN that syzkaller found by setting the
vCPU in INIT_RECEIVED state (aka wait-for-SIPI), and then putting
the vCPU into VMX Root Mode (post-VMXON). Trying to detect every
possible path leading to architecturally forbidden states is hard
and even risks breaking userspace (if it goes from valid to valid
state but passes through invalid states), so just wait until
KVM_RUN to detect that the vCPU state isn't allowed
- Add KVM_X86_DISABLE_EXITS_APERFMPERF to allow disabling
interception of APERF/MPERF reads, so that a "properly" configured
VM can access APERF/MPERF. This has many caveats (APERF/MPERF
cannot be zeroed on vCPU creation or saved/restored on suspend and
resume, or preserved over thread migration let alone VM migration)
but can be useful whenever you're interested in letting Linux
guests see the effective physical CPU frequency in /proc/cpuinfo
- Reject KVM_SET_TSC_KHZ for vm file descriptors if vCPUs have been
created, as there's no known use case for changing the default
frequency for other VM types and it goes counter to the very reason
why the ioctl was added to the vm file descriptor. And also, there
would be no way to make it work for confidential VMs with a
"secure" TSC, so kill two birds with one stone
- Dynamically allocation the shadow MMU's hashed page list, and defer
allocating the hashed list until it's actually needed (the TDP MMU
doesn't use the list)
- Extract many of KVM's helpers for accessing architectural local
APIC state to common x86 so that they can be shared by guest-side
code for Secure AVIC
- Various cleanups and fixes
x86 (Intel):
- Preserve the host's DEBUGCTL.FREEZE_IN_SMM when running the guest.
Failure to honor FREEZE_IN_SMM can leak host state into guests
- Explicitly check vmcs12.GUEST_DEBUGCTL on nested VM-Enter to
prevent L1 from running L2 with features that KVM doesn't support,
e.g. BTF
x86 (AMD):
- WARN and reject loading kvm-amd.ko instead of panicking the kernel
if the nested SVM MSRPM offsets tracker can't handle an MSR (which
is pretty much a static condition and therefore should never
happen, but still)
- Fix a variety of flaws and bugs in the AVIC device posted IRQ code
- Inhibit AVIC if a vCPU's ID is too big (relative to what hardware
supports) instead of rejecting vCPU creation
- Extend enable_ipiv module param support to SVM, by simply leaving
IsRunning clear in the vCPU's physical ID table entry
- Disable IPI virtualization, via enable_ipiv, if the CPU is affected
by erratum #1235, to allow (safely) enabling AVIC on such CPUs
- Request GA Log interrupts if and only if the target vCPU is
blocking, i.e. only if KVM needs a notification in order to wake
the vCPU
- Intercept SPEC_CTRL on AMD if the MSR shouldn't exist according to
the vCPU's CPUID model
- Accept any SNP policy that is accepted by the firmware with respect
to SMT and single-socket restrictions. An incompatible policy
doesn't put the kernel at risk in any way, so there's no reason for
KVM to care
- Drop a superfluous WBINVD (on all CPUs!) when destroying a VM and
use WBNOINVD instead of WBINVD when possible for SEV cache
maintenance
- When reclaiming memory from an SEV guest, only do cache flushes on
CPUs that have ever run a vCPU for the guest, i.e. don't flush the
caches for CPUs that can't possibly have cache lines with dirty,
encrypted data
Generic:
- Rework irqbypass to track/match producers and consumers via an
xarray instead of a linked list. Using a linked list leads to
O(n^2) insertion times, which is hugely problematic for use cases
that create large numbers of VMs. Such use cases typically don't
actually use irqbypass, but eliminating the pointless registration
is a future problem to solve as it likely requires new uAPI
- Track irqbypass's "token" as "struct eventfd_ctx *" instead of a
"void *", to avoid making a simple concept unnecessarily difficult
to understand
- Decouple device posted IRQs from VFIO device assignment, as binding
a VM to a VFIO group is not a requirement for enabling device
posted IRQs
- Clean up and document/comment the irqfd assignment code
- Disallow binding multiple irqfds to an eventfd with a priority
waiter, i.e. ensure an eventfd is bound to at most one irqfd
through the entire host, and add a selftest to verify eventfd:irqfd
bindings are globally unique
- Add a tracepoint for KVM_SET_MEMORY_ATTRIBUTES to help debug issues
related to private <=> shared memory conversions
- Drop guest_memfd's .getattr() implementation as the VFS layer will
call generic_fillattr() if inode_operations.getattr is NULL
- Fix issues with dirty ring harvesting where KVM doesn't bound the
processing of entries in any way, which allows userspace to keep
KVM in a tight loop indefinitely
- Kill off kvm_arch_{start,end}_assignment() and x86's associated
tracking, now that KVM no longer uses assigned_device_count as a
heuristic for either irqbypass usage or MDS mitigation
Selftests:
- Fix a comment typo
- Verify KVM is loaded when getting any KVM module param so that
attempting to run a selftest without kvm.ko loaded results in a
SKIP message about KVM not being loaded/enabled (versus some random
parameter not existing)
- Skip tests that hit EACCES when attempting to access a file, and
print a "Root required?" help message. In most cases, the test just
needs to be run with elevated permissions"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (340 commits)
Documentation: KVM: Use unordered list for pre-init VGIC registers
RISC-V: KVM: Avoid re-acquiring memslot in kvm_riscv_gstage_map()
RISC-V: KVM: Use find_vma_intersection() to search for intersecting VMAs
RISC-V: perf/kvm: Add reporting of interrupt events
RISC-V: KVM: Enable ring-based dirty memory tracking
RISC-V: KVM: Fix inclusion of Smnpm in the guest ISA bitmap
RISC-V: KVM: Delegate illegal instruction fault to VS mode
RISC-V: KVM: Pass VMID as parameter to kvm_riscv_hfence_xyz() APIs
RISC-V: KVM: Factor-out g-stage page table management
RISC-V: KVM: Add vmid field to struct kvm_riscv_hfence
RISC-V: KVM: Introduce struct kvm_gstage_mapping
RISC-V: KVM: Factor-out MMU related declarations into separate headers
RISC-V: KVM: Use ncsr_xyz() in kvm_riscv_vcpu_trap_redirect()
RISC-V: KVM: Implement kvm_arch_flush_remote_tlbs_range()
RISC-V: KVM: Don't flush TLB when PTE is unchanged
RISC-V: KVM: Replace KVM_REQ_HFENCE_GVMA_VMID_ALL with KVM_REQ_TLB_FLUSH
RISC-V: KVM: Rename and move kvm_riscv_local_tlb_sanitize()
RISC-V: KVM: Drop the return value of kvm_riscv_vcpu_aia_init()
RISC-V: KVM: Check kvm_riscv_vcpu_alloc_vector_context() return value
KVM: arm64: selftests: Add FEAT_RAS EL2 registers to get-reg-list
...
- Add new -D option that allows to override variables and options
For example:
./ktest.pl -DPATCH_START:=HEAD~1 -DOUTPUT_DIR=/work/build/urgent config
The above sets the variable "PATCH_START" to HEAD~1 and the OUTPUT_DIR
option to "/work/build/urgent".
This is useful because currently the only way to make a slight change to a
config file is by modifying that config file. For one time changes, this
can be annoying. Having a way to do a one time override from the command
line simplifies the workflow.
Temp variables (PATCH_START) will override every temp variable in the
config file, whereas options will act like a normal OVERRIDE option and
will only affect the session they define.
-DBUILD_OUTPUT=/work/git/linux.git
Replaces the default BUILD_OUTPUT option.
'-DBUILD_OUTPUT[2]=/work/git/linux.git'
Only replaces the BUILD_OUTPUT variable for test #2.
- If an option contains itself, just drop it instead of going into an
infinite loop and failing to parse (it doesn't crash, it detects the
recursion after 100 iterations anyway).
Some configs may define a variable with the same name as the option:
ADD_CONFIG := $(ADD_CONFIG)
But if the option doesn't exist, it the above will fail to parse. In these
cases, just ignore evaluating the option inside the definition of another
option if it has the same name.
- Display the BUILD_DIR and OUTPUT_DIR options at the start of every test
It is useful to know which kernel source and what destination a test is
using when it starts, in case a mistake is made. This makes it easier to
abort the test if the wrong source or destination is being used instead of
waiting until the test completes.
- Add new PATCHCHECK_SKIP option
When testing a series of commits that also includes changes to the Linux
tools directory, it is useless to test the changes in tools as they may
not affect the kernel itself. Doing tests on the kernel for changes that
do not affect the kernel is a waste of time.
Add a PATCHCHECK_SKIP that takes a series of shas that will be skipped
while doing the individual commit tests.
-----BEGIN PGP SIGNATURE-----
iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaIjO+hQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qmJiAP93udmfI+j4+IcyrYg6hkm2F+stakot
7TgM/YCgB8tTSAEA4iMGkAGxEUlHVv5Has7JFW1ajsqk2ZHlXlu9lSxvVwg=
=JDuF
-----END PGP SIGNATURE-----
Merge tag 'ktest-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest
Pull ktest updates from Steven Rostedt:
- Add new -D option that allows to override variables and options
For example:
./ktest.pl -DPATCH_START:=HEAD~1 -DOUTPUT_DIR=/work/build/urgent config
The above sets the variable "PATCH_START" to HEAD~1 and the
OUTPUT_DIR option to "/work/build/urgent".
This is useful because currently the only way to make a slight change
to a config file is by modifying that config file. For one time
changes, this can be annoying. Having a way to do a one time override
from the command line simplifies the workflow.
Temp variables (PATCH_START) will override every temp variable in the
config file, whereas options will act like a normal OVERRIDE option
and will only affect the session they define.
-DBUILD_OUTPUT=/work/git/linux.git
Replaces the default BUILD_OUTPUT option.
'-DBUILD_OUTPUT[2]=/work/git/linux.git'
Only replaces the BUILD_OUTPUT variable for test #2.
- If an option contains itself, just drop it instead of going into an
infinite loop and failing to parse (it doesn't crash, it detects the
recursion after 100 iterations anyway).
Some configs may define a variable with the same name as the option:
ADD_CONFIG := $(ADD_CONFIG)
But if the option doesn't exist, it the above will fail to parse. In
these cases, just ignore evaluating the option inside the definition
of another option if it has the same name.
- Display the BUILD_DIR and OUTPUT_DIR options at the start of every
test
It is useful to know which kernel source and what destination a test
is using when it starts, in case a mistake is made. This makes it
easier to abort the test if the wrong source or destination is being
used instead of waiting until the test completes.
- Add new PATCHCHECK_SKIP option
When testing a series of commits that also includes changes to the
Linux tools directory, it is useless to test the changes in tools as
they may not affect the kernel itself. Doing tests on the kernel for
changes that do not affect the kernel is a waste of time.
Add a PATCHCHECK_SKIP that takes a series of shas that will be
skipped while doing the individual commit tests.
* tag 'ktest-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest:
ktest.pl: Add new PATCHCHECK_SKIP option to skip testing individual commits
ktest.pl: Always display BUILD_DIR and OUTPUT_DIR at the start of tests
ktest.pl: Prevent recursion of default variable options
ktest.pl: Have -D option work without a space
ktest.pl: Allow command option -D to override temp variables
ktest.pl: Add -D option to override options
- Stack usage reduction for probe events:
- Allocate string buffers from the heap for uprobe, eprobe, kprobe,
and fprobe events to avoid stack overflow.
- Allocate traceprobe_parse_context from the heap to prevent
potential stack overflow.
- Fix a typo in the above commit.
- New features for eprobe and tprobe events:
- Add support for arrays in eprobes.
- Support multiple tprobes on the same tracepoint.
- Improve efficiency:
- Register fprobe-events only when it is enabled to reduce overhead.
- Register tracepoints for tprobe events only when enabled to
resolve a lock dependency.
- Code Cleanup:
- Add kerneldoc for traceprobe_parse_event_name() and __get_insn_slot().
- Sort #include alphabetically in the probes code.
- Remove the unused 'mod' field from the tprobe-event.
- Clean up the entry-arg storing code in probe-events.
- Selftest update
- Enable fprobe events before checking enable_functions in selftests.
-----BEGIN PGP SIGNATURE-----
iQFPBAABCgA5FiEEh7BulGwFlgAOi5DV2/sHvwUrPxsFAmiJ2DQbHG1hc2FtaS5o
aXJhbWF0c3VAZ21haWwuY29tAAoJENv7B78FKz8bSfkH/06Zn5I55rU85FKSBQll
FN4hipmef/9Nd13skDwpEuFyzLPNS4P1up/UBUuyDQUTlO74+t2zSFO2dpcNrWmu
sPTenQ+6h82H3K591WTIC23VzF54syIbFLXEj8iMBALT3wyU4Nn0bs4DCbnTo5HX
R3NVo77rk6wxNJoKYOtT6ALf/lHonuNlGF+KTUGWP8UbWsIY3fIp0RWWy572M0bt
+YBE8D8RIVrw+ZY+vNKn1LdZdWlR1ton518XDf1gV9isTCfKErcd/6HJKwuj5q2v
qMgwiaKK+Gne/ylAKmWLEg2oNDo7kpyfW+612oiECitgZkqxOXhyYYfWgRt1lFNp
Wb8=
=E+Z6
-----END PGP SIGNATURE-----
Merge tag 'probes-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull probes updates from Masami Hiramatsu:
"Stack usage reduction for probe events:
- Allocate string buffers from the heap for uprobe, eprobe, kprobe,
and fprobe events to avoid stack overflow
- Allocate traceprobe_parse_context from the heap to prevent
potential stack overflow
- Fix a typo in the above commit
New features for eprobe and tprobe events:
- Add support for arrays in eprobes
- Support multiple tprobes on the same tracepoint
Improve efficiency:
- Register fprobe-events only when it is enabled to reduce overhead
- Register tracepoints for tprobe events only when enabled to resolve
a lock dependency
Code Cleanup:
- Add kerneldoc for traceprobe_parse_event_name() and
__get_insn_slot()
- Sort #include alphabetically in the probes code
- Remove the unused 'mod' field from the tprobe-event
- Clean up the entry-arg storing code in probe-events
Selftest update
- Enable fprobe events before checking enable_functions in selftests"
* tag 'probes-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: trace_fprobe: Fix typo of the semicolon
tracing: Have eprobes handle arrays
tracing: probes: Add a kerneldoc for traceprobe_parse_event_name()
tracing: uprobe-event: Allocate string buffers from heap
tracing: eprobe-event: Allocate string buffers from heap
tracing: kprobe-event: Allocate string buffers from heap
tracing: fprobe-event: Allocate string buffers from heap
tracing: probe: Allocate traceprobe_parse_context from heap
tracing: probes: Sort #include alphabetically
kprobes: Add missing kerneldoc for __get_insn_slot
tracing: tprobe-events: Register tracepoint when enable tprobe event
selftests: tracing: Enable fprobe events before checking enable_functions
tracing: fprobe-events: Register fprobe-events only when it is enabled
tracing: tprobe-events: Support multiple tprobes on the same tracepoint
tracing: tprobe-events: Remove mod field from tprobe-event
tracing: probe-events: Cleanup entry-arg storing code
This pull request contains the following branches:
rcu-exp.23.07.2025
- Protect against early RCU exp quiescent state reporting during exp
grace period initialization.
- Remove superfluous barrier in task unblock path.
- Remove the CPU online quiescent state report optimization, which is
error prone for certain scenarios.
- Add warning for unexpected pending requested expedited quiescent
state on dying CPU.
rcu.22.07.2025
- Robustify rcu_is_cpu_rrupt_from_idle() by using more accurate
indicators of the actual context tracking state of a CPU.
- Handle ->defer_qs_iw_pending field data race.
- Enable rcu_normal_wake_from_gp by default on systems with <= 16 CPUs.
- Fix lockup in rcu_read_unlock() due to recursive irq_exit() calls.
- Refactor expedited handling condition in rcu_read_unlock_special().
- Documentation updates for hotplug and GP init scan ordering,
separation of rcu_state and rnp's gp_seq states, quiescent state
reporting for offline CPUs.
torture-scripts.16.07.2025
- Cleanup and improve scripts : remove superfluous warnings for disabled
tests; better handling of kvm.sh --kconfig arg; suppress some confusing
diagnostics; tolerate bad kvm.sh args; add new diagnostic for build
output; fail allmodconfig testing on warnings.
- Include RCU_TORTURE_TEST_CHK_RDR_STATE config for KCSAN kernels.
- Disable default RCU-tasks and clocksource-wdog testing on arm64.
- Add EXPERT Kconfig option for arm64 KCSAN runs.
- Remove SRCU-lite testing.
rcutorture.16.07.2025
- Start torture writer threads creation after reader threads to handle
race in SRCU-P scenario.
- Add SRCU down_read()/up_read() test.
- Add diagnostics for delayed SRCU up_read(), unmatched up_read(), print
number of up/down readers and the number of such readers which
migrated to other CPU.
- Ignore certain unsupported configurations for trivial RCU test.
- Fix splats in RT kernels due to inaccurate checks for BH-disabled
context.
- Enable checks and logs to capture intentionally exercised unexpected
scenarios (too short readers) for BUSTED test.
- Remove SRCU-lite testing.
srcu.19.07.2025
- Expedite SRCU-fast grace periods.
- Remove SRCU-lite implementation.
- Add guards for SRCU-fast readers.
rcu.nocb.18.07.2025
- Dump NOCB group leader state on stall detection.
- Robustify nocb_cb_kthread pointer accesses.
- Fix delayed execution of hurry callbacks when LAZY_RCU is enabled.
refscale.07.07.2025
- Fix multiplication overflow in "loops" and "nreaders" calculations.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQSi2tPIQIc2VEtjarIAHS7/6Z0wpQUCaINnRwAKCRAAHS7/6Z0w
pRYJAQC97ZDW2wBegDbQPsg5ECLX9Lyd6+IC65sdi38IENl+TQEA4/oMzUUceIH+
CDCnxv3fAMhPncJfvIukOLzMJpKw0go=
=8t4O
-----END PGP SIGNATURE-----
Merge tag 'rcu.release.v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux
Pull RCU updates from Neeraj Upadhyay:
"Expedited grace period updates:
- Protect against early RCU exp quiescent state reporting during exp
grace period initialization
- Remove superfluous barrier in task unblock path
- Remove the CPU online quiescent state report optimization, which is
error prone for certain scenarios
- Add warning for unexpected pending requested expedited quiescent
state on dying CPU
Core:
- Robustify rcu_is_cpu_rrupt_from_idle() by using more accurate
indicators of the actual context tracking state of a CPU
- Handle ->defer_qs_iw_pending field data race
- Enable rcu_normal_wake_from_gp by default on systems with <= 16
CPUs
- Fix lockup in rcu_read_unlock() due to recursive irq_exit() calls
- Refactor expedited handling condition in rcu_read_unlock_special()
- Documentation updates for hotplug and GP init scan ordering,
separation of rcu_state and rnp's gp_seq states, quiescent state
reporting for offline CPUs
torture-scripts:
- Cleanup and improve scripts : remove superfluous warnings for
disabled tests; better handling of kvm.sh --kconfig arg; suppress
some confusing diagnostics; tolerate bad kvm.sh args; add new
diagnostic for build output; fail allmodconfig testing on warnings
- Include RCU_TORTURE_TEST_CHK_RDR_STATE config for KCSAN kernels
- Disable default RCU-tasks and clocksource-wdog testing on arm64
- Add EXPERT Kconfig option for arm64 KCSAN runs
- Remove SRCU-lite testing
rcutorture:
- Start torture writer threads creation after reader threads to
handle race in SRCU-P scenario
- Add SRCU down_read()/up_read() test
- Add diagnostics for delayed SRCU up_read(), unmatched up_read(),
print number of up/down readers and the number of such readers
which migrated to other CPU
- Ignore certain unsupported configurations for trivial RCU test
- Fix splats in RT kernels due to inaccurate checks for BH-disabled
context
- Enable checks and logs to capture intentionally exercised
unexpected scenarios (too short readers) for BUSTED test
- Remove SRCU-lite testing
srcu:
- Expedite SRCU-fast grace periods
- Remove SRCU-lite implementation
- Add guards for SRCU-fast readers
rcu nocb:
- Dump NOCB group leader state on stall detection
- Robustify nocb_cb_kthread pointer accesses
- Fix delayed execution of hurry callbacks when LAZY_RCU is enabled
refscale:
- Fix multiplication overflow in "loops" and "nreaders" calculations"
* tag 'rcu.release.v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (49 commits)
rcu: Document concurrent quiescent state reporting for offline CPUs
rcu: Document separation of rcu_state and rnp's gp_seq
rcu: Document GP init vs hotplug-scan ordering requirements
srcu: Add guards for SRCU-fast readers
rcu: Fix delayed execution of hurry callbacks
rcu: Refactor expedited handling check in rcu_read_unlock_special()
checkpatch: Remove SRCU-lite deprecation
srcu: Remove SRCU-lite implementation
srcu: Expedite SRCU-fast grace periods
rcutorture: Remove support for SRCU-lite
rcutorture: Remove SRCU-lite scenarios
torture: Remove support for SRCU-lite
torture: Make torture.sh --allmodconfig testing fail on warnings
torture: Add "ERROR" diagnostic for testing kernel-build output
torture: Make torture.sh tolerate runs having bad kvm.sh arguments
torture: Add textid.txt file to --do-allmodconfig and --do-rcu-rust runs
torture: Extract testid.txt generation to separate script
torture: Suppress "find" diagnostics from torture.sh --do-none run
torture: Provide EXPERT Kconfig option for arm64 KCSAN torture.sh runs
rcu: Fix rcu_read_unlock() deadloop due to IRQ work
...
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmiINnEACgkQ6rmadz2v
bToBnA/9F+A3R6rTwGk4HK3xpfc/nm2Tanl3oRN7S2ub/mskDOtWSIyG6cVFZ0UG
1fK6IkByyRIpAF/5qhdlw8drRXHkQtGLA0lP2L9llm4X1mHLofB18y9OeLrDE1WN
KwNP06+IGX9W802lCGSIXOY+VmRscVfXSMokyQt2ilHplKjOnDqJcYkWupi3T2rC
mz79FY9aEl2YrIcpj9RXz+8nwP49pZBuW2P0IM5PAIj4BJBXShrUp8T1nz94okNe
NFsnAyRxjWpUT0McEgtA9WvpD9lZqujYD8Qp0KlGZWmI3vNpV5d9S1+dBcEb1n7q
dyNMkTF3oRrJhhg4VqoHc6fVpzSEoZ9ZxV5Hx4cs+ganH75D4YbdGqx/7mR3DUgH
MZh6rHF1pGnK7TAm7h5gl3ZRAOkZOaahbe1i01NKo9CEe5fSh3AqMyzJYoyGHRKi
xDN39eQdWBNA+hm1VkbK2Bv93Rbjrka2Kj+D3sSSO9Bo/u3ntcknr7LW39idKz62
Q8dkKHcCEtun7gjk0YXPF013y81nEohj1C+52BmJ2l5JitM57xfr6YOaQpu7DPDE
AJbHx6ASxKdyEETecd0b+cXUPQ349zmRXy0+CDMAGKpBicC0H0mHhL14cwOY1Hfu
EIpIjmIJGI3JNF6T5kybcQGSBOYebdV0FFgwSllzPvuYt7YsHCs=
=/O3j
-----END PGP SIGNATURE-----
Merge tag 'bpf-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Pull bpf updates from Alexei Starovoitov:
- Remove usermode driver (UMD) framework (Thomas Weißschuh)
- Introduce Strongly Connected Component (SCC) in the verifier to
detect loops and refine register liveness (Eduard Zingerman)
- Allow 'void *' cast using bpf_rdonly_cast() and corresponding
'__arg_untrusted' for global function parameters (Eduard Zingerman)
- Improve precision for BPF_ADD and BPF_SUB operations in the verifier
(Harishankar Vishwanathan)
- Teach the verifier that constant pointer to a map cannot be NULL
(Ihor Solodrai)
- Introduce BPF streams for error reporting of various conditions
detected by BPF runtime (Kumar Kartikeya Dwivedi)
- Teach the verifier to insert runtime speculation barrier (lfence on
x86) to mitigate speculative execution instead of rejecting the
programs (Luis Gerhorst)
- Various improvements for 'veristat' (Mykyta Yatsenko)
- For CONFIG_DEBUG_KERNEL config warn on internal verifier errors to
improve bug detection by syzbot (Paul Chaignon)
- Support BPF private stack on arm64 (Puranjay Mohan)
- Introduce bpf_cgroup_read_xattr() kfunc to read xattr of cgroup's
node (Song Liu)
- Introduce kfuncs for read-only string opreations (Viktor Malik)
- Implement show_fdinfo() for bpf_links (Tao Chen)
- Reduce verifier's stack consumption (Yonghong Song)
- Implement mprog API for cgroup-bpf programs (Yonghong Song)
* tag 'bpf-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (192 commits)
selftests/bpf: Migrate fexit_noreturns case into tracing_failure test suite
selftests/bpf: Add selftest for attaching tracing programs to functions in deny list
bpf: Add log for attaching tracing programs to functions in deny list
bpf: Show precise rejected function when attaching fexit/fmod_ret to __noreturn functions
bpf: Fix various typos in verifier.c comments
bpf: Add third round of bounds deduction
selftests/bpf: Test invariants on JSLT crossing sign
selftests/bpf: Test cross-sign 64bits range refinement
selftests/bpf: Update reg_bound range refinement logic
bpf: Improve bounds when s64 crosses sign boundary
bpf: Simplify bounds refinement from s32
selftests/bpf: Enable private stack tests for arm64
bpf, arm64: JIT support for private stack
bpf: Move bpf_jit_get_prog_name() to core.c
bpf, arm64: Fix fp initialization for exception boundary
umd: Remove usermode driver framework
bpf/preload: Don't select USERMODE_DRIVER
selftests/bpf: Fix test dynptr/test_dynptr_memset_xdp_chunks failure
selftests/bpf: Fix test dynptr/test_dynptr_copy_xdp failure
selftests/bpf: Increase xdp data size for arm64 64K page size
...
Core & protocols
----------------
- Wrap datapath globals into net_aligned_data, to avoid false sharing.
- Preserve MSG_ZEROCOPY in forwarding (e.g. out of a container).
- Add SO_INQ and SCM_INQ support to AF_UNIX.
- Add SIOCINQ support to AF_VSOCK.
- Add TCP_MAXSEG sockopt to MPTCP.
- Add IPv6 force_forwarding sysctl to enable forwarding per interface.
- Make TCP validation of whether packet fully fits in the receive
window and the rcv_buf more strict. With increased use of HW
aggregation a single "packet" can be multiple 100s of kB.
- Add MSG_MORE flag to optimize large TCP transmissions via sockmap,
improves latency up to 33% for sockmap users.
- Convert TCP send queue handling from tasklet to BH workque.
- Improve BPF iteration over TCP sockets to see each socket exactly once.
- Remove obsolete and unused TCP RFC3517/RFC6675 loss recovery code.
- Support enabling kernel threads for NAPI processing on per-NAPI
instance basis rather than a whole device. Fully stop the kernel NAPI
thread when threaded NAPI gets disabled. Previously thread would stick
around until ifdown due to tricky synchronization.
- Allow multicast routing to take effect on locally-generated packets.
- Add output interface argument for End.X in segment routing.
- MCTP: add support for gateway routing, improve bind() handling.
- Don't require rtnl_lock when fetching an IPv6 neighbor over Netlink.
- Add a new neighbor flag ("extern_valid"), which cedes refresh
responsibilities to userspace. This is needed for EVPN multi-homing
where a neighbor entry for a multi-homed host needs to be synced
across all the VTEPs among which the host is multi-homed.
- Support NUD_PERMANENT for proxy neighbor entries.
- Add a new queuing discipline for IETF RFC9332 DualQ Coupled AQM.
- Add sequence numbers to netconsole messages. Unregister netconsole's
console when all net targets are removed. Code refactoring.
Add a number of selftests.
- Align IPSec inbound SA lookup to RFC 4301. Only SPI and protocol
should be used for an inbound SA lookup.
- Support inspecting ref_tracker state via DebugFS.
- Don't force bonding advertisement frames tx to ~333 ms boundaries.
Add broadcast_neighbor option to send ARP/ND on all bonded links.
- Allow providing upcall pid for the 'execute' command in openvswitch.
- Remove DCCP support from Netfilter's conntrack.
- Disallow multiple packet duplications in the queuing layer.
- Prevent use of deprecated iptables code on PREEMPT_RT.
Driver API
----------
- Support RSS and hashing configuration over ethtool Netlink.
- Add dedicated ethtool callbacks for getting and setting hashing fields.
- Add support for power budget evaluation strategy in PSE /
Power-over-Ethernet. Generate Netlink events for overcurrent etc.
- Support DPLL phase offset monitoring across all device inputs.
Support providing clock reference and SYNC over separate DPLL
inputs.
- Support traffic classes in devlink rate API for bandwidth management.
- Remove rtnl_lock dependency from UDP tunnel port configuration.
Device drivers
--------------
- Add a new Broadcom driver for 800G Ethernet (bnge).
- Add a standalone driver for Microchip ZL3073x DPLL.
- Remove IBM's NETIUCV device driver.
- Ethernet high-speed NICs:
- Broadcom (bnxt):
- support zero-copy Tx of DMABUF memory
- take page size into account for page pool recycling rings
- Intel (100G, ice, idpf):
- idpf: XDP and AF_XDP support preparations
- idpf: add flow steering
- add link_down_events statistic
- clean up the TSPLL code
- preparations for live VM migration
- nVidia/Mellanox:
- support zero-copy Rx/Tx interfaces (DMABUF and io_uring)
- optimize context memory usage for matchers
- expose serial numbers in devlink info
- support PCIe congestion metrics
- Meta (fbnic):
- add 25G, 50G, and 100G link modes to phylink
- support dumping FW logs
- Marvell/Cavium:
- support for CN20K generation of the Octeon chips
- Amazon:
- add HW clock (without timestamping, just hypervisor time access)
- Ethernet virtual:
- VirtIO net:
- support segmentation of UDP-tunnel-encapsulated packets
- Google (gve):
- support packet timestamping and clock synchronization
- Microsoft vNIC:
- add handler for device-originated servicing events
- allow dynamic MSI-X vector allocation
- support Tx bandwidth clamping
- Ethernet NICs consumer, and embedded:
- AMD:
- amd-xgbe: hardware timestamping and PTP clock support
- Broadcom integrated MACs (bcmgenet, bcmasp):
- use napi_complete_done() return value to support NAPI polling
- add support for re-starting auto-negotiation
- Broadcom switches (b53):
- support BCM5325 switches
- add bcm63xx EPHY power control
- Synopsys (stmmac):
- lots of code refactoring and cleanups
- TI:
- icssg-prueth: read firmware-names from device tree
- icssg: PRP offload support
- Microchip:
- lan78xx: convert to PHYLINK for improved PHY and MAC management
- ksz: add KSZ8463 switch support
- Intel:
- support similar queue priority scheme in multi-queue and
time-sensitive networking (taprio)
- support packet pre-emption in both
- RealTek (r8169):
- enable EEE at 5Gbps on RTL8126
- Airoha:
- add PPPoE offload support
- MDIO bus controller for Airoha AN7583
- Ethernet PHYs:
- support for the IPQ5018 internal GE PHY
- micrel KSZ9477 switch-integrated PHYs:
- add MDI/MDI-X control support
- add RX error counters
- add cable test support
- add Signal Quality Indicator (SQI) reporting
- dp83tg720: improve reset handling and reduce link recovery time
- support bcm54811 (and its MII-Lite interface type)
- air_en8811h: support resume/suspend
- support PHY counters for QCA807x and QCA808x
- support WoL for QCA807x
- CAN drivers:
- rcar_canfd: support for Transceiver Delay Compensation
- kvaser: report FW versions via devlink dev info
- WiFi:
- extended regulatory info support (6 GHz)
- add statistics and beacon monitor for Multi-Link Operation (MLO)
- support S1G aggregation, improve S1G support
- add Radio Measurement action fields
- support per-radio RTS threshold
- some work around how FIPS affects wifi, which was wrong (RC4 is used
by TKIP, not only WEP)
- improvements for unsolicited probe response handling
- WiFi drivers:
- RealTek (rtw88):
- IBSS mode for SDIO devices
- RealTek (rtw89):
- BT coexistence for MLO/WiFi7
- concurrent station + P2P support
- support for USB devices RTL8851BU/RTL8852BU
- Intel (iwlwifi):
- use embedded PNVM in (to be released) FW images to fix
compatibility issues
- many cleanups (unused FW APIs, PCIe code, WoWLAN)
- some FIPS interoperability
- MediaTek (mt76):
- firmware recovery improvements
- more MLO work
- Qualcomm/Atheros (ath12k):
- fix scan on multi-radio devices
- more EHT/Wi-Fi 7 features
- encapsulation/decapsulation offload
- Broadcom (brcm80211):
- support SDIO 43751 device
- Bluetooth:
- hci_event: add support for handling LE BIG Sync Lost event
- ISO: add socket option to report packet seqnum via CMSG
- ISO: support SCM_TIMESTAMPING for ISO TS
- Bluetooth drivers:
- intel_pcie: support Function Level Reset
- nxpuart: add support for 4M baudrate
- nxpuart: implement powerup sequence, reset, FW dump, and FW loading
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmiFgLgACgkQMUZtbf5S
IrvafxAAnQRwYBoIG+piCILx6z5pRvBGHkmEQ4AQgSCFuq2eO3ubwMFIqEybfma1
5+QFjUZAV3OgGgKRBS2KGWxtSzdiF+/JGV1VOIN67sX3Mm0a2QgjA4n5CgKL0FPr
o6BEzjX5XwG1zvGcBNQ5BZ19xUUKjoZQgTtnea8sZ57Fsp5RtRgmYRqoewNvNk/n
uImh0NFsDVb0UeOpSzC34VD9l1dJvLGdui4zJAjno/vpvmT1DkXjoK419J/r52SS
X+5WgsfJ6DkjHqVN1tIhhK34yWqBOcwGFZJgEnWHMkFIl2FqRfFKMHyqtfLlVnLA
mnIpSyz8Sq2AHtx0TlgZ3At/Ri8p5+yYJgHOXcDKyABa8y8Zf4wrycmr6cV9JLuL
z54nLEVnJuvfDVDVJjsLYdJXyhMpZFq6+uAItdxKaw8Ugp/QqG4QtoRj+XIHz4ZW
z6OohkCiCzTwEISFK+pSTxPS30eOxq43kCspcvuLiwCCStJBRkRb5GdZA4dm7LA+
1Od4ADAkHjyrFtBqTyyC2scX8UJ33DlAIpAYyIeS6w9Cj9EXxtp1z33IAAAZ03MW
jJwIaJuc8bK2fWKMmiG7ucIXjPo4t//KiWlpkwwqLhPbjZgfDAcxq1AC2TLoqHBL
y4EOgKpHDCMAghSyiFIAn2JprGcEt8dp+11B0JRXIn4Pm/eYDH8=
=lqbe
-----END PGP SIGNATURE-----
Merge tag 'net-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core & protocols:
- Wrap datapath globals into net_aligned_data, to avoid false sharing
- Preserve MSG_ZEROCOPY in forwarding (e.g. out of a container)
- Add SO_INQ and SCM_INQ support to AF_UNIX
- Add SIOCINQ support to AF_VSOCK
- Add TCP_MAXSEG sockopt to MPTCP
- Add IPv6 force_forwarding sysctl to enable forwarding per interface
- Make TCP validation of whether packet fully fits in the receive
window and the rcv_buf more strict. With increased use of HW
aggregation a single "packet" can be multiple 100s of kB
- Add MSG_MORE flag to optimize large TCP transmissions via sockmap,
improves latency up to 33% for sockmap users
- Convert TCP send queue handling from tasklet to BH workque
- Improve BPF iteration over TCP sockets to see each socket exactly
once
- Remove obsolete and unused TCP RFC3517/RFC6675 loss recovery code
- Support enabling kernel threads for NAPI processing on per-NAPI
instance basis rather than a whole device. Fully stop the kernel
NAPI thread when threaded NAPI gets disabled. Previously thread
would stick around until ifdown due to tricky synchronization
- Allow multicast routing to take effect on locally-generated packets
- Add output interface argument for End.X in segment routing
- MCTP: add support for gateway routing, improve bind() handling
- Don't require rtnl_lock when fetching an IPv6 neighbor over Netlink
- Add a new neighbor flag ("extern_valid"), which cedes refresh
responsibilities to userspace. This is needed for EVPN multi-homing
where a neighbor entry for a multi-homed host needs to be synced
across all the VTEPs among which the host is multi-homed
- Support NUD_PERMANENT for proxy neighbor entries
- Add a new queuing discipline for IETF RFC9332 DualQ Coupled AQM
- Add sequence numbers to netconsole messages. Unregister
netconsole's console when all net targets are removed. Code
refactoring. Add a number of selftests
- Align IPSec inbound SA lookup to RFC 4301. Only SPI and protocol
should be used for an inbound SA lookup
- Support inspecting ref_tracker state via DebugFS
- Don't force bonding advertisement frames tx to ~333 ms boundaries.
Add broadcast_neighbor option to send ARP/ND on all bonded links
- Allow providing upcall pid for the 'execute' command in openvswitch
- Remove DCCP support from Netfilter's conntrack
- Disallow multiple packet duplications in the queuing layer
- Prevent use of deprecated iptables code on PREEMPT_RT
Driver API:
- Support RSS and hashing configuration over ethtool Netlink
- Add dedicated ethtool callbacks for getting and setting hashing
fields
- Add support for power budget evaluation strategy in PSE /
Power-over-Ethernet. Generate Netlink events for overcurrent etc
- Support DPLL phase offset monitoring across all device inputs.
Support providing clock reference and SYNC over separate DPLL
inputs
- Support traffic classes in devlink rate API for bandwidth
management
- Remove rtnl_lock dependency from UDP tunnel port configuration
Device drivers:
- Add a new Broadcom driver for 800G Ethernet (bnge)
- Add a standalone driver for Microchip ZL3073x DPLL
- Remove IBM's NETIUCV device driver
- Ethernet high-speed NICs:
- Broadcom (bnxt):
- support zero-copy Tx of DMABUF memory
- take page size into account for page pool recycling rings
- Intel (100G, ice, idpf):
- idpf: XDP and AF_XDP support preparations
- idpf: add flow steering
- add link_down_events statistic
- clean up the TSPLL code
- preparations for live VM migration
- nVidia/Mellanox:
- support zero-copy Rx/Tx interfaces (DMABUF and io_uring)
- optimize context memory usage for matchers
- expose serial numbers in devlink info
- support PCIe congestion metrics
- Meta (fbnic):
- add 25G, 50G, and 100G link modes to phylink
- support dumping FW logs
- Marvell/Cavium:
- support for CN20K generation of the Octeon chips
- Amazon:
- add HW clock (without timestamping, just hypervisor time access)
- Ethernet virtual:
- VirtIO net:
- support segmentation of UDP-tunnel-encapsulated packets
- Google (gve):
- support packet timestamping and clock synchronization
- Microsoft vNIC:
- add handler for device-originated servicing events
- allow dynamic MSI-X vector allocation
- support Tx bandwidth clamping
- Ethernet NICs consumer, and embedded:
- AMD:
- amd-xgbe: hardware timestamping and PTP clock support
- Broadcom integrated MACs (bcmgenet, bcmasp):
- use napi_complete_done() return value to support NAPI polling
- add support for re-starting auto-negotiation
- Broadcom switches (b53):
- support BCM5325 switches
- add bcm63xx EPHY power control
- Synopsys (stmmac):
- lots of code refactoring and cleanups
- TI:
- icssg-prueth: read firmware-names from device tree
- icssg: PRP offload support
- Microchip:
- lan78xx: convert to PHYLINK for improved PHY and MAC management
- ksz: add KSZ8463 switch support
- Intel:
- support similar queue priority scheme in multi-queue and
time-sensitive networking (taprio)
- support packet pre-emption in both
- RealTek (r8169):
- enable EEE at 5Gbps on RTL8126
- Airoha:
- add PPPoE offload support
- MDIO bus controller for Airoha AN7583
- Ethernet PHYs:
- support for the IPQ5018 internal GE PHY
- micrel KSZ9477 switch-integrated PHYs:
- add MDI/MDI-X control support
- add RX error counters
- add cable test support
- add Signal Quality Indicator (SQI) reporting
- dp83tg720: improve reset handling and reduce link recovery time
- support bcm54811 (and its MII-Lite interface type)
- air_en8811h: support resume/suspend
- support PHY counters for QCA807x and QCA808x
- support WoL for QCA807x
- CAN drivers:
- rcar_canfd: support for Transceiver Delay Compensation
- kvaser: report FW versions via devlink dev info
- WiFi:
- extended regulatory info support (6 GHz)
- add statistics and beacon monitor for Multi-Link Operation (MLO)
- support S1G aggregation, improve S1G support
- add Radio Measurement action fields
- support per-radio RTS threshold
- some work around how FIPS affects wifi, which was wrong (RC4 is
used by TKIP, not only WEP)
- improvements for unsolicited probe response handling
- WiFi drivers:
- RealTek (rtw88):
- IBSS mode for SDIO devices
- RealTek (rtw89):
- BT coexistence for MLO/WiFi7
- concurrent station + P2P support
- support for USB devices RTL8851BU/RTL8852BU
- Intel (iwlwifi):
- use embedded PNVM in (to be released) FW images to fix
compatibility issues
- many cleanups (unused FW APIs, PCIe code, WoWLAN)
- some FIPS interoperability
- MediaTek (mt76):
- firmware recovery improvements
- more MLO work
- Qualcomm/Atheros (ath12k):
- fix scan on multi-radio devices
- more EHT/Wi-Fi 7 features
- encapsulation/decapsulation offload
- Broadcom (brcm80211):
- support SDIO 43751 device
- Bluetooth:
- hci_event: add support for handling LE BIG Sync Lost event
- ISO: add socket option to report packet seqnum via CMSG
- ISO: support SCM_TIMESTAMPING for ISO TS
- Bluetooth drivers:
- intel_pcie: support Function Level Reset
- nxpuart: add support for 4M baudrate
- nxpuart: implement powerup sequence, reset, FW dump, and FW loading"
* tag 'net-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1742 commits)
dpll: zl3073x: Fix build failure
selftests: bpf: fix legacy netfilter options
ipv6: annotate data-races around rt->fib6_nsiblings
ipv6: fix possible infinite loop in fib6_info_uses_dev()
ipv6: prevent infinite loop in rt6_nlmsg_size()
ipv6: add a retry logic in net6_rt_notify()
vrf: Drop existing dst reference in vrf_ip6_input_dst
net/sched: taprio: align entry index attr validation with mqprio
net: fsl_pq_mdio: use dev_err_probe
selftests: rtnetlink.sh: remove esp4_offload after test
vsock: remove unnecessary null check in vsock_getname()
igb: xsk: solve negative overflow of nb_pkts in zerocopy mode
stmmac: xsk: fix negative overflow of budget in zerocopy mode
dt-bindings: ieee802154: Convert at86rf230.txt yaml format
net: dsa: microchip: Disable PTP function of KSZ8463
net: dsa: microchip: Setup fiber ports for KSZ8463
net: dsa: microchip: Write switch MAC address differently for KSZ8463
net: dsa: microchip: Use different registers for KSZ8463
net: dsa: microchip: Add KSZ8463 switch support to KSZ DSA driver
dt-bindings: net: dsa: microchip: Add KSZ8463 switch support
...
* Move sysctls out of the kern_table array
This is the final move of ctl_tables into their respective subsystems. Only 5
(out of the original 50) will remain in kernel/sysctl.c file; these handle
either sysctl or common arch variables.
By decentralizing sysctl registrations, subsystem maintainers regain control
over their sysctl interfaces, improving maintainability and reducing the
likelihood of merge conflicts.
* docs: Remove false positives from check-sysctl-docs
Stopped falsely identifying sysctls as undocumented or unimplemented in the
check-sysctl-docs script. This script can now be used to automatically
identify if documentation is missing.
* Testing
All these have been in linux-next since rc3, giving them a solid 3 to 4 weeks
worth of testing. Additionally, sysctl selftests and kunit were also run
locally on my x86_64
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEErkcJVyXmMSXOyyeQupfNUreWQU8FAmiAvd8ACgkQupfNUreW
QU+9nAv/dtxaKoL4BXJSzsA2+49bbo9QfiK5Vjz1wSRYRQTb+jhGr9QdS5hG+NeX
uN2ilvcNQqW7ENdiblU10lvcbPjIn2hw4lbMcpv/+QXnrudtGYlBFXlkWqW5nv7X
AVvHU8y3uzfs6JbRIpROUA7Cn2cDOlfP2mMtwxCXR3iP+orS1ziuVEi1JRoirIyG
iq5I/1rJMJBU3FjqqDTq6yljspLx8AlXO1yc5xUxAM67IcY4ew3ZTxqiZr6M9AhV
DUbR2lu/88wcFNERt8DJmuQ50dSGGqOEpK3FURTmkwtMFxzNLmenFDQeBKKahz3Q
2ntXSDfp2y+ppZNmcOP8tZZkra03Xpy1DQyoOgQ2r9uGekPxyr+wmKXwYPOeJIPO
YWTNBm8omX9qr49zVzaZ1f2foRGfgStHL6aa6xLIf34zzScSDEPtO3og2+5Hw/30
gnp+7v9E19uKpoE6oiGE0PtiFzAi/I6nFxzG2RRqrlMLFXyKVccTKygzY6tCnI3P
6144s/Bt
=R369
-----END PGP SIGNATURE-----
Merge tag 'sysctl-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl
Pull sysctl updates from Joel Granados:
- Move sysctls out of the kern_table array
This is the final move of ctl_tables into their respective
subsystems. Only 5 (out of the original 50) will remain in
kernel/sysctl.c file; these handle either sysctl or common arch
variables.
By decentralizing sysctl registrations, subsystem maintainers regain
control over their sysctl interfaces, improving maintainability and
reducing the likelihood of merge conflicts.
- docs: Remove false positives from check-sysctl-docs
Stopped falsely identifying sysctls as undocumented or unimplemented
in the check-sysctl-docs script. This script can now be used to
automatically identify if documentation is missing.
* tag 'sysctl-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl: (23 commits)
docs: Downgrade arm64 & riscv from titles to comment
docs: Replace spaces with tabs in check-sysctl-docs
docs: Remove colon from ctltable title in vm.rst
docs: Add awk section for ucount sysctl entries
docs: Use skiplist when checking sysctl admin-guide
docs: nixify check-sysctl-docs
sysctl: rename kern_table -> sysctl_subsys_table
kernel/sys.c: Move overflow{uid,gid} sysctl into kernel/sys.c
uevent: mv uevent_helper into kobject_uevent.c
sysctl: Removed unused variable
sysctl: Nixify sysctl.sh
sysctl: Remove superfluous includes from kernel/sysctl.c
sysctl: Remove (very) old file changelog
sysctl: Move sysctl_panic_on_stackoverflow to kernel/panic.c
sysctl: move cad_pid into kernel/pid.c
sysctl: Move tainted ctl_table into kernel/panic.c
Input: sysrq: mv sysrq into drivers/tty/sysrq.c
fork: mv threads-max into kernel/fork.c
parisc/power: Move soft-power into power.c
mm: move randomize_va_space into memory.c
...
Perf and PMU updates:
- Add support for new (v3) Hisilicon SLLC and DDRC PMUs
- Add support for Arm-NI PMU integrations that share interrupts between
clock domains within a given instance
- Allow SPE to be configured with a lower sample period than the
minimum recommendation advertised by PMSIDR_EL1.Interval
- Add suppport for Arm's "Branch Record Buffer Extension" (BRBE)
- Adjust the perf watchdog period according to cpu frequency changes
- Minor driver fixes and cleanups
Hardware features:
- Support for MTE store-only checking (FEAT_MTE_STORE_ONLY)
- Support for reporting the non-address bits during a synchronous MTE
tag check fault (FEAT_MTE_TAGGED_FAR)
- Optimise the TLBI when folding/unfolding contiguous PTEs on hardware
with FEAT_BBM (break-before-make) level 2 and no TLB conflict aborts
Software features:
- Enable HAVE_LIVEPATCH after implementing arch_stack_walk_reliable()
and using the text-poke API for late module relocations
- Force VMAP_STACK always on and change arm64_efi_rt_init() to use
arch_alloc_vmap_stack() in order to avoid KASAN false positives
ACPI:
- Improve SPCR handling and messaging on systems lacking an SPCR table
Debug:
- Simplify the debug exception entry path
- Drop redundant DBG_MDSCR_* macros
Kselftests:
- Cleanups and improvements for SME, SVE and FPSIMD tests
Miscellaneous:
- Optimise loop to reduce redundant operations in contpte_ptep_get()
- Remove ISB when resetting POR_EL0 during signal handling
- Mark the kernel as tainted on SEA and SError panic
- Remove redundant gcs_free() call
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmiDkgoACgkQa9axLQDI
XvFucQ//bYugRP5/Sdlrq5eDKWBGi1HufYzwfDEBLc4S75Eu8mGL/tuThfu9yFn+
qCowtt4U84HdWsZDTSVo6lym6v2vJUpGOMgXzepvJaFBRnqGv9X9NxH6RQO1LTnu
Pm7rO+7I9tNpfuc7Zu9pHDggsJEw+WzVfmEF6WPSFlT9mUNv6NbSx4rbLQKU86Dm
ouTqXaePEQZ5oiRXVasxyT0otGtiACD20WpgOtNjYGzsfUVwCf/C83V/2DLwwbhr
9cW9lCtFxA/yFdQcA9ThRzWZ9Eo5LAHqjGIq00+zOjuzgDbBtcTT79gpChkhovIR
FBIsWHd9j9i3nYxzf4V4eRKQnyqS3NQWv7g7uKFwNgARif1Zk0VJ77QIlAYk5xLI
ENTRjLKz5WNGGnhdkeCvDlVyxX+OktgcVTp3vqRxAKCRahMMUqBrwxiM8RzVF37e
yzkEQayL8F7uZqy9H7Sjn48UpHZux6frJ1bBQw1oEvR9QmAoAdqavPMSAYIOT3Zr
ze4WIljq/cFr3kBPIFP5pK1e0qYMHXZpSKIm8MAv6y/7KmQuVbMjZthpuPbLSIw0
Q7C0KalB8lToPIbO7qMni/he0dCN4K2+E1YHFTR+pzfcoLuW4rjSg7i8tqMLKMJ8
H+SeGLyPtM5A6bdAPTTpqefcgUUe7064ENUqrGUpDEynGXA7boE=
=5h1C
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
"A quick summary: perf support for Branch Record Buffer Extensions
(BRBE), typical PMU hardware updates, small additions to MTE for
store-only tag checking and exposing non-address bits to signal
handlers, HAVE_LIVEPATCH enabled on arm64, VMAP_STACK forced on.
There is also a TLBI optimisation on hardware that does not require
break-before-make when changing the user PTEs between contiguous and
non-contiguous.
More details:
Perf and PMU updates:
- Add support for new (v3) Hisilicon SLLC and DDRC PMUs
- Add support for Arm-NI PMU integrations that share interrupts
between clock domains within a given instance
- Allow SPE to be configured with a lower sample period than the
minimum recommendation advertised by PMSIDR_EL1.Interval
- Add suppport for Arm's "Branch Record Buffer Extension" (BRBE)
- Adjust the perf watchdog period according to cpu frequency changes
- Minor driver fixes and cleanups
Hardware features:
- Support for MTE store-only checking (FEAT_MTE_STORE_ONLY)
- Support for reporting the non-address bits during a synchronous MTE
tag check fault (FEAT_MTE_TAGGED_FAR)
- Optimise the TLBI when folding/unfolding contiguous PTEs on
hardware with FEAT_BBM (break-before-make) level 2 and no TLB
conflict aborts
Software features:
- Enable HAVE_LIVEPATCH after implementing arch_stack_walk_reliable()
and using the text-poke API for late module relocations
- Force VMAP_STACK always on and change arm64_efi_rt_init() to use
arch_alloc_vmap_stack() in order to avoid KASAN false positives
ACPI:
- Improve SPCR handling and messaging on systems lacking an SPCR
table
Debug:
- Simplify the debug exception entry path
- Drop redundant DBG_MDSCR_* macros
Kselftests:
- Cleanups and improvements for SME, SVE and FPSIMD tests
Miscellaneous:
- Optimise loop to reduce redundant operations in contpte_ptep_get()
- Remove ISB when resetting POR_EL0 during signal handling
- Mark the kernel as tainted on SEA and SError panic
- Remove redundant gcs_free() call"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (93 commits)
arm64/gcs: task_gcs_el0_enable() should use passed task
arm64: Kconfig: Keep selects somewhat alphabetically ordered
arm64: signal: Remove ISB when resetting POR_EL0
kselftest/arm64: Handle attempts to disable SM on SME only systems
kselftest/arm64: Fix SVE write data generation for SME only systems
kselftest/arm64: Test SME on SME only systems in fp-ptrace
kselftest/arm64: Test FPSIMD format data writes via NT_ARM_SVE in fp-ptrace
kselftest/arm64: Allow sve-ptrace to run on SME only systems
arm64/mm: Drop redundant addr increment in set_huge_pte_at()
kselftest/arm4: Provide local defines for AT_HWCAP3
arm64: Mark kernel as tainted on SAE and SError panic
arm64/gcs: Don't call gcs_free() when releasing task_struct
drivers/perf: hisi: Support PMUs with no interrupt
drivers/perf: hisi: Relax the event number check of v2 PMUs
drivers/perf: hisi: Add support for HiSilicon SLLC v3 PMU driver
drivers/perf: hisi: Use ACPI driver_data to retrieve SLLC PMU information
drivers/perf: hisi: Add support for HiSilicon DDRC v3 PMU driver
drivers/perf: hisi: Simplify the probe process for each DDRC version
perf/arm-ni: Support sharing IRQs within an NI instance
perf/arm-ni: Consolidate CPU affinity handling
...
Highlights:
* New supported architectures: SuperH, x32, MIPS n32/n64
* Adopt general kernel architectures names
* Integrate the nolibc selftests into the kselftests framework
* Various fixes and new syscall wrappers
Two non-nolibc changes:
* New arm64 selftest which depends on nolibc changes
* General tools/ cross-compilation bugfix for s390 clang
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTg4lxklFHAidmUs57B+h1jyw5bOAUCaIKVZQAKCRDB+h1jyw5b
OFU7AP9pFk+eIO8M68GHCRVQoOjWTYa/A0lPfx31pa3HTHYiDAD+PLcTEBP4nc21
ZQ4MFxwe+O9YXKX+Y1LkqkU7yOu5eQo=
=xQ6f
-----END PGP SIGNATURE-----
Merge tag 'nolibc-20250724-for-6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/nolibc/linux-nolibc
Pull nolibc updates from Thomas Weißschuh:
"Highlights:
- New supported architectures: SuperH, x32, MIPS n32/n64
- Adopt general kernel architectures names
- Integrate the nolibc selftests into the kselftests framework
- Various fixes and new syscall wrappers
Two non-nolibc changes:
- New arm64 selftest which depends on nolibc changes
- General tools/ cross-compilation bugfix for s390 clang"
* tag 'nolibc-20250724-for-6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/nolibc/linux-nolibc: (30 commits)
selftests/nolibc: add x32 test configuration
tools/nolibc: define time_t in terms of __kernel_old_time_t
selftests/nolibc: show failed run if test process crashes
tools/nolibc: drop s390 clang target override
tools/build: Fix s390(x) cross-compilation with clang
tools/nolibc: avoid false-positive -Wmaybe-uninitialized through waitpid()
selftests/nolibc: correctly report errors from printf() and friends
selftests/nolibc: create /dev/full when running as PID 1
tools/nolibc: add support for clock_nanosleep() and nanosleep()
kselftest/arm64: Add a test for vfork() with GCS
selftests/nolibc: Add coverage of vfork()
tools/nolibc: Provide vfork()
tools/nolibc: Replace ifdef with if defined() in sys.h
tools/nolibc: add support for SuperH
selftests/nolibc: use file driver for QEMU serial
selftests/nolibc: fix EXTRACONFIG variables ordering
tools/nolibc: MIPS: add support for N64 and N32 ABIs
tools/nolibc: MIPS: drop noreorder option
tools/nolibc: MIPS: drop manual stack pointer alignment
tools/nolibc: MIPS: drop $gp setup
...
- Split the code into syscall and exception/interrupt parts to ease the
conversion of ARM[64] to the generic entry infrastructure
- Extend syscall user dispatching to support a single intercepted range
instead of the default single non-intercepted range. That allows
monitoring/analysis of a specific executable range, e.g. a library, and
also provides flexibility for sandboxing scenarios.
- Cleanup and extend the user dispatch selftest
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmiIg6UTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoWn7EACTvQpu7tGd1rN9hCjiB1W5po7nvlCd
gKghjS9Kp0KttTDQPLVcmnH06BhDHWNNn1HXZ1ORea4bpLywiKHtVgqUAsJDsBsv
ETeTHYNphk0sktvAqp3XusA6HF4T0s1KXJQj3W1ACrYZWRkK/VystCLYwBRGpc3r
cj7jAFmJyNpU236R5XYJ7ooHfPYpzZ8VAHBO8ykK7muHDfyBRXEIlmkGep++ctSv
v0uZXAy6LONljKg87YJTien0UA7ze9lFgPTuV1y/qfaLbYNekUaJSDjfuhOpZZUw
TzSh9OYoIvKpd0ylHwB1qMLd5CaXNicaeLfTW3xbX06KaXa7WNAonS35sK0EjhtZ
0bBA9g6bRhphyh0tzR4saF9bczNvJydNCn7/QFo9dKbQUEL/FRXtJiIeusVx/0fJ
+ZqWRTcEdDw2Rsyv52hKgyEJi7F3nL9ovabUN9P1/0aPcTdM3WekMpSOJm1U6wVF
e6oSyeoeNdjcdxgWbQrgRNbmq5CPEV3ig5J+G418r5DTF3ifqZX+WscijUtKTu5K
V5GpLc0PL9eoigQ37LmGkwK/4xoB9SAPTQuzUs9qgh9NidwT0cCfoNxpeGh6GeHX
GLHPGU61vZaefxpwuAuv+SQSgxXSKk2/H/ijPzSjrX/PkUp7MoX9XoOQAh4FxZjO
ok5YEUGXzSJfXQ==
=yaCQ
-----END PGP SIGNATURE-----
Merge tag 'core-entry-2025-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull generic entry code updates from Thomas Gleixner:
- Split the code into syscall and exception/interrupt parts to ease the
conversion of ARM[64] to the generic entry infrastructure
- Extend syscall user dispatching to support a single intercepted range
instead of the default single non-intercepted range. That allows
monitoring/analysis of a specific executable range, e.g. a library,
and also provides flexibility for sandboxing scenarios
- Cleanup and extend the user dispatch selftest
* tag 'core-entry-2025-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
entry: Split generic entry into generic exception and syscall entry
selftests: Add tests for PR_SYS_DISPATCH_INCLUSIVE_ON
syscall_user_dispatch: Add PR_SYS_DISPATCH_INCLUSIVE_ON
selftests: Fix errno checking in syscall_user_dispatch test
- Skip the chaca test when the architecture does not provide the random
infrastructure in the VDSO
- Switch back to a symlink for vdso_standalone_test_x86 to avoid code
duplication.
- Improve code quality and TAP output compliance
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmiIihwTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoTWhD/0cI+VRRCuOqSrs/7GOeVjIjIK5IVn0
DLXNurHokOu9aY6/BTUvLgHf4R+no+RIcnWSKc614UKjftIzkt1HVD/ekUXQbDnm
hn6gOOEGMjUFzb7wBUToWkJ4O50/YjUmzjdEECwGafetOyre7a97cdVHOie4HrpO
A7zy22J8kdRNcTB2FZfICZsLqvpxznNtZ9U1pHkNLddVLnC7ojMJKUTki4Ah5z18
rb5I7K8FFQQ+QfOtMKJ0b90wFLQiYoEZjw8qp8LDq/r/fYHbm9ri3cO/rFWMtyjd
/TLGl57H54IJUmq/M84cAI+TzZ+vZkWJqJwBF7Jt8TR01+00y3MEJG+p5Z3sG8ZQ
sDCNypmMZPud8OsBhwZGW0vppkF1fG+64Ro5GnXZ2Q4aSScT9lyfs0lBQ6rDdb5U
wM4wR/8UWsVY5F6Lm8nSdpKZo92U4mgaizPvWCwUl51cfRbUz71/qdXaoMW8YIOk
Us+uDbwtHl7KFGcSehFxmrr74HJCzUezJgMPoaOjK0L/uFhXTJbvasXb7GKyS3aW
WSYc2x62CaTJvXTEHwjZlcMFPs2OHwDSpiR2XJ+zlhiuDb++2Luhlje8cD+VieiI
Rz7XmYiIeHsouvwD+G23ZiTOiMJH8+dVi3+oeDchITXcbGxFz+lPM1kay4w7k4zk
q4IzgToknCuDRQ==
=IYtA
-----END PGP SIGNATURE-----
Merge tag 'timers-vdso-2025-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull VDSO selftest updates from Thomas Gleixner:
- Skip the chacha test when the architecture does not provide the
random infrastructure in the VDSO
- Switch back to a symlink for vdso_standalone_test_x86 to avoid code
duplication.
- Improve code quality and TAP output compliance
* tag 'timers-vdso-2025-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
selftests: vDSO: vdso_standalone_test_x86: Replace source file with symlink
selftests: vDSO: vdso_test_getrandom: Always print TAP header
selftests: vDSO: vdso_test_correctness: Fix -Wstrict-prototypes
selftests: vDSO: Enable -Wall
selftests: vDSO: vdso_config: Avoid -Wunused-variables
selftests: vDSO: vdso_test_getrandom: Avoid -Wunused
selftests: vDSO: vdso_test_getrandom: Drop unused include of linux/compiler.h
selftests: vDSO: clock_getres: Drop unused include of err.h
selftests: vDSO: chacha: Correctly skip test if necessary
- Switch the reference counting to a RCU based per-CPU reference to
address a performance bottleneck vs. the single instance rcuref
variant.
- Make the futex selftest build on 32-bit architectures which only
support 64-bit time_t, e.g. RISCV-32.
- Cleanups and improvements in selftests and futex bench
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmiIiDITHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoblTD/0eV9w21tFVmn6ICrhgQgsrejJ0BANs
mm5mE/0d29MZHEhnJO2CSccGXBDfykuk/gxHXHsUZ9tiVSOgjz9dDl1bcrZ8Je9V
YNWMXiHASQrLctmrKLPSdjlcxQnPIxCm+K4lajoa+CyvReHE24sUDgCN8GC3P9pH
VxTmQ7UjGrzvIRlfd4AL9GJBF1IGKNnpPHCeSwjn/cmlDxu4RxEdjRWTbW8Tbz9N
1ay/T8vEE1SykI2qZOXIP16sYZw2dP9FOgARO90Ahb6hwAwbI72MvC69GpZe3lh5
1B1ZgpEiUMa4IT5jJ43Wkm3k8BF6meW+rIUjUBt+y8yjNgaR4degvgnDx44YPZ94
5Ek3cJgpTpVnWbfRxn2b2vRL8rZkRBIq9ezswp0/8KLgC7Gd+zPuQKPvoo2m+n3S
UMufGGT2h5oJbx0qGry5rxZz03eGE6oWAm3H/WRl2wIw5D/kvU5ol6AYKJ5eGTyj
JdPJVzzPBH319iCMZ1olqo/h5er148aYL16ga7w6w9pqhPuxGud30BFf8SHQ8F1R
NIZiu6O3L2ge0RLb/8wxukFkDz3R1gZBWeTLxLEymTJG3TaA3uIByOI6UO03zgW/
QBbNLr7ndkIcm8E31hAWamGQy+EAXj1/e5GYREvhhHOwUV+y/E1FTrrdwtT4GA0S
tBYACfeCbOojsA==
=WqFq
-----END PGP SIGNATURE-----
Merge tag 'locking-futex-2025-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull futex updates from Thomas Gleixner:
- Switch the reference counting to a RCU based per-CPU reference to
address a performance bottleneck vs the single instance rcuref
variant
- Make the futex selftest build on 32-bit architectures which only
support 64-bit time_t, e.g. RISCV-32
- Cleanups and improvements in selftests and futex bench
* tag 'locking-futex-2025-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
selftests/futex: Fix spelling mistake "Succeffuly" -> "Successfully"
selftests/futex: Define SYS_futex on 32-bit architectures with 64-bit time_t
perf bench futex: Remove support for IMMUTABLE
selftests/futex: Remove support for IMMUTABLE
futex: Remove support for IMMUTABLE
futex: Make futex_private_hash_get() static
futex: Use RCU-based per-CPU reference counting instead of rcuref_t
selftests/futex: Adapt the private hash test to RCU related changes
If WAIT_KILLABLE_RECV was specified, and an event is received, the
tracee's syscall is not supposed to be interruptible. This was not properly
ensured if the reply was sent too fast, and an interrupting signal was
received before the reply was processed on the tracee side.
Add a test for this, that consists of:
- a tracee with a timer that keeps sending it signals while repeatedly
running a traced syscall in a loop,
- a tracer that repeatedly handles all syscalls from the tracee in a
loop, and
- a shared pipe between both, on which the tracee sends one byte per
syscall attempted and the tracer reads one byte per syscall handled.
If the syscall for the tracee is restarted after the tracer received the
event for it due to this bug, the tracee will not have sent a second
token on the pipe, which the tracer will notice and fail the test.
The tests also uses SECCOMP_IOCTL_NOTIF_ADDFD with SECCOMP_ADDFD_FLAG_SEND
for the reply, as the fix for the bug has an additional code path
change for handling addfd, which would not be exercised by a simple
SECCOMP_IOCTL_NOTIF_SEND, and it is possible to fix the bug while leaving
the same race intact for the addfd case.
This test is not guaranteed to reproduce the bug on every run, but the
parameters (signal frequency and number of repeated syscalls) have been
chosen so that on my machine this test:
- takes ~0.8s in the good case (+1s in the failure case), and
- detects the bug in 999 of 1000 runs.
Signed-off-by: Johannes Nixdorf <johannes@nixdorf.dev>
Link: https://lore.kernel.org/r/20250725-seccomp-races-v2-2-cf8b9d139596@nixdorf.dev
Signed-off-by: Kees Cook <kees@kernel.org>
Fixes
- false failure of subsystem event test
- glob filter test to use mutex_unlock() instead of mutex_trylock()
- several spelling errors in tests
- test_kexec_jump build errors
- pidfd test duplicate-symbol warnings for SCHED_ CPP symbols
Adds a reliable check for suspend to breakpoints suspend test
Improvements to ipc test
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmiH76IACgkQCwJExA0N
QxyHAw/+PWj6IvX+3qQEhPL1l/huPpknPVqd/UVEozAudtkDMNwTXAzmc8TZAGU1
BiQTMRBothDZVxLa6YyQCNgHcFuE9X4PIDPCeRrtHQmoSIJz5Db7wrgy1LtrStBN
9bOrsoYUFyy5xWlnidreu5IHuZFrLGUdwF4BhlBL9IvNBv2FKujHfzGl0mbc2agH
/pbJuLiDKRWk+Wqgyv8LOBu9wsQSraunyiKxYZ+ICqzhangnH8SBVNejum9B/KUT
P2Jzyjjo9hyZxR85B7951HqubzWZjhrS2I9boDYaI0x8RyOM75pO6cdh681Gdzz9
lluospIzQRsKR1brtP+M5vH7F6yGnrQRvjp/PvW5a410JFYeIiuhCuDhGZzYsPdj
nFFXP/CLlTJ4U1TVAzZ0OpwapoLfSmnUzrmVHFJRDW6YpuhliFbxb4YErCRxmBLX
MLxN7NnPG0fUxO4voy49GiqXfFuYnvgZO1FWJuczKv4NXxPUbc77Y1K9Tv4mV97n
W5e1EO3JDLghUAkoQyDy9oC7g6ZYET1EcqhOeTujazOAVPlIKJ0YOnLqlFppc3F7
oqlXJbRdjamn9w+UcAjWNPQxYAJeK8jyB+S2LkCDLULD3dgJEzBj7szC67Ou3yQB
qKuh9u6xhFKc0xgW2YSm3+Cpj8BPGlKzNf+Ks4TIGF8pvurPCy0=
=DyuD
-----END PGP SIGNATURE-----
Merge tag 'linux_kselftest-next-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest updates from Shuah Khan:
- Fixes:
- false failure of subsystem event test
- glob filter test to use mutex_unlock() instead of mutex_trylock()
- several spelling errors in tests
- test_kexec_jump build errors
- pidfd test duplicate-symbol warnings for SCHED_ CPP symbols
- Add a reliable check for suspend to breakpoints suspend test
- Improvements to ipc test
* tag 'linux_kselftest-next-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/pidfd: Fix duplicate-symbol warnings for SCHED_ CPP symbols
selftests/tracing: Fix false failure of subsystem event test
selftests/kexec: fix test_kexec_jump build
selftests: breakpoints: use suspend_stats to reliably check suspend success
selftests: tracing: Use mutex_unlock for testing glob filter
selftests: print installation complete message
selftests/ptrace: Fix spelling mistake "multible" -> "multiple"
selftests: ipc: Replace fail print statements with ksft_test_result_fail
selftests: Add version file to kselftest installation dir
selftests/cpu-hotplug: fix typo in hotplaggable_offline_cpus function name
- Host driver for GICv5, the next generation interrupt controller for
arm64, including support for interrupt routing, MSIs, interrupt
translation and wired interrupts.
- Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on
GICv5 hardware, leveraging the legacy VGIC interface.
- Userspace control of the 'nASSGIcap' GICv3 feature, allowing
userspace to disable support for SGIs w/o an active state on hardware
that previously advertised it unconditionally.
- Map supporting endpoints with cacheable memory attributes on systems
with FEAT_S2FWB and DIC where KVM no longer needs to perform cache
maintenance on the address range.
- Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the guest
hypervisor to inject external aborts into an L2 VM and take traps of
masked external aborts to the hypervisor.
- Convert more system register sanitization to the config-driven
implementation.
- Fixes to the visibility of EL2 registers, namely making VGICv3 system
registers accessible through the VGIC device instead of the ONE_REG
vCPU ioctls.
- Various cleanups and minor fixes.
-----BEGIN PGP SIGNATURE-----
iI0EABYIADUWIQSNXHjWXuzMZutrKNKivnWIJHzdFgUCaIezbRccb2xpdmVyLnVw
dG9uQGxpbnV4LmRldgAKCRCivnWIJHzdFr/eAQDY5NIG5cR6ZcAWnPQLmGWpz2ou
pq4Jhn9E/mGR3n5L1AEAsJpfLLpOsmnLBdwfbjmW59gGsa8k3i5tjWEOJ6yzAwk=
=r+sp
-----END PGP SIGNATURE-----
Merge tag 'kvmarm-6.17' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 changes for 6.17, round #1
- Host driver for GICv5, the next generation interrupt controller for
arm64, including support for interrupt routing, MSIs, interrupt
translation and wired interrupts.
- Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on
GICv5 hardware, leveraging the legacy VGIC interface.
- Userspace control of the 'nASSGIcap' GICv3 feature, allowing
userspace to disable support for SGIs w/o an active state on hardware
that previously advertised it unconditionally.
- Map supporting endpoints with cacheable memory attributes on systems
with FEAT_S2FWB and DIC where KVM no longer needs to perform cache
maintenance on the address range.
- Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the guest
hypervisor to inject external aborts into an L2 VM and take traps of
masked external aborts to the hypervisor.
- Convert more system register sanitization to the config-driven
implementation.
- Fixes to the visibility of EL2 registers, namely making VGICv3 system
registers accessible through the VGIC device instead of the ONE_REG
vCPU ioctls.
- Various cleanups and minor fixes.
When testing a series of commits that also includes changes to the Linux
tools directory, it is useless to test the changes in tools as they may
not affect the kernel itself. Doing tests on the kernel for changes that
do not affect the kernel is a waste of time.
Add a PATCHCHECK_SKIP that takes a series of shas that will be skipped
while doing the individual commit tests.
For example, the runtime verification may have a series of commits like:
$ git log --abbrev-commit --pretty=oneline fac5493251a6~1..HEAD
3d3800b4f7 rv: Remove rv_reactor's reference counter
3d3c376118 rv: Merge struct rv_reactor_def into struct rv_reactor
24cbfe18d5 rv: Merge struct rv_monitor_def into struct rv_monitor
b0c08dd534 rv: Remove unused field in struct rv_monitor_def
58d5f0d437 (debiantesting-x86-64/trace/rv/core) rv: Return init error when registering monitors
560473f2e2 verification/rvgen: Organise Kconfig entries for nested monitors
9efcf59082 tools/dot2c: Fix generated files going over 100 column limit
1160ccaf77 tools/rv: Stop gracefully also on SIGTERM
f60227f344 tools/rv: Do not skip idle in trace
f3735df628 verification/rvgen: Do not generate unused variables
6fb37c2a27 verification/rvgen: Generate each variable definition only once
8cfcf9b0e9 verification/rvgen: Support the 'next' operator
fac5493251 rv: Allow to configure the number of per-task monitor
Where the first commit touches the kernel followed by a series of commits
that do not, and ends with commits that do. Instead of having to add
multiple patchcheck tests to handle the gaps, just include the commits
that should not be tested:
$ git log --abbrev-commit --pretty=oneline fac5493251a6~1..HEAD |
grep -e verification -e tools/ | cut -d' ' -f1 |
while read a ; do echo -n "$a "; done
560473f2e29efcf590821160ccaf77f60227f344f3735df6286fb37c2a278cfcf9b0e9
Then set PATCHCHECK_SKIP to that, and those commits will be skipped.
PATCHCHECK_SKIP = 560473f2e29efcf590821160ccaf77f60227f344f3735df6286fb37c2a278cfcf9b0e9
Cc: John 'Warthog9' Hawley <warthog9@kernel.org>
Cc: Dhaval Giani <dhaval.giani@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/20250725112153.1dd06b84@gandalf.local.home
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
KVM selftests changes for 6.17
- Fix a comment typo.
- Verify KVM is loaded when getting any KVM module param so that attempting to
run a selftest without kvm.ko loaded results in a SKIP message about KVM not
being loaded/enabled, versus some random parameter not existing.
- SKIP tests that hit EACCES when attempting to access a file, with a "Root
required?" help message. In most cases, the test just needs to be run with
elevated permissions.
KVM x86 misc changes for 6.17
- Prevert the host's DEBUGCTL.FREEZE_IN_SMM (Intel only) when running the
guest. Failure to honor FREEZE_IN_SMM can bleed host state into the guest.
- Explicitly check vmcs12.GUEST_DEBUGCTL on nested VM-Enter (Intel only) to
prevent L1 from running L2 with features that KVM doesn't support, e.g. BTF.
- Intercept SPEC_CTRL on AMD if the MSR shouldn't exist according to the
vCPU's CPUID model.
- Rework the MSR interception code so that the SVM and VMX APIs are more or
less identical.
- Recalculate all MSR intercepts from the "source" on MSR filter changes, and
drop the dedicated "shadow" bitmaps (and their awful "max" size defines).
- WARN and reject loading kvm-amd.ko instead of panicking the kernel if the
nested SVM MSRPM offsets tracker can't handle an MSR.
- Advertise support for LKGS (Load Kernel GS base), a new instruction that's
loosely related to FRED, but is supported and enumerated independently.
- Fix a user-triggerable WARN that syzkaller found by stuffing INIT_RECEIVED,
a.k.a. WFS, and then putting the vCPU into VMX Root Mode (post-VMXON). Use
the same approach KVM uses for dealing with "impossible" emulation when
running a !URG guest, and simply wait until KVM_RUN to detect that the vCPU
has architecturally impossible state.
- Add KVM_X86_DISABLE_EXITS_APERFMPERF to allow disabling interception of
APERF/MPERF reads, so that a "properly" configured VM can "virtualize"
APERF/MPERF (with many caveats).
- Reject KVM_SET_TSC_KHZ if vCPUs have been created, as changing the "default"
frequency is unsupported for VMs with a "secure" TSC, and there's no known
use case for changing the default frequency for other VM types.
KVM IRQ changes for 6.17
- Rework irqbypass to track/match producers and consumers via an xarray
instead of a linked list. Using a linked list leads to O(n^2) insertion
times, which is hugely problematic for use cases that create large numbers
of VMs. Such use cases typically don't actually use irqbypass, but
eliminating the pointless registration is a future problem to solve as it
likely requires new uAPI.
- Track irqbypass's "token" as "struct eventfd_ctx *" instead of a "void *",
to avoid making a simple concept unnecessarily difficult to understand.
- Add CONFIG_KVM_IOAPIC for x86 to allow disabling support for I/O APIC, PIC,
and PIT emulation at compile time.
- Drop x86's irq_comm.c, and move a pile of IRQ related code into irq.c.
- Fix a variety of flaws and bugs in the AVIC device posted IRQ code.
- Inhibited AVIC if a vCPU's ID is too big (relative to what hardware
supports) instead of rejecting vCPU creation.
- Extend enable_ipiv module param support to SVM, by simply leaving IsRunning
clear in the vCPU's physical ID table entry.
- Disable IPI virtualization, via enable_ipiv, if the CPU is affected by
erratum #1235, to allow (safely) enabling AVIC on such CPUs.
- Dedup x86's device posted IRQ code, as the vast majority of functionality
can be shared verbatime between SVM and VMX.
- Harden the device posted IRQ code against bugs and runtime errors.
- Use vcpu_idx, not vcpu_id, for GA log tag/metadata, to make lookups O(1)
instead of O(n).
- Generate GA Log interrupts if and only if the target vCPU is blocking, i.e.
only if KVM needs a notification in order to wake the vCPU.
- Decouple device posted IRQs from VFIO device assignment, as binding a VM to
a VFIO group is not a requirement for enabling device posted IRQs.
- Clean up and document/comment the irqfd assignment code.
- Disallow binding multiple irqfds to an eventfd with a priority waiter, i.e.
ensure an eventfd is bound to at most one irqfd through the entire host,
and add a selftest to verify eventfd:irqfd bindings are globally unique.
With this change, we know the precise rejected function name when
attaching fexit/fmod_ret to __noreturn functions from log.
$ ./fexit
libbpf: prog 'fexit': BPF program load failed: -EINVAL
libbpf: prog 'fexit': -- BEGIN PROG LOAD LOG --
Attaching fexit/fmod_ret to __noreturn function 'do_exit' is rejected.
Suggested-by: Leon Hwang <leon.hwang@linux.dev>
Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
Acked-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250724151454.499040-2-kafai.wan@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-----BEGIN PGP SIGNATURE-----
iIYEABYKAC4WIQSVyBthFV4iTW/VU1/l49DojIL20gUCaINQ2hAcbWljQGRpZ2lr
b2QubmV0AAoJEOXj0OiMgvbS8DcA/RvnXD7NcvINU94pkY6w+SxtdhsIe/w7EcGF
LJdwxrKQAP0WpMNTfKdzfe6ub7qE+AkZYhagKAuMCloS1qE35nZgDg==
=WKRS
-----END PGP SIGNATURE-----
Merge tag 'landlock-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux
Pull landlock update from Mickaël Salaün:
"Fix test issues, improve build compatibility, and add new tests"
* tag 'landlock-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
landlock: Fix cosmetic change
samples/landlock: Fix building on musl libc
landlock: Fix warning from KUnit tests
selftests/landlock: Add test to check rule tied to covered mount point
selftests/landlock: Fix build of audit_test
selftests/landlock: Fix readlink check
This is the main crypto library pull request for 6.17. The main focus
this cycle is on reorganizing the SHA-1 and SHA-2 code, providing
high-quality library APIs for SHA-1 and SHA-2 including HMAC support,
and establishing conventions for lib/crypto/ going forward:
- Migrate the SHA-1 and SHA-512 code (and also SHA-384 which shares
most of the SHA-512 code) into lib/crypto/. This includes both the
generic and architecture-optimized code. Greatly simplify how the
architecture-optimized code is integrated. Add an easy-to-use
library API for each SHA variant, including HMAC support. Finally,
reimplement the crypto_shash support on top of the library API.
- Apply the same reorganization to the SHA-256 code (and also SHA-224
which shares most of the SHA-256 code). This is a somewhat smaller
change, due to my earlier work on SHA-256. But this brings in all
the same additional improvements that I made for SHA-1 and SHA-512.
There are also some smaller changes:
- Move the architecture-optimized ChaCha, Poly1305, and BLAKE2s code
from arch/$(SRCARCH)/lib/crypto/ to lib/crypto/$(SRCARCH)/. For
these algorithms it's just a move, not a full reorganization yet.
- Fix the MIPS chacha-core.S to build with the clang assembler.
- Fix the Poly1305 functions to work in all contexts.
- Fix a performance regression in the x86_64 Poly1305 code.
- Clean up the x86_64 SHA-NI optimized SHA-1 assembly code.
Note that since the new organization of the SHA code is much simpler,
the diffstat of this pull request is negative, despite the addition of
new fully-documented library APIs for multiple SHA and HMAC-SHA
variants. These APIs will allow further simplifications across the
kernel as users start using them instead of the old-school crypto API.
(I've already written a lot of such conversion patches, removing over
1000 more lines of code. But most of those will target 6.18 or later.)
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCaIZ93BQcZWJpZ2dlcnNA
a2VybmVsLm9yZwAKCRDzXCl4vpKOK8HCAQD3O9P0qd6wscne5XuRwaybzKHQ2AqU
OlhlDZWQQEvYAgD/aa6KP/DS+8RKGj0TBn6bACAJyXyDygFXq5a5s9pGzAs=
=UmMM
-----END PGP SIGNATURE-----
Merge tag 'libcrypto-updates-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull crypto library updates from Eric Biggers:
"This is the main crypto library pull request for 6.17. The main focus
this cycle is on reorganizing the SHA-1 and SHA-2 code, providing
high-quality library APIs for SHA-1 and SHA-2 including HMAC support,
and establishing conventions for lib/crypto/ going forward:
- Migrate the SHA-1 and SHA-512 code (and also SHA-384 which shares
most of the SHA-512 code) into lib/crypto/. This includes both the
generic and architecture-optimized code. Greatly simplify how the
architecture-optimized code is integrated. Add an easy-to-use
library API for each SHA variant, including HMAC support. Finally,
reimplement the crypto_shash support on top of the library API.
- Apply the same reorganization to the SHA-256 code (and also SHA-224
which shares most of the SHA-256 code). This is a somewhat smaller
change, due to my earlier work on SHA-256. But this brings in all
the same additional improvements that I made for SHA-1 and SHA-512.
There are also some smaller changes:
- Move the architecture-optimized ChaCha, Poly1305, and BLAKE2s code
from arch/$(SRCARCH)/lib/crypto/ to lib/crypto/$(SRCARCH)/. For
these algorithms it's just a move, not a full reorganization yet.
- Fix the MIPS chacha-core.S to build with the clang assembler.
- Fix the Poly1305 functions to work in all contexts.
- Fix a performance regression in the x86_64 Poly1305 code.
- Clean up the x86_64 SHA-NI optimized SHA-1 assembly code.
Note that since the new organization of the SHA code is much simpler,
the diffstat of this pull request is negative, despite the addition of
new fully-documented library APIs for multiple SHA and HMAC-SHA
variants.
These APIs will allow further simplifications across the kernel as
users start using them instead of the old-school crypto API. (I've
already written a lot of such conversion patches, removing over 1000
more lines of code. But most of those will target 6.18 or later)"
* tag 'libcrypto-updates-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (67 commits)
lib/crypto: arm64/sha512-ce: Drop compatibility macros for older binutils
lib/crypto: x86/sha1-ni: Convert to use rounds macros
lib/crypto: x86/sha1-ni: Minor optimizations and cleanup
crypto: sha1 - Remove sha1_base.h
lib/crypto: x86/sha1: Migrate optimized code into library
lib/crypto: sparc/sha1: Migrate optimized code into library
lib/crypto: s390/sha1: Migrate optimized code into library
lib/crypto: powerpc/sha1: Migrate optimized code into library
lib/crypto: mips/sha1: Migrate optimized code into library
lib/crypto: arm64/sha1: Migrate optimized code into library
lib/crypto: arm/sha1: Migrate optimized code into library
crypto: sha1 - Use same state format as legacy drivers
crypto: sha1 - Wrap library and add HMAC support
lib/crypto: sha1: Add HMAC support
lib/crypto: sha1: Add SHA-1 library functions
lib/crypto: sha1: Rename sha1_init() to sha1_init_raw()
crypto: x86/sha1 - Rename conflicting symbol
lib/crypto: sha2: Add hmac_sha*_init_usingrawkey()
lib/crypto: arm/poly1305: Remove unneeded empty weak function
lib/crypto: x86/poly1305: Fix performance regression on short messages
...
- Introduce and start using TRAILING_OVERLAP() helper for fixing
embedded flex array instances (Gustavo A. R. Silva)
- mux: Convert mux_control_ops to a flex array member in mux_chip
(Thorsten Blum)
- string: Group str_has_prefix() and strstarts() (Andy Shevchenko)
- Remove KCOV instrumentation from __init and __head (Ritesh Harjani,
Kees Cook)
- Refactor and rename stackleak feature to support Clang
- Add KUnit test for seq_buf API
- Fix KUnit fortify test under LTO
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRSPkdeREjth1dHnSE2KwveOeQkuwUCaIfUkgAKCRA2KwveOeQk
uypLAP92r6f47sWcOw/5B9aVffX6Bypsb7dqBJQpCNxI5U1xcAEAiCrZ98UJyOeQ
JQgnXd4N67K4EsS2JDc+FutRn3Yi+A8=
=+5Bq
-----END PGP SIGNATURE-----
Merge tag 'hardening-v6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening updates from Kees Cook:
- Introduce and start using TRAILING_OVERLAP() helper for fixing
embedded flex array instances (Gustavo A. R. Silva)
- mux: Convert mux_control_ops to a flex array member in mux_chip
(Thorsten Blum)
- string: Group str_has_prefix() and strstarts() (Andy Shevchenko)
- Remove KCOV instrumentation from __init and __head (Ritesh Harjani,
Kees Cook)
- Refactor and rename stackleak feature to support Clang
- Add KUnit test for seq_buf API
- Fix KUnit fortify test under LTO
* tag 'hardening-v6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (22 commits)
sched/task_stack: Add missing const qualifier to end_of_stack()
kstack_erase: Support Clang stack depth tracking
kstack_erase: Add -mgeneral-regs-only to silence Clang warnings
init.h: Disable sanitizer coverage for __init and __head
kstack_erase: Disable kstack_erase for all of arm compressed boot code
x86: Handle KCOV __init vs inline mismatches
arm64: Handle KCOV __init vs inline mismatches
s390: Handle KCOV __init vs inline mismatches
arm: Handle KCOV __init vs inline mismatches
mips: Handle KCOV __init vs inline mismatch
powerpc/mm/book3s64: Move kfence and debug_pagealloc related calls to __init section
configs/hardening: Enable CONFIG_INIT_ON_FREE_DEFAULT_ON
configs/hardening: Enable CONFIG_KSTACK_ERASE
stackleak: Split KSTACK_ERASE_CFLAGS from GCC_PLUGINS_CFLAGS
stackleak: Rename stackleak_track_stack to __sanitizer_cov_stack_depth
stackleak: Rename STACKLEAK to KSTACK_ERASE
seq_buf: Introduce KUnit tests
string: Group str_has_prefix() and strstarts()
kunit/fortify: Add back "volatile" for sizeof() constants
acpi: nfit: intel: avoid multiple -Wflex-array-member-not-at-end warnings
...
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmiHdZ8QHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgptRED/9o3dQ1QHL5yNM/AyCCGox0V4zra8qGS/Vc
cBWpAVrmPGRw0IYlLZENtN9PdwKcbMzJq3l6cxeC7dBnAZP0AxTzP4YYJYUNVsqo
WtJ3d/k5+cVp0OyOp4uabaqNeMeLoPk9/JXe1Ml2KxtDmHtj5yee0JRh7zlPZmZj
tsrpIUTeHgAPn6yR1EI+0ybx/mjCb05Mv2Y8gF5hkUPA2PuON+MTFixJmqoy2ySh
n+22mz/prqlyOSYh/VVv1+9jcQ94wMjcW0JIpg9lM3Kg8BCPU4IetvO1UiX6X33v
154zEh2aJJDBx+yORS4BM4JMXjRZI7lYea2dkHM8Cajctu1Wpja9bNwnK9ibXvEc
WtyBwztleLbAZef25fA/W87JE23fGa/r3nwIb2cF4QqkAFslCvhjA93WkOzNJCgQ
qsWOrlCh3IK2NUu4b1Ncs3ZHOPvc51+zzjMzC6SUr54xhrxDK+gngDPhRy7XDqWJ
DTMpIlr366o8GdJqnib0/e/CPBrThS6Vl6u0tgLnNbwdpK1svgo/uHW5ksKvDqHX
kGEIhyRRJJC+4wyl4dsYKXa2twcyFrlWdAE+pZguEC2nZRYqYl9uXftOtvfp1x0y
/skDX0FIDjvyjRqCLcqF03FSGqwCGS8WuWXZjPhVhcfz47NvbHeFDh1G/jMzsbpj
S9zrPve/DQ==
=e86T
-----END PGP SIGNATURE-----
Merge tag 'for-6.17/block-20250728' of git://git.kernel.dk/linux
Pull block updates from Jens Axboe:
- MD pull request via Yu:
- call del_gendisk synchronously (Xiao)
- cleanup unused variable (John)
- cleanup workqueue flags (Ryo)
- fix faulty rdev can't be removed during resync (Qixing)
- NVMe pull request via Christoph:
- try PCIe function level reset on init failure (Keith Busch)
- log TLS handshake failures at error level (Maurizio Lombardi)
- pci-epf: do not complete commands twice if nvmet_req_init()
fails (Rick Wertenbroek)
- misc cleanups (Alok Tiwari)
- Removal of the pktcdvd driver
This has been more than a decade coming at this point, and some
recently revealed breakages that had it causing issues even for cases
where it isn't required made me re-pull the trigger on this one. It's
known broken and nobody has stepped up to maintain the code
- Series for ublk supporting batch commands, enabling the use of
multishot where appropriate
- Speed up ublk exit handling
- Fix for the two-stage elevator fixing which could leak data
- Convert NVMe to use the new IOVA based API
- Increase default max transfer size to something more reasonable
- Series fixing write operations on zoned DM devices
- Add tracepoints for zoned block device operations
- Prep series working towards improving blk-mq queue management in the
presence of isolated CPUs
- Don't allow updating of the block size of a loop device that is
currently under exclusively ownership/open
- Set chunk sectors from stacked device stripe size and use it for the
atomic write size limit
- Switch to folios in bcache read_super()
- Fix for CD-ROM MRW exit flush handling
- Various tweaks, fixes, and cleanups
* tag 'for-6.17/block-20250728' of git://git.kernel.dk/linux: (94 commits)
block: restore two stage elevator switch while running nr_hw_queue update
cdrom: Call cdrom_mrw_exit from cdrom_release function
sunvdc: Balance device refcount in vdc_port_mpgroup_check
nvme-pci: try function level reset on init failure
dm: split write BIOs on zone boundaries when zone append is not emulated
block: use chunk_sectors when evaluating stacked atomic write limits
dm-stripe: limit chunk_sectors to the stripe size
md/raid10: set chunk_sectors limit
md/raid0: set chunk_sectors limit
block: sanitize chunk_sectors for atomic write limits
ilog2: add max_pow_of_two_factor()
nvmet: pci-epf: Do not complete commands twice if nvmet_req_init() fails
nvme-tcp: log TLS handshake failures at error level
docs: nvme: fix grammar in nvme-pci-endpoint-target.rst
nvme: fix typo in status code constant for self-test in progress
nvmet: remove redundant assignment of error code in nvmet_ns_enable()
nvme: fix incorrect variable in io cqes error message
nvme: fix multiple spelling and grammar issues in host drivers
block: fix blk_zone_append_update_request_bio() kernel-doc
md/raid10: fix set but not used variable in sync_request_write()
...
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaINCjwAKCRCRxhvAZXjc
osnVAQCv4rM7sF4yJvGlm1myIJcJy5Sabk2q31qMdI1VHmkcOwD+Mxs7d1aByTS8
/6djhVleq6lcT2LpP9j8YI3Rb+x30QY=
=PF3o
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.17-rc1.bpf' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs bpf updates from Christian Brauner:
"These changes allow bpf to read extended attributes from cgroupfs.
This is useful in redirecting AF_UNIX socket connections based on
cgroup membership of the socket. One use-case is the ability to
implement log namespaces in systemd so services and containers are
redirected to different journals"
* tag 'vfs-6.17-rc1.bpf' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
selftests/kernfs: test xattr retrieval
selftests/bpf: Add tests for bpf_cgroup_read_xattr
bpf: Mark cgroup_subsys_state->cgroup RCU safe
bpf: Introduce bpf_cgroup_read_xattr to read xattr of cgroup's node
kernfs: remove iattr_mutex
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaINCiQAKCRCRxhvAZXjc
orltAQDq3y1anYETz5/FD6P2gXY1W5hXdSm3EHHeacQ1JjTXvgEA2g1lWO7J4anf
oOVE8aSvMow/FOjivLZBYmI65pkYJAE=
=oDKB
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.17-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull pidfs updates from Christian Brauner:
- persistent info
Persist exit and coredump information independent of whether anyone
currently holds a pidfd for the struct pid.
The current scheme allocated pidfs dentries on-demand repeatedly.
This scheme is reaching it's limits as it makes it impossible to pin
information that needs to be available after the task has exited or
coredumped and that should not be lost simply because the pidfd got
closed temporarily. The next opener should still see the stashed
information.
This is also a prerequisite for supporting extended attributes on
pidfds to allow attaching meta information to them.
If someone opens a pidfd for a struct pid a pidfs dentry is allocated
and stashed in pid->stashed. Once the last pidfd for the struct pid
is closed the pidfs dentry is released and removed from pid->stashed.
So if 10 callers create a pidfs dentry for the same struct pid
sequentially, i.e., each closing the pidfd before the other creates a
new one then a new pidfs dentry is allocated every time.
Because multiple tasks acquiring and releasing a pidfd for the same
struct pid can race with each another a task may still find a valid
pidfs entry from the previous task in pid->stashed and reuse it. Or
it might find a dead dentry in there and fail to reuse it and so
stashes a new pidfs dentry. Multiple tasks may race to stash a new
pidfs dentry but only one will succeed, the other ones will put their
dentry.
The current scheme aims to ensure that a pidfs dentry for a struct
pid can only be created if the task is still alive or if a pidfs
dentry already existed before the task was reaped and so exit
information has been was stashed in the pidfs inode.
That's great except that it's buggy. If a pidfs dentry is stashed in
pid->stashed after pidfs_exit() but before __unhash_process() is
called we will return a pidfd for a reaped task without exit
information being available.
The pidfds_pid_valid() check does not guard against this race as it
doens't sync at all with pidfs_exit(). The pid_has_task() check might
be successful simply because we're before __unhash_process() but
after pidfs_exit().
Introduce a new scheme where the lifetime of information associated
with a pidfs entry (coredump and exit information) isn't bound to the
lifetime of the pidfs inode but the struct pid itself.
The first time a pidfs dentry is allocated for a struct pid a struct
pidfs_attr will be allocated which will be used to store exit and
coredump information.
If all pidfs for the pidfs dentry are closed the dentry and inode can
be cleaned up but the struct pidfs_attr will stick until the struct
pid itself is freed. This will ensure minimal memory usage while
persisting relevant information.
The new scheme has various advantages. First, it allows to close the
race where we end up handing out a pidfd for a reaped task for which
no exit information is available. Second, it minimizes memory usage.
Third, it allows to remove complex lifetime tracking via dentries
when registering a struct pid with pidfs. There's no need to get or
put a reference. Instead, the lifetime of exit and coredump
information associated with a struct pid is bound to the lifetime of
struct pid itself.
- extended attributes
Now that we have a way to persist information for pidfs dentries we
can start supporting extended attributes on pidfds. This will allow
userspace to attach meta information to tasks.
One natural extension would be to introduce a custom pidfs.* extended
attribute space and allow for the inheritance of extended attributes
across fork() and exec().
The first simple scheme will allow privileged userspace to set
trusted extended attributes on pidfs inodes.
- Allow autonomous pidfs file handles
Various filesystems such as pidfs and drm support opening file
handles without having to require a file descriptor to identify the
filesystem. The filesystem are global single instances and can be
trivially identified solely on the information encoded in the file
handle.
This makes it possible to not have to keep or acquire a sentinal file
descriptor just to pass it to open_by_handle_at() to identify the
filesystem. That's especially useful when such sentinel file
descriptor cannot or should not be acquired.
For pidfs this means a file handle can function as full replacement
for storing a pid in a file. Instead a file handle can be stored and
reopened purely based on the file handle.
Such autonomous file handles can be opened with or without specifying
a a file descriptor. If no proper file descriptor is used the
FD_PIDFS_ROOT sentinel must be passed. This allows us to define
further special negative fd sentinels in the future.
Userspace can trivially test for support by trying to open the file
handle with an invalid file descriptor.
- Allow pidfds for reaped tasks with SCM_PIDFD messages
This is a logical continuation of the earlier work to create pidfds
for reaped tasks through the SO_PEERPIDFD socket option merged in
923ea4d448 ("Merge patch series "net, pidfs: enable handing out
pidfds for reaped sk->sk_peer_pid"").
- Two minor fixes:
* Fold fs_struct->{lock,seq} into a seqlock
* Don't bother with path_{get,put}() in unix_open_file()
* tag 'vfs-6.17-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (37 commits)
don't bother with path_get()/path_put() in unix_open_file()
fold fs_struct->{lock,seq} into a seqlock
selftests: net: extend SCM_PIDFD test to cover stale pidfds
af_unix: enable handing out pidfds for reaped tasks in SCM_PIDFD
af_unix: stash pidfs dentry when needed
af_unix/scm: fix whitespace errors
af_unix: introduce and use scm_replace_pid() helper
af_unix: introduce unix_skb_to_scm helper
af_unix: rework unix_maybe_add_creds() to allow sleep
selftests/pidfd: decode pidfd file handles withou having to specify an fd
fhandle, pidfs: support open_by_handle_at() purely based on file handle
uapi/fcntl: add FD_PIDFS_ROOT
uapi/fcntl: add FD_INVALID
fcntl/pidfd: redefine PIDFD_SELF_THREAD_GROUP
uapi/fcntl: mark range as reserved
fhandle: reflow get_path_anchor()
pidfs: add pidfs_root_path() helper
fhandle: rename to get_path_anchor()
fhandle: hoist copy_from_user() above get_path_from_fd()
fhandle: raise FILEID_IS_DIR in handle_type
...
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaINCgQAKCRCRxhvAZXjc
os+nAP9LFHUwWO6EBzHJJGEVjJvvzsbzqeYrRFamYiMc5ulPJwD+KW4RIgJa/MWO
pcYE40CacaekD8rFWwYUyszpgmv6ewc=
=wCwp
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.17-rc1.mmap_prepare' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull mmap_prepare updates from Christian Brauner:
"Last cycle we introduce f_op->mmap_prepare() in c84bf6dd2b ("mm:
introduce new .mmap_prepare() file callback").
This is preferred to the existing f_op->mmap() hook as it does require
a VMA to be established yet, thus allowing the mmap logic to invoke
this hook far, far earlier, prior to inserting a VMA into the virtual
address space, or performing any other heavy handed operations.
This allows for much simpler unwinding on error, and for there to be a
single attempt at merging a VMA rather than having to possibly
reattempt a merge based on potentially altered VMA state.
Far more importantly, it prevents inappropriate manipulation of
incompletely initialised VMA state, which is something that has been
the cause of bugs and complexity in the past.
The intent is to gradually deprecate f_op->mmap, and in that vein this
series coverts the majority of file systems to using f_op->mmap_prepare.
Prerequisite steps are taken - firstly ensuring all checks for mmap
capabilities use the file_has_valid_mmap_hooks() helper rather than
directly checking for f_op->mmap (which is now not a valid check) and
secondly updating daxdev_mapping_supported() to not require a VMA
parameter to allow ext4 and xfs to be converted.
Commit bb666b7c27 ("mm: add mmap_prepare() compatibility layer for
nested file systems") handles the nasty edge-case of nested file
systems like overlayfs, which introduces a compatibility shim to allow
f_op->mmap_prepare() to be invoked from an f_op->mmap() callback.
This allows for nested filesystems to continue to function correctly
with all file systems regardless of which callback is used. Once we
finally convert all file systems, this shim can be removed.
As a result, ecryptfs, fuse, and overlayfs remain unaltered so they
can nest all other file systems.
We additionally do not update resctl - as this requires an update to
remap_pfn_range() (or an alternative to it) which we defer to a later
series, equally we do not update cramfs which needs a mixed mapping
insertion with the same issue, nor do we update procfs, hugetlbfs,
syfs or kernfs all of which require VMAs for internal state and hooks.
We shall return to all of these later"
* tag 'vfs-6.17-rc1.mmap_prepare' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
doc: update porting, vfs documentation to describe mmap_prepare()
fs: replace mmap hook with .mmap_prepare for simple mappings
fs: convert most other generic_file_*mmap() users to .mmap_prepare()
fs: convert simple use of generic_file_*_mmap() to .mmap_prepare()
mm/filemap: introduce generic_file_*_mmap_prepare() helpers
fs/xfs: transition from deprecated .mmap hook to .mmap_prepare
fs/ext4: transition from deprecated .mmap hook to .mmap_prepare
fs/dax: make it possible to check dev dax support without a VMA
fs: consistently use can_mmap_file() helper
mm/nommu: use file_has_valid_mmap_hooks() helper
mm: rename call_mmap/mmap_prepare to vfs_mmap/mmap_prepare
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaINAYAAKCRCRxhvAZXjc
opJiAQDXGs+gQcxJ+4BpV4QszT2OJC19oI/f5AQ4PWMJdHgr4AEA7fc6NbBrpmW7
L/tbdAwIiWp8bL1Q8Wy7Q2qldHtcggM=
=KbD9
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.17-rc1.coredump' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull coredump updates from Christian Brauner:
"This contains an extension to the coredump socket and a proper rework
of the coredump code.
- This extends the coredump socket to allow the coredump server to
tell the kernel how to process individual coredumps. This allows
for fine-grained coredump management. Userspace can decide to just
let the kernel write out the coredump, or generate the coredump
itself, or just reject it.
* COREDUMP_KERNEL
The kernel will write the coredump data to the socket.
* COREDUMP_USERSPACE
The kernel will not write coredump data but will indicate to the
parent that a coredump has been generated. This is used when
userspace generates its own coredumps.
* COREDUMP_REJECT
The kernel will skip generating a coredump for this task.
* COREDUMP_WAIT
The kernel will prevent the task from exiting until the coredump
server has shutdown the socket connection.
The flexible coredump socket can be enabled by using the "@@"
prefix instead of the single "@" prefix for the regular coredump
socket:
@@/run/systemd/coredump.socket
- Cleanup the coredump code properly while we have to touch it
anyway.
Split out each coredump mode in a separate helper so it's easy to
grasp what is going on and make the code easier to follow. The core
coredump function should now be very trivial to follow"
* tag 'vfs-6.17-rc1.coredump' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (31 commits)
cleanup: add a scoped version of CLASS()
coredump: add coredump_skip() helper
coredump: avoid pointless variable
coredump: order auto cleanup variables at the top
coredump: add coredump_cleanup()
coredump: auto cleanup prepare_creds()
cred: add auto cleanup method
coredump: directly return
coredump: auto cleanup argv
coredump: add coredump_write()
coredump: use a single helper for the socket
coredump: move pipe specific file check into coredump_pipe()
coredump: split pipe coredumping into coredump_pipe()
coredump: move core_pipe_count to global variable
coredump: prepare to simplify exit paths
coredump: split file coredumping into coredump_file()
coredump: rename do_coredump() to vfs_coredump()
selftests/coredump: make sure invalid paths are rejected
coredump: validate socket path in coredump_parse()
coredump: don't allow ".." in coredump socket path
...
Commit d7f0087381 ("bpf: try harder to deduce register bounds from
different numeric domains") added a second call to __reg_deduce_bounds
in reg_bounds_sync because a single call wasn't enough to converge to a
fixed point in terms of register bounds.
With patch "bpf: Improve bounds when s64 crosses sign boundary" from
this series, Eduard noticed that calling __reg_deduce_bounds twice isn't
enough anymore to converge. The first selftest added in "selftests/bpf:
Test cross-sign 64bits range refinement" highlights the need for a third
call to __reg_deduce_bounds. After instruction 7, reg_bounds_sync
performs the following bounds deduction:
reg_bounds_sync entry: scalar(smin=-655,smax=0xeffffeee,smin32=-783,smax32=-146)
__update_reg_bounds: scalar(smin=-655,smax=0xeffffeee,smin32=-783,smax32=-146)
__reg_deduce_bounds:
__reg32_deduce_bounds: scalar(smin=-655,smax=0xeffffeee,smin32=-783,smax32=-146,umin32=0xfffffcf1,umax32=0xffffff6e)
__reg64_deduce_bounds: scalar(smin=-655,smax=0xeffffeee,smin32=-783,smax32=-146,umin32=0xfffffcf1,umax32=0xffffff6e)
__reg_deduce_mixed_bounds: scalar(smin=-655,smax=0xeffffeee,umin=umin32=0xfffffcf1,umax=0xffffffffffffff6e,smin32=-783,smax32=-146,umax32=0xffffff6e)
__reg_deduce_bounds:
__reg32_deduce_bounds: scalar(smin=-655,smax=0xeffffeee,umin=umin32=0xfffffcf1,umax=0xffffffffffffff6e,smin32=-783,smax32=-146,umax32=0xffffff6e)
__reg64_deduce_bounds: scalar(smin=-655,smax=smax32=-146,umin=0xfffffffffffffd71,umax=0xffffffffffffff6e,smin32=-783,umin32=0xfffffcf1,umax32=0xffffff6e)
__reg_deduce_mixed_bounds: scalar(smin=-655,smax=smax32=-146,umin=0xfffffffffffffd71,umax=0xffffffffffffff6e,smin32=-783,umin32=0xfffffcf1,umax32=0xffffff6e)
__reg_bound_offset: scalar(smin=-655,smax=smax32=-146,umin=0xfffffffffffffd71,umax=0xffffffffffffff6e,smin32=-783,umin32=0xfffffcf1,umax32=0xffffff6e,var_off=(0xfffffffffffffc00; 0x3ff))
__update_reg_bounds: scalar(smin=-655,smax=smax32=-146,umin=0xfffffffffffffd71,umax=0xffffffffffffff6e,smin32=-783,umin32=0xfffffcf1,umax32=0xffffff6e,var_off=(0xfffffffffffffc00; 0x3ff))
In particular, notice how:
1. In the first call to __reg_deduce_bounds, __reg32_deduce_bounds
learns new u32 bounds.
2. __reg64_deduce_bounds is unable to improve bounds at this point.
3. __reg_deduce_mixed_bounds derives new u64 bounds from the u32 bounds.
4. In the second call to __reg_deduce_bounds, __reg64_deduce_bounds
improves the smax and umin bounds thanks to patch "bpf: Improve
bounds when s64 crosses sign boundary" from this series.
5. Subsequent functions are unable to improve the ranges further (only
tnums). Yet, a better smin32 bound could be learned from the smin
bound.
__reg32_deduce_bounds is able to improve smin32 from smin, but for that
we need a third call to __reg_deduce_bounds.
As discussed in [1], there may be a better way to organize the deduction
rules to learn the same information with less calls to the same
functions. Such an optimization requires further analysis and is
orthogonal to the present patchset.
Link: https://lore.kernel.org/bpf/aIKtSK9LjQXB8FLY@mail.gmail.com/ [1]
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Co-developed-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/79619d3b42e5525e0e174ed534b75879a5ba15de.1753695655.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The improvement of the u64/s64 range refinement fixed the invariant
violation that was happening on this test for BPF_JSLT when crossing the
sign boundary.
After this patch, we have one test remaining with a known invariant
violation. It's the same test as fixed here but for 32 bits ranges.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/ad046fb0016428f1a33c3b81617aabf31b51183f.1753695655.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch adds coverage for the new cross-sign 64bits range refinement
logic. The three tests cover the cases when the u64 and s64 ranges
overlap (1) in the negative portion of s64, (2) in the positive portion
of s64, and (3) in both portions.
The first test is a simplified version of a BPF program generated by
syzkaller that caused an invariant violation [1]. It looks like
syzkaller could not extract the reproducer itself (and therefore didn't
report it to the mailing list), but I was able to extract it from the
console logs of a crash.
The principle is similar to the invariant violation described in
commit 6279846b9b ("bpf: Forget ranges when refining tnum after
JSET"): the verifier walks a dead branch, uses the condition to refine
ranges, and ends up with inconsistent ranges. In this case, the dead
branch is when we fallthrough on both jumps. The new refinement logic
improves the bounds such that the second jump is properly detected as
always-taken and the verifier doesn't end up walking a dead branch.
The second and third tests are inspired by the first, but rely on
condition jumps to prepare the bounds instead of ALU instructions. An
R10 write is used to trigger a verifier error when the bounds can't be
refined.
Link: https://syzkaller.appspot.com/bug?extid=c711ce17dd78e5d4fdcf [1]
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/a0e17b00dab8dabcfa6f8384e7e151186efedfdd.1753695655.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch updates the range refinement logic in the reg_bound test to
match the new logic from the previous commit. Without this change, tests
would fail because we end with more precise ranges than the tests
expect.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/b7f6b1fbe03373cca4e1bb6a113035a6cd2b3ff7.1753695655.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
* kvm-arm64/vgic-v4-ctl:
: Userspace control of nASSGIcap, courtesy of Raghavendra Rao Ananta
:
: Allow userspace to decide if support for SGIs without an active state is
: advertised to the guest, allowing VMs from GICv3-only hardware to be
: migrated to to GICv4.1 capable machines.
Documentation: KVM: arm64: Describe VGICv3 registers writable pre-init
KVM: arm64: selftests: Add test for nASSGIcap attribute
KVM: arm64: vgic-v3: Allow userspace to write GICD_TYPER2.nASSGIcap
KVM: arm64: vgic-v3: Allow access to GICD_IIDR prior to initialization
KVM: arm64: vgic-v3: Consolidate MAINT_IRQ handling
KVM: arm64: Disambiguate support for vSGIs v. vLPIs
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
* kvm-arm64/el2-reg-visibility:
: Fixes to EL2 register visibility, courtesy of Marc Zyngier
:
: - Expose EL2 VGICv3 registers via the VGIC attributes accessor, not the
: KVM_{GET,SET}_ONE_REG ioctls
:
: - Condition visibility of FGT registers on the presence of FEAT_FGT in
: the VM
KVM: arm64: selftest: vgic-v3: Add basic GICv3 sysreg userspace access test
KVM: arm64: Enforce the sorting of the GICv3 system register table
KVM: arm64: Clarify the check for reset callback in check_sysreg_table()
KVM: arm64: vgic-v3: Fix ordering of ICH_HCR_EL2
KVM: arm64: Document registers exposed via KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS
KVM: arm64: selftests: get-reg-list: Add base EL2 registers
KVM: arm64: selftests: get-reg-list: Simplify feature dependency
KVM: arm64: Advertise FGT2 registers to userspace
KVM: arm64: Condition FGT registers on feature availability
KVM: arm64: Expose GICv3 EL2 registers via KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS
KVM: arm64: Let GICv3 save/restore honor visibility attribute
KVM: arm64: Define helper for ICH_VTR_EL2
KVM: arm64: Define constant value for ICC_SRE_EL2
KVM: arm64: Don't advertise ICH_*_EL2 registers through GET_ONE_REG
KVM: arm64: Make RVBAR_EL2 accesses UNDEF
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
The current test scripts contain duplicated root permission checks in
multiple locations. This patch consolidates these checks into _common.sh
to eliminate code redundancy.
Link: https://lkml.kernel.org/r/20250718064217.299300-1-lienze@kylinos.cn
Signed-off-by: Enze Li <lienze@kylinos.cn>
Reviewed-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
sysfs.py is testing if non-default additional parameters can be committed.
Add a test case for further reducing the parameters to the default set.
Link: https://lkml.kernel.org/r/20250720171652.92309-23-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
sysfs.py is testing only the default and minimum DAMON parameters. Add
another test case for more non-default additional DAMON parameters
commitment on runtime.
Link: https://lkml.kernel.org/r/20250720171652.92309-22-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
DAMON context commitment assertion is hard-coded for a specific test case.
Split it out into a general version that can be reused for different test
cases.
Link: https://lkml.kernel.org/r/20250720171652.92309-21-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
DAMON monitoring attributes commitment assertion is hard-coded for a
specific test case. Split it out into a general version that can be
reused for different test cases.
Link: https://lkml.kernel.org/r/20250720171652.92309-20-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
DAMOS schemes commitment assertion is hard-coded for a specific test case.
Split it out into a general version that can be reused for different test
cases.
Link: https://lkml.kernel.org/r/20250720171652.92309-19-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Current DAMOS scheme commitment assertion is not testing DAMOS filters.
Add the test.
Link: https://lkml.kernel.org/r/20250720171652.92309-18-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
DAMOS scheme commitment assertion is hard-coded for a specific test case.
Split it out into a general version that can be reused for different test
cases.
Link: https://lkml.kernel.org/r/20250720171652.92309-17-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Current DAMOS commitment assertion is not testing quota destinations
commitment. Add the test.
Link: https://lkml.kernel.org/r/20250720171652.92309-16-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Current DAMOS quota commitment assertion is not testing quota goal
commitment. Add the test.
Link: https://lkml.kernel.org/r/20250720171652.92309-15-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
DamosQuota commitment assertion is hard-coded for a specific test case.
Split it out into a general version that can be reused for different test
cases.
Link: https://lkml.kernel.org/r/20250720171652.92309-14-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
DamosWatermarks commitment assertion is hard-coded for a specific test
case. Split it out into a general version that can be reused for
different test cases.
Link: https://lkml.kernel.org/r/20250720171652.92309-13-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
drgn_dump_damon_status.py is a script for dumping DAMON internal status in
json format. It is being used for seeing if DAMON parameters that are set
using _damon_sysfs.py are actually passed to DAMON in the kernel space.
It is, however, not dumping full DAMON internal status, and it makes
increasing test coverage difficult. Add damos filters dumping for more
tests.
Link: https://lkml.kernel.org/r/20250720171652.92309-12-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
drgn_dump_damon_status.py is a script for dumping DAMON internal status in
json format. It is being used for seeing if DAMON parameters that are set
using _damon_sysfs.py are actually passed to DAMON in the kernel space.
It is, however, not dumping full DAMON internal status, and it makes
increasing test coverage difficult. Add ctx->ops.id dumping for more
tests.
Link: https://lkml.kernel.org/r/20250720171652.92309-11-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
drgn_dump_damon_status.py is a script for dumping DAMON internal status in
json format. It is being used for seeing if DAMON parameters that are set
using _damon_sysfs.py are actually passed to DAMON in the kernel space.
It is, however, not dumping full DAMON internal status, and it makes
increasing test coverage difficult. Add damos->migrate_dests dumping for
more tests.
Link: https://lkml.kernel.org/r/20250720171652.92309-10-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
nr_accesses and age are unsigned int. Use the proper max value.
Link: https://lkml.kernel.org/r/20250720171652.92309-9-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
_damon_sysfs.py contains code for test-purpose DAMON sysfs interface
control. Add support of DAMOS action destination target_nid setup for
more tests.
Link: https://lkml.kernel.org/r/20250720171652.92309-8-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>