TEST_F() allocates and registers its struct __test_metadata via mmap()
inside its constructor, and only then assigns the
_##fixture_##test##_object pointer.
XFAIL_ADD() runs in a constructor too and reads
_##fixture_##test##_object to initialize xfail->test. If XFAIL_ADD runs
first, xfail->test can be NULL and the expected failure will be reported
as FAIL.
Use constructor priorities to ensure TEST_F registration runs before
XFAIL_ADD, without adding extra state or runtime lookups.
Fixes: 2709473c93 ("selftests: kselftest_harness: support using xfail")
Signed-off-by: Sun Jian <sun.jian.kdev@gmail.com>
Link: https://patch.msgid.link/20260225111451.347923-1-sun.jian.kdev@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This validates the previous commit: endpoints with both the signal and
subflow flags should always be marked as used even if it was not
possible to create new subflows due to the MPTCP PM limits.
For this test, an extra endpoint is created with both the signal and the
subflow flags, and limits are set not to create extra subflows. In this
case, an ADD_ADDR is sent, but no subflows are created. Still, the local
endpoint is marked as used, and no warning is fired when removing the
endpoint, after having sent a RM_ADDR.
The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.
Fixes: 85df533a78 ("mptcp: pm: do not ignore 'subflow' if 'signal' flag is also set")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260303-net-mptcp-misc-fixes-7-0-rc2-v1-5-4b5462b6f016@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This validates the previous commit: RM_ADDR were sent over the first
found active subflow which could be the same as the one being removed.
It is more likely to loose this notification.
For this check, RM_ADDR are explicitly dropped when trying to send them
over the initial subflow, when removing the endpoint attached to it. If
it is dropped, the test will complain because some RM_ADDR have not been
received.
Note that only the RM_ADDR are dropped, to allow the linked subflow to
be quickly and cleanly closed. To only drop those RM_ADDR, a cBPF byte
code is used. If the IPTables commands fail, that's OK, the tests will
continue to pass, but not validate this part. This can be ignored:
another subtest fully depends on such command, and will be marked as
skipped.
The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.
Fixes: 8dd5efb1f9 ("mptcp: send ack for rm_addr")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260303-net-mptcp-misc-fixes-7-0-rc2-v1-3-4b5462b6f016@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
By default, the netem qdisc can keep up to 1000 packets under its belly
to deal with the configured rate and delay. The simult flows test-case
simulates very low speed links, to avoid problems due to slow CPUs and
the TCP stack tend to transmit at a slightly higher rate than the
(virtual) link constraints.
All the above causes a relatively large amount of packets being enqueued
in the netem qdiscs - the longer the transfer, the longer the queue -
producing increasingly high TCP RTT samples and consequently increasingly
larger receive buffer size due to DRS.
When the receive buffer size becomes considerably larger than the needed
size, the tests results can flake, i.e. because minimal inaccuracy in the
pacing rate can lead to a single subflow usage towards the end of the
connection for a considerable amount of data.
Address the issue explicitly setting netem limits suitable for the
configured link speeds and unflake all the affected tests.
Fixes: 1a418cb8e8 ("mptcp: simult flow self-tests")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260303-net-mptcp-misc-fixes-7-0-rc2-v1-1-4b5462b6f016@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use %ld (not %lu) for signed long, and pass the actual string length
returned by sprintf() to write_text() instead of sizeof(buf).
Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
list_categories() builds a set directly from the 'category'
field of each test case. Since 'category' is a list,
set(map(...)) attempts to insert lists into a set, which
raises:
TypeError: unhashable type: 'list'
Flatten category lists and collect unique category names
using set.update() instead.
Signed-off-by: Naveen Anandhan <mr.navi8680@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Fix circular locking dependency in cpuset partition code by deferring
housekeeping_update() calls to a workqueue instead of calling them
directly under cpus_read_lock.
- Fix null-ptr-deref in rebuild_sched_domains_cpuslocked() when
generate_sched_domains() returns NULL due to kmalloc failure.
- Fix incorrect cpuset behavior for effective_xcpus in
partition_xcpus_del() and cpuset_update_tasks_cpumask() in
update_cpumasks_hier().
- Fix race between task migration and cgroup iteration.
-----BEGIN PGP SIGNATURE-----
iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCaadVVQ4cdGpAa2VybmVs
Lm9yZwAKCRCxYfJx3gVYGef0AQDLuJE3vzc2VeCBc4rGcj7ZSRmc3tc28lOqHRzi
XEx1iwD+PeFcb9wt1CTqA5hAiIY1LGR/5iO1kTH7paRd16DBRAc=
=S8WE
-----END PGP SIGNATURE-----
Merge tag 'cgroup-for-7.0-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
- Fix circular locking dependency in cpuset partition code by
deferring housekeeping_update() calls to a workqueue instead
of calling them directly under cpus_read_lock
- Fix null-ptr-deref in rebuild_sched_domains_cpuslocked() when
generate_sched_domains() returns NULL due to kmalloc failure
- Fix incorrect cpuset behavior for effective_xcpus in
partition_xcpus_del() and cpuset_update_tasks_cpumask()
in update_cpumasks_hier()
- Fix race between task migration and cgroup iteration
* tag 'cgroup-for-7.0-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup/cpuset: fix null-ptr-deref in rebuild_sched_domains_cpuslocked
cgroup/cpuset: Call housekeeping_update() without holding cpus_read_lock
cgroup/cpuset: Defer housekeeping_update() calls from CPU hotplug to workqueue
cgroup/cpuset: Move housekeeping_update()/rebuild_sched_domains() together
kselftest/cgroup: Simplify test_cpuset_prs.sh by removing "S+" command
cgroup/cpuset: Set isolated_cpus_updating only if isolated_cpus is changed
cgroup/cpuset: Clarify exclusion rules for cpuset internal variables
cgroup/cpuset: Fix incorrect use of cpuset_update_tasks_cpumask() in update_cpumasks_hier()
cgroup/cpuset: Fix incorrect change to effective_xcpus in partition_xcpus_del()
cgroup: fix race between task migration and iteration
- Fix starvation of scx_enable() under fair-class saturation by
offloading the enable path to an RT kthread.
- Fix out-of-bounds access in idle mask initialization on systems with
non-contiguous NUMA node IDs.
- Fix a preemption window during scheduler exit and a refcount underflow
in cgroup init error path.
- Fix SCX_EFLAG_INITIALIZED being a no-op flag.
- Add READ_ONCE() annotations for KCSAN-clean lockless accesses and
replace naked scx_root dereferences with container_of() in kobject
callbacks.
- Tooling and selftest fixes: compilation issues with clang 17,
strtoul() misuse, unused options cleanup, and Kconfig sync.
-----BEGIN PGP SIGNATURE-----
iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCaadTZA4cdGpAa2VybmVs
Lm9yZwAKCRCxYfJx3gVYGdf9AQDmsZ8Y3uOJV/5K5RuEoo6SDPmCjr+JXPZu45kD
+UBj3wD9F8DPq+g+KnD7jILhqUdOTePhhNrVYbVw3e1x29EYBQ0=
=nRTC
-----END PGP SIGNATURE-----
Merge tag 'sched_ext-for-7.0-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext fixes from Tejun Heo:
- Fix starvation of scx_enable() under fair-class saturation by
offloading the enable path to an RT kthread
- Fix out-of-bounds access in idle mask initialization on systems with
non-contiguous NUMA node IDs
- Fix a preemption window during scheduler exit and a refcount
underflow in cgroup init error path
- Fix SCX_EFLAG_INITIALIZED being a no-op flag
- Add READ_ONCE() annotations for KCSAN-clean lockless accesses and
replace naked scx_root dereferences with container_of() in kobject
callbacks
- Tooling and selftest fixes: compilation issues with clang 17,
strtoul() misuse, unused options cleanup, and Kconfig sync
* tag 'sched_ext-for-7.0-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
sched_ext: Fix starvation of scx_enable() under fair-class saturation
sched_ext: Remove redundant css_put() in scx_cgroup_init()
selftests/sched_ext: Fix peek_dsq.bpf.c compile error for clang 17
selftests/sched_ext: Add -fms-extensions to bpf build flags
tools/sched_ext: Add -fms-extensions to bpf build flags
sched_ext: Use READ_ONCE() for plain reads of scx_watchdog_timeout
sched_ext: Replace naked scx_root dereferences in kobject callbacks
sched_ext: Use READ_ONCE() for the read side of dsq->nr update
tools/sched_ext: fix strtoul() misuse in scx_hotplug_seq()
sched_ext: Fix SCX_EFLAG_INITIALIZED being a no-op flag
sched_ext: Fix out-of-bounds access in scx_idle_init_masks()
sched_ext: Disable preemption between scx_claim_exit() and kicking helper work
tools/sched_ext: Add Kconfig to sync with upstream
tools/sched_ext: Sync README.md Kconfig with upstream scx
selftests/sched_ext: Remove duplicated unistd.h include in rt_stall.c
tools/sched_ext: scx_sdt: Remove unused '-f' option
tools/sched_ext: scx_central: Remove unused '-p' option
selftests/sched_ext: Fix unused-result warning for read()
selftests/sched_ext: Abort test loop on signal
Add a selftest to verify that changing xmit_hash_policy to vlan+srcmac
is rejected when a native XDP program is loaded on a bond in 802.3ad
mode. Without the fix in bond_option_xmit_hash_policy_set(), the change
succeeds silently, creating an inconsistent state that triggers a kernel
WARNING in dev_xdp_uninstall() when the bond is torn down.
The test attaches native XDP to a bond0 (802.3ad, layer2+3), then
attempts to switch xmit_hash_policy to vlan+srcmac and asserts the
operation fails. It also verifies the change succeeds after XDP is
detached, confirming the rejection is specific to the XDP-loaded state.
Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
Link: https://patch.msgid.link/20260226080306.98766-3-jiayuan.chen@linux.dev
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
When compiling sched_ext selftests using clang 17.0.6, it raised
compiler crash and build error:
Error at line 68: Unsupport signed division for DAG: 0x55b2f9a60240:
i64 = sdiv 0x55b2f9a609b0, Constant:i64<100>, peek_dsq.bpf.c:68:25 @[
peek_dsq.bpf.c:95:4 @[ peek_dsq.bpf.c:169:8 @[ peek
_dsq.bpf.c:140:6 ] ] ]Please convert to unsigned div/mod
After digging, it's not a compiler error, clang supported Signed division
only when using -mcpu=v4, while we use -mcpu=v3 currently, the better way
is to use unsigned div, see [1] for details.
[1] https://github.com/llvm/llvm-project/issues/70433
Signed-off-by: Zhao Mengmeng <zhaomengmeng@kylinos.cn>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Similar to commit 835a507535 ("selftests/bpf: Add -fms-extensions to
bpf build flags") and commit 639f58a0f4 ("bpftool: Fix build warnings
due to MS extensions")
Fix "declaration does not declare anything" warning by using
-fms-extensions and -Wno-microsoft-anon-tag flags to build bpf programs
that #include "vmlinux.h"
Signed-off-by: Zhao Mengmeng <zhaomengmeng@kylinos.cn>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Commit 1cc93c48b5 ("selftests/net: packetdrill: remove tests for
tcp_rcv_*big") removed the test for the reverted commit 1d2fbaad7c
("tcp: stronger sk_rcvbuf checks") but also the one for commit
9ca48d616e ("tcp: do not accept packets beyond window").
Restore the test with the necessary adaptation: expect a delayed ACK
instead of an immediate one, since tcp_can_ingest() does not fail
anymore for the last data packet.
Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
Link: https://patch.msgid.link/20260301-tcp_rcv_big_endseq-v1-1-86ab7415ab58@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
run_kernel() appended KUnit flags directly to the caller-provided args
list. When exec_tests() calls run_kernel() repeatedly (e.g. with
--run_isolated), each call mutated the same list, causing later runs
to inherit stale filter_glob values and duplicate kunit.enable flags.
Fix this by copying args at the start of run_kernel(). Add a regression
test that calls run_kernel() twice with the same list and verifies the
original remains unchanged.
Fixes: ff9e09a376 ("kunit: tool: support running each suite/test separately")
Signed-off-by: Shuvam Pandey <shuvampandey1@gmail.com>
Reviewed-by: David Gow <david@davidgow.net>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmmjmDIACgkQ6rmadz2v
bTq3gg//QQLOT/FxP2/dDurliDTXvQRr1tUxmIw6s3P6hnz9j/LLEVKpLRVkqd8t
XEwbubPd1TXDRsJ4f26Ew01YUtf9xi6ZQoMe/BL1okxi0ZwQGGRVMkiKOQgRT+rj
qYSN5JMfPzA2AuM6FjBF/hhw24yVRdgKRYBam6D7XLfFf3s8TOhHHjJ925PqEo0t
uJOy4ddDYB9BcGmfoeyiFgUtpPqcYrKIUCLBdwFvT2fnPJvrFFoCF3t7NS9UJu/O
wd6ZPuGWSOl9A7vSheldP6cJUDX8L/5WEGO4/LjN7plkySF0HNv8uq/b1T3kKqoY
Y3unXerLGJUAA9D5wpYAekx9YmvRTPQ/o39oTbquEB4SSJVU/SPUpvFw7m2Moq10
51yuyXLcPlI3xtk0Bd8c/CESSmkRenjWzsuZQhDGhsR0I9mIaALrhf9LaatHtXI5
f5ct73e+beK7Fc0Ze+b0JxDeFvzA3CKfAF0/fvGt0r9VZjBaMD+a3NnscBlyKztW
UCXazcfndMhNfUUWanktbT5YhYPmY7hzVQEOl7HAMGn4yG6XbXXmzzY6BqEXIucM
etueW2msZJHGBHQGe2RK3lxtmiB7/FglJHd86xebkIU2gCzqt8fGUha8AIuJ4rLS
7wxC33DycCofRGWdseVu7PsTasdhSGsHKbXz2fOFOFESOczYRw8=
=fj3P
-----END PGP SIGNATURE-----
Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Pull bpf fixes from Alexei Starovoitov:
- Fix alignment of arm64 JIT buffer to prevent atomic tearing (Fuad
Tabba)
- Fix invariant violation for single value tnums in the verifier
(Harishankar Vishwanathan, Paul Chaignon)
- Fix a bunch of issues found by ASAN in selftests/bpf (Ihor Solodrai)
- Fix race in devmpa and cpumap on PREEMPT_RT (Jiayuan Chen)
- Fix show_fdinfo of kprobe_multi when cookies are not present (Jiri
Olsa)
- Fix race in freeing special fields in BPF maps to prevent memory
leaks (Kumar Kartikeya Dwivedi)
- Fix OOB read in dmabuf_collector (T.J. Mercier)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: (36 commits)
selftests/bpf: Avoid simplification of crafted bounds test
selftests/bpf: Test refinement of single-value tnum
bpf: Improve bounds when tnum has a single possible value
bpf: Introduce tnum_step to step through tnum's members
bpf: Fix race in devmap on PREEMPT_RT
bpf: Fix race in cpumap on PREEMPT_RT
selftests/bpf: Add tests for special fields races
bpf: Retire rcu_trace_implies_rcu_gp() from local storage
bpf: Delay freeing fields in local storage
bpf: Lose const-ness of map in map_check_btf()
bpf: Register dtor for freeing special fields
selftests/bpf: Fix OOB read in dmabuf_collector
selftests/bpf: Fix a memory leak in xdp_flowtable test
bpf: Fix stack-out-of-bounds write in devmap
bpf: Fix kprobe_multi cookies access in show_fdinfo callback
bpf, arm64: Force 8-byte alignment for JIT buffer to prevent atomic tearing
selftests/bpf: Don't override SIGSEGV handler with ASAN
selftests/bpf: Check BPFTOOL env var in detect_bpftool_path()
selftests/bpf: Fix out-of-bounds array access bugs reported by ASAN
selftests/bpf: Fix array bounds warning in jit_disasm_helpers
...
Add 4 test cases to exercise new act_ct binding restrictions:
- Try to attach act_ct to an ets qdisc
- Attach act_ct to an ingress qdisc
- Attach act_ct to a clsact/egress qdisc
- Attach act_ct to a shared block
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/20260225134349.1287037-2-victor@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub reports test flakes on debug kernels:
FAIL: test_udp_gro_ct: Expected software segmentation to occur, had 23 and 17
This test assumes that the kernels nfnetlink_queue module sees N GSO
packets, segments them into M skbs and queues them to userspace for
reinjection.
Hence, if M >= N, no segmentation occurred.
However, its possible that this happens:
- nfnetlink_queue gets GSO packet
- segments that into n skbs
- userspace buffer is full, kernel drops the segmented skbs
-> "toqueue" counter incremented by 1, "fromqueue" is unchanged.
If this happens often enough in a single run, M >= N check triggers
incorrectly.
To solve this, allow the nf_queue.c test program to set the FAIL_OPEN
flag so that the segmented skbs bypass the queueing step in the kernel
if the receive buffer is full.
Also, reduce number of sending socat instances, decrease their priority
and increase nice value for the nf_queue program itself to reduce the
probability of overruns happening in the first place.
Fixes: 59ecffa399 ("selftests: netfilter: nft_queue.sh: add udp fraglist gro test case")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/netdev/20260218184114.0b405b72@kernel.org/
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://patch.msgid.link/20260226161920.1205-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The reg_bounds_crafted tests validate the verifier's range analysis
logic. They focus on the actual ranges and thus ignore the tnum. As a
consequence, they carry the assumption that the tested cases can be
reproduced in userspace without using the tnum information.
Unfortunately, the previous change the refinement logic breaks that
assumption for one test case:
(u64)2147483648 (u32)<op> [4294967294; 0x100000000]
The tested bytecode is shown below. Without our previous improvement, on
the false branch of the condition, R7 is only known to have u64 range
[0xfffffffe; 0x100000000]. With our improvement, and using the tnum
information, we can deduce that R7 equals 0x100000000.
19: (bc) w0 = w6 ; R6=0x80000000
20: (bc) w0 = w7 ; R7=scalar(smin=umin=0xfffffffe,smax=umax=0x100000000,smin32=-2,smax32=0,var_off=(0x0; 0x1ffffffff))
21: (be) if w6 <= w7 goto pc+3 ; R6=0x80000000 R7=0x100000000
R7's tnum is (0; 0x1ffffffff). On the false branch, regs_refine_cond_op
refines R7's u32 range to [0; 0x7fffffff]. Then, __reg32_deduce_bounds
refines the s32 range to 0 using u32 and finally also sets u32=0.
From this, __reg_bound_offset improves the tnum to (0; 0x100000000).
Finally, our previous patch uses this new tnum to deduce that it only
intersect with u64=[0xfffffffe; 0x100000000] in a single value:
0x100000000.
Because the verifier uses the tnum to reach this constant value, the
selftest is unable to reproduce it by only simulating ranges. The
solution implemented in this patch is to change the test case such that
there is more than one overlap value between u64 and the tnum. The max.
u64 value is thus changed from 0x100000000 to 0x300000000.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/50641c6a7ef39520595dcafa605692427c1006ec.1772225741.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch introduces selftests to cover the new bounds refinement
logic introduced in the previous patch. Without the previous patch,
the first two tests fail because of the invariant violation they
trigger. The last test fails because the R10 access is not detected as
dead code. In addition, all three tests fail because of R0 having a
non-constant value in the verifier logs.
In addition, the last two cases are covering the negative cases: when we
shouldn't refine the bounds because the u64 and tnum overlap in at least
two values.
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/90d880c8cf587b9f7dc715d8961cd1b8111d01a8.1772225741.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add a couple of tests to ensure that the refcount drops to zero when we
exercise the race where creation of a special field succeeds the logical
bpf_obj_free_fields done when deleting an element. Prior to previous
changes, the fields would be freed eagerly and repopulate and end up
leaking, causing the reference to not drop down correctly. Running this
test on a kernel without fixes will cause a hang in delete_module, since
the module reference stays active due to the leaked kptr not dropping
it. After the fixes tests succeed as expected.
Reviewed-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260227224806.646888-6-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
- Fix cpufreq warning due to attempting a cross-call with interrupts
masked when reading local AMU counters.
- Fix DEBUG_PREEMPT warning from the delay loop when it tries to access
per-cpu errata workaround state for the virtual counter.
- Re-jig and optimise our TLB invalidation errata workarounds in
preparation for more hardware brokenness.
- Fix GCS mappings to interact properly with PROT_NONE and to avoid
corrupting the pte on CPUs with FEAT_LPA2.
- Fix ioremap_prot() to extract only the memory attributes from the
user pte and ignore all the other 'prot' bits.
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmmh458QHHdpbGxAa2Vy
bmVsLm9yZwAKCRC3rHDchMFjNOA1B/9RhZoGSDEuTKVjue1UtVIaFXs5v0LFqgIf
c5eCtYzHU2djw0zVUv4DHZaPvxmxjzu4G4safzjB23IwMDkeGM2hmDBigw/HgWXN
Nm46UWFnmWrd/58w895r3QAe5wbTuhwzVdj9bUbYNGf9/Tqey/hbn8e2BJFdKC1H
Dt/uBYNuyRLSe94iYTzKcWPKH5CnSAkQ7lshxOhHSq5WF+ybmNywJnSqcTy+2kHb
cU9/TXWd6ck5MqK/pzu9nCLPPGpSmOYCjWPiEB2pWiHRPlykK3r7fhvYlqfySRtx
r14TlLIpevstCc8dr7NS2rj6lkjXKzggWOm0fD/rPusSj5Jn7qkk
=MmjE
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"The diffstat is dominated by changes to our TLB invalidation errata
handling and the introduction of a new GCS selftest to catch one of
the issues that is fixed here relating to PROT_NONE mappings.
- Fix cpufreq warning due to attempting a cross-call with interrupts
masked when reading local AMU counters
- Fix DEBUG_PREEMPT warning from the delay loop when it tries to
access per-cpu errata workaround state for the virtual counter
- Re-jig and optimise our TLB invalidation errata workarounds in
preparation for more hardware brokenness
- Fix GCS mappings to interact properly with PROT_NONE and to avoid
corrupting the pte on CPUs with FEAT_LPA2
- Fix ioremap_prot() to extract only the memory attributes from the
user pte and ignore all the other 'prot' bits"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: topology: Fix false warning in counters_read_on_cpu() for same-CPU reads
arm64: Fix sampling the "stable" virtual counter in preemptible section
arm64: tlb: Optimize ARM64_WORKAROUND_REPEAT_TLBI
arm64: tlb: Allow XZR argument to TLBI ops
kselftest: arm64: Check access to GCS after mprotect(PROT_NONE)
arm64: gcs: Honour mprotect(PROT_NONE) on shadow stack mappings
arm64: gcs: Do not set PTE_SHARED on GCS mappings if FEAT_LPA2 is enabled
arm64: io: Extract user memory type in ioremap_prot()
arm64: io: Rename ioremap_prot() to __ioremap_prot()
Add a new test file bridge_vlan_dump.sh with four test cases that verify
VLANs with different per-VLAN options are not incorrectly grouped into
ranges in the dump output.
The tests verify the kernel's br_vlan_opts_eq_range() function correctly
prevents VLAN range grouping when neigh_suppress, mcast_max_groups,
mcast_n_groups, or mcast_enabled options differ.
Each test verifies that VLANs with different option values appear as
individual entries rather than ranges, and that VLANs with matching
values are properly grouped together.
Example output:
$ ./bridge_vlan_dump.sh
TEST: VLAN range grouping with neigh_suppress [ OK ]
TEST: VLAN range grouping with mcast_max_groups [ OK ]
TEST: VLAN range grouping with mcast_n_groups [ OK ]
TEST: VLAN range grouping with mcast_enabled [ OK ]
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20260225143956.3995415-3-danieller@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dmabuf name allocations can be less than DMA_BUF_NAME_LEN characters,
but bpf_probe_read_kernel always tries to read exactly that many bytes.
If a name is less than DMA_BUF_NAME_LEN characters,
bpf_probe_read_kernel will read past the end. bpf_probe_read_kernel_str
stops at the first NUL terminator so use it instead, like
iter_dmabuf_for_each already does.
Fixes: ae5d2c59ec ("selftests/bpf: Add test for dmabuf_iter")
Reported-by: Jerome Lee <jaewookl@quicinc.com>
Signed-off-by: T.J. Mercier <tjmercier@google.com>
Link: https://lore.kernel.org/r/20260225003349.113746-1-tjmercier@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The child_ns_mode sysctl parameter becomes write-once in a future patch
in this series, which breaks existing tests. This patch updates the
tests to respect this new policy. No additional tests are added.
Add "global-parent" and "local-parent" namespaces as intermediaries to
spawn namespaces in the given modes. This avoids the need to change
"child_ns_mode" in the init_ns. nsenter must be used because ip netns
unshares the mount namespace so nested "ip netns add" breaks exec calls
from the init ns. Adds nsenter to the deps check.
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://patch.msgid.link/20260223-vsock-ns-write-once-v3-1-c0cde6959923@meta.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add a test for the issue that was fixed in "team: avoid NETDEV_CHANGEMTU
event when unregistering slave".
The test hangs due to a reference count leak without the fix:
# make -C tools/testing/selftests TARGETS="drivers/net/team" TEST_PROGS=refleak.sh TEST_GEN_PROGS="" run_tests
[...]
TAP version 13
1..1
# timeout set to 45
# selftests: drivers/net/team: refleak.sh
[ 50.681299][ T496] unregister_netdevice: waiting for dummy1 to become free. Usage count = 3
[ 71.185325][ T496] unregister_netdevice: waiting for dummy1 to become free. Usage count = 3
And passes with the fix:
# make -C tools/testing/selftests TARGETS="drivers/net/team" TEST_PROGS=refleak.sh TEST_GEN_PROGS="" run_tests
[...]
TAP version 13
1..1
# timeout set to 45
# selftests: drivers/net/team: refleak.sh
ok 1 selftests: drivers/net/team: refleak.sh
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260224125709.317574-3-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add a packetdrill test that verifies we accept bare FIN packets when
the advertised receive window is zero.
Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260224-fix_zero_wnd_fin-v2-2-a16677ea7cea@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A GCS mapping should not be accessible after mprotect(PROT_NONE). Add a
kselftest for this scenario.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
The bpftool_maps_access and bpftool_metadata tests may fail on BPF CI
with "command not found", depending on a workflow.
This happens because detect_bpftool_path() only checks two hardcoded
relative paths:
- ./tools/sbin/bpftool
- ../tools/sbin/bpftool
Add support for a BPFTOOL environment variable that allows specifying
the exact path to the bpftool binary.
Acked-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260223191118.655185-2-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
- kmem_cache_iter: remove unnecessary debug output
- lwt_seg6local: change the type of foobar to char[]
- the sizeof(foobar) returned the pointer size and not a string
length as intended
- verifier_log: increase prog_name buffer size in verif_log_subtest()
- compiler has a conservative estimate of fixed_log_sz value, making
ASAN complain on snprint() call
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260223191118.655185-1-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Compiler cannot infer upper bound for labels.cnt and warns about
potential buffer overflow in snprintf. Add an explicit bounds
check (... && i < MAX_LOCAL_LABELS) in the loop condition to fix the
warning.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260223190736.649171-18-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
ASAN reported a resource leak due to the bpf_object not being tracked
in test_sysctl. Add obj field to struct sysctl_test to properly clean
it up.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260223190736.649171-17-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
ASAN reported a "joining already joined thread" error. The
release_child() may be called multiple times for the same struct
child.
Fix by resetting child->thread to 0 after pthread_join.
Also memset(0) static child variable in test_attach_api().
Acked-by: Mykyta Yatsenko <yatsenko@meta.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260223190736.649171-15-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
ASAN reported a use-after-free in close_xsk().
The xsk->socket internally references xsk->umem via socket->ctx->umem,
so the socket must be deleted before the umem. Fix the order of
operations in close_xsk().
Acked-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260223190736.649171-14-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The Close() macro uses the passed in expression three times, which
leads to repeated execution in case it has side effects. That is,
Close(i--) would decrement i three times.
ASAN caught a stack-buffer-undeflow error at a point where this was
overlooked. Fix it.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260223190736.649171-12-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
ASAN reported a memory leak in bpf_get_ksyms(): it allocates a struct
ksyms internally and never frees it.
Move struct ksyms to trace_helpers.h and return it from the
bpf_get_ksyms(), giving ownership to the caller. Add filtered_syms and
filtered_cnt fields to the ksyms to hold the filtered array of
symbols, previously returned by bpf_get_ksyms().
Fixup the call sites: kprobe_multi_test and bench_trigger.
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260223190736.649171-10-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add a denylist file for tests that should be skipped when built with
userspace ASAN:
$ make ... SAN_CFLAGS="-fsanitize=address -fno-omit-frame-pointer"
Skip the following tests:
- *arena*: userspace ASAN does not understand BPF arena maps and gets
confused particularly when map_extra is non-zero
- non-zero map_extra leads to mmap with MAP_FIXED, and ASAN treats
this as an unknown memory region
- task_local_data: ASAN complains about "incorrect" aligned_alloc()
usage, but it's intentional in the test
- uprobe_multi_test: very slow with ASAN enabled
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260223190736.649171-9-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
EXTRA_* and SAN_* build flags were not correctly propagated to bpftool
and resolve_btids when building selftests/bpf. This led to various
build errors on attempt to build with SAN_CFLAGS="-fsanitize=address",
for example.
Fix the makefiles to address this:
- Pass SAN_CFLAGS/SAN_LDFLAGS to bpftool and resolve_btfids build
- Propagate EXTRA_LDFLAGS to resolve_btfids link command
- Use pkg-config to detect zlib and zstd for resolve_btfids, similar
libelf handling
Also check for ASAN flag in selftests/bpf/Makefile for convenience.
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260223190736.649171-7-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Replace strncpy() with memcpy() in cases where the source is
non-NULL-terminated and the copy length is known.
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260223190736.649171-6-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Verify Wacom devices set INPUT_PROP_DIRECT on display devices and
INPUT_PROP_POINTER on opaque devices. Verify INPUT_PROP_POINTER
is not set on display devices and INPUT_PROP_DIRECT is not set
on opaque devices.
Moved test_prop_pointer into TestOpaqueTablet. Created a
DirectTabletTest mixin class for test_prop_direct that can be
inherited by display tablet test classes.Used DirectTabletTest
for TestDTH2452Tablet case.
Signed-off-by: Alex Tran <alex.t.tran@gmail.com>
Tested-by: Erin Skomra <erin.skomra@wacom.com>
Reviewed-by: Erin Skomra <erin.skomra@wacom.com>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
The cpuset_handle_hotplug() may need to invoke housekeeping_update(),
for instance, when an isolated partition is invalidated because its
last active CPU has been put offline.
As we are going to enable dynamic update to the nozh_full housekeeping
cpumask (HK_TYPE_KERNEL_NOISE) soon with the help of CPU hotplug,
allowing the CPU hotplug path to call into housekeeping_update() directly
from update_isolation_cpumasks() will likely cause deadlock. So we
have to defer any call to housekeeping_update() after the CPU hotplug
operation has finished. This is now done via the workqueue where
the update_hk_sched_domains() function will be invoked via the
hk_sd_workfn().
An concurrent cpuset control file write may have executed the required
update_hk_sched_domains() function before the work function is called. So
the work function call may become a no-op when it is invoked.
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
The "S+" command is used in the test matrix to enable the cpuset
controller. However this can be done automatically and we never use the
"S-" command to disable cpuset controller. Simplify the test matrix and
reduce clutter by removing the command and doing that automatically.
There is no functional change to the test cases.
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
The header <unistd.h> is included twice in rt_stall.c. Remove the
redundant inclusion to clean up the code.
Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
The read() call in run_test() triggers a warn_unused_result compiler
warning, which breaks the build under -Werror.
Check the return value of read() and exit the child process on failure to
satisfy the compiler and handle pipe read errors.
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
The runner sets exit_req on SIGINT/SIGTERM but ignores it during the
main loop. This prevents users from cleanly interrupting a test run.
Check exit_req each iteration to safely break out on exit signals.
Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
Acked-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
- Various bug fixes for the example schedulers and selftests.
-----BEGIN PGP SIGNATURE-----
iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCaZkltg4cdGpAa2VybmVs
Lm9yZwAKCRCxYfJx3gVYGR+3AQC2h+P8tZXpQ3tzNRgQ10KvekO+uetee4d3vb4O
db5FDwEAh5Binq/tRSzLm7XEb/YmmP2XmHmhmnbRQL69dzMt5gM=
=agf0
-----END PGP SIGNATURE-----
Merge tag 'sched_ext-for-7.0-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext fixes from Tejun Heo:
- Various bug fixes for the example schedulers and selftests
* tag 'sched_ext-for-7.0-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
tools/sched_ext: fix getopt not re-parsed on restart
tools/sched_ext: scx_userland: fix data races on shared counters
tools/sched_ext: scx_pair: fix stride == 0 crash on single-CPU systems
tools/sched_ext: scx_central: fix CPU_SET and skeleton leak on early exit
tools/sched_ext: scx_userland: fix stale data on restart
tools/sched_ext: scx_flatcg: fix potential stack overflow from VLA in fcg_read_stats
selftests/sched_ext: Fix rt_stall flaky failure
tools/sched_ext: scx_userland: fix restart and stats thread lifecycle bugs
tools/sched_ext: scx_central: fix sched_setaffinity() call with the set size
tools/sched_ext: scx_flatcg: zero-initialize stats counter array
Add a test to verify that RSS contexts persist across interface
down/up along with their associated Ntuple filters. Another test
that creates contexts/rules keeping interface down and test their
persistence is also added.
Tested on bnxt_en:
TAP version 13
1..1
# timeout set to 0
# selftests: drivers/net/hw: rss_ctx.py
# TAP version 13
# 1..2
# ok 1 rss_ctx.test_rss_context_persist_create_and_ifdown
# ok 2 rss_ctx.test_rss_context_persist_ifdown_and_create # SKIP Create context not supported with interface down
# # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:1 error:0
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20260219185313.2682148-4-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Current release - new code bugs:
- net: fix backlog_unlock_irq_restore() vs CONFIG_PREEMPT_RT
- eth: mlx5e: XSK, Fix unintended ICOSQ change
- phy_port: correctly recompute the port's linkmodes
- vsock: prevent child netns mode switch from local to global
- couple of kconfig fixes for new symbols
Previous releases - regressions:
- nfc: nci: fix false-positive parameter validation for packet data
- net: do not delay zero-copy skbs in skb_attempt_defer_free()
Previous releases - always broken:
- mctp: ensure our nlmsg responses to user space are zero-initialised
- ipv6: ioam: fix heap buffer overflow in __ioam6_fill_trace_data()
- fixes for ICMP rate limiting
Misc:
- intel: fix PCI device ID conflict between i40e and ipw2200
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmmXUh8ACgkQMUZtbf5S
IrufYA//ZVj+4gvegqKwKZYXNBndVW00GGTYqaILbaenK1olUVUelVB91eV2Klc/
dXCeKG/MgEPuT89IjkPzVr2Yg4x6uhjcQL1rsahORn+GuQfSI/P8y7ysDOPnHVeM
Rtsg1m8z3EizJcHPeAJe7nEqFzfvZ2m+FCEGe++z8BYaUZUVApytgpIWOHO/aB+p
t13bCNzd05XxPphMl610T00Fncj2jCVDHILMgTB5rmFmkeJuQwNrRGXQSoQame46
+g+yCZjT0eVTrBaH1EUssWfrOT3VJj3BEee6gSp7k9mxMkbW18i8shBgmxS+EHjk
u19wwBzSrHK+JY1UExim+1E/rZisQVmEE1Gs0ALedxAu9zC/Julzfa2/+BFsc0j7
QTXd4jukG3aTPIX8v3TV2Igu0j+bAT4WdpzvnsXXBMVKy7wFYMd1+aSOLyFH2W9L
qRbg50oUATcsz77bZt6YUTJEgua4HXNYGtn15FMZOR7HJVR2L44Q5TK5mQxGp5iM
GabeKMzg6bsjE98STM3nbWks3pIb9ptIk++i0913eSqKgn84bDPtp3Gabfgle2SJ
8gjKS61K8rDt5x8StXVod7oGQ4asL8RJyOtE/avgbWUu9BNH8/oKqsE6TQrpXauv
1ndiyim/mPe4fBCxkVAi2+uq5/ph9z8XyleESz9VYwyL3Rl4nsg=
=qSCj
-----END PGP SIGNATURE-----
Merge tag 'net-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from Netfilter.
Current release - new code bugs:
- net: fix backlog_unlock_irq_restore() vs CONFIG_PREEMPT_RT
- eth: mlx5e: XSK, Fix unintended ICOSQ change
- phy_port: correctly recompute the port's linkmodes
- vsock: prevent child netns mode switch from local to global
- couple of kconfig fixes for new symbols
Previous releases - regressions:
- nfc: nci: fix false-positive parameter validation for packet data
- net: do not delay zero-copy skbs in skb_attempt_defer_free()
Previous releases - always broken:
- mctp: ensure our nlmsg responses to user space are zero-initialised
- ipv6: ioam: fix heap buffer overflow in __ioam6_fill_trace_data()
- fixes for ICMP rate limiting
Misc:
- intel: fix PCI device ID conflict between i40e and ipw2200"
* tag 'net-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (85 commits)
net: nfc: nci: Fix parameter validation for packet data
net/mlx5e: Use unsigned for mlx5e_get_max_num_channels
net/mlx5e: Fix deadlocks between devlink and netdev instance locks
net/mlx5e: MACsec, add ASO poll loop in macsec_aso_set_arm_event
net/mlx5: Fix misidentification of write combining CQE during poll loop
net/mlx5e: Fix misidentification of ASO CQE during poll loop
net/mlx5: Fix multiport device check over light SFs
bonding: alb: fix UAF in rlb_arp_recv during bond up/down
bnge: fix reserving resources from FW
eth: fbnic: Advertise supported XDP features.
rds: tcp: fix uninit-value in __inet_bind
net/rds: Fix NULL pointer dereference in rds_tcp_accept_one
octeontx2-af: Fix default entries mcam entry action
net/mlx5e: XSK, Fix unintended ICOSQ change
ipv6: icmp: icmpv6_xrlim_allow() optimization if net.ipv6.icmp.ratelimit is zero
ipv4: icmp: icmpv4_xrlim_allow() optimization if net.ipv4.icmp_ratelimit is zero
ipv6: icmp: remove obsolete code in icmpv6_xrlim_allow()
inet: move icmp_global_{credit,stamp} to a separate cache line
icmp: prevent possible overflow in icmp_global_allow()
selftests/net: packetdrill: add ipv4-mapped-ipv6 tests
...
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmmXR6QACgkQ6rmadz2v
bTqVjg/+PZPMKGBMfF5uWk74LWYQIt01ePfHH2QeA4DsOwNK+9Q1+jmCLRPa/diL
Ds//ZIEMatmtdd1eO5aHyGXE1sBsJ02LfKOhsPukQyzD/FtZ4BmQpzpG2mK5o1M5
NAH6wxY+6Tr8UlXQtoTF1FFXSa6Y0vQmkyXofOoUgSBAxTPMGVQnWw4bq7mUAX9A
G6/TnPDgGbNLejPCmu8mERCkqRjIGAgjBUItVeiHbdxymtzjHcrH7nwxuP59djR9
1AhMrJnyV+s7iEMkAKGkE6NOID73R/YQEqmvD1eX0AWvqdR8+4lOHT0KPU039JqT
RQV5JgXSfeEkdUtyvqQJZdiJinjFLOwp4CGcX+DKcvUpAKmLx8q3ihPiuWk8+JOV
fnosXQIeQ7B9EuTvoNoNTfvU/MuV8vWd3/1kQc+KGXhzk944Ypb29zywGEoGZarU
eb7YRtUIsXBo7H2K1juqTvj72jyhG83cbZxE5+pR2gv87yGgUvt0r0u0FvzkQf2c
Fq671n6UaOd+0ZYey7YG9bIc+WMTbsjqVY0f7+3Rcl3Q68we0HHdqiPoF637/a1r
lS6TZLRDJmykDPp03db97UcQml6RKJiyTnTNQVUF4Uk+VF1KTreXK/D8fD401gxZ
GWuLt/bjq8l/EVJOhzpaO0JDejmZVRaOQUX8t9DwstS+FFbSyBs=
=mb+y
-----END PGP SIGNATURE-----
Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Pull bpf fixes from Alexei Starovoitov:
- Fix invalid write loop logic in libbpf's bpf_linker__add_buf() (Amery
Hung)
- Fix a potential use-after-free of BTF object (Anton Protopopov)
- Add feature detection to libbpf and avoid moving arena global
variables on older kernels (Emil Tsalapatis)
- Remove extern declaration of bpf_stream_vprintk() from libbpf headers
(Ihor Solodrai)
- Fix truncated netlink dumps in bpftool (Jakub Kicinski)
- Fix map_kptr grace period wait in bpf selftests (Kumar Kartikeya
Dwivedi)
- Remove hexdump dependency while building bpf selftests (Matthieu
Baerts)
- Complete fsession support in BPF trampolines on riscv (Menglong Dong)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: Remove hexdump dependency
libbpf: Remove extern declaration of bpf_stream_vprintk()
selftests/bpf: Use vmlinux.h in test_xdp_meta
bpftool: Fix truncated netlink dumps
libbpf: Delay feature gate check until object prepare time
libbpf: Do not use PROG_TYPE_TRACEPOINT program for feature gating
bpf: Add a map/btf from a fd array more consistently
selftests/bpf: Fix map_kptr grace period wait
selftests/bpf: enable fsession_test on riscv64
selftests/bpf: Adjust selftest due to function rename
bpf, riscv: add fsession support for trampolines
bpf: Fix a potential use-after-free of BTF object
bpf, riscv: introduce emit_store_stack_imm64() for trampoline
libbpf: Fix invalid write loop logic in bpf_linker__add_buf()
libbpf: Add gating for arena globals relocation feature
Total patches: 7
Reviews/patch: 0.57
Reviewed rate: 42%
- The 2 patch series "two fixes in kho_populate()" from Ran Xiaokai
fixes a couple of not-major issues in the kexec handover code.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaZaKBAAKCRDdBJ7gKXxA
jpB1AP9UpNzT63aGDnB6G8pgekSdK/I2gypZI3cS7MpBPorRUgEAhcClc2//zWGK
0Wz1rxh3sWIE/pzd/yOEsv+7oQHeDQA=
=oUp2
-----END PGP SIGNATURE-----
Merge tag 'mm-nonmm-stable-2026-02-18-19-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more non-MM updates from Andrew Morton:
- "two fixes in kho_populate()" fixes a couple of not-major issues in
the kexec handover code (Ran Xiaokai)
- misc singletons
* tag 'mm-nonmm-stable-2026-02-18-19-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
lib/group_cpus: handle const qualifier from clusters allocation type
kho: remove unnecessary WARN_ON(err) in kho_populate()
kho: fix missing early_memunmap() call in kho_populate()
scripts/gdb: implement x86_page_ops in mm.py
objpool: fix the overestimation of object pooling metadata size
selftests/memfd: use IPC semaphore instead of SIGSTOP/SIGCONT
delayacct: fix build regression on accounting tool
Total patches: 36
Reviews/patch: 1.77
Reviewed rate: 83%
- The 2 patch series "mm/vmscan: fix demotion targets checks in
reclaim/demotion" from Bing Jiao fixes a couple of issues in the
demotion code - pages were failed demotion and were finding themselves
demoted into disallowed nodes.
- The 11 patch series "Remove XA_ZERO from error recovery of dup_mmap()"
from Liam Howlett fixes a rare mapledtree race and performs a number of
cleanups.
- The 13 patch series "mm: add bitmap VMA flag helpers and convert all
mmap_prepare to use them" from Lorenzo Stoakes implements a lot of
cleanups following on from the conversion of the VMA flags into a
bitmap.
- The 5 patch series "support batch checking of references and unmapping
for large folios" from Baolin Wang implements batching to greatly
improve the performance of reclaiming clean file-backed large folios.
- The 3 patch series "selftests/mm: add memory failure selftests" from
Miaohe Lin does as claimed.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaZaIEQAKCRDdBJ7gKXxA
jj73AQCQDwLoipDiQRGyjB5BDYydymWuDoiB1tlDPHfYAP3b/QD/UQtVlOEXqwM3
naOKs3NQ1pwnfhDaQMirGw2eAnJ1SQY=
=6Iif
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2026-02-18-19-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more MM updates from Andrew Morton:
- "mm/vmscan: fix demotion targets checks in reclaim/demotion" fixes a
couple of issues in the demotion code - pages were failed demotion
and were finding themselves demoted into disallowed nodes (Bing Jiao)
- "Remove XA_ZERO from error recovery of dup_mmap()" fixes a rare
mapledtree race and performs a number of cleanups (Liam Howlett)
- "mm: add bitmap VMA flag helpers and convert all mmap_prepare to use
them" implements a lot of cleanups following on from the conversion
of the VMA flags into a bitmap (Lorenzo Stoakes)
- "support batch checking of references and unmapping for large folios"
implements batching to greatly improve the performance of reclaiming
clean file-backed large folios (Baolin Wang)
- "selftests/mm: add memory failure selftests" does as claimed (Miaohe
Lin)
* tag 'mm-stable-2026-02-18-19-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (36 commits)
mm/page_alloc: clear page->private in free_pages_prepare()
selftests/mm: add memory failure dirty pagecache test
selftests/mm: add memory failure clean pagecache test
selftests/mm: add memory failure anonymous page test
mm: rmap: support batched unmapping for file large folios
arm64: mm: implement the architecture-specific clear_flush_young_ptes()
arm64: mm: support batch clearing of the young flag for large folios
arm64: mm: factor out the address and ptep alignment into a new helper
mm: rmap: support batched checks of the references for large folios
tools/testing/vma: add VMA userland tests for VMA flag functions
tools/testing/vma: separate out vma_internal.h into logical headers
tools/testing/vma: separate VMA userland tests into separate files
mm: make vm_area_desc utilise vma_flags_t only
mm: update all remaining mmap_prepare users to use vma_flags_t
mm: update shmem_[kernel]_file_*() functions to use vma_flags_t
mm: update secretmem to use VMA flags on mmap_prepare
mm: update hugetlbfs to use VMA flags on mmap_prepare
mm: add basic VMA flag operation helper functions
tools: bitmap: add missing bitmap_[subset(), andnot()]
mm: add mk_vma_flags() bitmap flag macro helper
...
Add ipv4-mapped-ipv6 case to ksft_runner.sh before
an upcoming TCP fix in this area.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260217142924.1853498-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The verification signature header generation requires converting a
binary certificate to a C array. Previously this only worked with xxd,
and a switch to hexdump has been done in commit b640d556a2
("selftests/bpf: Remove xxd util dependency").
hexdump is a more common utility program, yet it might not be installed
by default. When it is not installed, BPF selftests build without
errors, but tests_progs is unusable: it exits with the 255 code and
without any error messages. When manually reproducing the issue, it is
not too hard to find out that the generated verification_cert.h file is
incorrect, but that's time consuming. When digging the BPF selftests
build logs, this line can be seen amongst thousands others, but ignored:
/bin/sh: 2: hexdump: not found
Here, od is used instead of hexdump. od is coming from the coreutils
package, and this new od command produces the same output when using od
from GNU coreutils, uutils, and even busybox. This is more portable, and
it produces a similar results to what was done before with hexdump:
there is an extra comma at the end instead of trailing whitespaces,
but the C code is not impacted.
Fixes: b640d556a2 ("selftests/bpf: Remove xxd util dependency")
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/r/20260218-bpf-sft-hexdump-od-v2-1-2f9b3ee5ab86@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
- Replace linux/* includes with vmlinux.h
- Include errno.h
- Include bpf_tracing_net.h for TC_ACT_* and ETH_*
- Use BPF_STDERR instead of BPF_STREAM_STDERR
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260218215651.2057673-2-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Since we started running selftests in NIPA we have been seeing
tc_actions.sh generate a soft lockup warning on ~20% of the runs.
On the pre-netdev foundation setup it was actually a missed irq
splat from the console. Now it's either that or a lockup.
I initially suspected a socket locking issue since the test
is exercising local loopback with act_mirred.
After hours of staring at this I noticed in strace that ncat
when -o $file is specified _both_ saves the output to the file
and still prints it to stdout. Because the file being sent
is constructed with:
dd conv=sparse status=none if=/dev/zero bs=1M count=2 of=$mirred
^^^^^^^^^
the data printed is all \0. Most terminals don't display nul
characters (and neither does vng output capture save them).
But QEMU's serial console still has to poke them thru which
is very slow and causes the lockup (if the file is >600kB).
Replace the '-o $file' with '> $file'. This speeds the test up
from 2m20s to 18s on debug kernels, and prevents the warnings.
Fixes: ca22da2fbd ("act_mirred: use the backlog for nested calls to mirred ingress")
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260214035159.2119699-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The tests use the tc pedit action to modify the IPv4 source address
("pedit ex munge ip src set"), but the IP header checksum is not
recalculated after the modification. As a result, the modified packet
fails sanity checks in br_netfilter after bridging and is dropped,
which causes the test to fail.
Fix this by ensuring net.bridge.bridge-nf-call-iptables is set to 0
during the test execution. This prevents the bridge from passing
L2 traffic to netfilter, bypassing the checksum validation that
causes the test failure.
Fixes: 92ad382894 ("selftests: forwarding: Add a test for pedit munge SIP and DIP")
Fixes: 226657ba23 ("selftests: forwarding: Add a forwarding test for pedit munge dsfield")
Signed-off-by: Aleksei Oladko <aleksey.oladko@virtuozzo.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260213131907.43351-4-aleksey.oladko@virtuozzo.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The test generates VXLAN traffic using mausezahn, where the encapsulated
inner IPv6 packet has an incorrect payload length set in the IPv6 header.
After VXLAN decapsulation, such packets do not pass sanity checks in
br_netfilter and are dropped, which causes the test to fail.
Fix this by setting the correct IPv6 payload length for the encapsulated
packet generated by mausezahn, so that the packet is accepted
by br_netfilter.
tools/testing/selftests/net/forwarding/vxlan_bridge_1d_ipv6.sh
lines 698-706
)"00:03:"$( : Payload length
)"3a:"$( : Next header
)"04:"$( : Hop limit
)"$saddr:"$( : IP saddr
)"$daddr:"$( : IP daddr
)"80:"$( : ICMPv6.type
)"00:"$( : ICMPv6.code
)"00:"$( : ICMPv6.checksum
)
Data after IPv6 header:
• 80: — 1 byte (ICMPv6 type)
• 00: — 1 byte (ICMPv6 code)
• 00: — 1 byte (ICMPv6 checksum, truncated)
Total: 3 bytes → 00:03 is correct. The old value 00:08 did not match
the actual payload size.
Fixes: b07e9957f2 ("selftests: forwarding: Add VxLAN tests with a VLAN-unaware bridge for IPv6")
Signed-off-by: Aleksei Oladko <aleksey.oladko@virtuozzo.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260213131907.43351-3-aleksey.oladko@virtuozzo.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The test generates VXLAN traffic using mausezahn, where the encapsulated
inner IPv4 packet contains a zero IP header checksum. After VXLAN
decapsulation, such packets do not pass sanity checks in br_netfilter
and are dropped, which causes the test to fail.
Fix this by calculating and setting a valid IPv4 header checksum for the
encapsulated packet generated by mausezahn, so that the packet is accepted
by br_netfilter. Fixed by using the payload_template_calc_checksum() /
payload_template_expand_checksum() helpers that are only available
in v6.3 and newer kernels.
Fixes: a0b61f3d8e ("selftests: forwarding: vxlan_bridge_1d: Add an ECN decap test")
Signed-off-by: Aleksei Oladko <aleksey.oladko@virtuozzo.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260213131907.43351-2-aleksey.oladko@virtuozzo.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Recently we were able to trigger a warning in the mdb_n_entries counting
code. Add tests that exercise different ways which used to trigger that
warning.
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Link: https://patch.msgid.link/20260213070031.1400003-3-nikolay@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add check_rx_hds test that verifies header/data split works across
payload sizes. The test sweeps payload sizes from 1 byte to 8KB, if any
data propagates up to userspace as SCM_DEVMEM_LINEAR, then the test
fails. This shows that regardless of payload size, ncdevmem's
configuration of hds-thresh to 0 is respected.
Add -L (--fail-on-linear) flag to ncdevmem that causes the receiver to
fail if any SCM_DEVMEM_LINEAR cmsg is received.
Use socat option for fixed block sizing and tcp nodelay to disable
nagle's algo to avoid buffering.
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://patch.msgid.link/20260211-fbnic-tcp-hds-fixes-v1-4-55d050e6f606@meta.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* update tools/include/linux/mm.h to fix memblock tests compilation
* drop redundant struct page* parameter from memblock_free_pages() and get
struct page from the pfn
* add underflow detection for size calculation in memtest and warn about
underflow when VM_DEBUG is enabled
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEeOVYVaWZL5900a/pOQOGJssO/ZEFAmmQIhoQHHJwcHRAa2Vy
bmVsLm9yZwAKCRA5A4Ymyw79kWhYB/0aobkrfD4aW5Utfmzp08LdBwtfsOqEfKX6
AdBGPdG+WB90auW4qwDupspqj2lYDpJ4QvETNP0B84ek62VEN+8YEbvcC4W70l4H
nsrrnkTgwFGNXXxjr6tIQXu9hnC1o7eSuWhhYry4XG+JEKR3iah54JmbxcDrAEFj
lb4BzdocDtF6J3EkOv5alaDfdwUxgA3C6Idp2mpVb4m7DMraGZMq3lm7EPYm22zb
zo9v0nvXW9xtZfADQ6mRzp4uTjd/UAUH+YsU/u1S1f+JBN1bELXmFRf/X3CKBC6/
AIO9FcHsfA0i1MhbeBizT9eUEFaNIRxbMAtWbfdHrQhaLWNvyPOU
=Gz3z
-----END PGP SIGNATURE-----
Merge tag 'memblock-v7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock
Pull memblock updates from Mike Rapoport:
- update tools/include/linux/mm.h to fix memblock tests compilation
- drop redundant struct page* parameter from memblock_free_pages() and
get struct page from the pfn
- add underflow detection for size calculation in memtest and warn
about underflow when VM_DEBUG is enabled
* tag 'memblock-v7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
mm/memtest: add underflow detection for size calculation
memblock: drop redundant 'struct page *' argument from memblock_free_pages()
memblock test: include <linux/sizes.h> from tools mm.h stub
Now that the RISC-V trampoline JIT supports BPF_TRACE_FSESSION, run
the fsession selftest on riscv64 as well as x86_64.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Tested-by: Björn Töpel <bjorn@kernel.org>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/r/20260208053311.698352-4-dongml2@chinatelecom.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Commit c27cea4416 ("rcu: Re-implement RCU Tasks Trace in terms of SRCU-fast")
broke map_kptr selftest since it removed the function we were kprobing.
Use a new kfunc that invokes call_rcu_tasks_trace and sets a program
provided pointer to an integer to 1. Technically this can be unsafe if
the memory being written to from the callback disappears, but this is
just for usage in a test where we ensure we spin until we see the value
to be set to 1, so it's ok.
Reported-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Fixes: c27cea4416 ("rcu: Re-implement RCU Tasks Trace in terms of SRCU-fast")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260211185747.3630539-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
wait_for_port() can wait up to 2 seconds with the sleep and the polling
in wait_local_port_listen() combined. So, in netcons_basic.sh, the socat
process could die before the test writes to the netconsole.
Increase the timeout to 3 seconds to make netcons_basic.sh pass
consistently.
Fixes: 3dc6c76391 ("selftests: net: Add IPv6 support to netconsole basic tests")
Signed-off-by: Pin-yen Lin <treapking@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260210005939.3230550-1-treapking@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- Add more CPUCFG mask bits.
- Improve feature detection.
- Add lazy load support for FPU and binary translation (LBT) register state.
- Fix return value for memory reads from and writes to in-kernel devices.
- Add support for detecting preemption from within a guest.
- Add KVM steal time test case to tools/selftests.
ARM:
- Add support for FEAT_IDST, allowing ID registers that are not
implemented to be reported as a normal trap rather than as an UNDEF
exception.
- Add sanitisation of the VTCR_EL2 register, fixing a number of
UXN/PXN/XN bugs in the process.
- Full handling of RESx bits, instead of only RES0, and resulting in
SCTLR_EL2 being added to the list of sanitised registers.
- More pKVM fixes for features that are not supposed to be exposed to
guests.
- Make sure that MTE being disabled on the pKVM host doesn't give it
the ability to attack the hypervisor.
- Allow pKVM's host stage-2 mappings to use the Force Write Back
version of the memory attributes by using the "pass-through'
encoding.
- Fix trapping of ICC_DIR_EL1 on GICv5 hosts emulating GICv3 for the
guest.
- Preliminary work for guest GICv5 support.
- A bunch of debugfs fixes, removing pointless custom iterators stored
in guest data structures.
- A small set of FPSIMD cleanups.
- Selftest fixes addressing the incorrect alignment of page
allocation.
- Other assorted low-impact fixes and spelling fixes.
RISC-V:
- Fixes for issues discoverd by KVM API fuzzing in
kvm_riscv_aia_imsic_has_attr(), kvm_riscv_aia_imsic_rw_attr(),
and kvm_riscv_vcpu_aia_imsic_update()
- Allow Zalasr, Zilsd and Zclsd extensions for Guest/VM
- Transparent huge page support for hypervisor page tables
- Adjust the number of available guest irq files based on MMIO
register sizes found in the device tree or the ACPI tables
- Add RISC-V specific paging modes to KVM selftests
- Detect paging mode at runtime for selftests
s390:
- Performance improvement for vSIE (aka nested virtualization)
- Completely new memory management. s390 was a special snowflake that enlisted
help from the architecture's page table management to build hypervisor
page tables, in particular enabling sharing the last level of page
tables. This however was a lot of code (~3K lines) in order to support
KVM, and also blocked several features. The biggest advantages is
that the page size of userspace is completely independent of the
page size used by the guest: userspace can mix normal pages, THPs and
hugetlbfs as it sees fit, and in fact transparent hugepages were not
possible before. It's also now possible to have nested guests and
guests with huge pages running on the same host.
- Maintainership change for s390 vfio-pci
- Small quality of life improvement for protected guests
x86:
- Add support for giving the guest full ownership of PMU hardware (contexted
switched around the fastpath run loop) and allowing direct access to data
MSRs and PMCs (restricted by the vPMU model). KVM still intercepts
access to control registers, e.g. to enforce event filtering and to
prevent the guest from profiling sensitive host state. This is more
accurate, since it has no risk of contention and thus dropped events, and
also has significantly less overhead.
For more information, see the commit message for merge commit bf2c3138ae
("Merge tag 'kvm-x86-pmu-6.20' of https://github.com/kvm-x86/linux into HEAD").
- Disallow changing the virtual CPU model if L2 is active, for all the same
reasons KVM disallows change the model after the first KVM_RUN.
- Fix a bug where KVM would incorrectly reject host accesses to PV MSRs
when running with KVM_CAP_ENFORCE_PV_FEATURE_CPUID enabled, even if those
were advertised as supported to userspace,
- Fix a bug with protected guest state (SEV-ES/SNP and TDX) VMs, where KVM
would attempt to read CR3 configuring an async #PF entry.
- Fail the build if EXPORT_SYMBOL_GPL or EXPORT_SYMBOL is used in KVM (for x86
only) to enforce usage of EXPORT_SYMBOL_FOR_KVM_INTERNAL. Only a few exports
that are intended for external usage, and those are allowed explicitly.
- When checking nested events after a vCPU is unblocked, ignore -EBUSY instead
of WARNing. Userspace can sometimes put the vCPU into what should be an
impossible state, and spurious exit to userspace on -EBUSY does not really
do anything to solve the issue.
- Also throw in the towel and drop the WARN on INIT/SIPI being blocked when vCPU
is in Wait-For-SIPI, which also resulted in playing whack-a-mole with syzkaller
stuffing architecturally impossible states into KVM.
- Add support for new Intel instructions that don't require anything beyond
enumerating feature flags to userspace.
- Grab SRCU when reading PDPTRs in KVM_GET_SREGS2.
- Add WARNs to guard against modifying KVM's CPU caps outside of the intended
setup flow, as nested VMX in particular is sensitive to unexpected changes
in KVM's golden configuration.
- Add a quirk to allow userspace to opt-in to actually suppress EOI broadcasts
when the suppression feature is enabled by the guest (currently limited to
split IRQCHIP, i.e. userspace I/O APIC). Sadly, simply fixing KVM to honor
Suppress EOI Broadcasts isn't an option as some userspaces have come to rely
on KVM's buggy behavior (KVM advertises Supress EOI Broadcast irrespective
of whether or not userspace I/O APIC supports Directed EOIs).
- Clean up KVM's handling of marking mapped vCPU pages dirty.
- Drop a pile of *ancient* sanity checks hidden behind in KVM's unused
ASSERT() macro, most of which could be trivially triggered by the guest
and/or user, and all of which were useless.
- Fold "struct dest_map" into its sole user, "struct rtc_status", to make it
more obvious what the weird parameter is used for, and to allow fropping
these RTC shenanigans if CONFIG_KVM_IOAPIC=n.
- Bury all of ioapic.h, i8254.h and related ioctls (including
KVM_CREATE_IRQCHIP) behind CONFIG_KVM_IOAPIC=y.
- Add a regression test for recent APICv update fixes.
- Handle "hardware APIC ISR", a.k.a. SVI, updates in kvm_apic_update_apicv()
to consolidate the updates, and to co-locate SVI updates with the updates
for KVM's own cache of ISR information.
- Drop a dead function declaration.
- Minor cleanups.
x86 (Intel):
- Rework KVM's handling of VMCS updates while L2 is active to temporarily
switch to vmcs01 instead of deferring the update until the next nested
VM-Exit. The deferred updates approach directly contributed to several
bugs, was proving to be a maintenance burden due to the difficulty in
auditing the correctness of deferred updates, and was polluting
"struct nested_vmx" with a growing pile of booleans.
- Fix an SGX bug where KVM would incorrectly try to handle EPCM page faults,
and instead always reflect them into the guest. Since KVM doesn't shadow
EPCM entries, EPCM violations cannot be due to KVM interference and
can't be resolved by KVM.
- Fix a bug where KVM would register its posted interrupt wakeup handler even
if loading kvm-intel.ko ultimately failed.
- Disallow access to vmcb12 fields that aren't fully supported, mostly to
avoid weirdness and complexity for FRED and other features, where KVM wants
enable VMCS shadowing for fields that conditionally exist.
- Print out the "bad" offsets and values if kvm-intel.ko refuses to load (or
refuses to online a CPU) due to a VMCS config mismatch.
x86 (AMD):
- Drop a user-triggerable WARN on nested_svm_load_cr3() failure.
- Add support for virtualizing ERAPS. Note, correct virtualization of ERAPS
relies on an upcoming, publicly announced change in the APM to reduce the
set of conditions where hardware (i.e. KVM) *must* flush the RAP.
- Ignore nSVM intercepts for instructions that are not supported according to
L1's virtual CPU model.
- Add support for expedited writes to the fast MMIO bus, a la VMX's fastpath
for EPT Misconfig.
- Don't set GIF when clearing EFER.SVME, as GIF exists independently of SVM,
and allow userspace to restore nested state with GIF=0.
- Treat exit_code as an unsigned 64-bit value through all of KVM.
- Add support for fetching SNP certificates from userspace.
- Fix a bug where KVM would use vmcb02 instead of vmcb01 when emulating VMLOAD
or VMSAVE on behalf of L2.
- Misc fixes and cleanups.
x86 selftests:
- Add a regression test for TPR<=>CR8 synchronization and IRQ masking.
- Overhaul selftest's MMU infrastructure to genericize stage-2 MMU support,
and extend x86's infrastructure to support EPT and NPT (for L2 guests).
- Extend several nested VMX tests to also cover nested SVM.
- Add a selftest for nested VMLOAD/VMSAVE.
- Rework the nested dirty log test, originally added as a regression test for
PML where KVM logged L2 GPAs instead of L1 GPAs, to improve test coverage
and to hopefully make the test easier to understand and maintain.
guest_memfd:
- Remove kvm_gmem_populate()'s preparation tracking and half-baked hugepage
handling. SEV/SNP was the only user of the tracking and it can do it via
the RMP.
- Retroactively document and enforce (for SNP) that KVM_SEV_SNP_LAUNCH_UPDATE
and KVM_TDX_INIT_MEM_REGION require the source page to be 4KiB aligned, to
avoid non-trivial complexity for something that no known VMM seems to be
doing and to avoid an API special case for in-place conversion, which
simply can't support unaligned sources.
- When populating guest_memfd memory, GUP the source page in common code and
pass the refcounted page to the vendor callback, instead of letting vendor
code do the heavy lifting. Doing so avoids a looming deadlock bug with
in-place due an AB-BA conflict betwee mmap_lock and guest_memfd's filemap
invalidate lock.
Generic:
- Fix a bug where KVM would ignore the vCPU's selected address space when
creating a vCPU-specific mapping of guest memory. Actually this bug
could not be hit even on x86, the only architecture with multiple
address spaces, but it's a bug nevertheless.
-----BEGIN PGP SIGNATURE-----
iQFIBAABCgAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmmNqwwUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroPaZAf/cJx5B67lnST272esz0j29MIuT/Ti
jnf6PI9b7XubKYOtNvlu5ZW4Jsa5dqRG0qeO/JmcXDlwBf5/UkWOyvqIXyiuTl0l
KcSUlKPtTgKZSoZpJpTppuuDE8FSYqEdcCmjNvoYzcJoPjmaeJbK6aqO0AkBbb6e
L5InrLV7nV9iua6rFvA0s/G8/Eq2DG8M9hTRHe6NcI/z4hvslOudvpUXtC8Jygoo
cV8vFavUwc+atrmvhAOLvSitnrjfNa4zcG6XMOlwXPfIdvi3zqTlQTgUpwGKiAGQ
RIDUVZ/9bcWgJqbPRsdEWwaYRkNQWc5nmrAHRpEEaYV/NeBBNf4v6qfKSw==
=SkJ1
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"Loongarch:
- Add more CPUCFG mask bits
- Improve feature detection
- Add lazy load support for FPU and binary translation (LBT) register
state
- Fix return value for memory reads from and writes to in-kernel
devices
- Add support for detecting preemption from within a guest
- Add KVM steal time test case to tools/selftests
ARM:
- Add support for FEAT_IDST, allowing ID registers that are not
implemented to be reported as a normal trap rather than as an UNDEF
exception
- Add sanitisation of the VTCR_EL2 register, fixing a number of
UXN/PXN/XN bugs in the process
- Full handling of RESx bits, instead of only RES0, and resulting in
SCTLR_EL2 being added to the list of sanitised registers
- More pKVM fixes for features that are not supposed to be exposed to
guests
- Make sure that MTE being disabled on the pKVM host doesn't give it
the ability to attack the hypervisor
- Allow pKVM's host stage-2 mappings to use the Force Write Back
version of the memory attributes by using the "pass-through'
encoding
- Fix trapping of ICC_DIR_EL1 on GICv5 hosts emulating GICv3 for the
guest
- Preliminary work for guest GICv5 support
- A bunch of debugfs fixes, removing pointless custom iterators
stored in guest data structures
- A small set of FPSIMD cleanups
- Selftest fixes addressing the incorrect alignment of page
allocation
- Other assorted low-impact fixes and spelling fixes
RISC-V:
- Fixes for issues discoverd by KVM API fuzzing in
kvm_riscv_aia_imsic_has_attr(), kvm_riscv_aia_imsic_rw_attr(), and
kvm_riscv_vcpu_aia_imsic_update()
- Allow Zalasr, Zilsd and Zclsd extensions for Guest/VM
- Transparent huge page support for hypervisor page tables
- Adjust the number of available guest irq files based on MMIO
register sizes found in the device tree or the ACPI tables
- Add RISC-V specific paging modes to KVM selftests
- Detect paging mode at runtime for selftests
s390:
- Performance improvement for vSIE (aka nested virtualization)
- Completely new memory management. s390 was a special snowflake that
enlisted help from the architecture's page table management to
build hypervisor page tables, in particular enabling sharing the
last level of page tables. This however was a lot of code (~3K
lines) in order to support KVM, and also blocked several features.
The biggest advantages is that the page size of userspace is
completely independent of the page size used by the guest:
userspace can mix normal pages, THPs and hugetlbfs as it sees fit,
and in fact transparent hugepages were not possible before. It's
also now possible to have nested guests and guests with huge pages
running on the same host
- Maintainership change for s390 vfio-pci
- Small quality of life improvement for protected guests
x86:
- Add support for giving the guest full ownership of PMU hardware
(contexted switched around the fastpath run loop) and allowing
direct access to data MSRs and PMCs (restricted by the vPMU model).
KVM still intercepts access to control registers, e.g. to enforce
event filtering and to prevent the guest from profiling sensitive
host state. This is more accurate, since it has no risk of
contention and thus dropped events, and also has significantly less
overhead.
For more information, see the commit message for merge commit
bf2c3138ae ("Merge tag 'kvm-x86-pmu-6.20' ...")
- Disallow changing the virtual CPU model if L2 is active, for all
the same reasons KVM disallows change the model after the first
KVM_RUN
- Fix a bug where KVM would incorrectly reject host accesses to PV
MSRs when running with KVM_CAP_ENFORCE_PV_FEATURE_CPUID enabled,
even if those were advertised as supported to userspace,
- Fix a bug with protected guest state (SEV-ES/SNP and TDX) VMs,
where KVM would attempt to read CR3 configuring an async #PF entry
- Fail the build if EXPORT_SYMBOL_GPL or EXPORT_SYMBOL is used in KVM
(for x86 only) to enforce usage of EXPORT_SYMBOL_FOR_KVM_INTERNAL.
Only a few exports that are intended for external usage, and those
are allowed explicitly
- When checking nested events after a vCPU is unblocked, ignore
-EBUSY instead of WARNing. Userspace can sometimes put the vCPU
into what should be an impossible state, and spurious exit to
userspace on -EBUSY does not really do anything to solve the issue
- Also throw in the towel and drop the WARN on INIT/SIPI being
blocked when vCPU is in Wait-For-SIPI, which also resulted in
playing whack-a-mole with syzkaller stuffing architecturally
impossible states into KVM
- Add support for new Intel instructions that don't require anything
beyond enumerating feature flags to userspace
- Grab SRCU when reading PDPTRs in KVM_GET_SREGS2
- Add WARNs to guard against modifying KVM's CPU caps outside of the
intended setup flow, as nested VMX in particular is sensitive to
unexpected changes in KVM's golden configuration
- Add a quirk to allow userspace to opt-in to actually suppress EOI
broadcasts when the suppression feature is enabled by the guest
(currently limited to split IRQCHIP, i.e. userspace I/O APIC).
Sadly, simply fixing KVM to honor Suppress EOI Broadcasts isn't an
option as some userspaces have come to rely on KVM's buggy behavior
(KVM advertises Supress EOI Broadcast irrespective of whether or
not userspace I/O APIC supports Directed EOIs)
- Clean up KVM's handling of marking mapped vCPU pages dirty
- Drop a pile of *ancient* sanity checks hidden behind in KVM's
unused ASSERT() macro, most of which could be trivially triggered
by the guest and/or user, and all of which were useless
- Fold "struct dest_map" into its sole user, "struct rtc_status", to
make it more obvious what the weird parameter is used for, and to
allow fropping these RTC shenanigans if CONFIG_KVM_IOAPIC=n
- Bury all of ioapic.h, i8254.h and related ioctls (including
KVM_CREATE_IRQCHIP) behind CONFIG_KVM_IOAPIC=y
- Add a regression test for recent APICv update fixes
- Handle "hardware APIC ISR", a.k.a. SVI, updates in
kvm_apic_update_apicv() to consolidate the updates, and to
co-locate SVI updates with the updates for KVM's own cache of ISR
information
- Drop a dead function declaration
- Minor cleanups
x86 (Intel):
- Rework KVM's handling of VMCS updates while L2 is active to
temporarily switch to vmcs01 instead of deferring the update until
the next nested VM-Exit.
The deferred updates approach directly contributed to several bugs,
was proving to be a maintenance burden due to the difficulty in
auditing the correctness of deferred updates, and was polluting
"struct nested_vmx" with a growing pile of booleans
- Fix an SGX bug where KVM would incorrectly try to handle EPCM page
faults, and instead always reflect them into the guest. Since KVM
doesn't shadow EPCM entries, EPCM violations cannot be due to KVM
interference and can't be resolved by KVM
- Fix a bug where KVM would register its posted interrupt wakeup
handler even if loading kvm-intel.ko ultimately failed
- Disallow access to vmcb12 fields that aren't fully supported,
mostly to avoid weirdness and complexity for FRED and other
features, where KVM wants enable VMCS shadowing for fields that
conditionally exist
- Print out the "bad" offsets and values if kvm-intel.ko refuses to
load (or refuses to online a CPU) due to a VMCS config mismatch
x86 (AMD):
- Drop a user-triggerable WARN on nested_svm_load_cr3() failure
- Add support for virtualizing ERAPS. Note, correct virtualization of
ERAPS relies on an upcoming, publicly announced change in the APM
to reduce the set of conditions where hardware (i.e. KVM) *must*
flush the RAP
- Ignore nSVM intercepts for instructions that are not supported
according to L1's virtual CPU model
- Add support for expedited writes to the fast MMIO bus, a la VMX's
fastpath for EPT Misconfig
- Don't set GIF when clearing EFER.SVME, as GIF exists independently
of SVM, and allow userspace to restore nested state with GIF=0
- Treat exit_code as an unsigned 64-bit value through all of KVM
- Add support for fetching SNP certificates from userspace
- Fix a bug where KVM would use vmcb02 instead of vmcb01 when
emulating VMLOAD or VMSAVE on behalf of L2
- Misc fixes and cleanups
x86 selftests:
- Add a regression test for TPR<=>CR8 synchronization and IRQ masking
- Overhaul selftest's MMU infrastructure to genericize stage-2 MMU
support, and extend x86's infrastructure to support EPT and NPT
(for L2 guests)
- Extend several nested VMX tests to also cover nested SVM
- Add a selftest for nested VMLOAD/VMSAVE
- Rework the nested dirty log test, originally added as a regression
test for PML where KVM logged L2 GPAs instead of L1 GPAs, to
improve test coverage and to hopefully make the test easier to
understand and maintain
guest_memfd:
- Remove kvm_gmem_populate()'s preparation tracking and half-baked
hugepage handling. SEV/SNP was the only user of the tracking and it
can do it via the RMP
- Retroactively document and enforce (for SNP) that
KVM_SEV_SNP_LAUNCH_UPDATE and KVM_TDX_INIT_MEM_REGION require the
source page to be 4KiB aligned, to avoid non-trivial complexity for
something that no known VMM seems to be doing and to avoid an API
special case for in-place conversion, which simply can't support
unaligned sources
- When populating guest_memfd memory, GUP the source page in common
code and pass the refcounted page to the vendor callback, instead
of letting vendor code do the heavy lifting. Doing so avoids a
looming deadlock bug with in-place due an AB-BA conflict betwee
mmap_lock and guest_memfd's filemap invalidate lock
Generic:
- Fix a bug where KVM would ignore the vCPU's selected address space
when creating a vCPU-specific mapping of guest memory. Actually
this bug could not be hit even on x86, the only architecture with
multiple address spaces, but it's a bug nevertheless"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (267 commits)
KVM: s390: Increase permitted SE header size to 1 MiB
MAINTAINERS: Replace backup for s390 vfio-pci
KVM: s390: vsie: Fix race in acquire_gmap_shadow()
KVM: s390: vsie: Fix race in walk_guest_tables()
KVM: s390: Use guest address to mark guest page dirty
irqchip/riscv-imsic: Adjust the number of available guest irq files
RISC-V: KVM: Transparent huge page support
RISC-V: KVM: selftests: Add Zalasr extensions to get-reg-list test
RISC-V: KVM: Allow Zalasr extensions for Guest/VM
KVM: riscv: selftests: Add riscv vm satp modes
KVM: riscv: selftests: add Zilsd and Zclsd extension to get-reg-list test
riscv: KVM: allow Zilsd and Zclsd extensions for Guest/VM
RISC-V: KVM: Skip IMSIC update if vCPU IMSIC state is not initialized
RISC-V: KVM: Fix null pointer dereference in kvm_riscv_aia_imsic_rw_attr()
RISC-V: KVM: Fix null pointer dereference in kvm_riscv_aia_imsic_has_attr()
RISC-V: KVM: Remove unnecessary 'ret' assignment
KVM: s390: Add explicit padding to struct kvm_s390_keyop
KVM: LoongArch: selftests: Add steal time test case
LoongArch: KVM: Add paravirt vcpu_is_preempted() support in guest side
LoongArch: KVM: Add paravirt preempt feature in hypervisor side
...
The rt_stall test measures the runtime ratio between an EXT and an RT
task pinned to the same CPU, verifying that the deadline server prevents
RT tasks from starving SCHED_EXT tasks. It expects the EXT task to get
at least 4% of CPU time.
The test is flaky because sched_stress_test() calls sleep(RUN_TIME)
immediately after fork(), without waiting for the RT child to complete
its setup (set_affinity + set_sched). If the RT child experiences
scheduling latency before completing setup, that delay eats into the
measurement window: the RT child runs for less than RUN_TIME seconds,
and the EXT task's measured ratio drops below the 4% threshold.
For example, in the failing CI run [1]:
EXT=0.140s RT=4.750s total=4.890s (expected ~5.0s)
ratio=2.86% < 4% → FAIL
The 110ms gap (5.0 - 4.89) corresponds to the RT child's setup time
being counted inside the measurement window, during which fewer
deadline server ticks fire for the EXT task.
Fix by using pipes to synchronize: each child signals the parent after
completing its setup, and the parent waits for both signals before
starting sleep(RUN_TIME). This ensures the measurement window only
counts time when both tasks are fully configured and competing.
[1] https://github.com/kernel-patches/bpf/actions/runs/21961895809/job/63442490449
Fixes: be621a7634 ("selftests/sched_ext: Add test for sched_ext dl_server")
Assisted-by: claude-opus-4-6-v1
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
- Add support for control flow integrity for userspace processes.
This is based on the standard RISC-V ISA extensions Zicfiss and
Zicfilp
- Improve ptrace behavior regarding vector registers, and add some selftests
- Optimize our strlen() assembly
- Enable the ISO-8859-1 code page as built-in, similar to ARM64, for EFI
volume mounting
- Clean up some code slightly, including defining copy_user_page() as
copy_page() rather than memcpy(), aligning us with other
architectures; and using max3() to slightly simplify an expression
in riscv_iommu_init_check()
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEElRDoIDdEz9/svf2Kx4+xDQu9KksFAmmOYpYACgkQx4+xDQu9
KkvzOQ/9Fq8ZxWgYofhTPtw9/vps3avheOHlEoRrBWYfn1VkTRPAcbUULL4PGXwg
dnVFEl3AcrpOFikIthbukklLeLoOnUshZJBU25zY5h0My1jb63V1//gEwJR6I0dg
+V+GJmfzc4+YVaHK6UFdn7j3GgKUbTC7xXRMuGEriAzKPnm3AXAjh94wMNx6depv
Li3IXRoZT/HvqIAyfeAoM9STwOzJtE3Sc6fXABkzsIbNTjjdgIqoRSsQsKY10178
z6ox/sVStnLmVaMbOd/ZVN0J70JRDsvK0TC0/13K1ESUbnVia9a3bPIxLRmSapKC
wXnwAuSeevtFshGGyd5LZO0QQGxzG1H63Gky2GRoh8bTQbd2tQcfQzANdnPkBAQS
j2aOiSsiUQeNZqfZAfEBwRd27GXRYlKb/MpgCZKUH+ZO9VG6QaD3VGvg17/Caghy
nVdbBQ81ZV9tkz9EMN0vt2VJHmEqARh88w619laHjg+ioPTG4/UIDPzskt1I+Fgm
Y6NQLeFyfaO3RKKDYWGPcY7fmWQI9V8MECHOvyVI4xJcgqAbqnfsgytjuiFbrfRo
fTvpuB7kvltBZ180QSB79xj0sWGFTWR02MeWy3uOaLZz2eIm2ZTZbMUSgNYR0ldG
L3y7CEkTkoVF1ijYgAfuMgptk3Yf0dpa66D9HUo947wWkNrW5ds=
=4fTk
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-7.0-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V updates from Paul Walmsley:
- Add support for control flow integrity for userspace processes.
This is based on the standard RISC-V ISA extensions Zicfiss and
Zicfilp
- Improve ptrace behavior regarding vector registers, and add some
selftests
- Optimize our strlen() assembly
- Enable the ISO-8859-1 code page as built-in, similar to ARM64, for
EFI volume mounting
- Clean up some code slightly, including defining copy_user_page() as
copy_page() rather than memcpy(), aligning us with other
architectures; and using max3() to slightly simplify an expression
in riscv_iommu_init_check()
* tag 'riscv-for-linus-7.0-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (42 commits)
riscv: lib: optimize strlen loop efficiency
selftests: riscv: vstate_exec_nolibc: Use the regular prctl() function
selftests: riscv: verify ptrace accepts valid vector csr values
selftests: riscv: verify ptrace rejects invalid vector csr inputs
selftests: riscv: verify syscalls discard vector context
selftests: riscv: verify initial vector state with ptrace
selftests: riscv: test ptrace vector interface
riscv: ptrace: validate input vector csr registers
riscv: csr: define vtype register elements
riscv: vector: init vector context with proper vlenb
riscv: ptrace: return ENODATA for inactive vector extension
kselftest/riscv: add kselftest for user mode CFI
riscv: add documentation for shadow stack
riscv: add documentation for landing pad / indirect branch tracking
riscv: create a Kconfig fragment for shadow stack and landing pad support
arch/riscv: add dual vdso creation logic and select vdso based on hw
arch/riscv: compile vdso with landing pad and shadow stack note
riscv: enable kernel access to shadow stack memory via the FWFT SBI call
riscv: add kernel command line option to opt out of user CFI
riscv/hwprobe: add zicfilp / zicfiss enumeration in hwprobe
...
The _get_unused_cpus() function can return CPU numbers >= 16, which
exceeds RPS_MAX_CPUS in toeplitz.c. When this happens, the test fails
with a cryptic message:
# Exception| Traceback (most recent call last):
# Exception| File "/tmp/cur/linux/tools/testing/selftests/net/lib/py/ksft.py", line 319, in ksft_run
# Exception| func(*args)
# Exception| File "/tmp/cur/linux/tools/testing/selftests/drivers/net/hw/toeplitz.py", line 189, in test
# Exception| with bkg(" ".join(rx_cmd), ksft_ready=True, exit_wait=True) as rx_proc:
# Exception| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# Exception| File "/tmp/cur/linux/tools/testing/selftests/net/lib/py/utils.py", line 124, in __init__
# Exception| super().__init__(comm, background=True,
# Exception| File "/tmp/cur/linux/tools/testing/selftests/net/lib/py/utils.py", line 77, in __init__
# Exception| raise Exception("Did not receive ready message")
# Exception| Exception: Did not receive ready message
Rename _get_unused_cpus() to _get_unused_rps_cpus() and cap the CPU
search range to RPS_MAX_CPUS.
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260210093110.1935149-1-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The testcase failed as below:
$./vlan_bridge_binding.sh
...
+ adf_ip_link_set_up d1
+ local name=d1
+ shift
+ ip_link_is_up d1
+ ip_link_has_flag d1 UP
+ local name=d1
+ shift
+ local flag=UP
+ shift
++ ip -j link show d1
++ jq --arg flag UP 'any(.[].flags.[]; . == $flag)'
jq: error: syntax error, unexpected '[', expecting FORMAT or QQSTRING_START
(Unix shell quoting issues?) at <top-level>, line 1:
any(.[].flags.[]; . == $flag)
jq: 1 compile error
Remove the extra dot (.) after flags array to fix this.
Fixes: 4baa1d3a50 ("selftests: net: lib: Add ip_link_has_flag()")
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20260211022146.190948-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As explained in [1], iproute2 started rejecting tc-police burst sizes
that result in an overflow. This can happen when the burst size is high
enough and the rate is low enough.
A couple of test cases specify such configurations, resulting in
iproute2 errors and test failure.
Fix by reducing the burst size so that the test will pass with both new
and old iproute2 versions.
[1] https://lore.kernel.org/netdev/20250916215731.3431465-1-jay.vosburgh@canonical.com/
Fixes: cb12d17632 ("selftests: mlxsw: tc_restrictions: Test tc-police restrictions")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/88b00c6e85188aa6a065dc240206119b328c46e1.1770643998.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
People keep removing generated files from .gitignore files even when the
files stay around. Please don't do that: just because the file is no
longer being generated doesn't make it magically go away, and doesn't
make it suddenly be something that should now not be ignored any more.
Fixes: dd2c6ec24f ("selftests/mm: remove virtual_address_range test")
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- A set of commits that introduces cxl_memdev_attach and pave way for
soft reserved handling, type2 accelerator enabling, and LSA 2.0
enabling. All these series require the endpoint driver to settle
before continuing the memdev driver probe.
dax/hmem, e820, resource: Defer Soft Reserved insertion until hmem is ready
cxl/mem: Introduce cxl_memdev_attach for CXL-dependent operation
cxl/mem: Drop @host argument to devm_cxl_add_memdev()
cxl/mem: Convert devm_cxl_add_memdev() to scope-based-cleanup
cxl/port: Arrange for always synchronous endpoint attach
cxl/mem: Arrange for always-synchronous memdev attach
cxl/mem: Fix devm_cxl_memdev_edac_release() confusion
- A set to address CXL port error protocol handling and reporting. The
large patch series was split into 3 parts. Part 1 and 2 are included
here with part 3 coming later. Part 1 consists of a series of code
refactoring to PCI AER sub-system that addresses CXL and also CXL
RAS code to prepare for port error handling. Part 2 refactors the
CXL code to move management of component registers to cxl_port
objects to allow all CXL AER errors to be handled through the
cxl_port hierarchy.
Part 2:
cxl/port: Move endpoint component register management to cxl_port
cxl/port: Map Port RAS registers
cxl/port: Move dport RAS setup to dport add time
cxl/port: Move dport probe operations to a driver event
cxl/port: Move decoder setup before dport creation
cxl/port: Cleanup dport removal with a devres group
cxl/port: Reduce number of @dport variables in cxl_port_add_dport()
cxl/port: Cleanup handling of the nr_dports 0 -> 1 transition
Part 1:
cxl: Update RAS handler interfaces to also support CXL Ports
cxl/mem: Clarify @host for devm_cxl_add_nvdimm()
PCI/AER: Update struct aer_err_info with kernel-doc formatting
PCI/AER: Report CXL or PCIe bus type in AER trace logging
PCI/AER: Use guard() in cxl_rch_handle_error_iter()
PCI/AER: Move CXL RCH error handling to aer_cxl_rch.c
PCI/AER: Update is_internal_error() to be non-static is_aer_internal_error()
PCI/AER: Export pci_aer_unmask_internal_errors()
cxl/pci: Move CXL driver's RCH error handling into core/ras_rch.c
PCI/AER: Replace PCIEAER_CXL symbol with CXL_RAS
cxl/pci: Remove CXL VH handling in CONFIG_PCIEAER_CXL conditional blocks from core/pci.c
PCI: Replace cxl_error_is_native() with pcie_aer_is_native()
cxl/pci: Remove unnecessary CXL RCH handling helper functions
cxl/pci: Remove unnecessary CXL Endpoint handling helper functions
PCI: Introduce pcie_is_cxl()
PCI: Update CXL DVSEC definitions
PCI: Move CXL DVSEC definitions into uapi/linux/pci_regs.h
- A set of patches to provide AMD Zen5 platform address translation for
CXL using ACPI PRMT. Set includes a conventions document to explain
why this is needed and how it's implemented.
cxl: Disable HPA/SPA translation handlers for Normalized Addressing
cxl/region: Factor out code into cxl_region_setup_poison()
cxl/atl: Lock decoders that need address translation
cxl: Enable AMD Zen5 address translation using ACPI PRMT
cxl/acpi: Prepare use of EFI runtime services
cxl: Introduce callback for HPA address ranges translation
cxl/region: Use region data to get the root decoder
cxl/region: Add @hpa_range argument to function cxl_calc_interleave_pos()
cxl/region: Separate region parameter setup and region construction
cxl: Simplify cxl_root_ops allocation and handling
cxl/region: Store HPA range in struct cxl_region
cxl/region: Store root decoder in struct cxl_region
cxl/region: Rename misleading variable name @hpa to @hpa_range
Documentation/driver-api/cxl: ACPI PRM Address Translation Support and AMD Zen5 enablement
cxl, doc: Moving conventions in separate files
cxl, doc: Remove isonum.txt inclusion
- A set of misc CXL patches of fixes, cleanups, and updates. Including
CXL address translation for unaligned MOD3 regions.
cxl: Fix premature commit_end increment on decoder commit failure
cxl/region: Use do_div() for 64-bit modulo operation
cxl/region: Translate HPA to DPA and memdev in unaligned regions
cxl/region: Translate DPA->HPA in unaligned MOD3 regions
cxl/core: Fix cxl_dport debugfs EINJ entries
cxl/acpi: Remove cxl_acpi_set_cache_size()
cxl/hdm: Fix newline character in dev_err() messages
cxl/pci: Remove outdated FIXME comment and BUILD_BUG_ON
Documentation/driver-api/cxl: device hotplug section
Documentation/driver-api/cxl: BIOS/EFI expectation update
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE5DAy15EJMCV1R6v9YGjFFmlTOEoFAmmOFXcACgkQYGjFFmlT
OEojaxAApQJFLyX1MkPbhtm6j6GRzzEAEWTBX2XsmliZf1JhfahsNMWI69kO33rm
LddF+nyZNEl/foyHgUaxVzlQwqWuihyp7Qk2djXnMzLsuCAsWhPbB9j0RgJUN8h5
N4U76AmOdmhLlXH4CCqoW2jNy0OjxNdgp1FtTHv7VO7RxgRE9MFJRkLulKxB03wy
t6lRZXPofEFcHen40DlYRtW26vy1BYUO0dng2f16DxWrb1ztdACH/zVqCJJtdoFc
FAT5EaQCeRYZ9Yz4dONw3DcUjYlG6NcRN9FWNiptBn1Pb7pUX55Le8lfD3qZg0an
m3lWRs1T/lGz7pWmz4GPUKDwGFCEqLqd4oSz5v+dFR3JJxjJpRzKa19y5TfqK/LF
diqNZsDD9gCXE1HXzNr1YcbllpU2cPRPf58gWG9bLmG5xUUmScib8LoTMfgcCJW5
SlC6kf7BFLkJfDTcFaILc/UANeZaLGhrV0vyJntfGyT5EqKOcfjQEvrZvofA8mef
bdxt0IRDW4D+7kkcuR33OipTVUFG3ban8yYq4zXD64dmeHF76gwdJm3nyXsqdtpc
IYIIhz0W6pbTKjJ2fy1rZcTac1ZaALstyaF4bYWIjyF3NylPM8tDi48DFr+DGgeX
xkFs2B9p5vY5Cq73gCmSWsi3PBPTjWzeRp7YZrV6VoBd9uqewUs=
=blFQ
-----END PGP SIGNATURE-----
Merge tag 'cxl-for-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull CXL updates from Dave Jiang:
- Introduce cxl_memdev_attach and pave way for soft reserved handling,
type2 accelerator enabling, and LSA 2.0 enabling. All these series
require the endpoint driver to settle before continuing the memdev
driver probe.
- Address CXL port error protocol handling and reporting.
The large patch series was split into three parts. The first two
parts are included here with the final part coming later.
The first part consists of a series of code refactoring to PCI AER
sub-system that addresses CXL and also CXL RAS code to prepare for
port error handling.
The second part refactors the CXL code to move management of
component registers to cxl_port objects to allow all CXL AER errors
to be handled through the cxl_port hierarchy.
- Provide AMD Zen5 platform address translation for CXL using ACPI
PRMT. This includes a conventions document to explain why this is
needed and how it's implemented.
- Misc CXL patches of fixes, cleanups, and updates. Including CXL
address translation for unaligned MOD3 regions.
[ TLA service: CXL is "Compute Express Link" ]
* tag 'cxl-for-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (59 commits)
cxl: Disable HPA/SPA translation handlers for Normalized Addressing
cxl/region: Factor out code into cxl_region_setup_poison()
cxl/atl: Lock decoders that need address translation
cxl: Enable AMD Zen5 address translation using ACPI PRMT
cxl/acpi: Prepare use of EFI runtime services
cxl: Introduce callback for HPA address ranges translation
cxl/region: Use region data to get the root decoder
cxl/region: Add @hpa_range argument to function cxl_calc_interleave_pos()
cxl/region: Separate region parameter setup and region construction
cxl: Simplify cxl_root_ops allocation and handling
cxl/region: Store HPA range in struct cxl_region
cxl/region: Store root decoder in struct cxl_region
cxl/region: Rename misleading variable name @hpa to @hpa_range
Documentation/driver-api/cxl: ACPI PRM Address Translation Support and AMD Zen5 enablement
cxl, doc: Moving conventions in separate files
cxl, doc: Remove isonum.txt inclusion
cxl/port: Unify endpoint and switch port lookup
cxl/port: Move endpoint component register management to cxl_port
cxl/port: Map Port RAS registers
cxl/port: Move dport RAS setup to dport add time
...
- Update outdated mdev comment referencing the renamed
mdev_type_add() function. (Julia Lawall)
- Introduce selftest support for IOMMU mapping of PCI MMIO BARs.
(Alex Mastro)
- Relax selftest assertion relative to differences in huge page
handling between legacy (v1) TYPE1 IOMMU mapping behavior and
the compatibility mode supported by IOMMUFD. (David Matlack)
- Reintroduce memory poison handling support for non-struct-page-
backed memory in the nvgrace-gpu variant driver. (Ankit Agrawal)
- Replace dma_buf_phys_vec with phys_vec to avoid duplicate
structure and semantics. (Leon Romanovsky)
- Add missing upstream bridge locking across PCI function reset,
resolving an assertion failure when secondary bus reset is used
to provide that reset. (Anthony Pighin)
- Fixes to hisi_acc vfio-pci variant driver to resolve corner case
issues related to resets, repeated migration, and error injection
scenarios. (Longfang Liu, Weili Qian)
- Restrict vfio selftest builds to arm64 and x86_64, resolving
compiler warnings on 32-bit archs. (Ted Logan)
- Un-deprecate the fsl-mc vfio bus driver as a new maintainer has
stepped up. (Ioana Ciornei)
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEQvbATlQL0amee4qQI5ubbjuwiyIFAmmNCcMRHGFsZXhAc2hh
emJvdC5vcmcACgkQI5ubbjuwiyLlvw/9FLOcpjKCcxyWFPGUMHV9L0N8dWMR5t75
Pu6cBuYdpqGgrUaa1NWHYEzFbMSkEJMb5jLj26lokn2l4VZ9BKwdehaE/7t978z2
J0FgnGUg3B4lYm5qoBStaJ26123XafTMnsBn+wKdXt/lN6ng6GXVBxnmGP+Fuuwd
HA3MSFB6HUFw4et8qDG3ziyboN/pSWyXaupy60zvVy9x39i4/ZzMm3PSrYPdUX4x
aPM+lWKRi5yFMwiksZyYb67XA717Js8xhmgNMeJ8Yz3ZUF0n3Z7ZpOzbU+hl8LNn
sAea6+lXXsvNjEXfet1mjg7A+RYmuQdcjk58J//ijRXn7zRijRM671Bzc40T2JcP
bfrajHhprMsE+u7VwiBuERACTtbemuaKSbi5iNLHAIqTFwPpb400PvbptkyQhkxh
IRXIxqgKb5G6/sd73m9dKR9HU7d5SL3mNCARrymgqT6kRxz8fqtaVsXbbsa1Tgah
iV8in7wjKJ/80rYQd7gNyj/RRpYTAJJemfnJtKGQ9LxGnej8AV6kUZ3np7hpspz7
TVtmn9RxlwbA5lWYXJ4VUzt9u2Riwd2W6jg6ZnUknSZN6B5j2Jd2bDtF/FKLauKG
DW/bN8UU7nzgC40ro92qJEFF2PC7GkfZUVRlgW0oq54QZjyCoAIpfYOXjLTSteYP
umnjcrWkgag=
=F+FV
-----END PGP SIGNATURE-----
Merge tag 'vfio-v7.0-rc1' of https://github.com/awilliam/linux-vfio
Pull VFIO updates from Alex Williamson:
"A small cycle with the bulk in selftests and reintroducing poison
handling in the nvgrace-gpu driver. The rest are fixes, cleanups, and
some dmabuf structure consolidation.
- Update outdated mdev comment referencing the renamed
mdev_type_add() function (Julia Lawall)
- Introduce selftest support for IOMMU mapping of PCI MMIO BARs (Alex
Mastro)
- Relax selftest assertion relative to differences in huge page
handling between legacy (v1) TYPE1 IOMMU mapping behavior and the
compatibility mode supported by IOMMUFD (David Matlack)
- Reintroduce memory poison handling support for non-struct-page-
backed memory in the nvgrace-gpu variant driver (Ankit Agrawal)
- Replace dma_buf_phys_vec with phys_vec to avoid duplicate structure
and semantics (Leon Romanovsky)
- Add missing upstream bridge locking across PCI function reset,
resolving an assertion failure when secondary bus reset is used to
provide that reset (Anthony Pighin)
- Fixes to hisi_acc vfio-pci variant driver to resolve corner case
issues related to resets, repeated migration, and error injection
scenarios (Longfang Liu, Weili Qian)
- Restrict vfio selftest builds to arm64 and x86_64, resolving
compiler warnings on 32-bit archs (Ted Logan)
- Un-deprecate the fsl-mc vfio bus driver as a new maintainer has
stepped up (Ioana Ciornei)"
* tag 'vfio-v7.0-rc1' of https://github.com/awilliam/linux-vfio:
vfio/fsl-mc: add myself as maintainer
vfio: selftests: only build tests on arm64 and x86_64
hisi_acc_vfio_pci: fix the queue parameter anomaly issue
hisi_acc_vfio_pci: resolve duplicate migration states
hisi_acc_vfio_pci: update status after RAS error
hisi_acc_vfio_pci: fix VF reset timeout issue
vfio/pci: Lock upstream bridge for vfio_pci_core_disable()
types: reuse common phys_vec type instead of DMABUF open‑coded variant
vfio/nvgrace-gpu: register device memory for poison handling
mm: add stubs for PFNMAP memory failure registration functions
vfio: selftests: Drop IOMMU mapping size assertions for VFIO_TYPE1_IOMMU
vfio: selftests: Add vfio_dma_mapping_mmio_test
vfio: selftests: Align BAR mmaps for efficient IOMMU mapping
vfio: selftests: Centralize IOMMU mode name definitions
vfio/mdev: update outdated comment
selftests/memfd: use IPC semaphore instead of SIGSTOP/SIGCONT
In order to synchronize new processes to test inheritance of memfd_noexec
sysctl, memfd_test sets up the sysctl with a value before creating the new
process. The new process then sends itself a SIGSTOP in order to wait for
the parent to flip the sysctl value and send a SIGCONT signal.
This would work as intended if it wasn't the fact that the new process is
being created with CLONE_NEWPID, which creates a new PID namespace and the
new process has PID 1 in this namespace. There're restrictions on sending
signals to PID 1 and, although it's relaxed for other than root PID
namespace, it's biting us here. In this specific case the SIGSTOP sent by
the new process is ignored (no error to kill() is returned) and it never
stops its execution. This is usually not noticiable as the parent usually
manages to set the new sysctl value before the child has a chance to run
and the test succeeds. But if you run the test in a loop, it eventually
reproduces:
while [ 1 ]; do ./memfd_test >log 2>&1 || break; done; cat log
So this patch replaces the SIGSTOP/SIGCONT synchronization with IPC
semaphore.
Link: https://lkml.kernel.org/r/a7776389-b3d6-4b18-b438-0b0e3ed1fd3b@work
Fixes: 6469b66e3f ("selftests: improve vm.memfd_noexec sysctl tests")
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: liuye <liuye@kylinos.cn>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This patch adds a new testcase to validate memory failure handling for
dirty pagecache. This performs similar operations as clean pagecaches
except fsync() is not used to keep pages dirty.
This test helps ensure that memory failure handling for dirty pagecache
works correctly, including proper SIGBUS delivery, page isolation, and
recovery paths.
Link: https://lkml.kernel.org/r/20260206031639.2707102-4-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: kernel test robot <lkp@intel.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This patch adds a new testcase to validate memory failure handling for
clean pagecache. This test performs similar operations as anonymous pages
except allocating memory using mmap() with a file fd.
This test helps ensure that memory failure handling for clean pagecache
works correctly, including unchanged page content, page isolation, and
recovery paths.
Link: https://lkml.kernel.org/r/20260206031639.2707102-3-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202601221142.mDWA1ucw-lkp@intel.com/
Cc: David Hildenbrand <david@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "selftests/mm: add memory failure selftests", v4.
Introduce selftests to validate the functionality of memory failure.
These tests help ensure that memory failure handling for anonymous pages,
pagecaches pages works correctly, including proper SIGBUS delivery to user
processes, page isolation, and recovery paths.
Currently madvise syscall is used to inject memory failures. And only
anonymous pages and pagecaches are tested. More test scenarios, e.g.
hugetlb, shmem, thp, will be added. Also more memory failure injecting
methods will be supported, e.g. APEI Error INJection, if required.
This patch (of 3):
This patch adds a new kselftest to validate memory failure handling for
anonymous pages. The test performs the following operations:
1. Allocates anonymous pages using mmap().
2. Injects memory failure via madvise syscall.
3. Verifies expected error handling behavior.
4. Unpoison memory.
This test helps ensure that memory failure handling for anonymous pages
works correctly, including proper SIGBUS delivery to user processes, page
isolation and recovery paths.
Link: https://lkml.kernel.org/r/20260206031639.2707102-1-linmiaohe@huawei.com
Link: https://lkml.kernel.org/r/20260206031639.2707102-2-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: kernel test robot <lkp@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Now we have the capability to test the new helpers for the bitmap VMA
flags in userland, do so.
We also update the Makefile such that both VMA (and while we're here)
mm_struct flag sizes can be customised on build. We default to 128-bit to
enable testing of flags above word size even on 64-bit systems.
We add userland tests to ensure that we do not regress VMA flag behaviour
with the introduction when using bitmap VMA flags, nor accidentally
introduce unexpected results due to for instance higher bit values not
being correctly cleared/set.
As part of this change, make __mk_vma_flags() a custom function so we can
handle specifying invalid VMA bits. This is purposeful so we can have the
VMA tests work at lower and higher number of VMA flags without having to
duplicate code too much.
Link: https://lkml.kernel.org/r/7fe6afe9c8c61e4d3cfc9a2d50a5d24da8528e68.1769097829.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: "Darrick J. Wong" <djwong@kernel.org>
Cc: Jarkko Sakkinen <jarkko@kernel.org>
Cc: Yury Norov <ynorov@nvidia.com>
Cc: Chris Mason <clm@fb.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The vma_internal.h file is becoming entirely unmanageable. It combines
duplicated kernel implementation logic that needs to be kept in-sync with
the kernel, stubbed out declarations that we simply ignore for testing
purposes and custom logic added to aid testing.
If we separate each of the three things into separate headers it makes
things far more manageable, so do so:
* include/stubs.h contains the stubbed declarations,
* include/dup.h contains the duplicated kernel declarations, and
* include/custom.h contains declarations customised for testing.
[lorenzo.stoakes@oracle.com: avoid a duplicate struct define]
Link: https://lkml.kernel.org/r/1e032732-61c3-485c-9aa7-6a09016fefc1@lucifer.local
Link: https://lkml.kernel.org/r/dd57baf5b5986cb96a167150ac712cbe804b63ee.1769097829.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: "Darrick J. Wong" <djwong@kernel.org>
Cc: Jarkko Sakkinen <jarkko@kernel.org>
Cc: Yury Norov <ynorov@nvidia.com>
Cc: Chris Mason <clm@fb.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
So far the userland VMA tests have been established as a rough expression
of what's been possible.
Adapt it into a more usable form by separating out tests and shared
helper functions.
Since we test functions that are declared statically in mm/vma.c, we make
use of the trick of #include'ing kernel C files directly.
In order for the tests to continue to function, we must therefore also
this way into the tests/ directory.
We try to keep as much shared logic actually modularised into a separate
compilation unit in shared.c, however the merge_existing() and
attach_vma() helpers rely on statically declared mm/vma.c functions so
these must be declared in main.c.
Link: https://lkml.kernel.org/r/a0455ccfe4fdcd1c962c64f76304f612e5662a4e.1769097829.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: "Darrick J. Wong" <djwong@kernel.org>
Cc: Jarkko Sakkinen <jarkko@kernel.org>
Cc: Yury Norov <ynorov@nvidia.com>
Cc: Chris Mason <clm@fb.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Now we have eliminated all uses of vm_area_desc->vm_flags, eliminate this
field, and have mmap_prepare users utilise the vma_flags_t
vm_area_desc->vma_flags field only.
As part of this change we alter is_shared_maywrite() to accept a
vma_flags_t parameter, and introduce is_shared_maywrite_vm_flags() for use
with legacy vm_flags_t flags.
We also update struct mmap_state to add a union between vma_flags and
vm_flags temporarily until the mmap logic is also converted to using
vma_flags_t.
Also update the VMA userland tests to reflect this change.
Link: https://lkml.kernel.org/r/fd2a2938b246b4505321954062b1caba7acfc77a.1769097829.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: "Darrick J. Wong" <djwong@kernel.org>
Cc: Jarkko Sakkinen <jarkko@kernel.org>
Cc: Yury Norov <ynorov@nvidia.com>
Cc: Chris Mason <clm@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Now we have the mk_vma_flags() macro helper which permits easy
specification of any number of VMA flags, add helper functions which
operate with vma_flags_t parameters.
This patch provides vma_flags_test[_mask](), vma_flags_set[_mask]() and
vma_flags_clear[_mask]() respectively testing, setting and clearing flags
with the _mask variants accepting vma_flag_t parameters, and the non-mask
variants implemented as macros which accept a list of flags.
This allows us to trivially test/set/clear aggregate VMA flag values as
necessary, for instance:
if (vma_flags_test(&flags, VMA_READ_BIT, VMA_WRITE_BIT))
goto readwrite;
vma_flags_set(&flags, VMA_READ_BIT, VMA_WRITE_BIT);
vma_flags_clear(&flags, VMA_READ_BIT, VMA_WRITE_BIT);
We also add a function for testing that ALL flags are set for convenience,
e.g.:
if (vma_flags_test_all(&flags, VMA_READ_BIT, VMA_MAYREAD_BIT)) {
/* Both READ and MAYREAD flags set */
...
}
The compiler generates optimal assembly for each such that they behave as
if the caller were setting the bitmap flags manually.
This is important for e.g. drivers which manipulate flag values rather
than a VMA's specific flag values.
We also add helpers for testing, setting and clearing flags for VMA's and
VMA descriptors to reduce boilerplate.
Also add the EMPTY_VMA_FLAGS define to aid initialisation of empty flags.
Finally, update the userland VMA tests to add the helpers there so they
can be utilised as part of userland testing.
Link: https://lkml.kernel.org/r/885d4897d67a6a57c0b07fa182a7055ad752df11.1769097829.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: "Darrick J. Wong" <djwong@kernel.org>
Cc: Jarkko Sakkinen <jarkko@kernel.org>
Cc: Yury Norov <ynorov@nvidia.com>
Cc: Chris Mason <clm@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Pass through the unmap_desc to free_pgtables() because it almost has
everything necessary and is already on the stack.
Updates testing code as necessary.
No functional changes intended.
[Liam.Howlett@oracle.com: fix up unmap desc use on exit_mmap()]
Link: https://lkml.kernel.org/r/20260210214214.364856-1-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20260121164946.2093480-12-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Chris Li <chrisl@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: SeongJae Park <sj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Convert vms_clear_ptes() to use unmap_desc to call unmap_vmas() instead of
the large argument list. The UNMAP_STATE() cannot be used because the vma
iterator in the vms does not point to the correct maple state
(mas_detach), and the tree_end will be set incorrectly. Setting up the
arguments manually avoids setting the struct up incorrectly and doing
extra work to get the correct pagetable range.
exit_mmap() also calls unmap_vmas() with many arguments. Using the
unmap_all_init() function to set the unmap descriptor for all vmas makes
this a bit easier to read.
Update to the vma test code is necessary to ensure testing continues to
function.
No functional changes intended.
Link: https://lkml.kernel.org/r/20260121164946.2093480-10-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Chris Li <chrisl@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: SeongJae Park <sj@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The ceiling and tree search limit need to be different arguments for the
future change in the failed fork attempt. The ceiling and floor variables
are not very descriptive, so change them to pg_start/pg_end.
Adding a new variable for the vma_end to the function as it will differ
from the pg_end in the later patches in the series.
Add a kernel doc about the free_pgtables() function.
Test code also updated.
No functional changes intended.
Link: https://lkml.kernel.org/r/20260121164946.2093480-6-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Cc: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Chris Li <chrisl@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Total patches: 107
Reviews/patch: 1.07
Reviewed rate: 67%
- The 2 patch series "ocfs2: give ocfs2 the ability to reclaim
suballocator free bg" from Heming Zhao saves disk space by teaching
ocfs2 to reclaim suballocator block group space.
- The 4 patch series "Add ARRAY_END(), and use it to fix off-by-one
bugs" from Alejandro Colomar adds the ARRAY_END() macro and uses it in
various places.
- The 2 patch series "vmcoreinfo: support VMCOREINFO_BYTES larger than
PAGE_SIZE" from Pnina Feder makes the vmcore code future-safe, if
VMCOREINFO_BYTES ever exceeds the page size.
- The 7 patch series "kallsyms: Prevent invalid access when showing
module buildid" from Petr Mladek cleans up kallsyms code related to
module buildid and fixes an invalid access crash when printing
backtraces.
- The 3 patch series "Address page fault in
ima_restore_measurement_list()" from Harshit Mogalapalli fixes a
kexec-related crash that can occur when booting the second-stage kernel
on x86.
- The 6 patch series "kho: ABI headers and Documentation updates" from
Mike Rapoport updates the kexec handover ABI documentation.
- The 4 patch series "Align atomic storage" from Finn Thain adds the
__aligned attribute to atomic_t and atomic64_t definitions to get
natural alignment of both types on csky, m68k, microblaze, nios2,
openrisc and sh.
- The 2 patch series "kho: clean up page initialization logic" from
Pratyush Yadav simplifies the page initialization logic in
kho_restore_page().
- The 6 patch series "Unload linux/kernel.h" from Yury Norov moves
several things out of kernel.h and into more appropriate places.
- The 7 patch series "don't abuse task_struct.group_leader" from Oleg
Nesterov removes the usage of ->group_leader when it is "obviously
unnecessary".
- The 5 patch series "list private v2 & luo flb" from Pasha Tatashin
adds some infrastructure improvements to the live update orchestrator.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaY4giAAKCRDdBJ7gKXxA
jgusAQDnKkP8UWTqXPC1jI+OrDJGU5ciAx8lzLeBVqMKzoYk9AD/TlhT2Nlx+Ef6
0HCUHUD0FMvAw/7/Dfc6ZKxwBEIxyww=
=mmsH
-----END PGP SIGNATURE-----
Merge tag 'mm-nonmm-stable-2026-02-12-10-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
- "ocfs2: give ocfs2 the ability to reclaim suballocator free bg" saves
disk space by teaching ocfs2 to reclaim suballocator block group
space (Heming Zhao)
- "Add ARRAY_END(), and use it to fix off-by-one bugs" adds the
ARRAY_END() macro and uses it in various places (Alejandro Colomar)
- "vmcoreinfo: support VMCOREINFO_BYTES larger than PAGE_SIZE" makes
the vmcore code future-safe, if VMCOREINFO_BYTES ever exceeds the
page size (Pnina Feder)
- "kallsyms: Prevent invalid access when showing module buildid" cleans
up kallsyms code related to module buildid and fixes an invalid
access crash when printing backtraces (Petr Mladek)
- "Address page fault in ima_restore_measurement_list()" fixes a
kexec-related crash that can occur when booting the second-stage
kernel on x86 (Harshit Mogalapalli)
- "kho: ABI headers and Documentation updates" updates the kexec
handover ABI documentation (Mike Rapoport)
- "Align atomic storage" adds the __aligned attribute to atomic_t and
atomic64_t definitions to get natural alignment of both types on
csky, m68k, microblaze, nios2, openrisc and sh (Finn Thain)
- "kho: clean up page initialization logic" simplifies the page
initialization logic in kho_restore_page() (Pratyush Yadav)
- "Unload linux/kernel.h" moves several things out of kernel.h and into
more appropriate places (Yury Norov)
- "don't abuse task_struct.group_leader" removes the usage of
->group_leader when it is "obviously unnecessary" (Oleg Nesterov)
- "list private v2 & luo flb" adds some infrastructure improvements to
the live update orchestrator (Pasha Tatashin)
* tag 'mm-nonmm-stable-2026-02-12-10-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (107 commits)
watchdog/hardlockup: simplify perf event probe and remove per-cpu dependency
procfs: fix missing RCU protection when reading real_parent in do_task_stat()
watchdog/softlockup: fix sample ring index wrap in need_counting_irqs()
kcsan, compiler_types: avoid duplicate type issues in BPF Type Format
kho: fix doc for kho_restore_pages()
tests/liveupdate: add in-kernel liveupdate test
liveupdate: luo_flb: introduce File-Lifecycle-Bound global state
liveupdate: luo_file: Use private list
list: add kunit test for private list primitives
list: add primitives for private list manipulations
delayacct: fix uapi timespec64 definition
panic: add panic_force_cpu= parameter to redirect panic to a specific CPU
netclassid: use thread_group_leader(p) in update_classid_task()
RDMA/umem: don't abuse current->group_leader
drm/pan*: don't abuse current->group_leader
drm/amd: kill the outdated "Only the pthreads threading model is supported" checks
drm/amdgpu: don't abuse current->group_leader
android/binder: use same_thread_group(proc->tsk, current) in binder_mmap()
android/binder: don't abuse current->group_leader
kho: skip memoryless NUMA nodes when reserving scratch areas
...
Everything:
Total patches: 325
Reviews/patch: 1.39
Reviewed rate: 72%
Excluding DAMON:
Total patches: 262
Reviews/patch: 1.63
Reviewed rate: 82%
Excluding DAMON and zram:
Total patches: 248
Reviews/patch: 1.72
Reviewed rate: 86%
- The 14 patch series "powerpc/64s: do not re-activate batched TLB
flush" from Alexander Gordeev makes arch_{enter|leave}_lazy_mmu_mode()
nest properly.
It adds a generic enter/leave layer and switches architectures to use
it. Various hacks were removed in the process.
- The 7 patch series "zram: introduce compressed data writeback" from
Richard Chang and Sergey Senozhatsky implements data compression for
zram writeback.
- The 8 patch series "mm: folio_zero_user: clear page ranges" from David
Hildenbrand adds clearing of contiguous page ranges for hugepages.
Large improvements during demand faulting are demonstrated.
- The 2 patch series "memcg cleanups" from Chen Ridong tideis up some
memcg code.
- The 12 patch series "mm/damon: introduce {,max_}nr_snapshots and
tracepoint for damos stats" from SeongJae Park improves DAMOS stat's
provided information, deterministic control, and readability.
- The 3 patch series "selftests/mm: hugetlb cgroup charging: robustness
fixes" from Li Wang fixes a few issues in the hugetlb cgroup charging
selftests.
- The 5 patch series "Fix va_high_addr_switch.sh test failure - again"
from Chunyu Hu addresses several issues in the va_high_addr_switch test.
- The 5 patch series "mm/damon/tests/core-kunit: extend existing test
scenarios" from Shu Anzai improves the KUnit test coverage for DAMON.
- The 2 patch series "mm/khugepaged: fix dirty page handling for
MADV_COLLAPSE" from Shivank Garg fixes a glitch in khugepaged which was
causing madvise(MADV_COLLAPSE) to transiently return -EAGAIN.
- The 29 patch series "arch, mm: consolidate hugetlb early reservation"
from Mike Rapoport reworks and consolidates a pile of straggly code
related to reservation of hugetlb memory from bootmem and creation of
CMA areas for hugetlb.
- The 9 patch series "mm: clean up anon_vma implementation" from Lorenzo
Stoakes cleans up the anon_vma implementation in various ways.
- The 3 patch series "tweaks for __alloc_pages_slowpath()" from
Vlastimil Babka does a little streamlining of the page allocator's
slowpath code.
- The 8 patch series "memcg: separate private and public ID namespaces"
from Shakeel Butt cleans up the memcg ID code and prevents the
internal-only private IDs from being exposed to userspace.
- The 6 patch series "mm: hugetlb: allocate frozen gigantic folio" from
Kefeng Wang cleans up the allocation of frozen folios and avoids some
atomic refcount operations.
- The 11 patch series "mm/damon: advance DAMOS-based LRU sorting" from
SeongJae Park improves DAMOS's movement of memory betewwn the active and
inactive LRUs and adds auto-tuning of the ratio-based quotas and of
monitoring intervals.
- The 18 patch series "Support page table check on PowerPC" from Andrew
Donnellan makes CONFIG_PAGE_TABLE_CHECK_ENFORCED work on powerpc.
- The 3 patch series "nodemask: align nodes_and{,not} with underlying
bitmap ops" from Yury Norov makes nodes_and() and nodes_andnot()
propagate the return values from the underlying bit operations, enabling
some cleanup in calling code.
- The 5 patch series "mm/damon: hide kdamond and kdamond_lock from API
callers" from SeongJae Park cleans up some DAMON internal interfaces.
- The 4 patch series "mm/khugepaged: cleanups and scan limit fix" from
Shivank Garg does some cleanup work in khupaged and fixes a scan limit
accounting issue.
- The 24 patch series "mm: balloon infrastructure cleanups" from David
Hildenbrand goes to town on the balloon infrastructure and its page
migration function. Mainly cleanups, also some locking simplification.
- The 2 patch series "mm/vmscan: add tracepoint and reason for
kswapd_failures reset" from Jiayuan Chen adds additional tracepoints to
the page reclaim code.
- The 3 patch series "Replace wq users and add WQ_PERCPU to
alloc_workqueue() users" from Marco Crivellari is part of Marco's
kernel-wide migration from the legacy workqueue APIs over to the
preferred unbound workqueues.
- The 9 patch series "Various mm kselftests improvements/fixes" from
Kevin Brodsky provides various unrelated improvements/fixes for the mm
kselftests.
- The 5 patch series "mm: accelerate gigantic folio allocation" from
Kefeng Wang greatly speeds up gigantic folio allocation, mainly by
avoiding unnecessary work in pfn_range_valid_contig().
- The 5 patch series "selftests/damon: improve leak detection and wss
estimation reliability" from SeongJae Park improves the reliability of
two of the DAMON selftests.
- The 8 patch series "mm/damon: cleanup kdamond, damon_call(), damos
filter and DAMON_MIN_REGION" from SeongJae Park does some cleanup work
in the core DAMON code.
- The 8 patch series "Docs/mm/damon: update intro, modules, maintainer
profile, and misc" from SeongJae Park performs maintenance work on the
DAMON documentation.
- The 10 patch series "mm: add and use vma_assert_stabilised() helper"
from Lorenzo Stoakes refactors and cleans up the core VMA code. The
main aim here is to be able to use the mmap write lock's lockdep state
to perform various assertions regarding the locking which the VMA code
requires.
- The 19 patch series "mm, swap: swap table phase II: unify swapin use"
from Kairui Song removes some old swap code (swap cache bypassing and
swap synchronization) which wasn't working very well. Various other
cleanups and simplifications were made. The end result is a 20% speedup
in one benchmark.
- The 8 patch series "enable PT_RECLAIM on more 64-bit architectures"
from Qi Zheng makes PT_RECLAIM available on 64-bit alpha, loongarch,
mips, parisc, um, Various cleanups were performed along the way.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaY1HfAAKCRDdBJ7gKXxA
jqhZAP9H8ZlKKqCEgnr6U5XXmJ63Ep2FDQpl8p35yr9yVuU9+gEAgfyWiJ43l1fP
rT0yjsUW3KQFBi/SEA3R6aYarmoIBgI=
=+HLt
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2026-02-11-19-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- "powerpc/64s: do not re-activate batched TLB flush" makes
arch_{enter|leave}_lazy_mmu_mode() nest properly (Alexander Gordeev)
It adds a generic enter/leave layer and switches architectures to use
it. Various hacks were removed in the process.
- "zram: introduce compressed data writeback" implements data
compression for zram writeback (Richard Chang and Sergey Senozhatsky)
- "mm: folio_zero_user: clear page ranges" adds clearing of contiguous
page ranges for hugepages. Large improvements during demand faulting
are demonstrated (David Hildenbrand)
- "memcg cleanups" tidies up some memcg code (Chen Ridong)
- "mm/damon: introduce {,max_}nr_snapshots and tracepoint for damos
stats" improves DAMOS stat's provided information, deterministic
control, and readability (SeongJae Park)
- "selftests/mm: hugetlb cgroup charging: robustness fixes" fixes a few
issues in the hugetlb cgroup charging selftests (Li Wang)
- "Fix va_high_addr_switch.sh test failure - again" addresses several
issues in the va_high_addr_switch test (Chunyu Hu)
- "mm/damon/tests/core-kunit: extend existing test scenarios" improves
the KUnit test coverage for DAMON (Shu Anzai)
- "mm/khugepaged: fix dirty page handling for MADV_COLLAPSE" fixes a
glitch in khugepaged which was causing madvise(MADV_COLLAPSE) to
transiently return -EAGAIN (Shivank Garg)
- "arch, mm: consolidate hugetlb early reservation" reworks and
consolidates a pile of straggly code related to reservation of
hugetlb memory from bootmem and creation of CMA areas for hugetlb
(Mike Rapoport)
- "mm: clean up anon_vma implementation" cleans up the anon_vma
implementation in various ways (Lorenzo Stoakes)
- "tweaks for __alloc_pages_slowpath()" does a little streamlining of
the page allocator's slowpath code (Vlastimil Babka)
- "memcg: separate private and public ID namespaces" cleans up the
memcg ID code and prevents the internal-only private IDs from being
exposed to userspace (Shakeel Butt)
- "mm: hugetlb: allocate frozen gigantic folio" cleans up the
allocation of frozen folios and avoids some atomic refcount
operations (Kefeng Wang)
- "mm/damon: advance DAMOS-based LRU sorting" improves DAMOS's movement
of memory betewwn the active and inactive LRUs and adds auto-tuning
of the ratio-based quotas and of monitoring intervals (SeongJae Park)
- "Support page table check on PowerPC" makes
CONFIG_PAGE_TABLE_CHECK_ENFORCED work on powerpc (Andrew Donnellan)
- "nodemask: align nodes_and{,not} with underlying bitmap ops" makes
nodes_and() and nodes_andnot() propagate the return values from the
underlying bit operations, enabling some cleanup in calling code
(Yury Norov)
- "mm/damon: hide kdamond and kdamond_lock from API callers" cleans up
some DAMON internal interfaces (SeongJae Park)
- "mm/khugepaged: cleanups and scan limit fix" does some cleanup work
in khupaged and fixes a scan limit accounting issue (Shivank Garg)
- "mm: balloon infrastructure cleanups" goes to town on the balloon
infrastructure and its page migration function. Mainly cleanups, also
some locking simplification (David Hildenbrand)
- "mm/vmscan: add tracepoint and reason for kswapd_failures reset" adds
additional tracepoints to the page reclaim code (Jiayuan Chen)
- "Replace wq users and add WQ_PERCPU to alloc_workqueue() users" is
part of Marco's kernel-wide migration from the legacy workqueue APIs
over to the preferred unbound workqueues (Marco Crivellari)
- "Various mm kselftests improvements/fixes" provides various unrelated
improvements/fixes for the mm kselftests (Kevin Brodsky)
- "mm: accelerate gigantic folio allocation" greatly speeds up gigantic
folio allocation, mainly by avoiding unnecessary work in
pfn_range_valid_contig() (Kefeng Wang)
- "selftests/damon: improve leak detection and wss estimation
reliability" improves the reliability of two of the DAMON selftests
(SeongJae Park)
- "mm/damon: cleanup kdamond, damon_call(), damos filter and
DAMON_MIN_REGION" does some cleanup work in the core DAMON code
(SeongJae Park)
- "Docs/mm/damon: update intro, modules, maintainer profile, and misc"
performs maintenance work on the DAMON documentation (SeongJae Park)
- "mm: add and use vma_assert_stabilised() helper" refactors and cleans
up the core VMA code. The main aim here is to be able to use the mmap
write lock's lockdep state to perform various assertions regarding
the locking which the VMA code requires (Lorenzo Stoakes)
- "mm, swap: swap table phase II: unify swapin use" removes some old
swap code (swap cache bypassing and swap synchronization) which
wasn't working very well. Various other cleanups and simplifications
were made. The end result is a 20% speedup in one benchmark (Kairui
Song)
- "enable PT_RECLAIM on more 64-bit architectures" makes PT_RECLAIM
available on 64-bit alpha, loongarch, mips, parisc, and um. Various
cleanups were performed along the way (Qi Zheng)
* tag 'mm-stable-2026-02-11-19-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (325 commits)
mm/memory: handle non-split locks correctly in zap_empty_pte_table()
mm: move pte table reclaim code to memory.c
mm: make PT_RECLAIM depends on MMU_GATHER_RCU_TABLE_FREE
mm: convert __HAVE_ARCH_TLB_REMOVE_TABLE to CONFIG_HAVE_ARCH_TLB_REMOVE_TABLE config
um: mm: enable MMU_GATHER_RCU_TABLE_FREE
parisc: mm: enable MMU_GATHER_RCU_TABLE_FREE
mips: mm: enable MMU_GATHER_RCU_TABLE_FREE
LoongArch: mm: enable MMU_GATHER_RCU_TABLE_FREE
alpha: mm: enable MMU_GATHER_RCU_TABLE_FREE
mm: change mm/pt_reclaim.c to use asm/tlb.h instead of asm-generic/tlb.h
mm/damon/stat: remove __read_mostly from memory_idle_ms_percentiles
zsmalloc: make common caches global
mm: add SPDX id lines to some mm source files
mm/zswap: use %pe to print error pointers
mm/vmscan: use %pe to print error pointers
mm/readahead: fix typo in comment
mm: khugepaged: fix NR_FILE_PAGES and NR_SHMEM in collapse_file()
mm: refactor vma_map_pages to use vm_insert_pages
mm/damon: unify address range representation with damon_addr_range
mm/cma: replace snprintf with strscpy in cma_new_area
...