Commit Graph

1427896 Commits (d0e437b76bd3c979ddaa6205f5e9ad3e0f95faef)

Author SHA1 Message Date
Pavel Begunkov d0e437b76b io_uring/bpf-ops: implement loop_step with BPF struct_ops
Introduce io_uring BPF struct ops implementing the loop_step callback,
which will allow BPF to overwrite the default io_uring event loop logic.

The callback takes an io_uring context, the main role of which is to be
passed to io_uring kfuncs. The other argument is a struct iou_loop_params,
which BPF can use to request CQ waiting and communicate other parameters.
See the event loop description in the previous patch for more details.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://patch.msgid.link/98db437651ce64e9cbeb611c60bf5887259db09f.1772109579.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-03-16 16:15:00 -06:00
Pavel Begunkov 033af2b3eb io_uring: introduce callback driven main loop
The io_uring_enter() has a fixed order of execution: it submits
requests, waits for completions, and returns to the user. Allow to
optionally replace it with a custom loop driven by a callback called
loop_step. The basic requirements to the callback is that it should be
able to submit requests, wait for completions, parse them and repeat.
Most of the communication including parameter passing can be implemented
via shared memory.

The callback should return IOU_LOOP_CONTINUE to continue execution or
IOU_LOOP_STOP to return to the user space. Note that the kernel may
decide to prematurely terminate it as well, e.g. in case the process was
signalled or killed.

The hook takes a structure with parameters. It can be used to ask the
kernel to wait for CQEs by setting cq_wait_idx to the CQE index it wants
to wait for. Spurious wake ups are possible and even likely, the callback
is expected to handle it. There will be more parameters in the future
like timeout.

It can be used with kernel callbacks, for example, as a slow path
deprecation mechanism overwiting SQEs and emulating the wanted
behaviour, however it's more useful together with BPF programs
implemented in following patches.

Note that keeping it separately from the normal io_uring wait loop
makes things much simpler and cleaner. It keeps it in one place instead
of spreading a bunch of checks in different places including disabling
the submission path. It holds the lock by default, which is a better fit
for BPF synchronisation and the loop execution model. It nicely avoids
existing quirks like forced wake ups on timeout request completion. And
it should be easier to implement new features.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://patch.msgid.link/a2d369aa1c9dd23ad7edac9220cffc563abcaed6.1772109579.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-03-16 16:15:00 -06:00
Caleb Sander Mateos f144dbac4b nvme: remove nvme_dev_uring_cmd() IO_URING_F_IOPOLL check
nvme_dev_uring_cmd() is part of struct file_operations nvme_dev_fops,
which doesn't implement ->uring_cmd_iopoll(). So it won't be called with
issue_flags that include IO_URING_F_IOPOLL. Drop the unnecessary
IO_URING_F_IOPOLL check in nvme_dev_uring_cmd().

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Anuj Gupta <anuj20.g@samsung.com>
Link: https://patch.msgid.link/20260302172914.2488599-6-csander@purestorage.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-03-16 16:14:14 -06:00
Caleb Sander Mateos 23475637b0 io_uring/uring_cmd: allow non-iopoll cmds with IORING_SETUP_IOPOLL
Currently, creating an io_uring with IORING_SETUP_IOPOLL requires all
requests issued to it to support iopoll. This prevents, for example,
using ublk zero-copy together with IORING_SETUP_IOPOLL, as ublk
zero-copy buffer registrations are performed using a uring_cmd. There's
no technical reason why these non-iopoll uring_cmds can't be supported.
They will either complete synchronously or via an external mechanism
that calls io_uring_cmd_done(), io_uring_cmd_post_mshot_cqe32(), or
io_uring_mshot_cmd_post_cqe(), so they don't need to be polled.

Allow uring_cmd requests to be issued to IORING_SETUP_IOPOLL io_urings
even if their files don't implement ->uring_cmd_iopoll(). For these
uring_cmd requests, skip initializing struct io_kiocb's iopoll fields,
don't set REQ_F_IOPOLL, and don't set IO_URING_F_IOPOLL in issue_flags.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Anuj Gupta <anuj20.g@samsung.com>
Link: https://patch.msgid.link/20260302172914.2488599-5-csander@purestorage.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-03-16 16:14:14 -06:00
Caleb Sander Mateos 3a5e96d47f io_uring: count CQEs in io_iopoll_check()
A subsequent commit will allow uring_cmds that don't use iopoll on
IORING_SETUP_IOPOLL io_urings. As a result, CQEs can be posted without
setting the iopoll_completed flag for a request in iopoll_list or going
through task work. For example, a UBLK_U_IO_FETCH_IO_CMDS command could
call io_uring_mshot_cmd_post_cqe() to directly post a CQE. The
io_iopoll_check() loop currently only counts completions posted in
io_do_iopoll() when determining whether the min_events threshold has
been met. It also exits early if there are any existing CQEs before
polling, or if any CQEs are posted while running task work. CQEs posted
via io_uring_mshot_cmd_post_cqe() or other mechanisms won't be counted
against min_events.

Explicitly check the available CQEs in each io_iopoll_check() loop
iteration to account for CQEs posted in any fashion.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Link: https://patch.msgid.link/20260302172914.2488599-4-csander@purestorage.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-03-16 16:14:14 -06:00
Caleb Sander Mateos 7995be40de io_uring: remove iopoll_queue from struct io_issue_def
The opcode iopoll_queue flag is now redundant with REQ_F_IOPOLL. Only
io_{read,write}{,_fixed}() and io_uring_cmd() set the REQ_F_IOPOLL flag,
and the opcodes with these ->issue() implementations are precisely the
ones that set iopoll_queue. So don't bother checking the iopoll_queue
flag in io_issue_sqe(). Remove the unused flag from struct io_issue_def.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Anuj Gupta <anuj20.g@samsung.com>
Link: https://patch.msgid.link/20260302172914.2488599-3-csander@purestorage.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-03-16 16:14:14 -06:00
Caleb Sander Mateos 9165dc4fa9 io_uring: add REQ_F_IOPOLL
A subsequent commit will allow uring_cmds to files that don't implement
->uring_cmd_iopoll() to be issued to IORING_SETUP_IOPOLL io_urings. This
means the ctx's IORING_SETUP_IOPOLL flag isn't sufficient to determine
whether a given request needs to be iopolled.

Introduce a request flag REQ_F_IOPOLL set in ->issue() if a request
needs to be iopolled to completion. Set the flag in io_rw_init_file()
and io_uring_cmd() for requests issued to IORING_SETUP_IOPOLL ctxs. Use
the request flag instead of IORING_SETUP_IOPOLL in places dealing with a
specific request.

A future possibility would be to add an option to enable/disable iopoll
in the io_uring SQE instead of determining it from IORING_SETUP_IOPOLL.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Anuj Gupta <anuj20.g@samsung.com>
Link: https://patch.msgid.link/20260302172914.2488599-2-csander@purestorage.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-03-16 16:14:14 -06:00
Jens Axboe 8c55744919 io_uring: mark known and harmless racy ctx->int_flags uses
There are a few of these, where flags are read outside of the
uring_lock, yet it's harmless to race on them.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-03-16 15:33:10 -06:00
Jens Axboe f1a424e21c io_uring: switch struct io_ring_ctx internal bitfields to flags
Bitfields cannot be set and checked atomically, and this makes it more
clear that these are indeed in shared storage and must be checked and
set in a sane fashion. This is in preparation for annotating a few of
the known racy, but harmless, flags checking.

No intended functional changes in this patch.

Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-03-16 15:32:59 -06:00
Jens Axboe 0e46cb553f Merge branch 'io_uring-7.0' into for-7.1/io_uring
Merge upstream io_uring fixes to avoid conflicts in later patches.

* io_uring-7.0:
  io_uring/kbuf: check if target buffer list is still legacy on recycle
  io_uring: fix physical SQE bounds check for SQE_MIXED 128-byte ops
  io_uring/eventfd: use ctx->rings_rcu for flags checking
  io_uring: ensure ctx->rings is stable for task work flags manipulation
  io_uring/bpf_filter: use bpf_prog_run_pin_on_cpu() to prevent migration
  io_uring/register: fix comment about task_no_new_privs
2026-03-14 08:57:15 -06:00
Jens Axboe c2c185be5c io_uring/kbuf: check if target buffer list is still legacy on recycle
There's a gap between when the buffer was grabbed and when it
potentially gets recycled, where if the list is empty, someone could've
upgraded it to a ring provided type. This can happen if the request
is forced via io-wq. The legacy recycling is missing checking if the
buffer_list still exists, and if it's of the correct type. Add those
checks.

Cc: stable@vger.kernel.org
Fixes: c7fb19428d ("io_uring: add support for ring mapped supplied buffers")
Reported-by: Keenan Dong <keenanat2000@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-03-12 08:59:25 -06:00
Tom Ryan 6f02c6b196 io_uring: fix physical SQE bounds check for SQE_MIXED 128-byte ops
When IORING_SETUP_SQE_MIXED is used without IORING_SETUP_NO_SQARRAY,
the boundary check for 128-byte SQE operations in io_init_req()
validated the logical SQ head position rather than the physical SQE
index.

The existing check:

  !(ctx->cached_sq_head & (ctx->sq_entries - 1))

ensures the logical position isn't at the end of the ring, which is
correct for NO_SQARRAY rings where physical == logical. However, when
sq_array is present, an unprivileged user can remap any logical
position to an arbitrary physical index via sq_array. Setting
sq_array[N] = sq_entries - 1 places a 128-byte operation at the last
physical SQE slot, causing the 128-byte memcpy in
io_uring_cmd_sqe_copy() to read 64 bytes past the end of the SQE
array.

Replace the cached_sq_head alignment check with a direct validation
of the physical SQE index, which correctly handles both sq_array and
NO_SQARRAY cases.

Fixes: 1cba30bf9f ("io_uring: add support for IORING_SETUP_SQE_MIXED")
Signed-off-by: Tom Ryan <ryan36005@gmail.com>
Link: https://patch.msgid.link/20260310052003.72871-1-ryan36005@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-03-11 14:35:19 -06:00
Jens Axboe 177c694321 io_uring/eventfd: use ctx->rings_rcu for flags checking
Similarly to what commit e78f7b70e837 did for local task work additions,
use ->rings_rcu under RCU rather than dereference ->rings directly. See
that commit for more details.

Cc: stable@vger.kernel.org
Fixes: 79cfe9e59c ("io_uring/register: add IORING_REGISTER_RESIZE_RINGS")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-03-11 14:35:19 -06:00
Jens Axboe 9618908026 io_uring: ensure ctx->rings is stable for task work flags manipulation
If DEFER_TASKRUN | SETUP_TASKRUN is used and task work is added while
the ring is being resized, it's possible for the OR'ing of
IORING_SQ_TASKRUN to happen in the small window of swapping into the
new rings and the old rings being freed.

Prevent this by adding a 2nd ->rings pointer, ->rings_rcu, which is
protected by RCU. The task work flags manipulation is inside RCU
already, and if the resize ring freeing is done post an RCU synchronize,
then there's no need to add locking to the fast path of task work
additions.

Note: this is only done for DEFER_TASKRUN, as that's the only setup mode
that supports ring resizing. If this ever changes, then they too need to
use the io_ctx_mark_taskrun() helper.

Link: https://lore.kernel.org/io-uring/20260309062759.482210-1-naup96721@gmail.com/
Cc: stable@vger.kernel.org
Fixes: 79cfe9e59c ("io_uring/register: add IORING_REGISTER_RESIZE_RINGS")
Reported-by: Hao-Yu Yang <naup96721@gmail.com>
Suggested-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-03-11 14:35:16 -06:00
Jens Axboe 785d4625d3 io_uring/bpf_filter: use bpf_prog_run_pin_on_cpu() to prevent migration
Since the caller, __io_uring_run_bpf_filters(), doesn't prevent
migration, it should use the migration disabling variant for running
the BPF program.

Fixes: d42eb05e60 ("io_uring: add support for BPF filtering for opcode restrictions")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-03-09 14:20:14 -06:00
Jann Horn 3306a589e5 io_uring/register: fix comment about task_no_new_privs
The actual code is right, but the comment is the wrong way around.

Fixes: ed82f35b92 ("io_uring: allow registration of per-task restrictions")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-03-09 08:40:18 -06:00
Pavel Begunkov cb94873336 io_uring/zctx: separate notification user_data
People previously asked for the notification CQE to have a different
user_data value from the main request completion. It's useful to
separate buffer and request handling logic and avoid separately
refcounting the request.

Let the user pass the notification user_data in sqe->addr3. If zero,
it'll inherit sqe->user_data as before. It doesn't change the rules for
when the user can expect a notification CQE, and it should still check
the IORING_CQE_F_MORE flag.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-03-09 07:23:29 -06:00
Pavel Begunkov dce00c8303 io_uring/net: allow vectorised regbuf send zc
Enable IORING_SEND_VECTORIZED with registered buffers for
IORING_OP_SEND_ZC. Set IORING_SEND_VECTORIZED for all msg send requests
to differentiate if the vectorised version is expected.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-03-09 07:22:09 -06:00
Pavel Begunkov d8345a2190 io_uring/timeout: immediate timeout arg
One the things the user has always keep in mind is that any user
pointers they put into an SQE is not going to be read by the kernel
until submission happens, and the user has to ensure the pointee stays
alive until then. For example, snippet below will lead to UAF of the on
stack variable ts. Instead of passing the timeout value as a pointer
allow to store it immediately in the SQE. The user has to set a new flag
called IORING_TIMEOUT_IMMEDIATE_ARG, in which case sqe->addr for timeout
or sqe->addr2 for timeout update requests will be interpreted as a time
value in nanosecods.

void prep_timeout(struct io_uring_sqe *sqe) {
    struct __kernel_timespec ts = {...};
    prep_timeout(sqe, &ts);
}

void submit() {
    sqe = get_sqe();
    prep_timeout(sqe);
    io_uring_submit();
}

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-03-09 07:21:54 -06:00
Pavel Begunkov 0e78aa188c io_uring/timeout: migrate reqs from ts64 to ktime
It'll be more convenient for next patches to keep ktime in requests
instead of timespec64. Convert everything to ktime right after user
argument parsing at request prep time.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-03-09 07:21:54 -06:00
Pavel Begunkov 6e3f5943a4 io_uring/timeout: add helper for parsing user time
There is some duplication for timespec checks that can be deduplicated
with a new function, and it'll be extended in next patches.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-03-09 07:21:54 -06:00
Pavel Begunkov 484ae637a3 io_uring/timeout: check unused sqe fields
Zero check unused SQE fields addr3 and pad2 for timeout and timeout
update requests. They're not needed now, but could be used sometime
in the future.

Cc: stable@vger.kernel.org
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-03-09 07:21:54 -06:00
Pavel Begunkov d9d2455e77 io_uring/zcrx: move zcrx uapi into separate header
Split out zcrx uapi into a separate file. It'll be easier to manage it
this way, and that reduces the size of a not so small io_uring.h. Since
there are users that expect that zcrx definitions come with io_uring.h,
it includes the new file.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-03-09 07:21:54 -06:00
Pavel Begunkov dc156e0f1a io_uring/zcrx: declare some constants for query
Add constants for zcrx features and supported registration flags that
can be reused by the query code. I was going to add another registration
flag, and this patch helps to avoid duplication and keeps changes
specific to zcrx files.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-03-09 07:21:54 -06:00
Pavel Begunkov 403fec55bf io_uring/zctx: unify zerocopy issue variants
io_send_zc and io_sendmsg_zc started different but now the only real
difference between them is how registered buffers are imported and
which net helper we use. Avoid duplication and combine them into a
single function.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-03-09 07:21:54 -06:00
Pavel Begunkov 2f9965f5d5 io_uring/zctx: move vec regbuf import into io_send_zc_import
Unify send and sendmsg zerocopy paths for importing registered buffers
and make io_send_zc_import() responsible for that. It's a preparation
patch making the next change simpler.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-03-09 07:21:53 -06:00
Pavel Begunkov c279fcd95a io_uring/zctx: rename flags var for more clarity
The name "flags" is too overloaded, so rename the variable in
io_sendmsg_zc() into msg_flags to stress that it contains MSG_*.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-03-09 07:21:53 -06:00
Asbjørn Sloth Tønnesen bdb489adca io_uring/cmd_net: split ioctl code out of io_uring_cmd_sock()
io_uring_cmd_sock() originally supported two ioctl-based cmd_op
operations. Over time, additional operations were added with tail calls
to their helpers.

This approach resulted in the new operations sharing an ioctl check
with the original operations.

io_uring_cmd_sock() now supports 6 operations, so let's move the
implementation of the original two into their own helper, reducing
io_uring_cmd_sock() to a simple dispatcher.

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-03-09 07:21:53 -06:00
Linus Torvalds 1f318b96cc Linux 7.0-rc3 2026-03-08 16:56:54 -07:00
Linus Torvalds fc9f248d8c EFI fixes for v7.0 #2
Fix for the x86 EFI workaround keeping boot services code and data
 regions reserved until after SetVirtualAddressMap() completes: deferred
 struct page initialization may result in some of this memory to be lost
 permanently.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQQm/3uucuRGn1Dmh0wbglWLn0tXAUCaaxHtgAKCRAwbglWLn0t
 XIbNAPwNjw/TSgVD+Ur//yqY7TxZSBari8aheEkXNaYHFCPImwD6A1CzNGn6rcka
 JzeP+6HeOO9c0xCBudcR0aRfSma3cQI=
 =a+XF
 -----END PGP SIGNATURE-----

Merge tag 'efi-fixes-for-v7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi

Pull EFI fix from Ard Biesheuvel:
 "Fix for the x86 EFI workaround keeping boot services code and data
  regions reserved until after SetVirtualAddressMap() completes:
  deferred struct page initialization may result in some of this memory
  being lost permanently"

* tag 'efi-fixes-for-v7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  x86/efi: defer freeing of boot services memory
2026-03-08 12:13:09 -07:00
Linus Torvalds 014441d1e4 i2c-for-7.0-rc3
A revert for the i801 driver restoring old locking behaviour.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEOZGx6rniZ1Gk92RdFA3kzBSgKbYFAmmtKb8ACgkQFA3kzBSg
 Kba5dRAAnoNtQC2qd3PneEUZs/pVK3kE+wEJ57iWPyxnSUEW0O7RcbQHoETb039N
 O0aSyAL/x2pI4+nMYnLOkJUwwaDjcpSdCFPpeUsmIhzZo/k19hyPaW3VmdpIF+uR
 K6snCwNzH4AbCh0ka9XOUH4YXINse4C2n7ZP9r5z5WZ6ANK3x8oKGC/QRM6UPaZw
 jXPl962lb9LQARqvG6YnUHjn+x3teHW/sD3/48IHfNeuvhKstzG9Bc+XDZD+Uc7X
 EGNAwI7/4tkm/0vRZXDWkuupJleqZSIUXVlb5awv0p50IqREjEnl2fdQdoR90vux
 oooTKv4inWw0W79VBwQeScGCHKFPV00HQkexiyePmtCGwSU3/k3BWalD+jY9CF8s
 W6yDR7M3gmIeNbQQXGZx6/04KVFugtQEQm9v9O7bmB7oEW01K4tAAGhfcFgmwOeN
 qKsmF1Dt+KefYQdtWPCZpMT/zdUTjFJs69J8omxtyo5SdU8RWaLGMegYfEwUrakH
 r9pt/nASAPcMTb31KAlgro2QmvHWzRVx6+Sir41tLFB5Ls4jxC/a/cH3DWIqgq8V
 PqZF5dvfxsa/KoXrotpQHnS9Nma3KqEJnjLwg/7LhSPxjCqhKvlTjIqDv1IP5R0e
 N56KYy2MRRfdWUCovnXmTViFc5fsmJk1agjjXtxHHdp8GO/njAg=
 =Nc8b
 -----END PGP SIGNATURE-----

Merge tag 'i2c-for-7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c fix from Wolfram Sang:
 "A revert for the i801 driver restoring old locking behaviour"

* tag 'i2c-for-7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: i801: Revert "i2c: i801: replace acpi_lock with I2C bus lock"
2026-03-08 10:17:05 -07:00
Linus Torvalds c23719abc3 Miscellaneous x86 fixes:
- Fix SEV guest boot failures in certain circumstances, due
    to very early code relying on a BSS-zeroed variable that
    isn't actually zeroed yet an may contain non-zero bootup
    values. Move the variable into the .data section go gain
    even earlier zeroing.
 
  - Expose & allow the IBPB-on-Entry feature on SNP guests,
    which was not properly exposed to guests due to initial
    implementational caution.
 
  - Fix O= build failure when CONFIG_EFI_SBAT_FILE is using
    relative file paths.
 
  - 4 commits to fix the various SNC (Sub-NUMA Clustering)
    topology enumeration bugs/artifacts (sched-domain build
    errors mostly). SNC enumeration data got more complicated
    with Granite Rapids X (GNR) and Clearwater Forest X (CWF),
    which exposed these bugs and made their effects more
    serious.
 
  - Also use the now sane(r) SNC code to fix resctrl SNC
    detection bugs.
 
  - Work around a historic libgcc unwinder bug in the vdso32
    sigreturn code (again), which regressed during an overly
    aggressive recent cleanup of DWARF annotations.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmmsy9wRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1ieiQ/7B2Rfm5vR5rQLlAv26iEMypIwoCiCMgzA
 YD3nOMFl6aGhphKryiU0b4MDhAIASN9X6mZloryUKyol1oKP0evkWXSk/0J+k+V9
 lS7uIVL+8nPTSl3gQE7ARzJ9jakFN49VzDheZjsjIHC0+n+yvCJU6xSx8IKeiTSW
 axpX8R33M3Fj+u5anF3m37OdFTgiYxFO0t5VNFgWP4H9367yC/wnHPuDyidAdJ/N
 B7PL1L3rG3+w/4np81Xwi/rThwgsSWarVLNuMJuGM5wujMr8mQGhuWaeLiPgTx7G
 wze1iarWvp5uqamGztpy/4WMD1x0yBX9CCSocnwF48Fh1yTww5+uwOZn5e5fZxYr
 vDhCH6+DB8Rt3Wj+/3RBzHSFe7rNq+f86U84uxTwyOs5eC5sGUuyH15lCt4dP9ZO
 uQfW0dQRwvUXCGXJxxZdIR0nq/vEJUmQ+DLLL6zkCj24t9ND5IPAkBLVn7P5PO5s
 qv8dPpldSq57V4comqW8oDAqLL0OeS1qgggxlHzqAdrMmt+IVKWvteRXrkgy1m9Y
 Bt0EbdghUTZkn9+FcUTorVA/pZHL5sYCiuGQxNbaaLmMWrcX4I3XnEtpzgukHh8e
 BL1blJWAm/4cuhGXb4RF7AZMQgTU56greOU385Afc1Qz2lzohGO4lqgGOH8L0ZEh
 KqEX1IS0ZbI=
 =KlDX
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2026-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:

 - Fix SEV guest boot failures in certain circumstances, due to
   very early code relying on a BSS-zeroed variable that isn't
   actually zeroed yet an may contain non-zero bootup values

   Move the variable into the .data section go gain even earlier
   zeroing

 - Expose & allow the IBPB-on-Entry feature on SNP guests, which
   was not properly exposed to guests due to initial implementational
   caution

 - Fix O= build failure when CONFIG_EFI_SBAT_FILE is using relative
   file paths

 - Fix the various SNC (Sub-NUMA Clustering) topology enumeration
   bugs/artifacts (sched-domain build errors mostly).

   SNC enumeration data got more complicated with Granite Rapids X
   (GNR) and Clearwater Forest X (CWF), which exposed these bugs
   and made their effects more serious

 - Also use the now sane(r) SNC code to fix resctrl SNC detection bugs

 - Work around a historic libgcc unwinder bug in the vdso32 sigreturn
   code (again), which regressed during an overly aggressive recent
   cleanup of DWARF annotations

* tag 'x86-urgent-2026-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/entry/vdso32: Work around libgcc unwinder bug
  x86/resctrl: Fix SNC detection
  x86/topo: Fix SNC topology mess
  x86/topo: Replace x86_has_numa_in_package
  x86/topo: Add topology_num_nodes_per_package()
  x86/numa: Store extra copy of numa_nodes_parsed
  x86/boot: Handle relative CONFIG_EFI_SBAT_FILE file paths
  x86/sev: Allow IBPB-on-Entry feature for SNP guests
  x86/boot/sev: Move SEV decompressor variables into the .data section
2026-03-07 17:12:06 -08:00
Linus Torvalds 6ff1020c2f Make clock_adjtime() syscall timex validation slightly more
permissive for auxiliary clocks, to not reject syscalls
 based on the status field that do not try to modify the
 status field. This makes ABI behavior in clock_adjtime()
 consistent with CLOCK_REALTIME.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmmsxzkRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1hq8g//fRTp9p2pVfmRWUoxWELrT/bMK1r+D6F3
 6BYkwp68peRhchVrFxkI/Y37rjAIC8CXZSPuvkubqIROrH3gA7SCCQYCcZKdss+t
 i3lbpQF8IbagPIS5btpOAN2KRCu2S7aqjDdH0rWb9VhQdlW7fI71Z72Uz07YEA+q
 TWpy3gE531P/dgAqcvIAyMHnFZDCb1S6z8wZvT3SV4r4GkczfXpTFyNHHtETSu0V
 7isuOBfloM4HpDU50oUotlqBiwigH27J2Ad6aIrnCA7iaQPrzREysG+8E96ShhaB
 g6+qaQS5gTgFryA1bggA6LzGveLOI8bjy2kZ2SnZWuFPj46OReGIuwK4kyY07jz2
 xk0sd37alN16ETKhGVLfAgjmzVGoKVNnp4ak9J3VmMbxWEmXeObuOC8SmF9VImc1
 4bRaG9+Tlfd4DtOOz2+E4VcPE1D9A2tMw4esgUaXRrrp4GlEcKOJ5PRlWj0uGvrh
 xLPLbL0XIiWsjMsHdVs4Gq9Z0MvfRHc4VLOviIqLFtHox2DscZypPkyjKAv5inp0
 /VWyUYJkkr07RMQQ3nqHnP+lzAfO2aSeZ72D9NnHStL3RPbGC4jYvpoi8dnH0/TT
 PKJgj2jb7u3h+1cxKBi1RM0JbxUYD5+4N8zfJISa9uqkHZ3XY3VyuuT+2RHO6CQp
 d1BdX0V4oDA=
 =zjov
 -----END PGP SIGNATURE-----

Merge tag 'timers-urgent-2026-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fix from Ingo Molnar:
 "Make clock_adjtime() syscall timex validation slightly more permissive
  for auxiliary clocks, to not reject syscalls based on the status field
  that do not try to modify the status field.

  This makes the ABI behavior in clock_adjtime() consistent with
  CLOCK_REALTIME"

* tag 'timers-urgent-2026-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timekeeping: Fix timex status validation for auxiliary clocks
2026-03-07 17:09:15 -08:00
Linus Torvalds b1b9a9d0b5 Fix a DL scheduler bug that may corrupt internal metrics during
PI and setscheduler() syscalls, resulting in kernel warnings
 and misbehavior. Found during stress-testing.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmmsxW0RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1jzXRAAjqcTwaC72cd+6cnh+tE9/fcjXf1JtK5e
 TxdTygsgBAbXh63rD4y4cRPueqBR1ne52TAV0lI8Z1pBM/XthnaF4MJBue6B8EdX
 SQIE7hpOh6R81I6hnuhNoNsAy95jQvYXN5SFaKMuNacWNVX8k3vPzN5XPxa7yHLN
 MVUL+O9c7Xwg4v30Nz/QIv0mFoPosbh4PIdeVpD/ghJAXtXhsCg7EYOivEk9UsSy
 TAcq3qRnfDyroIOc5/dnSglEwX12LQqVFBba97nI/TCjaH23PsUIt2Dg2rpJbJ+k
 bLh4hGpOoyQvgE/PSEdoMl1F9pXw3XiUOzAGrFJdqn0iKL+7WzuTEQH+vAToGZQv
 4hF5BtMjLrAYY/MVsD8qJGm/pne5nTIo2gSsG7LZPwCmMj0rDUGXfO4G8N8LHhT7
 ExQ/t2+z0BczsKdvF3VKX+RweT51AOYOWcmLIdA9h1jdAy858GVmTzSWDveAEJ0L
 yToPQ0UMCz985g9il6Rdb5cIphD7DjuUeFNnYTCm63cVpZdA4j8Da74r4KfP2jNY
 tRcbiUy+A7MwqW5aERgwBtI6XCz6QZqW3svJW9yYghf40lgNGAcDCTTdf2r7g0Ho
 Q0pQVxEk9mXD5N1otjzSS4piLbzoMaPH1L4W6ceHN1RzBjfSJED3tmfGUHZUDqNE
 w33GhhQAFpA=
 =vP5l
 -----END PGP SIGNATURE-----

Merge tag 'sched-urgent-2026-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fix from Ingo Molnar:
 "Fix a DL scheduler bug that may corrupt internal metrics during PI and
  setscheduler() syscalls, resulting in kernel warnings and misbehavior.

  Found during stress-testing"

* tag 'sched-urgent-2026-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/deadline: Fix missing ENQUEUE_REPLENISH during PI de-boosting
2026-03-07 17:07:13 -08:00
Eric Dumazet 1954c4f012 eventpoll: Convert epoll_put_uevent() to scoped user access
Saves two function calls, and one stac/clac pair.

stac/clac is rather expensive on older cpus like Zen 2.

A synthetic network stress test gives a ~1.5% increase of pps
on AMD Zen 2.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-03-07 15:03:14 -08:00
Linus Torvalds 3b5d535c63 SCSI fixes on 20260307
Two core changes and the rest in drivers, one core change to quirk the
 behaviour of the Iomega Zip drive and one to fix a hang caused by tag
 reallocation problems, which has mostly been seen by the iscsi client.
 Note the latter fixes the problem but still has a slight sysfs memory
 leak, so will be amended in the next pull request (once we've run the
 fix for the fix through our testing).
 
 Signed-off-by: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
 -----BEGIN PGP SIGNATURE-----
 
 iLgEABMIAGAWIQTnYEDbdso9F2cI+arnQslM7pishQUCaaxT0hsUgAAAAAAEAA5t
 YW51MiwyLjUrMS4xMiwyLDImHGphbWVzLmJvdHRvbWxleUBoYW5zZW5wYXJ0bmVy
 c2hpcC5jb20ACgkQ50LJTO6YrIVmDwD+P17JCAk+Ju0aNSnjEmIjUC2oI1S+9GdO
 thbkK99vClABAOOkDvHopBBhfsilTpHBYjWFM34vC/iiaO/xfgd9YH2A
 =kIDx
 -----END PGP SIGNATURE-----

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Two core changes and the rest in drivers, one core change to quirk the
  behaviour of the Iomega Zip drive and one to fix a hang caused by tag
  reallocation problems, which has mostly been seen by the iscsi client.

  Note the latter fixes the problem but still has a slight sysfs memory
  leak, so will be amended in the next pull request (once we've run the
  fix for the fix through our testing)"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: target: Fix recursive locking in __configfs_open_file()
  scsi: devinfo: Add BLIST_SKIP_IO_HINTS for Iomega ZIP
  scsi: mpi3mr: Clear reset history on ready and recheck state after timeout
  scsi: core: Fix refcount leak for tagset_refcnt
2026-03-07 14:04:50 -08:00
Linus Torvalds fb07430e6f fbdev fixes for kernel v7.0-rc3:
Silence build error in au1100fb driver found by kernel test robot
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQS86RI+GtKfB8BJu973ErUQojoPXwUCaayLiwAKCRD3ErUQojoP
 X1pVAP4/j6LjBX862nFgtxS5XC4YBkpGRLYwO2WJMec+4sO5fQD/ThrowpuzZfPl
 FhD/6WtMS4zPCDfNeqIKAo/JySez+w8=
 =2Tha
 -----END PGP SIGNATURE-----

Merge tag 'fbdev-for-7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev

Pull fbdev fix from Helge Deller:
 "Silence build error in au1100fb driver found by kernel test robot"

* tag 'fbdev-for-7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
  fbdev: au1100fb: Fix build on MIPS64
2026-03-07 13:21:43 -08:00
Linus Torvalds 6deccafcb4 parisc architecture fixes for kernel v7.0-rc3:
Three initial kernel mapping fixes
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQS86RI+GtKfB8BJu973ErUQojoPXwUCaayE4AAKCRD3ErUQojoP
 X4U4AQDtHPc9nlM3areu5yTQnOcPTExuEoIpvBm9ktwNCdrwCgEAt4tqv3hhxCvG
 /lwb6XBCHfyw3d/AsTRbOIH1MGCnaQQ=
 =itGt
 -----END PGP SIGNATURE-----

Merge tag 'parisc-for-7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux

Pull parisc fixes from Helge Deller:
 "While testing Sasha Levin's 'kallsyms: embed source file:line info in
  kernel stack traces' patch series, which increases the typical kernel
  image size, I found some issues with the parisc initial kernel mapping
  which may prevent the kernel to boot.

  The three small patches here fix this"

* tag 'parisc-for-7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: Fix initial page table creation for boot
  parisc: Check kernel mapping earlier at bootup
  parisc: Increase initial mapping to 64 MB with KALLSYMS
2026-03-07 12:38:16 -08:00
Linus Torvalds 8b7f4cd3ac bpf-fixes
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmmsahoACgkQ6rmadz2v
 bToD4Q/9Hr2lAELsTRZIENBTYYTfk48Rzrr+rJbZWLfX4nOu9RwhVboTw0gvJr8r
 QEteYO8+gP5hrIuifcC+vdzW12CpgziBK7/Qj2NB7MGSX0ZCLK0ShQK6TAbNE8uZ
 xA4qMcJp0BWJpgfU51z4Ur5nMzG0Th2XlY+SGwHYZ/53qRMmP7Tr8R9wPltKuEll
 J+tS6BE6bFKMQe1CuIYlYfYj9kO3j4yx9S48j1e8x9LXc2/2arfsOIZQ6L11rnxZ
 YTa9jbcRL0YdLy7d4hMLyPQquF9hiHDrJLJih+YNmbyOCPGD3HP0ib2c2g0ip+qk
 WlNFLc2K9w0eS02uuAytaxwArdOpDUHG+skWe+dGt95Bk3Xld0hW/JSI5hrohM1E
 0YKjuZeGGvTkI5FJ5kk+BWoApy+FveiN1w7VsHZgr1tix5NRZRgJP/5swM+0eSp/
 neTzp08fC10gi+GOnUpdF/lI4wEayoGMpo68BTGiM/hP6Qh268wNWgTHaau3Sk/z
 wfJb90VTcDHk+N8mnbi4970RZ6OdqX2n7aBy8D7hBW31kHhP27zofzYX51CFD8Ln
 5NQDU3wX/hUw9tNwoghCNe12//gfDBi6rO/GoEztNFfVVef4QuzsfwKKdxzUMc9a
 jZvyDyGh87rGLauDWylq4s23vgxgIG/p114uv+eFp3AMkOfT/Dk=
 =knWl
 -----END PGP SIGNATURE-----

Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Pull bpf fixes from Alexei Starovoitov:

 - Fix u32/s32 bounds when ranges cross min/max boundary (Eduard
   Zingerman)

 - Fix precision backtracking with linked registers (Eduard Zingerman)

 - Fix linker flags detection for resolve_btfids (Ihor Solodrai)

 - Fix race in update_ftrace_direct_add/del (Jiri Olsa)

 - Fix UAF in bpf_trampoline_link_cgroup_shim (Lang Xu)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  resolve_btfids: Fix linker flags detection
  selftests/bpf: add reproducer for spurious precision propagation through calls
  bpf: collect only live registers in linked regs
  Revert "selftests/bpf: Update reg_bound range refinement logic"
  selftests/bpf: test refining u32/s32 bounds when ranges cross min/max boundary
  bpf: Fix u32/s32 bounds when ranges cross min/max boundary
  bpf: Fix a UAF issue in bpf_trampoline_link_cgroup_shim
  ftrace: Add missing ftrace_lock to update_ftrace_direct_add/del
2026-03-07 12:20:37 -08:00
Linus Torvalds 03dcad79ee RCU fixes for v7.0
Fix a regression in RCU torture test pre-defined scenarios caused by
 commit 7dadeaa6e8 ("sched: Further restrict the preemption modes")
 which limits PREEMPT_NONE to architectures that do not support
 preemption at all and PREEMPT_VOLUNTARY to those architectures that do
 not yet have PREEMPT_LAZY support. Since major architectures (e.g. x86
 and arm64) no longer support CONFIG_PREEMPT_NONE and
 CONFIG_PREEMPT_VOLUNTARY, using them in rcutorture, rcuscale, refscale,
 and scftorture pre-defined scenarios causes config checking errors.
 Hence switch these kconfigs to PREEMPT_LAZY.
 -----BEGIN PGP SIGNATURE-----
 
 iQFFBAABCAAvFiEEj5IosQTPz8XU1wRHSXnow7UH+rgFAmmsZu0RHGJvcXVuQGtl
 cm5lbC5vcmcACgkQSXnow7UH+rgNWwgAn1bIWDIWQR9CkRQ0grhdO+pPfLusbVg/
 Y7H3SsdEX03meSunA/IVGejP6Qbanuab9nyHdv3WxxowpCGYBaLFPvklSfcBWYeZ
 4lxi6Fj+2rrzvOnQ54Pk5i4v5VayxEo12XBIgx6HDV7+5LgWk0gU0LpWPUhEWYYR
 z/zJ/XbcLr8e9tZVwAWZj/ShLUH301razC2SaR/OlS93zqRG9Sd251Knjqq/lEwI
 T6RZfhT2Wz3bgqU3QcjxDWw5dhB0/Y9wKoJjx4bB9m8lnSt+o96gH40TO+lnllnS
 T4OHjqK1J1fUXNLzQufyfKKAiwi/LBBc9H4pe4tiLFxg0fg5s21flQ==
 =NBys
 -----END PGP SIGNATURE-----

Merge tag 'rcu-fixes.v7.0-20260307a' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux

Pull RCU selftest fixes from Boqun Feng:
 "Fix a regression in RCU torture test pre-defined scenarios caused by
  commit 7dadeaa6e8 ("sched: Further restrict the preemption modes")
  which limits PREEMPT_NONE to architectures that do not support
  preemption at all and PREEMPT_VOLUNTARY to those architectures that do
  not yet have PREEMPT_LAZY support.

  Since major architectures (e.g. x86 and arm64) no longer support
  CONFIG_PREEMPT_NONE and CONFIG_PREEMPT_VOLUNTARY, using them in
  rcutorture, rcuscale, refscale, and scftorture pre-defined scenarios
  causes config checking errors.

  Switch these kconfigs to PREEMPT_LAZY"

* tag 'rcu-fixes.v7.0-20260307a' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux:
  scftorture: Update due to x86 not supporting none/voluntary preemption
  refscale: Update due to x86 not supporting none/voluntary preemption
  rcuscale: Update due to x86 not supporting none/voluntary preemption
  rcutorture: Update due to x86 not supporting none/voluntary preemption
2026-03-07 11:56:55 -08:00
Linus Torvalds aed0af05a8 tracing fixes for 7.0:
- Fix possible NULL pointer dereference in trace_data_alloc()
 
   On the error path in trace_data_alloc(), it can call trigger_data_free()
   with a NULL pointer. This use to be a kfree() but was changed to
   trigger_data_free() to clean up any partial initialization. The issue is
   that trigger_data_free() does not expect a NULL pointer. Have
   trigger_data_free() return safely on NULL pointer.
 
 - Fix multiple events on the command line and bootconfig
 
   If multiple events are enabled on the command line separately and not
   grouped, only the last event gets enabled. That is:
 
     trace_event=sched_switch trace_event=sched_waking
 
   Will only enable sched_waking where as:
 
     trace_event=sched_switch,sched_waking
 
   Will enable both.
 
   The bootconfig makes it even worse as the second way is the more common
   method.
 
   The issue is that a temporary buffer is used to store the events to enable
   later in boot. Each time the cmdline callback is called, it overwrites
   what was previously there.
 
   Have the callback append the next value (delimited by a comma) if the
   temporary buffer already has content.
 
 - Fix command line trace_buffer_size if >= 2G
 
   The logic to allocate the trace buffer uses "int" for the size parameter
   in the command line code causing overflow issues if more that 2G is
   specified.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaaxEIRQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qn+QAQCM6aJm0ZqDD2dM262M1mQpkU7sW3Dz
 hZfBpo3YlH55fQEAklsaD96+yKN7PLl1Vh4c0zCelMHZA7kgck/3GqaFAgA=
 =rn/Z
 -----END PGP SIGNATURE-----

Merge tag 'trace-v7.0-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fixes from Steven Rostedt:

 - Fix possible NULL pointer dereference in trace_data_alloc()

   On the trace_data_alloc() error path, it can call trigger_data_free()
   with a NULL pointer. This used to be a kfree() but was changed to
   trigger_data_free() to clean up any partial initialization. The issue
   is that trigger_data_free() does not expect a NULL pointer. Have
   trigger_data_free() return safely on NULL pointer.

 - Fix multiple events on the command line and bootconfig

   If multiple events are enabled on the command line separately and not
   grouped, only the last event gets enabled. That is:

      trace_event=sched_switch trace_event=sched_waking

   will only enable sched_waking whereas:

      trace_event=sched_switch,sched_waking

   will enable both.

   The bootconfig makes it even worse as the second way is the more
   common method.

   The issue is that a temporary buffer is used to store the events to
   enable later in boot. Each time the cmdline callback is called, it
   overwrites what was previously there.

   Have the callback append the next value (delimited by a comma) if the
   temporary buffer already has content.

 - Fix command line trace_buffer_size if >= 2G

   The logic to allocate the trace buffer uses "int" for the size
   parameter in the command line code causing overflow issues if more
   that 2G is specified.

* tag 'trace-v7.0-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Fix trace_buf_size= cmdline parameter with sizes >= 2G
  tracing: Fix enabling multiple events on the kernel command line and bootconfig
  tracing: Add NULL pointer check to trigger_data_free()
2026-03-07 09:50:54 -08:00
Ihor Solodrai b0dcdcb9ae resolve_btfids: Fix linker flags detection
The "|| echo -lzstd" default makes zstd an unconditional link
dependency of resolve_btfids. On systems where libzstd-dev is not
installed and pkg-config fails, the linker fails:

  ld: cannot find -lzstd: No such file or directory

libzstd is a transitive dependency of libelf, so the -lzstd flag is
strictly necessary only for static builds [1].

Remove ZSTD_LIBS variable, and instead set LIBELF_LIBS depending on
whether the build is static or not. Use $(HOSTPKG_CONFIG) as primary
source of the flags list.

Also add a default value for HOSTPKG_CONFIG in case it's not built via
the toplevel Makefile. Pass it from selftests/bpf too.

[1] https://lore.kernel.org/bpf/4ff82800-2daa-4b9f-95a9-6f512859ee70@linux.dev/

Reported-by: BPF CI Bot (Claude Opus 4.6) <bot+bpf-ci@kernel.org>
Reported-by: Vitaly Chikunov <vt@altlinux.org>
Closes: https://lore.kernel.org/bpf/aaWqMcK-2AQw5dx8@altlinux.org/
Fixes: 4021848a90 ("selftests/bpf: Pass through build flags to bpftool and resolve_btfids")
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Reviewed-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/20260305014730.3123382-1-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-07 08:51:51 -08:00
Linus Torvalds 7b6e48df88 hwmon fixes for v7.0-rc3
- aht10: Fix initialization commands for AHT20
 
 - emc1403: correct a malformed email address
 
 - it87: Check the it87_lock() return value
 
 - max6639: fix inverted polarity
 
 - macsmc: Fix overflows, underflows, sign extension, and other problems
 
 - pmbus/q54sj108a2: fix stack overflow in debugfs read
 
 - Drop support for SMARC-sAM67 (discontinued and never released to market)
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEiHPvMQj9QTOCiqgVyx8mb86fmYEFAmmsDycACgkQyx8mb86f
 mYE5PhAAjn5II2E0aAJhuulPIEtMQekjZd33q//XUGL5tcz5PVakVtK6MBkGQA7P
 a3ryouTH4TUlmfjlwEWHu3Dde3bElStfUoFWAvvQc74rkiZdmSbo3fyU4LWNGnIu
 2lxcOsjpMbIUC2xV0WUUmg/2r/v+Z8mFrLXbF0YEzhZfzpMin3kNxdCTjBqAa6p7
 E6/Wayxh13W1o0UNJNnCJ6jdxKQPQ8GkVgB9EyivJqmyiJjrPhbFN4KWVhUCHhGF
 mvoMzuC6inVbMwDa2cjU8Ykx9NqVMte4xqZ92f8F8ObH6+dxoRJeszrAvY5PDwy0
 p8OJcB9bZXzv7VYLBxIEsQg3Dm/VVk7YqRpPtChjrL7Z6VUtV170mq4r54YeLWfA
 6xNwoNZpxylOjbG57F+Tv9S2ogQtxyGmg+J5r14tB2IQKJEZQfmsYAIXLeJwgcwu
 gPJWvQsUiB0sMe5Uoxck9EQou5QCKfdjX/XrlRfJm94aZHicajgM/wEXBFtn4a0E
 U3k1WSWq9aVTZrto6mMaVrXnAu2dWrXY2TkfxI/3/6Q8NDdg2ckDaTKFaPC3w9Xy
 S31SMjI98EviNKpZ9K9aP4RpYQTQk/K50fisZ4cZtT+trDvnEcc161+USB7tYE6j
 YdOjYM+Eqh3cEIDGjiP4lLS8Tnc8+IzpqljGru99TbgK9Ma4hXI=
 =+l89
 -----END PGP SIGNATURE-----

Merge tag 'hwmon-for-v7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull hwmon fixes from Guenter Roeck:

 - Fix initialization commands for AHT20

 - Correct a malformed email address (emc1403)

 - Check the it87_lock() return value

 - Fix inverted polarity (max6639)

 - Fix overflows, underflows, sign extension, and other problems in
   macsmc

 - Fix stack overflow in debugfs read (pmbus/q54sj108a2)

 - Drop support for SMARC-sAM67 (discontinued and never released to
   market)

* tag 'hwmon-for-v7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (pmbus/q54sj108a2) fix stack overflow in debugfs read
  hwmon: (max6639) fix inverted polarity
  dt-bindings: hwmon: sl28cpld: Drop sa67mcu compatible
  hwmon: (it87) Check the it87_lock() return value
  Revert "hwmon: add SMARC-sAM67 support"
  hwmon: (aht10) Fix initialization commands for AHT20
  hwmon: (emc1403) correct a malformed email address
  hwmon: (macsmc) Fix overflows, underflows, and sign extension
  hwmon: (macsmc) Fix regressions in Apple Silicon SMC hwmon driver
2026-03-07 08:39:59 -08:00
Linus Torvalds e33aafac04 Driver core fixes for 7.0-rc3
- Revert "driver core: enforce device_lock for driver_match_device()":
 
   When a device is already present in the system and a driver is
   registered on the same bus, we iterate over all devices registered on
   this bus to see if one of them matches. If we come across an already
   bound one where the corresponding driver crashed while holding the
   device lock (e.g. in probe()) we can't make any progress anymore.
 
   Thus, revert and clarify that an implementer of struct bus_type must
   not expect match() to be called with the device lock held.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQS2q/xV6QjXAdC7k+1FlHeO1qrKLgUCaawGIAAKCRBFlHeO1qrK
 LmYsAP0XzV/dZVrEqU5AvchbcuZ7kfAKotj4wPUIAkoT3gzMcQEAqNm7Vaf2ulDs
 CS8XvRi0PX6inD1Oo3dqwb0rKjKfFwY=
 =GT+5
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core

Pull driver core fix from Danilo Krummrich:

 - Revert "driver core: enforce device_lock for driver_match_device()":

   When a device is already present in the system and a driver is
   registered on the same bus, we iterate over all devices registered on
   this bus to see if one of them matches. If we come across an already
   bound one where the corresponding driver crashed while holding the
   device lock (e.g. in probe()) we can't make any progress anymore.

   Thus, revert and clarify that an implementer of struct bus_type must
   not expect match() to be called with the device lock held.

* tag 'driver-core-7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core:
  Revert "driver core: enforce device_lock for driver_match_device()"
2026-03-07 08:16:48 -08:00
Linus Torvalds 0f912c8917 xen: branch for v7.0-rc3
-----BEGIN PGP SIGNATURE-----
 
 iJEEABYKADkWIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCaav3pBsUgAAAAAAEAA5t
 YW51MiwyLjUrMS4xMSwyLDIACgkQgFxhu0/YY75xWwD+NO/7WX01zcYSFMHTjHRx
 okbOkBwFzcZK+p/L4iTtVv0BAIYPUpa+RBLR2RtYN7mQEw8KO5yVgiLP2nlQYwIf
 wZcH
 =DrUe
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-7.0-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:

 - a cleanup of arch/x86/kernel/head_64.S removing the pre-built page
   tables for Xen guests

 - a small comment update

 - another cleanup for Xen PVH guests mode

 - fix an issue with Xen PV-devices backed by driver domains

* tag 'for-linus-7.0-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/xenbus: better handle backend crash
  xenbus: add xenbus_device parameter to xenbus_read_driver_state()
  x86/PVH: Use boot params to pass RSDP address in start_info page
  x86/xen: update outdated comment
  xen/acpi-processor: fix _CST detection using undersized evaluation buffer
  x86/xen: Build identity mapping page tables dynamically for XENPV
2026-03-07 07:44:32 -08:00
Alexei Starovoitov 325d1ba3ca Merge branch 'bpf-fix-precision-backtracking-bug-with-linked-registers'
Eduard Zingerman says:

====================
bpf: Fix precision backtracking bug with linked registers

Emil Tsalapatis reported a verifier bug hit by the scx_lavd sched_ext
scheduler. The essential part of the verifier log looks as follows:

  436: ...
  // checkpoint hit for 438: (1d) if r7 == r8 goto ...
  frame 3: propagating r2,r7,r8
  frame 2: propagating r6
  mark_precise: frame3: last_idx ...
  mark_precise: frame3: regs=r2,r7,r8 stack= before 436: ...
  mark_precise: frame3: regs=r2,r7 stack= before 435: ...
  mark_precise: frame3: regs=r2,r7 stack= before 434: (85) call bpf_trace_vprintk#177
  verifier bug: backtracking call unexpected regs 84

The log complains that registers r2 and r7 are tracked as precise
while processing the bpf_trace_vprintk() call in precision backtracking.
This can't be right, as r2 is reset by the call and there is nothing
to backtrack it to. The precision propagation is triggered when
a checkpoint is hit at instruction 438, r2 is dead at that instruction.

This happens because of the following sequence of events:
- Instruction 438 is first reached with registers r2 and r7 having
  the same id via a path that does not call bpf_trace_vprintk():
  - Checkpoint is created at 438.
  - The jump at 438 is predicted, hence r7 and registers linked to it
    (r2) are propagated as precise, marking r2 and r7 precise in the
    checkpoint.
- Instruction 438 is reached a second time with r2 undefined and via
  a path that calls bpf_trace_vprintk():
  - Checkpoint is hit.
  - propagate_precision() picks registers r2 and r7 and propagates
    precision marks for those up to the helper call.

The root cause is the fact that states_equal() and
propagate_precision() assume that the precision flag can't be set for a
dead register (as computed by compute_live_registers()).
However, this is not the case when linked registers are at play.
Fix this by accounting for live register flags in
collect_linked_regs().
---
====================

Link: https://patch.msgid.link/20260306-linked-regs-and-propagate-precision-v1-0-18e859be570d@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-06 21:50:05 -08:00
Eduard Zingerman 223ffb6a3d selftests/bpf: add reproducer for spurious precision propagation through calls
Add a test for the scenario described in the previous commit:
an iterator loop with two paths where one ties r2/r7 via
shared scalar id and skips a call, while the other goes
through the call. Precision marks from the linked registers
get spuriously propagated to the call path via
propagate_precision(), hitting "backtracking call unexpected
regs" in backtrack_insn().

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260306-linked-regs-and-propagate-precision-v1-2-18e859be570d@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-06 21:50:05 -08:00
Eduard Zingerman 2658a1720a bpf: collect only live registers in linked regs
Fix an inconsistency between func_states_equal() and
collect_linked_regs():
- regsafe() uses check_ids() to verify that cached and current states
  have identical register id mapping.
- func_states_equal() calls regsafe() only for registers computed as
  live by compute_live_registers().
- clean_live_states() is supposed to remove dead registers from cached
  states, but it can skip states belonging to an iterator-based loop.
- collect_linked_regs() collects all registers sharing the same id,
  ignoring the marks computed by compute_live_registers().
  Linked registers are stored in the state's jump history.
- backtrack_insn() marks all linked registers for an instruction
  as precise whenever one of the linked registers is precise.

The above might lead to a scenario:
- There is an instruction I with register rY known to be dead at I.
- Instruction I is reached via two paths: first A, then B.
- On path A:
  - There is an id link between registers rX and rY.
  - Checkpoint C is created at I.
  - Linked register set {rX, rY} is saved to the jump history.
  - rX is marked as precise at I, causing both rX and rY
    to be marked precise at C.
- On path B:
  - There is no id link between registers rX and rY,
    otherwise register states are sub-states of those in C.
  - Because rY is dead at I, check_ids() returns true.
  - Current state is considered equal to checkpoint C,
    propagate_precision() propagates spurious precision
    mark for register rY along the path B.
  - Depending on a program, this might hit verifier_bug()
    in the backtrack_insn(), e.g. if rY ∈  [r1..r5]
    and backtrack_insn() spots a function call.

The reproducer program is in the next patch.
This was hit by sched_ext scx_lavd scheduler code.

Changes in tests:
- verifier_scalar_ids.c selftests need modification to preserve
  some registers as live for __msg() checks.
- exceptions_assert.c adjusted to match changes in the verifier log,
  R0 is dead after conditional instruction and thus does not get
  range.
- precise.c adjusted to match changes in the verifier log, register r9
  is dead after comparison and it's range is not important for test.

Reported-by: Emil Tsalapatis <emil@etsalapatis.com>
Fixes: 0fb3cf6110 ("bpf: use register liveness information for func_states_equal")
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260306-linked-regs-and-propagate-precision-v1-1-18e859be570d@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-06 21:49:40 -08:00
Linus Torvalds 4ae12d8bd9 Second round of Kbuild fixes for 7.0
- Split out .modinfo section from ELF_DETAILS macro, as that macro may
   be used in other areas that expect to discard .modinfo, breaking
   certain image layouts
 
 - Adjust genksyms parser to handle optional attributes in certain
   declarations, necessary after commit 07919126ec ("netfilter:
   annotate NAT helper hook pointers with __rcu")
 
 - Include resolve_btfids in external module build created by
   scripts/package/install-extmod-build when it may be run on
   external modules
 
 - Avoid removing objtool binary with 'make clean', as it is required for
   external module builds
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQR74yXHMTGczQHYypIdayaRccAalgUCaat33gAKCRAdayaRccAa
 lizMAQCxm0P5WsJf3ydYR+5ZzzM7wreNtpMVMXsCbwOKBGY3VwEAyvB7om1a00Ex
 Z6WFa9P4VKW+L4PWMnWoyxcnvl/CdgM=
 =mvIb
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-fixes-7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux

Pull Kbuild fixes from Nathan Chancellor:

 - Split out .modinfo section from ELF_DETAILS macro, as that macro may
   be used in other areas that expect to discard .modinfo, breaking
   certain image layouts

 - Adjust genksyms parser to handle optional attributes in certain
   declarations, necessary after commit 07919126ec ("netfilter:
   annotate NAT helper hook pointers with __rcu")

 - Include resolve_btfids in external module build created by
   scripts/package/install-extmod-build when it may be run on external
   modules

 - Avoid removing objtool binary with 'make clean', as it is required
   for external module builds

* tag 'kbuild-fixes-7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux:
  kbuild: Leave objtool binary around with 'make clean'
  kbuild: install-extmod-build: Package resolve_btfids if necessary
  genksyms: Fix parsing a declarator with a preceding attribute
  kbuild: Split .modinfo out from ELF_DETAILS
2026-03-06 20:27:13 -08:00
Linus Torvalds 591d8796b2 s390 updates for 7.0-rc3
- Fix stackleak and xor lib inline asm, constraints and clobbers to
   prevent miscompilations and incomplete stack poisoning
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAmmrahkACgkQjYWKoQLX
 FBjrwQgAjN8E5Qqt7vuFqYw4HLwyWfQ/O6QRhHGcpeQdh1zlHhKbxLcytl6rrpwJ
 Dzp5G1aalHcIaefCJnePpLasf64YO30cnDhLtOS6n9n+2CZEz8v7kb44CeZJD/r7
 ytCu/PQjQr2LabfMtxjB3TfzuUfPxn65xmlHcx8LW+WjNyXKn0/gPRaBOmHxBxk7
 wx8NpYVApdqHhZO7K32KDxPF2TAmUjMNd8dnof2BMH0vz6qPM8Ib/kt+uOPMPTmd
 D1yPrwJdoVeAHsbAVA0sgUCZmrx5ERs3ZYAjvzgaRSsUygX0KtS/2Fyghul2SKMh
 JtUP905RIIGugIZ8naKdQO9iAZfLPA==
 =jkZC
 -----END PGP SIGNATURE-----

Merge tag 's390-7.0-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 fixes from Vasily Gorbik:

 - Fix stackleak and xor lib inline asm, constraints and clobbers to
   prevent miscompilations and incomplete stack poisoning

* tag 's390-7.0-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/stackleak: Fix __stackleak_poison() inline assembly constraint
  s390/xor: Improve inline assembly constraints
  s390/xor: Fix xor_xc_2() inline assembly constraints
  s390/xor: Fix xor_xc_5() inline assembly
2026-03-06 20:20:17 -08:00