The only use of this helper is to iterate all of the workers, and
hence all callers will pass in a func that always returns false to do
that. As none of the callers use the return value, get rid of it.
Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
A previous commit added this helper, and had it terminate if false is
returned from the handler. However, that is completely opposite, it
should abort the loop if true is returned.
Fix this up by having io_wq_for_each_worker() keep iterating as long
as false is returned, and only abort if true is returned.
Cc: stable@vger.kernel.org
Fixes: 751eedc4b4 ("io_uring/io-wq: move worker lists to struct io_wq_acct")
Reported-by: Lewis Campbell <info@lewiscampbell.tech>
Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
io_uring_validate_mmap_request() doesn't use its size_t sz argument, so
remove it.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
__io_openat_prep() allocates a struct filename using getname(). However,
for the condition of the file being installed in the fixed file table as
well as having O_CLOEXEC flag set, the function returns early. At that
point, the request doesn't have REQ_F_NEED_CLEANUP flag set. Due to this,
the memory for the newly allocated struct filename is not cleaned up,
causing a memory leak.
Fix this by setting the REQ_F_NEED_CLEANUP for the request just after the
successful getname() call, so that when the request is torn down, the
filename will be cleaned up, along with other resources needing cleanup.
Reported-by: syzbot+00e61c43eb5e4740438f@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=00e61c43eb5e4740438f
Tested-by: syzbot+00e61c43eb5e4740438f@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Prithvi Tambewagh <activprithvi@gmail.com>
Fixes: b9445598d8 ("io_uring: openat directly into fixed fd table")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
io_import_kbuf() calculates nr_segs incorrectly when iov_offset is
non-zero after iov_iter_advance(). It doesn't account for the partial
consumption of the first bvec.
The problem comes when meet the following conditions:
1. Use UBLK_F_AUTO_BUF_REG feature of ublk.
2. The kernel will help to register the buffer, into the io uring.
3. Later, the ublk server try to send IO request using the registered
buffer in the io uring, to read/write to fuse-based filesystem, with
O_DIRECT.
>From a userspace perspective, the ublk server thread is blocked in the
kernel, and will see "soft lockup" in the kernel dmesg.
When ublk registers a buffer with mixed-size bvecs like [4K]*6 + [12K]
and a request partially consumes a bvec, the next request's nr_segs
calculation uses bvec->bv_len instead of (bv_len - iov_offset).
This causes fuse_get_user_pages() to loop forever because nr_segs
indicates fewer pages than actually needed.
Specifically, the infinite loop happens at:
fuse_get_user_pages()
-> iov_iter_extract_pages()
-> iov_iter_extract_bvec_pages()
Since the nr_segs is miscalculated, the iov_iter_extract_bvec_pages
returns when finding that i->nr_segs is zero. Then
iov_iter_extract_pages returns zero. However, fuse_get_user_pages does
still not get enough data/pages, causing infinite loop.
Example:
- Bvecs: [4K, 4K, 4K, 4K, 4K, 4K, 12K, ...]
- Request 1: 32K at offset 0, uses 6*4K + 8K of the 12K bvec
- Request 2: 32K at offset 32K
- iov_offset = 8K (8K already consumed from 12K bvec)
- Bug: calculates using 12K, not (12K - 8K) = 4K
- Result: nr_segs too small, infinite loop in fuse_get_user_pages.
Fix by accounting for iov_offset when calculating the first segment's
available length.
Fixes: b419bed4f0 ("io_uring/rsrc: ensure segments counts are correct on kbuf buffers")
Signed-off-by: huang-jl <huang-jl@deepseek.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Using min_wait, two timeouts are given:
1) The min_wait timeout, within which up to 'wait_nr' events are
waited for.
2) The overall long timeout, which is entered if no events are generated
in the min_wait window.
If the min_wait has expired, any event being posted must wake the task.
For SQPOLL, that isn't the case, as it won't trigger the io_has_work()
condition, as it will have already processed the task_work that happened
when an event was posted. This causes any event to trigger post the
min_wait to not always cause the waiting application to wakeup, and
instead it will wait until the overall timeout has expired. This can be
shown in a test case that has a 1 second min_wait, with a 5 second
overall wait, even if an event triggers after 1.5 seconds:
axboe@m2max-kvm /d/iouring-mre (master)> zig-out/bin/iouring
info: MIN_TIMEOUT supported: true, features: 0x3ffff
info: Testing: min_wait=1000ms, timeout=5s, wait_nr=4
info: 1 cqes in 5000.2ms
where the expected result should be:
axboe@m2max-kvm /d/iouring-mre (master)> zig-out/bin/iouring
info: MIN_TIMEOUT supported: true, features: 0x3ffff
info: Testing: min_wait=1000ms, timeout=5s, wait_nr=4
info: 1 cqes in 1500.3ms
When the min_wait timeout triggers, reset the number of completions
needed to wake the task. This should ensure that any future events will
wake the task, regardless of how many events it originally wanted to
wait for.
Reported-by: Tip ten Brink <tip@tenbrinkmeijs.com>
Cc: stable@vger.kernel.org
Fixes: 1100c4a265 ("io_uring: add support for batch wait timeout")
Link: https://github.com/axboe/liburing/issues/1477
Signed-off-by: Jens Axboe <axboe@kernel.dk>
For some cases, the order in which the waitq entry list and head
writing happens is important, for others it doesn't really matter.
But it's somewhat confusing to have them spread out over the file.
Abstract out the nicely documented code in io_pollfree_wake() and
move it into a helper, and use that helper consistently rather than
having other call sites manually do the same thing. While at it,
correct a comment function name as well.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
buf->addr and buf->len reside in memory shared with userspace. They
should be written with WRITE_ONCE() to guarantee atomic stores and
prevent tearing or other unsafe compiler optimizations.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Cc: Caleb Sander Mateos <csander@purestorage.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The struct io_uring_buf elements in a buffer ring are in a memory region
accessible from userspace. A malicious/buggy userspace program could
therefore write to them at any time, so they should be accessed with
READ_ONCE() in the kernel. Commit 98b6fa62c8 ("io_uring/kbuf: always
use READ_ONCE() to read ring provided buffer lengths") already switched
the reads of the len field to READ_ONCE(). Do the same for bid and addr.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Fixes: c7fb19428d ("io_uring: add support for ring mapped supplied buffers")
Cc: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When cloning with node replacements (IORING_REGISTER_DST_REPLACE),
destination entries after the cloned range are not copied over.
Add logic to copy them over to the new destination table.
Fixes: c1329532d5 ("io_uring/rsrc: allow cloning with node replacements")
Cc: stable@vger.kernel.org
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The variable holds nodes from the destination ring's existing buffer
table. In io_clone_buffers(), the term "src" is used to refer to the
source ring.
Rename to node for clarity.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Get rid of some redundant checks and move the src arg validation to
before the buffer table allocation, which simplifies error handling.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The io_uring_queue_async_work tracepoint event stores an int rw field
that represents whether the work item is hashed. Rename it to "hashed"
and change its type to bool to more accurately reflect its value.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If a task has a pending signal when create_io_thread() is called,
copy_process() will return -ERESTARTNOINTR. io_should_retry_thread()
will request a retry of create_io_thread() up to WORKER_INIT_LIMIT = 3
times. If all retries fail, the io_uring request will fail with
ECANCELED.
Commit 3918315c5dc ("io-wq: backoff when retrying worker creation")
added a linear backoff to allow the thread to handle its signal before
the retry. However, a thread receiving frequent signals may get unlucky
and have a signal pending at every retry. Since the userspace task
doesn't control when it receives signals, there's no easy way for it to
prevent the create_io_thread() failure due to pending signals. The task
may also lack the information necessary to regenerate the canceled SQE.
So always retry the create_io_thread() on the ERESTART* errors,
analogous to what a fork() syscall would do. EAGAIN can occur due to
various persistent conditions such as exceeding RLIMIT_NPROC, so respect
the WORKER_INIT_LIMIT retry limit for EAGAIN errors.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When the core of io_uring was updated to handle completions
consistently and with fixed return codes, the POLL_REMOVE opcode
with updates got slightly broken. If a POLL_ADD is pending and
then POLL_REMOVE is used to update the events of that request, if that
update causes the POLL_ADD to now trigger, then that completion is lost
and a CQE is never posted.
Additionally, ensure that if an update does cause an existing POLL_ADD
to complete, that the completion value isn't always overwritten with
-ECANCELED. For that case, whatever io_poll_add() set the value to
should just be retained.
Cc: stable@vger.kernel.org
Fixes: 97b388d70b ("io_uring: handle completions in the core")
Reported-by: syzbot+641eec6b7af1f62f2b99@syzkaller.appspotmail.com
Tested-by: syzbot+641eec6b7af1f62f2b99@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Added:
support timestamps prior to epoch;
do not overwrite uptodate pages;
disable readahead for compressed files;
setting of dummy blocksize to read boot_block when mounting;
the run_lock initialization when loading $Extend;
initialization of allocated memory before use;
support for the NTFS3_IOC_SHUTDOWN ioctl;
check for minimum alignment when performing direct I/O reads;
check for shutdown in fsync.
Fixed:
mount failure for sparse runs in run_unpack();
use-after-free of sbi->options in cmp_fnames;
KMSAN uninit bug after failed mi_read in mi_format_new;
uninit error after buffer allocation by __getname();
KMSAN uninit-value in ni_create_attr_list;
double free of sbi->options->nls and ownership of fc->fs_private;
incorrect vcn adjustments in attr_collapse_range();
mode update when ACL can be reduced to mode;
memory leaks in add sub record.
Changed:
refactor code, updated terminology, spelling;
do not kmap pages in (de)compression code;
after ntfs_look_free_mft(), code that fails must put mi;
default mount options for "acl" and "prealloc".
Replaced:
use unsafe_memcpy() to avoid memcpy size warning;
ntfs_bio_pages with page cache for compressed files.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEh0DEKNP0I9IjwfWEqbAzH4MkB7YFAmkwF6UACgkQqbAzH4Mk
B7aG3w/9H3iUua41pQWH1g94PW7qhRxyN+2hVvnlJvNRB5bV/rhVHnEEkyxNHjau
I+pknP2KceUrHYQO032b3kuuwab7sMCfVhVQCIzQ1dsp0V3HFwx5IWz8hGiAMrAN
7pIQuUqqG5NxArVM028HWUXOq1myjRWxuwLTEaSYDOT0/Fl+a/nXKhlg+S5Js8K1
FlkorvWdmXKUp3Dp96yzlcmcbqNFCeyfAlUu57KDx1wLjSvzEnCol8VHpSl9myc3
FET6RcsL00jy3Tr3/6xBk6ef+9Eoas6hiafLaHay6jMkzW2hPzVwNgUsGOYQ0IwQ
6jIWiTsAGfi6/GZRVQrEMFr4pvlIlG//uTezfu9pE7ld6wMwYfT4ZvDuHMYvshiw
keRIEeF4oCakut9Cy8OVh0FtUCv/RYaT332rKJhtFZyCnewqwHd29/ihjB9yNnDx
qruFEpjBHOffsbcd0PLDsATutGi8sjd4eNyv1KuRf2xKL3fZ22d79LOzBHRGhAnR
1MyBNo6Q9eF3W8hbf1aMJ0X3O76EVwXxuAWc32EwHn/35XMjb9ajjmn0G1Xeix7+
0N/cdXZO4fiZhj2fOVUuLduHpoNCcaQya0GD6YJ+jgVnFSs0ms/pGDryQa2kooqu
ewav36eEmtU6WOXAxMKVYzok1A+/fMtpDK7cOvW61Vz91X6woCU=
=RGJu
-----END PGP SIGNATURE-----
Merge tag 'ntfs3_for_6.19' of https://github.com/Paragon-Software-Group/linux-ntfs3
Pull ntfs3 updates from Konstantin Komarov:
"New code:
- support timestamps prior to epoch
- do not overwrite uptodate pages
- disable readahead for compressed files
- setting of dummy blocksize to read boot_block when mounting
- the run_lock initialization when loading $Extend
- initialization of allocated memory before use
- support for the NTFS3_IOC_SHUTDOWN ioctl
- check for minimum alignment when performing direct I/O reads
- check for shutdown in fsync
Fixes:
- mount failure for sparse runs in run_unpack()
- use-after-free of sbi->options in cmp_fnames
- KMSAN uninit bug after failed mi_read in mi_format_new
- uninit error after buffer allocation by __getname()
- KMSAN uninit-value in ni_create_attr_list
- double free of sbi->options->nls and ownership of fc->fs_private
- incorrect vcn adjustments in attr_collapse_range()
- mode update when ACL can be reduced to mode
- memory leaks in add sub record
Changes:
- refactor code, updated terminology, spelling
- do not kmap pages in (de)compression code
- after ntfs_look_free_mft(), code that fails must put mft_inode
- default mount options for "acl" and "prealloc"
Replaced:
- use unsafe_memcpy() to avoid memcpy size warning
- ntfs_bio_pages with page cache for compressed files"
* tag 'ntfs3_for_6.19' of https://github.com/Paragon-Software-Group/linux-ntfs3: (26 commits)
fs/ntfs3: check for shutdown in fsync
fs/ntfs3: change the default mount options for "acl" and "prealloc"
fs/ntfs3: Prevent memory leaks in add sub record
fs/ntfs3: out1 also needs to put mi
fs/ntfs3: Fix spelling mistake "recommened" -> "recommended"
fs/ntfs3: update mode in xattr when ACL can be reduced to mode
fs/ntfs3: check minimum alignment for direct I/O
fs/ntfs3: implement NTFS3_IOC_SHUTDOWN ioctl
fs/ntfs3: correct attr_collapse_range when file is too fragmented
ntfs3: fix double free of sbi->options->nls and clarify ownership of fc->fs_private
fs/ntfs3: Initialize allocated memory before use
fs/ntfs3: remove ntfs_bio_pages and use page cache for compressed I/O
ntfs3: avoid memcpy size warning
fs/ntfs3: fix KMSAN uninit-value in ni_create_attr_list
ntfs3: init run lock for extend inode
ntfs: set dummy blocksize to read boot_block when mounting
fs/ntfs3: disable readahead for compressed files
ntfs3: Fix uninit buffer allocated by __getname()
ntfs3: fix uninit memory after failed mi_read in mi_format_new
ntfs3: fix use-after-free of sbi->options in cmp_fnames
...
* Optimize online defragmentation by using folios instead of individual
buffer heads
* Improve error codes stored in the superblock when the journal aborts
* Minor cleanups and clarifications in ext4_map_blocks()
* Add documentation of the casefold and encrypt flags
* Add support for file systems with a blocksize greater than the pagesize
* Improve performance by enabling the caching the fact that an inode does
not have a Posix ACL.
Various Bug Fixes
* Fix false positive compliants from smatch
* Fix error code which is returned by ext4fs_dirhash() when Siphash is
used without the encryption key
* Fix races when writing to inline data files which could trigger a BUG
* Fix potential NULL dereference when there is an corrupt file system with
an extended attribute value stored in a inode
* Fix false positive lockdep report when syzbot uses ext4 and ocfs2 together
* Fix false positive reported by DEPT by adjusting lock annotation
* Avoid a potential BUG_ON in jbd2 when a file system is massively corrupted
* Fix a WARN_ON when superblock is corrupted with a non-NULL terminated
mount options field
* Add check if the userspace passes in a non-NULL terminated mount options
field to EXT4_IOC_SET_TUNE_SB_PARAM
* Fix a potential journal checksum failure whena file system is copied while
it is mounted read-only
* Fix a potential potential orphan file tracking error which only showed
on 32-bit systems
* Fix assertion checks in mballoc (which have to be explicitly enbled by
manually enabling AGGRESSIVE_CHECKS and recompiling)
* Avoid complaining about overly large orphan files created by mke2fs with
with file systems with a 64k block size
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmktFUsACgkQ8vlZVpUN
gaPA/Af/WMIK5fHG687JOQUoAnxJ5A89aWrJM7LXIXhfyy/hQ2x/dp7rRHuis+WQ
1AcB7tRN4EuAx+tU8rBKsh7f+xQRkhdl3FHjxAyZdNaVTS/iYh121lMeSDqBVP0V
tRSk+9DoahueYBJdHwgtFBd7ZHSKF2haqDW1FIYvFZFWZR1NEzNaoB9O4NO5D1dH
RN7nKB/eggOnPelP8FtD83yY3lwCMmxanqZuFCTn9AFcn/o1yo0w+8L+nHKO0n0Z
BMbIMLaJ4oJv6G/4vA99btGJjMHKqBqwNNh+2Gq81eubpkutTpTgma6npUBSKdl3
pfWhHzVaa3vVj1Mxc0j4qb1fzlCH8w==
=abb+
-----END PGP SIGNATURE-----
Merge tag 'ext4_for_linus-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"New features and improvements for the ext4 file system:
- Optimize online defragmentation by using folios instead of
individual buffer heads
- Improve error codes stored in the superblock when the journal
aborts
- Minor cleanups and clarifications in ext4_map_blocks()
- Add documentation of the casefold and encrypt flags
- Add support for file systems with a blocksize greater than the
pagesize
- Improve performance by enabling the caching the fact that an inode
does not have a Posix ACL
Various Bug Fixes:
- Fix false positive complaints from smatch
- Fix error code which is returned by ext4fs_dirhash() when Siphash
is used without the encryption key
- Fix races when writing to inline data files which could trigger a
BUG
- Fix potential NULL dereference when there is an corrupt file system
with an extended attribute value stored in a inode
- Fix false positive lockdep report when syzbot uses ext4 and ocfs2
together
- Fix false positive reported by DEPT by adjusting lock annotation
- Avoid a potential BUG_ON in jbd2 when a file system is massively
corrupted
- Fix a WARN_ON when superblock is corrupted with a non-NULL
terminated mount options field
- Add check if the userspace passes in a non-NULL terminated mount
options field to EXT4_IOC_SET_TUNE_SB_PARAM
- Fix a potential journal checksum failure whena file system is
copied while it is mounted read-only
- Fix a potential potential orphan file tracking error which only
showed on 32-bit systems
- Fix assertion checks in mballoc (which have to be explicitly enbled
by manually enabling AGGRESSIVE_CHECKS and recompiling)
- Avoid complaining about overly large orphan files created by mke2fs
with with file systems with a 64k block size"
* tag 'ext4_for_linus-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (58 commits)
ext4: mark inodes without acls in __ext4_iget()
ext4: enable block size larger than page size
ext4: add checks for large folio incompatibilities when BS > PS
ext4: support verifying data from large folios with fs-verity
ext4: make data=journal support large block size
ext4: support large block size in __ext4_block_zero_page_range()
ext4: support large block size in mpage_prepare_extent_to_map()
ext4: support large block size in mpage_map_and_submit_buffers()
ext4: support large block size in ext4_block_write_begin()
ext4: support large block size in ext4_mpage_readpages()
ext4: rename 'page' references to 'folio' in multi-block allocator
ext4: prepare buddy cache inode for BS > PS with large folios
ext4: support large block size in ext4_mb_init_cache()
ext4: support large block size in ext4_mb_get_buddy_page_lock()
ext4: support large block size in ext4_mb_load_buddy_gfp()
ext4: add EXT4_LBLK_TO_PG and EXT4_PG_TO_LBLK for block/page conversion
ext4: add EXT4_LBLK_TO_B macro for logical block to bytes conversion
ext4: support large block size in ext4_readdir()
ext4: support large block size in ext4_calculate_overhead()
ext4: introduce s_min_folio_order for future BS > PS support
...
- Major withdraw / error handling overhaul based on dlm's new
DLM_RELEASE_RECOVER feature: this allows gfs to treat withdraws like
node failures. Make withdraws asynchronous.
- Fix a bug in commit e4a8b5481c that caused 'df' to remain out of
sync. ('df' is still allowed to go slightly out of sync for short
periods of time.)
- Prevent recusive memory reclaim in gfs2_unstuff_dinode().
- Clean up SDF_JOURNAL_LIVE flag handling.
- Fix remote evict for read-only filesystems.
- Fix a misuse of bio_chain().
- Various other minor cleanups.
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmkvM/oUHGFncnVlbmJh
QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTqFLQ//Ra+FUIMYRrU83xd8OB+rXaFixymk
riPnoXZHrccoRAARc6d/p/nTYmV7FSnTa2jlsUFshsO2jl/1OIRI6xqScL+W5dp9
ToxHKnezOsRmy5tXvomYy9Teo/ngz6dpv6ixVrn7qTGJJ5zpir0ji2fO+xUvCUQq
IVGgvQN0O7ECUmmRXxjkRLhoEdus2Ye4jQkpJmqY9I1+H/0E43HlF2zDgoisTkiH
baNtFkTgE65Hba2xhoTCt3NO17POgqGcRaLVOLv5dqGhLAQ33+jXeYduM6ujbqwk
1FznGf09NRBJVQkyJvbgZELdX+mWXvWf/2kPmW6TXd7phFiO7cByxN2aHHcu/wBp
kh0ZgLFtvjyuR7wiyKf0tAKKPh250YiSkBuJU64dlHha2ZQuME6Z4dlYqBKwvPXf
AwxAu0tB2bAjc2OXvPaHhLHjdpySEWqNZIITbfQPVPonRKed10IqCwlTd+j5/64H
mnZCCxr2p2akrSYbQjl7h1k4LO7eX3cYJ23JoKIjwQ2hJE+Pd4li9+xhQCEQUVyE
ou/ezUKc19Xe7uxG1412PrbmOtIMgKfpq2KW151b1KA3pfCUt7GMvpYlvmnf9Wzs
XGjrQ1p4XdlxzQqe9XTrdO0IrNUNGDcky1Kqishje9SPa8FthwrtTuBSYL8nkcX6
3KRRzlFCRHfM5ng=
=AWK2
-----END PGP SIGNATURE-----
Merge tag 'gfs2-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 updates from Andreas Gruenbacher:
- Major withdraw / error handling overhaul based on dlm's new
DLM_RELEASE_RECOVER feature: this allows gfs to treat withdraws like
node failures. Make withdraws asynchronous
- Fix a bug in commit e4a8b5481c that caused 'df' to remain out of
sync. ('df' is still allowed to go slightly out of sync for short
periods of time)
- Prevent recusive memory reclaim in gfs2_unstuff_dinode()
- Clean up SDF_JOURNAL_LIVE flag handling
- Fix remote evict for read-only filesystems
- Fix a misuse of bio_chain()
- Various other minor cleanups
* tag 'gfs2-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (35 commits)
gfs2: Fix use of bio_chain
gfs2: Clean up SDF_JOURNAL_LIVE flag handling
gfs2: No longer thaw filesystems during a withdraw
gfs2: Withdraw immediately in gfs2_trans_add_meta
gfs2: New gfs2_withdraw_helper
gfs2: Clean up properly during a withdraw
gfs2: Rename gfs2_{gl_dq_holders => withdraw_glocks}
Revert "gfs2: fix infinite loop when checking ail item count before go_inval"
Revert "gfs2: Allow some glocks to be used during withdraw"
Revert "gfs2: Check for log write errors before telling dlm to unlock"
Revert "gfs2: fix a deadlock on withdraw-during-mount"
Revert "gfs2: Force withdraw to replay journals and wait for it to finish" (6/6)
Revert "gfs2: Force withdraw to replay journals and wait for it to finish" (5/6)
Revert "gfs2: Force withdraw to replay journals and wait for it to finish" (4/6)
Revert "gfs2: Force withdraw to replay journals and wait for it to finish" (3/6)
Revert "gfs2: Force withdraw to replay journals and wait for it to finish" (2/6)
Revert "gfs2: Force withdraw to replay journals and wait for it to finish" (1/6)
Revert "gfs2: don't stop reads while withdraw in progress"
gfs2: Rename LM_FLAG_{NOEXP -> RECOVER}
gfs2: Kill gfs2_io_error_bh_wd
...
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmktxl0ACgkQiiy9cAdy
T1HI7AwAtnKyyIWfKwSnsGNTTc3LrT6ucx0DpCw1tqU7TBDLJtNPbQlgzL2rNCiv
cL6TTxn+qWTLfgJChQnypsCpoiQXwtqNBpfxQaoCOcwG+fUaa9G1JXxEmgv/Ob7j
KTrYHrnPS6YCQMDLMrEJh8tnI9jHcTgLtYCtDC+psYRzcbs/8Wfgs1Ek/vKW5Ui1
u0nKYf189KZA9UErTY7NhizyYNkifKD0n+DQPzR1FH6JWusWNRtj9MZPgoEsexXK
NxfDoc8krWwM7O+6X10ER7Snzd9EUtnbgwIqCFhDS6igfz+w+7Hm/XevOhK+DQqX
ApmFZL3n6yEUCnOZTQCqHutlChu1j8gWmEAnoGSmBJ3y99NvNqxBfedGDA8DGgFM
wkX6bxCJjpBMrMeLqVv0QXaXLkRycZASz5zVgb/ly3Zu6DnDlFzRYFUBNvXHb+M/
WFMtzV2Jzd4I09J460b0qU18iuSVxxyJNGVp6xzo++zdNUSt07UYXK/R7NCIaCXO
buCHlQll
=QmyI
-----END PGP SIGNATURE-----
Merge tag 'v6.19-rc-smb-fixes' of git://git.samba.org/ksmbd
Pull smb client and server updates from Steve French:
- server fixes:
- IPC use after free locking fix
- fix locking bug in delete paths
- fix use after free in disconnect
- fix underflow in locking check
- error mapping improvement
- socket listening improvement
- return code mapping fixes
- crypto improvements (use default libraries)
- cleanup patches:
- netfs
- client checkpatch cleanup
- server cleanup
- move server/client duplicate code to common code
- fix some defines to better match protocol specification
- smbdirect (RDMA) fixes
- client debugging improvements for leases
* tag 'v6.19-rc-smb-fixes' of git://git.samba.org/ksmbd: (44 commits)
cifs: Use netfs_alloc/free_folioq_buffer()
smb: client: show smb lease key in open_dirs output
smb: client: show smb lease key in open_files output
ksmbd: ipc: fix use-after-free in ipc_msg_send_request
smb: client: relax WARN_ON_ONCE(SMBDIRECT_SOCKET_*) checks in recv_done() and smbd_conn_upcall()
smb: server: relax WARN_ON_ONCE(SMBDIRECT_SOCKET_*) checks in recv_done() and smb_direct_cm_handler()
smb: smbdirect: introduce SMBDIRECT_CHECK_STATUS_{WARN,DISCONNECT}()
smb: smbdirect: introduce SMBDIRECT_DEBUG_ERR_PTR() helper
ksmbd: vfs: fix race on m_flags in vfs_cache
ksmbd: Replace strcpy + strcat to improve convert_to_nt_pathname
smb: move FILE_SYSTEM_ATTRIBUTE_INFO to common/fscc.h
ksmbd: implement error handling for STATUS_INFO_LENGTH_MISMATCH in smb server
ksmbd: fix use-after-free in ksmbd_tree_connect_put under concurrency
ksmbd: server: avoid busy polling in accept loop
smb: move create_durable_reconn to common/smb2pdu.h
smb: fix some warnings reported by scripts/checkpatch.pl
smb: do some cleanups
smb: move FILE_SYSTEM_SIZE_INFO to common/fscc.h
smb: move some duplicate struct definitions to common/fscc.h
smb: move list of FileSystemAttributes to common/fscc.h
...
Signed-off-by: Carlos Maiolino <cem@kernel.org>
-----BEGIN PGP SIGNATURE-----
iJUEABMJAB0WIQSmtYVZ/MfVMGUq1GNcsMJ8RxYuYwUCaS1/jAAKCRBcsMJ8RxYu
Y+jfAYCF6akuDB6lwjCCIhiRk2tJjVeaTF+NVdztSBwFOnbU7uVOznMUooo4PCN4
fshIslYBgP0s9DBa5DAP//7PE8IwIbuftrly6UmQMvhJFLg5ZRZaQCSKJK77JhjJ
REn0g8+Ifg==
=wHN9
-----END PGP SIGNATURE-----
Merge tag 'xfs-merge-6.19' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs updates from Carlos Maiolino:
"There are no major changes in xfs. This contains mostly some code
cleanups, a few bug fixes and documentation update. Highlights are:
- Quota locking cleanup
- Getting rid of old xlog_in_core_2_t type"
* tag 'xfs-merge-6.19' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (33 commits)
docs: remove obsolete links in the xfs online repair documentation
xfs: move some code out of xfs_iget_recycle
xfs: use zi more in xfs_zone_gc_mount
xfs: remove the unused bv field in struct xfs_gc_bio
xfs: remove xarray mark for reclaimable zones
xfs: remove the xlog_in_core_t typedef
xfs: remove l_iclog_heads
xfs: remove the xlog_rec_header_t typedef
xfs: remove xlog_in_core_2_t
xfs: remove a very outdated comment from xlog_alloc_log
xfs: cleanup xlog_alloc_log a bit
xfs: don't use xlog_in_core_2_t in struct xlog_in_core
xfs: add a on-disk log header cycle array accessor
xfs: add a XLOG_CYCLE_DATA_SIZE constant
xfs: reduce ilock roundtrips in xfs_qm_vop_dqalloc
xfs: move xfs_dquot_tree calls into xfs_qm_dqget_cache_{lookup,insert}
xfs: move quota locking into xrep_quota_item
xfs: move quota locking into xqcheck_commit_dquot
xfs: move q_qlock locking into xqcheck_compare_dquot
xfs: move q_qlock locking into xchk_quota_item
...
- Fix a WARNING caused by a recent FSDAX misdetection regression
- Fix the filesystem stacking limit for file-backed mounts
- Print more informative diagnostics on decompression errors
- Switch the on-disk definition `erofs_fs.h` to the MIT license
- Minor cleanups
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEQ0A6bDUS9Y+83NPFUXZn5Zlu5qoFAmktfSQRHHhpYW5nQGtl
cm5lbC5vcmcACgkQUXZn5Zlu5qoNlw/+K4F9wH6re9fXqrvkyCB96W66LtBMvbiU
VYObkonp/SJuqwBzg+V9HpJvR+rMVhaWvpWFc56sCi3IHGyYxQXh5kMfb0AZlbjQ
VoeiYqGzyyLoj0HAGOt2dwEfRR51ZNvhNU9WdOjDj6pJFAnQfPyAJTb3MFZdKmJv
fqHpr16AK5duMjK5fS+bTA5khq6teyB7RD91cq3aZPW9CUTnltA6N4nIMes2J378
LvDnXebx+BCsKZaNv39LzLlMoCpCZsQeJVMxaka4d55+BHvLfIjHInisWOAS4faC
wcHU5NVrEdvfU+CVJ+bg7KJfvAWYxWJkm9QBLAlSi6AfBlHllhOvLOwxjk2O/cHZ
95F/zcZ2rA0iFHHAljMeIY/01+UbWlAeksvtQEm9TrS8B1sRGCagOHaQD/UylOq0
o+xwmUAP98JxClLPttChWC/wpu7800O0jC4gpu/VGMdFcsbSTLIk0ZujMl0GS5os
UpwIYD0WHuyt9e16+p602HBbdtx1ayEWW45c1F5FBy7j3jbG02BIFqKiVCf9wSgQ
ONGZoCQRpxftLjc9SL3Z+bX2PfaZwYHNf6DuFjQbxthro978qa5350dkBLVAFoaq
DgJ6NZ+Zun9NqInIBMTqJmlj0pr2ElCWPFbftAb2SyT9G2j9zIuq6XjXtFU9+4I6
SrGCKxOlzRM=
=vdh2
-----END PGP SIGNATURE-----
Merge tag 'erofs-for-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs updates from Gao Xiang:
- Fix a WARNING caused by a recent FSDAX misdetection regression
- Fix the filesystem stacking limit for file-backed mounts
- Print more informative diagnostics on decompression errors
- Switch the on-disk definition `erofs_fs.h` to the MIT license
- Minor cleanups
* tag 'erofs-for-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: switch on-disk header `erofs_fs.h` to MIT license
erofs: get rid of raw bi_end_io() usage
erofs: enable error reporting for z_erofs_fixup_insize()
erofs: enable error reporting for z_erofs_stream_switch_bufs()
erofs: improve Zstd, LZMA and DEFLATE error strings
erofs: improve decompression error reporting
erofs: tidy up z_erofs_lz4_handle_overlap()
erofs: limit the level of fs stacking for file-backed mounts
erofs: correct FSDAX detection
- hfs/hfsplus: move on-disk layout declarations into hfs_common.h
- hfsplus: fix volume corruption issue for generic/101
- hfsplus: introduce KUnit tests for HFS+ string operations
- hfs: introduce KUnit tests for HFS string operations
- hfsplus: fix volume corruption issue for generic/073
- hfsplus: Verify inode mode when loading from disk
- hfsplus: fix volume corruption issue for generic/070
- hfs/hfsplus: prevent getting negative values of offset/length
- hfsplus: fix missing hfs_bnode_get() in __hfs_bnode_create
- hfs: fix potential use after free in hfs_correct_next_unused_CNID()
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQT4wVoLCG92poNnMFAhI4xTh21NnQUCaSnmHAAKCRAhI4xTh21N
nWt0AQDQ4hDGj4VkHNzWWGfh6GL+RhSwKgEzf897tJlUZDewogD/TE9bZnzOKjOw
YhWPXHEH4xy9+QaDXRgXk2DnWS+YKwg=
=mAL6
-----END PGP SIGNATURE-----
Merge tag 'hfs-v6.19-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/vdubeyko/hfs
Pull hfs/hfsplus updates from Viacheslav Dubeyko:
"Several fixes for syzbot reported issues, HFS/HFS+ fixes of xfstests
failures, Kunit-based unit-tests introduction, and code cleanup:
- Dan Carpenter fixed a potential use-after-free issue in
hfs_correct_next_unused_CNID() method. Tetsuo Handa has made nice
fix of syzbot reported issue related to incorrect inode->i_mode
management if volume has been corrupted somehow. Yang Chenzhi has
made really good fix of potential race condition in
__hfs_bnode_create() method for HFS+ file system.
- Several fixes to xfstests failures. Particularly, generic/070,
generic/073, and generic/101 test-cases finish successfully for the
case of HFS+ file system right now.
- HFS and HFS+ drivers share multiple structures of on-disk layout
declarations. Some structures are used without any change. However,
we had two independent declarations of the same structures in HFS
and HFS+ drivers.
The on-disk layout declarations have been moved into
include/linux/hfs_common.h with the goal to exclude the
declarations duplication and to keep the HFS/HFS+ on-disk layout
declarations in one place.
Also, this patch prepares the basis for creating a hfslib that can
aggregate common functionality without necessity to duplicate the
same code in HFS and HFS+ drivers.
- HFS/HFS+ really need unit-tests because of multiple xfstests
failures. The first two patches introduce Kunit-based unit-tests
for the case string operations in HFS/HFS+ file system drivers"
* tag 'hfs-v6.19-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/vdubeyko/hfs:
hfs/hfsplus: move on-disk layout declarations into hfs_common.h
hfsplus: fix volume corruption issue for generic/101
hfsplus: introduce KUnit tests for HFS+ string operations
hfs: introduce KUnit tests for HFS string operations
hfsplus: fix volume corruption issue for generic/073
hfsplus: Verify inode mode when loading from disk
hfsplus: fix volume corruption issue for generic/070
hfs/hfsplus: prevent getting negative values of offset/length
hfsplus: fix missing hfs_bnode_get() in __hfs_bnode_create
hfs: fix potential use after free in hfs_correct_next_unused_CNID()
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmko/DUACgkQxWXV+ddt
WDtyCw//UaFOTX/k72HgA1n2MWfegWbWyD+OGNbGosoZljKOrAe/mRnjXTyF9lyW
8GDzGvJzF4Tkl5lyuGyiequlrO2F3veTpwHo94xnBTOYCeiTpMqTN/e/5SkasBpN
4YlWq7OGYR4hwghRvZpaW7nsmVCKDLIlZVkH77x9Bmvx0NLO24EJlEZusQT4zYew
ntC/i9x3DW0ZxYyfRhFIFvk9JUUdgXfxJ6dNexz0zi3dKUSUIR9hI0J9Nwl++1cF
SgjAzbtO064htWoCvsKykgA6YGbJCZjw8XO8D2eJonkN24VbqSMaY44TPXmCMLVs
ZXw871jV2E/urfWhRNdxv/kJdCFudPk0qXG5ZtfHO4UUwS/nZ+qAig+LHawgAOCJ
9CgWy4zrfiYCqULRuqF1wzWu/z22++zIlZC552VAZd1RQ+JjqJY/aje4xhY5nUF4
n1uVBReZaI9sH3jJOsMWpwLMptbhpH9RZp3QPgqZlUHo6GtPJJmNKfw8KgMAhZ7L
wf7iy6v9yo+7VZ2ACwu2qJ+lZRxbZ0yvCnFatN3O5G1O0kkIrZFUM3MwdKtufZ0u
LHWkGfoaq7zR6E6DhIaxIhiTTXMlOfLTikNKgBUO3NEdrRZwrDhr7K07S25jFxSx
ZCNV6OdSCeziShPqT0ntcwecnJ41/kOcm13732NHF+QgzMK5LrI=
=rO4x
-----END PGP SIGNATURE-----
Merge tag 'for-6.19-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
"Features:
- shutdown ioctl support (needs CONFIG_BTRFS_EXPERIMENTAL for now):
- set filesystem state as being shut down (also named going down
in other filesystems), where all active operations return EIO
and this cannot be changed until unmount
- pending operations are attempted to be finished but error
messages may still show up depending on where exactly the
shutdown happened
- scrub (and device replace) vs suspend/hibernate:
- a running scrub will prevent suspend, which can be annoying as
suspend is an immediate request and scrub is not critical
- filesystem freezing before suspend was not sufficient as the
problem was in process freezing
- behaviour change: on suspend scrub and device replace are
cancelled, where scrub can record the last state and continue
from there; the device replace has to be restarted from the
beginning
- zone stats exported in sysfs, from the perspective of the
filesystem this includes active, reclaimable, relocation etc zones
Performance:
- improvements when processing space reservation tickets by
optimizing locking and shrinking critical sections, cumulative
improvements in lockstat numbers show +15%
Notable fixes:
- use vmalloc fallback when allocating bios as high order allocations
can happen with wide checksums (like sha256)
- scrub will always track the last position of progress so it's not
starting from zero after an error
Core:
- under experimental config, checksum calculations are offloaded to
process context, simplifies locking and allows to remove
compression write worker kthread(s):
- speed improvement in direct IO throughput with buffered IO
fallback is +15% when not offloaded but this is more related to
internal crypto subsystem improvements
- this will be probably default in the future removing the sysfs
tunable
- (experimental) block size > page size updates:
- support more operations when not using large folios (encoded
read/write and send)
- raid56
- more preparations for fscrypt support
Other:
- more conversions to auto-cleaned variables
- parameter cleanups and removals
- extended warning fixes
- improved printing of structured values like keys
- lots of other cleanups and refactoring"
* tag 'for-6.19-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (147 commits)
btrfs: remove unnecessary inode key in btrfs_log_all_parents()
btrfs: remove redundant zero/NULL initializations in btrfs_alloc_root()
btrfs: remaining BTRFS_PATH_AUTO_FREE conversions
btrfs: send: do not allocate memory for xattr data when checking it exists
btrfs: send: add unlikely to all unexpected overflow checks
btrfs: reduce arguments to btrfs_del_inode_ref_in_log()
btrfs: remove root argument from btrfs_del_dir_entries_in_log()
btrfs: use test_and_set_bit() in btrfs_delayed_delete_inode_ref()
btrfs: don't search back for dir inode item in INO_LOOKUP_USER
btrfs: don't rewrite ret from inode_permission
btrfs: add orig_logical to btrfs_bio for encryption
btrfs: disable verity on encrypted inodes
btrfs: disable various operations on encrypted inodes
btrfs: remove redundant level reset in btrfs_del_items()
btrfs: simplify leaf traversal after path release in btrfs_next_old_leaf()
btrfs: optimize balance_level() path reference handling
btrfs: factor out root promotion logic into promote_child_to_root()
btrfs: raid56: remove the "_step" infix
btrfs: raid56: enable bs > ps support
btrfs: raid56: prepare finish_parity_scrub() to support bs > ps cases
...
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmktsoMQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpuiUD/92eivL+HmOh10o8trvxajB0yuyqfSjHHrL
g+xUbF4s9bgAg/v+Upx7sTY8jdrTcMjKov+G9T6uPvBMqVmeVdZckA1PSAKQaIX1
Zb7nS2LnO7F6JKbwpwVrrIaqVbcz8MfGIIMbN4yNNEOMCwdIVMp4fo7trPBknJNx
WddNSGUFlIF3NqSI8AflSS/pYnGm+McfBHXBpJAKipI3iquKKubHv+FX9kLp7Tn4
x27ZoCWOHglIBTJXU0mmXCVsLF8b5BA8DQcGtT62azb8+l0cRTkaHY0DFAv5BvhG
TqcjrKdmR0cGSNt+nEmFrujE3atBRl0G0kiHA80YgA1MTtYzdPaUVOUtM9k/rEem
gpiGMDpBypdxyJAyijPSaVJdfcg0psOlYbhIR4N2wbj/dq8268h+cWzXlF1spgVt
/7ygoaCmfMNbTy9rKThTjH+es787AVXUAXXaPHhIFsnCKUj8xQl4pT7XltmgYeWx
1/XD1NEJeLHHog5upAVlGX3H5tbvP1nIICxbZa9mDOJX1rwxxI7/s/RucPjbNXuY
AiaKPTfxtB9+Ihd2HrJ/76RVMkckcOBc4GIKoFfwuKDbcdLXQ5FcZCmVRoI1V9SV
KsH7JBgihLwR9XWKE1vp9+CBNe1Qlu3K4IjG/E7CNLeuDntIBu73ihqGP/DqV6Bq
RX1Dc0OyAQ==
=m22w
-----END PGP SIGNATURE-----
Merge tag 'for-6.19/block-20251201' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull block updates from Jens Axboe:
- Fix head insertion for mq-deadline, a regression from when priority
support was added
- Series simplifying and improving the ublk user copy code
- Various ublk related cleanups
- Fixup REQ_NOWAIT handling in loop/zloop, clearing NOWAIT when the
request is punted to a thread for handling
- Merge and then later revert loop dio nowait support, as it ended up
causing excessive stack usage for when the inline issue code needs to
dip back into the full file system code
- Improve auto integrity code, making it less deadlock prone
- Speedup polled IO handling, but manually managing the hctx lookups
- Fixes for blk-throttle for SSD devices
- Small series with fixes for the S390 dasd driver
- Add support for caching zones, avoiding unnecessary report zone
queries
- MD pull requests via Yu:
- fix null-ptr-dereference regression for dm-raid0
- fix IO hang for raid5 when array is broken with IO inflight
- remove legacy 1s delay to speed up system shutdown
- change maintainer's email address
- data can be lost if array is created with different lbs devices,
fix this problem and record lbs of the array in metadata
- fix rcu protection for md_thread
- fix mddev kobject lifetime regression
- enable atomic writes for md-linear
- some cleanups
- bcache updates via Coly
- remove useless discard and cache device code
- improve usage of per-cpu workqueues
- Reorganize the IO scheduler switching code, fixing some lockdep
reports as well
- Improve the block layer P2P DMA support
- Add support to the block tracing code for zoned devices
- Segment calculation improves, and memory alignment flexibility
improvements
- Set of prep and cleanups patches for ublk batching support. The
actual batching hasn't been added yet, but helps shrink down the
workload of getting that patchset ready for 6.20
- Fix for how the ps3 block driver handles segments offsets
- Improve how block plugging handles batch tag allocations
- nbd fixes for use-after-free of the configuration on device clear/put
- Set of improvements and fixes for zloop
- Add Damien as maintainer of the block zoned device code handling
- Various other fixes and cleanups
* tag 'for-6.19/block-20251201' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (162 commits)
block/rnbd: correct all kernel-doc complaints
blk-mq: use queue_hctx in blk_mq_map_queue_type
md: remove legacy 1s delay in md_notify_reboot
md/raid5: fix IO hang when array is broken with IO inflight
md: warn about updating super block failure
md/raid0: fix NULL pointer dereference in create_strip_zones() for dm-raid
sbitmap: fix all kernel-doc warnings
ublk: add helper of __ublk_fetch()
ublk: pass const pointer to ublk_queue_is_zoned()
ublk: refactor auto buffer register in ublk_dispatch_req()
ublk: add `union ublk_io_buf` with improved naming
ublk: add parameter `struct io_uring_cmd *` to ublk_prep_auto_buf_reg()
kfifo: add kfifo_alloc_node() helper for NUMA awareness
blk-mq: fix potential uaf for 'queue_hw_ctx'
blk-mq: use array manage hctx map instead of xarray
ublk: prevent invalid access with DEBUG
s390/dasd: Use scnprintf() instead of sprintf()
s390/dasd: Move device name formatting into separate function
s390/dasd: Remove unnecessary debugfs_create() return checks
s390/dasd: Fix gendisk parent after copy pair swap
...
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmktsm0QHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpiLvD/0dptgeJyLHKchOtRHzi/UvtM/EuNFKJrvI
LBWCyIMjygxsVfPR41Lave9SE3UpcavF8Mg/EddasTci8VlMcDF8zPxWLb289Lz2
tkp/wOVuyYmDhNXKmKNW59NOPTd0NosEJFTZI4VhMudwx+UtAHELJGfBWW5hRyQB
Md+UwZ2+J9HbYd19mToaDFxz7jpIPLEE4BYUGtljveRUdpnxhyFGGUS2+CQXZt/5
lnRvJmmEv4nSGH9ZRksix1xnV6KvJM0UwYQhrWvXhgwyiKu47zG7ONpd39KqoaRw
Fw+6zZd0t7nyyuZkk15cKNnBLnjilnsCzmdcPq0Cuvkmbf6y1hlhEQQTGWXTKfJx
zCZxEZcnCC4wL0CBQjZjS38AEMfH2p76M/36+NTWtlYCibY7qUtd9ndpUr49BYGo
o4qfT0HMpI1PHuUvpZwpMcf4OX5qvtLmavT9vt78uqmtM+Aryzzuy3bI3S2SGjNe
if/cNHnZc8Z06hUqdEit5NW+lYzj642AoF/j7qH9ADDH+VXRWaCdK/iI8tPaEpDV
Rw6j442eVugS5tDPoTjdO8jsJ9+OCNNV1t/Jxy+Or+zrGdq7lfg4mnzEia1/izy5
8MnSubRy6LEd+I5PnK/9y9mPIwFMIFgULi+mUjucAhJjRj5beiG74eR6+jBAdyp1
GhFvN6fwdw==
=4g/f
-----END PGP SIGNATURE-----
Merge tag 'for-6.19/io_uring-20251201' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull io_uring updates from Jens Axboe:
- Unify how task_work cancelations are detected, placing it in the
task_work running state rather than needing to check the task state
- Series cleaning up and moving the cancelation code to where it
belongs, in cancel.c
- Cleanup of waitid and futex argument handling
- Add support for mixed sized SQEs. 6.18 added support for mixed sized
CQEs, improving flexibility and efficiency of workloads that need big
CQEs. This adds similar support for SQEs, where the occasional need
for a 128b SQE doesn't necessitate having all SQEs be 128b in size
- Introduce zcrx and SQ/CQ layout queries. The former returns what zcrx
features are available. And both return the ring size information to
help with allocation size calculation for user provided rings like
IORING_SETUP_NO_MMAP and IORING_MEM_REGION_TYPE_USER
- Zcrx updates for 6.19. It includes a bunch of small patches,
IORING_REGISTER_ZCRX_CTRL and RQ flushing and David's work on sharing
zcrx b/w multiple io_uring instances
- Series cleaning up ring initializations, notable deduplicating ring
size and offset calculations. It also moves most of the checking
before doing any allocations, making the code simpler
- Add support for getsockname and getpeername, which is mostly a
trivial hookup after a bit of refactoring on the networking side
- Various fixes and cleanups
* tag 'for-6.19/io_uring-20251201' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (68 commits)
io_uring: Introduce getsockname io_uring cmd
socket: Split out a getsockname helper for io_uring
socket: Unify getsockname and getpeername implementation
io_uring/query: drop unused io_handle_query_entry() ctx arg
io_uring/kbuf: remove obsolete buf_nr_pages and update comments
io_uring/register: use correct location for io_rings_layout
io_uring/zcrx: share an ifq between rings
io_uring/zcrx: add io_fill_zcrx_offsets()
io_uring/zcrx: export zcrx via a file
io_uring/zcrx: move io_zcrx_scrub() and dependencies up
io_uring/zcrx: count zcrx users
io_uring/zcrx: add sync refill queue flushing
io_uring/zcrx: introduce IORING_REGISTER_ZCRX_CTRL
io_uring/zcrx: elide passing msg flags
io_uring/zcrx: use folio_nr_pages() instead of shift operation
io_uring/zcrx: convert to use netmem_desc
io_uring/query: introduce rings info query
io_uring/query: introduce zcrx query
io_uring: move cq/sq user offset init around
io_uring: pre-calculate scq layout
...
Core & protocols
----------------
- Replace busylock at the Tx queuing layer with a lockless list. Resulting
in a 300% (4x) improvement on heavy TX workloads, sending twice the
number of packets per second, for half the cpu cycles.
- Allow constantly busy flows to migrate to a more suitable CPU/NIC
queue. Normally we perform queue re-selection when flow comes out
of idle, but under extreme circumstances the flows may be constantly
busy. Add sysctl to allow periodic rehashing even if it'd risk packet
reordering.
- Optimize the NAPI skb cache, make it larger, use it in more paths.
- Attempt returning Tx skbs to the originating CPU (like we already did
for Rx skbs).
- Various data structure layout and prefetch optimizations from Eric.
- Remove ktime_get() from the recvmsg() fast path, ktime_get() is sadly
quite expensive on recent AMD machines.
- Extend threaded NAPI polling to allow the kthread busy poll for packets.
- Make MPTCP use Rx backlog processing. This lowers the lock pressure,
improving the Rx performance.
- Support memcg accounting of MPTCP socket memory.
- Allow admin to opt sockets out of global protocol memory accounting
(using a sysctl or BPF-based policy). The global limits are a poor fit
for modern container workloads, where limits are imposed using cgroups.
- Improve heuristics for when to kick off AF_UNIX garbage collection.
- Allow users to control TCP SACK compression, and default to 33% of RTT.
- Add tcp_rcvbuf_low_rtt sysctl to let datacenter users avoid unnecessarily
aggressive rcvbuf growth and overshot when the connection RTT is low.
- Preserve skb metadata space across skb_push / skb_pull operations.
- Support for IPIP encapsulation in the nftables flowtable offload.
- Support appending IP interface information to ICMP messages (RFC 5837).
- Support setting max record size in TLS (RFC 8449).
- Remove taking rtnl_lock from RTM_GETNEIGHTBL and RTM_SETNEIGHTBL.
- Use a dedicated lock (and RCU) in MPLS, instead of rtnl_lock.
- Let users configure the number of write buffers in SMC.
- Add new struct sockaddr_unsized for sockaddr of unknown length,
from Kees.
- Some conversions away from the crypto_ahash API, from Eric Biggers.
- Some preparations for slimming down struct page.
- YAML Netlink protocol spec for WireGuard.
- Add a tool on top of YAML Netlink specs/lib for reporting commonly
computed derived statistics and summarized system state.
Driver API
----------
- Add CAN XL support to the CAN Netlink interface.
- Add uAPI for reporting PHY Mean Square Error (MSE) diagnostics,
as defined by the OPEN Alliance's "Advanced diagnostic features
for 100BASE-T1 automotive Ethernet PHYs" specification.
- Add DPLL phase-adjust-gran pin attribute (and implement it in zl3073x).
- Refactor xfrm_input lock to reduce contention when NIC offloads IPsec
and performs RSS.
- Add info to devlink params whether the current setting is the default
or a user override. Allow resetting back to default.
- Add standard device stats for PSP crypto offload.
- Leverage DSA frame broadcast to implement simple HSR frame duplication
for a lot of switches without dedicated HSR offload.
- Add uAPI defines for 1.6Tbps link modes.
Device drivers
--------------
- Add Motorcomm YT921x gigabit Ethernet switch support.
- Add MUCSE driver for N500/N210 1GbE NIC series.
- Convert drivers to support dedicated ops for timestamping control,
and away from the direct IOCTL handling. While at it support GET
operations for PHY timestamping.
- Add (and convert most drivers to) a dedicated ethtool callback
for reading the Rx ring count.
- Significant refactoring efforts in the STMMAC driver, which supports
Synopsys turn-key MAC IP integrated into a ton of SoCs.
- Ethernet high-speed NICs:
- Broadcom (bnxt):
- support PPS in/out on all pins
- Intel (100G, ice, idpf):
- ice: implement standard ethtool and timestamping stats
- i40e: support setting the max number of MAC addresses per VF
- iavf: support RSS of GTP tunnels for 5G and LTE deployments
- nVidia/Mellanox (mlx5):
- reduce downtime on interface reconfiguration
- disable being an XDP redirect target by default (same as other
drivers) to avoid wasting resources if feature is unused
- Meta (fbnic):
- add support for Linux-managed PCS on 25G, 50G, and 100G links
- Wangxun:
- support Rx descriptor merge, and Tx head writeback
- support Rx coalescing offload
- support 25G SPF and 40G QSFP modules
- Ethernet virtual:
- Google (gve):
- allow ethtool to configure rx_buf_len
- implement XDP HW RX Timestamping support for DQ descriptor format
- Microsoft vNIC (mana):
- support HW link state events
- handle hardware recovery events when probing the device
- Ethernet NICs consumer, and embedded:
- usbnet: add support for Byte Queue Limits (BQL)
- AMD (amd-xgbe):
- add device selftests
- NXP (enetc):
- add i.MX94 support
- Broadcom integrated MACs (bcmgenet, bcmasp):
- bcmasp: add support for PHY-based Wake-on-LAN
- Broadcom switches (b53):
- support port isolation
- support BCM5389/97/98 and BCM63XX ARL formats
- Lantiq/MaxLinear switches:
- support bridge FDB entries on the CPU port
- use regmap for register access
- allow user to enable/disable learning
- support Energy Efficient Ethernet
- support configuring RMII clock delays
- add tagging driver for MaxLinear GSW1xx switches
- Synopsys (stmmac):
- support using the HW clock in free running mode
- add Eswin EIC7700 support
- add Rockchip RK3506 support
- add Altera Agilex5 support
- Cadence (macb):
- cleanup and consolidate descriptor and DMA address handling
- add EyeQ5 support
- TI:
- icssg-prueth: support AF_XDP
- Airoha access points:
- add missing Ethernet stats and link state callback
- add AN7583 support
- support out-of-order Tx completion processing
- Power over Ethernet:
- pd692x0: preserve PSE configuration across reboots
- add support for TPS23881B devices
- Ethernet PHYs:
- Open Alliance OATC14 10BASE-T1S PHY cable diagnostic support
- Support 50G SerDes and 100G interfaces in Linux-managed PHYs
- micrel:
- support for non PTP SKUs of lan8814
- enable in-band auto-negotiation on lan8814
- realtek:
- cable testing support on RTL8224
- interrupt support on RTL8221B
- motorcomm: support for PHY LEDs on YT853
- microchip: support for LAN867X Rev.D0 PHYs w/ SQI and cable diag
- mscc: support for PHY LED control
- CAN drivers:
- m_can: add support for optional reset and system wake up
- remove can_change_mtu() obsoleted by core handling
- mcp251xfd: support GPIO controller functionality
- Bluetooth:
- add initial support for PASTa
- WiFi:
- split ieee80211.h file, it's way too big
- improvements in VHT radiotap reporting, S1G, Channel Switch
Announcement handling, rate tracking in mesh networks
- improve multi-radio monitor mode support, and add a cfg80211 debugfs
interface for it
- HT action frame handling on 6 GHz
- initial chanctx work towards NAN
- MU-MIMO sniffer improvements
- WiFi drivers:
- RealTek (rtw89):
- support USB devices RTL8852AU and RTL8852CU
- initial work for RTL8922DE
- improved injection support
- Intel:
- iwlwifi: new sniffer API support
- MediaTek (mt76):
- WED support for >32-bit DMA
- airoha NPU support
- regdomain improvements
- continued WiFi7/MLO work
- Qualcomm/Atheros:
- ath10k: factory test support
- ath11k: TX power insertion support
- ath12k: BSS color change support
- ath12k: statistics improvements
- brcmfmac: Acer A1 840 tablet quirk
- rtl8xxxu: 40 MHz connection fixes/support
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmkveRQACgkQMUZtbf5S
IrvY7A/+Nb0o4BxLHjPkAl1m3t3q2d0Y29B7SNkwnwEtxAV8EkNeZ3GWrdtDnTQY
MYhmc7LEzvz8/lihapr7UJkcokzSASUV54hbez5jDBKC8EEoyUk8FdWDPerwlcRI
zmCFNAVFyh9GX8i7wcrzKbDTHT5+GZLbSlGl9U5mhLsDdRlJgH7d8PJ7vWcmtLFY
XN0paDyaeHfCl8wReWNAYx4C/I0ODOvlscpO0tnAKhB0ngJbQCKY2t6tn3rOYdif
ZSQ5KwVRnJtQ4fYOFMOy9+FSCjVXtyrxF8KLxD+mqom2ZhmO00UpOMl09tqhq3uT
WnvwoHUVBt6F+iITHwg5kMgIDPUq1kpUvL4S4UbVSuUm9ZKD+4KRU2ZHRBYMx+MU
bsqmtY8/IULClUoRz+tZhltA8eb0NEqNZE2JPOFDiJHn1YiCCkFwxibhir893oM3
sB7x65D7LQI2ty2BBGVGYnwYDPtyaxOA/s3WTwPvLEi3+Y/TGNIIrS9lBLA4U+Yr
Gi93WQGVjttMmVyaHgXBUGmi3L52hvolm0AZ8zSRGrnIEpecjhly2KfYuaOzuxXC
IHEQ6AFLdRh6JzafXGb/mQwGCHNmhwsY8A49i94fakWQamaL/L6A+1dyPu4LXMqi
NwqCmlVb/LKGlfNG+V4wT27srJ+yBA2Vk3tpR1sZQQytFh0LKHI=
=UoDR
-----END PGP SIGNATURE-----
Merge tag 'net-next-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core & protocols:
- Replace busylock at the Tx queuing layer with a lockless list.
Resulting in a 300% (4x) improvement on heavy TX workloads, sending
twice the number of packets per second, for half the cpu cycles.
- Allow constantly busy flows to migrate to a more suitable CPU/NIC
queue.
Normally we perform queue re-selection when flow comes out of idle,
but under extreme circumstances the flows may be constantly busy.
Add sysctl to allow periodic rehashing even if it'd risk packet
reordering.
- Optimize the NAPI skb cache, make it larger, use it in more paths.
- Attempt returning Tx skbs to the originating CPU (like we already
did for Rx skbs).
- Various data structure layout and prefetch optimizations from Eric.
- Remove ktime_get() from the recvmsg() fast path, ktime_get() is
sadly quite expensive on recent AMD machines.
- Extend threaded NAPI polling to allow the kthread busy poll for
packets.
- Make MPTCP use Rx backlog processing. This lowers the lock
pressure, improving the Rx performance.
- Support memcg accounting of MPTCP socket memory.
- Allow admin to opt sockets out of global protocol memory accounting
(using a sysctl or BPF-based policy). The global limits are a poor
fit for modern container workloads, where limits are imposed using
cgroups.
- Improve heuristics for when to kick off AF_UNIX garbage collection.
- Allow users to control TCP SACK compression, and default to 33% of
RTT.
- Add tcp_rcvbuf_low_rtt sysctl to let datacenter users avoid
unnecessarily aggressive rcvbuf growth and overshot when the
connection RTT is low.
- Preserve skb metadata space across skb_push / skb_pull operations.
- Support for IPIP encapsulation in the nftables flowtable offload.
- Support appending IP interface information to ICMP messages (RFC
5837).
- Support setting max record size in TLS (RFC 8449).
- Remove taking rtnl_lock from RTM_GETNEIGHTBL and RTM_SETNEIGHTBL.
- Use a dedicated lock (and RCU) in MPLS, instead of rtnl_lock.
- Let users configure the number of write buffers in SMC.
- Add new struct sockaddr_unsized for sockaddr of unknown length,
from Kees.
- Some conversions away from the crypto_ahash API, from Eric Biggers.
- Some preparations for slimming down struct page.
- YAML Netlink protocol spec for WireGuard.
- Add a tool on top of YAML Netlink specs/lib for reporting commonly
computed derived statistics and summarized system state.
Driver API:
- Add CAN XL support to the CAN Netlink interface.
- Add uAPI for reporting PHY Mean Square Error (MSE) diagnostics, as
defined by the OPEN Alliance's "Advanced diagnostic features for
100BASE-T1 automotive Ethernet PHYs" specification.
- Add DPLL phase-adjust-gran pin attribute (and implement it in
zl3073x).
- Refactor xfrm_input lock to reduce contention when NIC offloads
IPsec and performs RSS.
- Add info to devlink params whether the current setting is the
default or a user override. Allow resetting back to default.
- Add standard device stats for PSP crypto offload.
- Leverage DSA frame broadcast to implement simple HSR frame
duplication for a lot of switches without dedicated HSR offload.
- Add uAPI defines for 1.6Tbps link modes.
Device drivers:
- Add Motorcomm YT921x gigabit Ethernet switch support.
- Add MUCSE driver for N500/N210 1GbE NIC series.
- Convert drivers to support dedicated ops for timestamping control,
and away from the direct IOCTL handling. While at it support GET
operations for PHY timestamping.
- Add (and convert most drivers to) a dedicated ethtool callback for
reading the Rx ring count.
- Significant refactoring efforts in the STMMAC driver, which
supports Synopsys turn-key MAC IP integrated into a ton of SoCs.
- Ethernet high-speed NICs:
- Broadcom (bnxt):
- support PPS in/out on all pins
- Intel (100G, ice, idpf):
- ice: implement standard ethtool and timestamping stats
- i40e: support setting the max number of MAC addresses per VF
- iavf: support RSS of GTP tunnels for 5G and LTE deployments
- nVidia/Mellanox (mlx5):
- reduce downtime on interface reconfiguration
- disable being an XDP redirect target by default (same as
other drivers) to avoid wasting resources if feature is
unused
- Meta (fbnic):
- add support for Linux-managed PCS on 25G, 50G, and 100G links
- Wangxun:
- support Rx descriptor merge, and Tx head writeback
- support Rx coalescing offload
- support 25G SPF and 40G QSFP modules
- Ethernet virtual:
- Google (gve):
- allow ethtool to configure rx_buf_len
- implement XDP HW RX Timestamping support for DQ descriptor
format
- Microsoft vNIC (mana):
- support HW link state events
- handle hardware recovery events when probing the device
- Ethernet NICs consumer, and embedded:
- usbnet: add support for Byte Queue Limits (BQL)
- AMD (amd-xgbe):
- add device selftests
- NXP (enetc):
- add i.MX94 support
- Broadcom integrated MACs (bcmgenet, bcmasp):
- bcmasp: add support for PHY-based Wake-on-LAN
- Broadcom switches (b53):
- support port isolation
- support BCM5389/97/98 and BCM63XX ARL formats
- Lantiq/MaxLinear switches:
- support bridge FDB entries on the CPU port
- use regmap for register access
- allow user to enable/disable learning
- support Energy Efficient Ethernet
- support configuring RMII clock delays
- add tagging driver for MaxLinear GSW1xx switches
- Synopsys (stmmac):
- support using the HW clock in free running mode
- add Eswin EIC7700 support
- add Rockchip RK3506 support
- add Altera Agilex5 support
- Cadence (macb):
- cleanup and consolidate descriptor and DMA address handling
- add EyeQ5 support
- TI:
- icssg-prueth: support AF_XDP
- Airoha access points:
- add missing Ethernet stats and link state callback
- add AN7583 support
- support out-of-order Tx completion processing
- Power over Ethernet:
- pd692x0: preserve PSE configuration across reboots
- add support for TPS23881B devices
- Ethernet PHYs:
- Open Alliance OATC14 10BASE-T1S PHY cable diagnostic support
- Support 50G SerDes and 100G interfaces in Linux-managed PHYs
- micrel:
- support for non PTP SKUs of lan8814
- enable in-band auto-negotiation on lan8814
- realtek:
- cable testing support on RTL8224
- interrupt support on RTL8221B
- motorcomm: support for PHY LEDs on YT853
- microchip: support for LAN867X Rev.D0 PHYs w/ SQI and cable diag
- mscc: support for PHY LED control
- CAN drivers:
- m_can: add support for optional reset and system wake up
- remove can_change_mtu() obsoleted by core handling
- mcp251xfd: support GPIO controller functionality
- Bluetooth:
- add initial support for PASTa
- WiFi:
- split ieee80211.h file, it's way too big
- improvements in VHT radiotap reporting, S1G, Channel Switch
Announcement handling, rate tracking in mesh networks
- improve multi-radio monitor mode support, and add a cfg80211
debugfs interface for it
- HT action frame handling on 6 GHz
- initial chanctx work towards NAN
- MU-MIMO sniffer improvements
- WiFi drivers:
- RealTek (rtw89):
- support USB devices RTL8852AU and RTL8852CU
- initial work for RTL8922DE
- improved injection support
- Intel:
- iwlwifi: new sniffer API support
- MediaTek (mt76):
- WED support for >32-bit DMA
- airoha NPU support
- regdomain improvements
- continued WiFi7/MLO work
- Qualcomm/Atheros:
- ath10k: factory test support
- ath11k: TX power insertion support
- ath12k: BSS color change support
- ath12k: statistics improvements
- brcmfmac: Acer A1 840 tablet quirk
- rtl8xxxu: 40 MHz connection fixes/support"
* tag 'net-next-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1381 commits)
net: page_pool: sanitise allocation order
net: page pool: xa init with destroy on pp init
net/mlx5e: Support XDP target xmit with dummy program
net/mlx5e: Update XDP features in switch channels
selftests/tc-testing: Test CAKE scheduler when enqueue drops packets
net/sched: sch_cake: Fix incorrect qlen reduction in cake_drop
wireguard: netlink: generate netlink code
wireguard: uapi: generate header with ynl-gen
wireguard: uapi: move flag enums
wireguard: uapi: move enum wg_cmd
wireguard: netlink: add YNL specification
selftests: drv-net: Fix tolerance calculation in devlink_rate_tc_bw.py
selftests: drv-net: Fix and clarify TC bandwidth split in devlink_rate_tc_bw.py
selftests: drv-net: Set shell=True for sysfs writes in devlink_rate_tc_bw.py
selftests: drv-net: Use Iperf3Runner in devlink_rate_tc_bw.py
selftests: drv-net: introduce Iperf3Runner for measurement use cases
selftests: drv-net: Add devlink_rate_tc_bw.py to TEST_PROGS
net: ps3_gelic_net: Use napi_alloc_skb() and napi_gro_receive()
Documentation: net: dsa: mention simple HSR offload helpers
Documentation: net: dsa: mention availability of RedBox
...
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmktzC4ACgkQ6rmadz2v
bTpA1w/+PZ45N3y6O+NQVIpBlpnHG7DEMK7Lw19On0xVLwH+XPHz6J5PEfzjyJR1
SCbsV30qkJ1YCtgRHHf+ZCuWPWm58hY8dXYwSDyjNavdQyVGOdf17aBu9pvH45NW
K20OhwQHpCHWIfDlijjPkDdiHnYf5S7Xy6ctt/3ztF0pMDHIaghGxJymG4wULcDT
iLKnT37kwO8b2ihmw/HbcZPQYMWfHRye7X009K+wCv0dnhJ6q/Ny1m+Pg4kF92e6
ON/RY26ep2dq7LpaNWa1rI1yOgFlI7uUlVojqrAuAb+xrg+64wUDBxeijvE37EN1
s/+PuEKAR6xwz1dbY2cWAI0D633saz24UdV6kCBW9HrjHKVRQ7ZSsBF9ENkS4DTK
nowx4wOe1ZHc/6YgTktZp9LEn/0YrmQtFxjqEAJiYUgD18FrBrSjmhHpBiL+HghP
sTqy41qDQGoKtg3bRu42Co9wmNeeLsnxT8NQExCmTYQ4ufpdA/VMQux9cBVX3GBq
EchJb465+AcvvCJUiKbnHLxDsHCQz1YYytz3RqyFLgGDFZnHOE0FjwPJmM8I5kkK
gvDB3ZYdO3Halm8BZfZZBnKv5uK7myuAWwqRLgMRanuZcRgmIV1oUP5EP88HdH75
fB20vZSVcfzB17SLyhiM20ivEWodJa9VCLEw9WDOmDoml+33Pks=
=kaCJ
-----END PGP SIGNATURE-----
Merge tag 'bpf-next-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Pull bpf updates from Alexei Starovoitov:
- Convert selftests/bpf/test_tc_edt and test_tc_tunnel from .sh to
test_progs runner (Alexis Lothoré)
- Convert selftests/bpf/test_xsk to test_progs runner (Bastien
Curutchet)
- Replace bpf memory allocator with kmalloc_nolock() in
bpf_local_storage (Amery Hung), and in bpf streams and range tree
(Puranjay Mohan)
- Introduce support for indirect jumps in BPF verifier and x86 JIT
(Anton Protopopov) and arm64 JIT (Puranjay Mohan)
- Remove runqslower bpf tool (Hoyeon Lee)
- Fix corner cases in the verifier to close several syzbot reports
(Eduard Zingerman, KaFai Wan)
- Several improvements in deadlock detection in rqspinlock (Kumar
Kartikeya Dwivedi)
- Implement "jmp" mode for BPF trampoline and corresponding
DYNAMIC_FTRACE_WITH_JMP. It improves "fexit" program type performance
from 80 M/s to 136 M/s. With Steven's Ack. (Menglong Dong)
- Add ability to test non-linear skbs in BPF_PROG_TEST_RUN (Paul
Chaignon)
- Do not let BPF_PROG_TEST_RUN emit invalid GSO types to stack (Daniel
Borkmann)
- Generalize buildid reader into bpf_dynptr (Mykyta Yatsenko)
- Optimize bpf_map_update_elem() for map-in-map types (Ritesh
Oedayrajsingh Varma)
- Introduce overwrite mode for BPF ring buffer (Xu Kuohai)
* tag 'bpf-next-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (169 commits)
bpf: optimize bpf_map_update_elem() for map-in-map types
bpf: make kprobe_multi_link_prog_run always_inline
selftests/bpf: do not hardcode target rate in test_tc_edt BPF program
selftests/bpf: remove test_tc_edt.sh
selftests/bpf: integrate test_tc_edt into test_progs
selftests/bpf: rename test_tc_edt.bpf.c section to expose program type
selftests/bpf: Add success stats to rqspinlock stress test
rqspinlock: Precede non-head waiter queueing with AA check
rqspinlock: Disable spinning for trylock fallback
rqspinlock: Use trylock fallback when per-CPU rqnode is busy
rqspinlock: Perform AA checks immediately
rqspinlock: Enclose lock/unlock within lock entry acquisitions
bpf: Remove runqslower tool
selftests/bpf: Remove usage of lsm/file_alloc_security in selftest
bpf: Disable file_alloc_security hook
bpf: check for insn arrays in check_ptr_alignment
bpf: force BPF_F_RDONLY_PROG on insn array creation
bpf: Fix exclusive map memory leak
selftests/bpf: Make CS length configurable for rqspinlock stress test
selftests/bpf: Add lock wait time stats to rqspinlock stress test
...
- Enable -fms-extensions, allowing anonymous use of tagged struct or
union in struct/union (tag kbuild-ms-extensions-6.19). An exemplary
conversion patch is added here, too (btrfs).
- Introduce architecture-specific CC_CAN_LINK and flags for userprogs
- Add new packaging target 'modules-cpio-pkg' for building a initramfs
cpio w/ kmods
- Handle included .c files in gen_compile_commands
- Minor kbuild changes:
- Use objtree for module signing key path, fixing oot kmod signing
- Improve documentation of KBUILD_BUILD_TIMESTAMP
- Reuse KBUILD_USERCFLAGS for UAPI, instead of defining twice
- Rename scripts/Makefile.extrawarn to Makefile.warn
- Drop obsolete types.h check from headers_check.pl
- Remove outdated config leak ignore entries
Signed-off-by: Nicolas Schier <nsc@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEh0E3p4c3JKeBvsLGB1IKcBYmEmkFAmkttSoACgkQB1IKcBYm
Emkt6g/+NLW1rJBCcONXIe3UlfHvo4MBziA+ItuLkYx7FwhhTTOivQgav68wXSQK
/JqQ5N8O0nO4Gznv5aJWoW+GUCKFjqaDbJDXW9pRdDNGQjXJfL82OHu9gPiUAiDJ
ffZwyeRfXtBaglWv6ovM5z6817OqUCLErqjESAMHZv4nXbXjFNi2YKwgJGmAjfNM
kiNUU1ieO2/dxgI/mMCW0UuK0jjiC0EsY1as3IkxBdVZi3dRPobLNaL4JiKHGFlZ
BlrtAoGHrqwqEQgFACVJE2BzWhW+Hwz7SyfBvF4c8T0eM/02ogcL43COh6scq0pV
2om7hFwvE/NBdbCvz0KB/q3khx9jJRagCm0o/+YEexHb1rn+LcCbomQZYRjIEcHU
mBc3DTD+B+jwiN6mow6e6E1uuKvpTrmPGBNVyQ5BMFpdlhfga7zsttLO5kMIXqAH
Fc8MsSx06YggrwGevc/39C+3uzI501Rcxu9BWdyHXfeZKeq0gZGNtkXVN2edzzKm
c4uFeDbbF3M0oeGXmtlm8I//jqeJuc0wK8d8tSXjKodo+vgppXj0s2okAwALY2jW
NVAlxqKwS7CnjBbXOsHIXSCDDL5IQ+np/gIaOkSbl6DXU50BvFV+KGvqW9hdjmwR
AEQXemqUhSvGFmYG92mLwewXWAivnxUvGTCifOhXipyikXhX/wI=
=Gn4V
-----END PGP SIGNATURE-----
Merge tag 'kbuild-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux
Pull Kbuild updates from Nicolas Schier:
- Enable -fms-extensions, allowing anonymous use of tagged struct or
union in struct/union (tag kbuild-ms-extensions-6.19). An exemplary
conversion patch is added here, too (btrfs).
[ Editor's note: the core of this actually came in early through a
shared branch and a few other trees - Linus ]
- Introduce architecture-specific CC_CAN_LINK and flags for userprogs
- Add new packaging target 'modules-cpio-pkg' for building a initramfs
cpio w/ kmods
- Handle included .c files in gen_compile_commands
- Minor kbuild changes:
- Use objtree for module signing key path, fixing oot kmod signing
- Improve documentation of KBUILD_BUILD_TIMESTAMP
- Reuse KBUILD_USERCFLAGS for UAPI, instead of defining twice
- Rename scripts/Makefile.extrawarn to Makefile.warn
- Drop obsolete types.h check from headers_check.pl
- Remove outdated config leak ignore entries
* tag 'kbuild-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux:
kbuild: add target to build a cpio containing modules
initramfs: add gen_init_cpio to hostprogs unconditionally
kbuild: allow architectures to override CC_CAN_LINK
init: deduplicate cc-can-link.sh invocations
kbuild: don't enable CC_CAN_LINK if the dummy program generates warnings
scripts: headers_install.sh: Remove two outdated config leak ignore entries
scripts/clang-tools: Handle included .c files in gen_compile_commands
kbuild: uapi: Drop types.h check from headers_check.pl
kbuild: Rename Makefile.extrawarn to Makefile.warn
MAINTAINERS, .mailmap: Update mail address for Nicolas Schier
kbuild: uapi: reuse KBUILD_USERCFLAGS
kbuild: doc: improve KBUILD_BUILD_TIMESTAMP documentation
kbuild: Use objtree for module signing key path
btrfs: send: make use of -fms-extensions for defining struct fs_path
Toolchain and infrastructure:
- Add support for 'syn'.
Syn is a parsing library for parsing a stream of Rust tokens into a
syntax tree of Rust source code.
Currently this library is geared toward use in Rust procedural
macros, but contains some APIs that may be useful more generally.
'syn' allows us to greatly simplify writing complex macros such as
'pin-init' (Benno has already prepared the 'syn'-based version). We
will use it in the 'macros' crate too.
'syn' is the most downloaded Rust crate (according to crates.io), and
it is also used by the Rust compiler itself. While the amount of code
is substantial, there should not be many updates needed for these
crates, and even if there are, they should not be too big, e.g. +7k
-3k lines across the 3 crates in the last year.
'syn' requires two smaller dependencies: 'quote' and 'proc-macro2'.
I only modified their code to remove a third dependency
('unicode-ident') and to add the SPDX identifiers. The code can be
easily verified to exactly match upstream with the provided scripts.
They are all licensed under "Apache-2.0 OR MIT", like the other
vendored 'alloc' crate we had for a while.
Please see the merge commit with the cover letter for more context.
- Allow 'unreachable_pub' and 'clippy::disallowed_names' for doctests.
Examples (i.e. doctests) may want to do things like show public items
and use names such as 'foo'.
Nevertheless, we still try to keep examples as close to real code as
possible (this is part of why running Clippy on doctests is important
for us, e.g. for safety comments, which userspace Rust does not
support yet but we are stricter).
'kernel' crate:
- Replace our custom 'CStr' type with 'core::ffi::CStr'.
Using the standard library type reduces our custom code footprint,
and we retain needed custom functionality through an extension trait
and a new 'fmt!' macro which replaces the previous 'core' import.
This started in 6.17 and continued in 6.18, and we finally land the
replacement now. This required quite some stamina from Tamir, who
split the changes in steps to prepare for the flag day change here.
- Replace 'kernel::c_str!' with C string literals.
C string literals were added in Rust 1.77, which produce '&CStr's
(the 'core' one), so now we can write:
c"hi"
instead of:
c_str!("hi")
- Add 'num' module for numerical features.
It includes the 'Integer' trait, implemented for all primitive
integer types.
It also includes the 'Bounded' integer wrapping type: an integer
value that requires only the 'N' less significant bits of the wrapped
type to be encoded:
// An unsigned 8-bit integer, of which only the 4 LSBs are used.
let v = Bounded::<u8, 4>:🆕:<15>();
assert_eq!(v.get(), 15);
'Bounded' is useful to e.g. enforce guarantees when working with
bitfields that have an arbitrary number of bits.
Values can be constructed from simple non-constant expressions or,
for more complex ones, validated at runtime.
'Bounded' also comes with comparison and arithmetic operations (with
both their backing type and other 'Bounded's with a compatible
backing type), casts to change the backing type, extending/shrinking
and infallible/fallible conversions from/to primitives as applicable.
- 'rbtree' module: add immutable cursor ('Cursor').
It enables to use just an immutable tree reference where appropriate.
The existing fully-featured mutable cursor is renamed to 'CursorMut'.
kallsyms:
- Fix wrong "big" kernel symbol type read from procfs.
'pin-init' crate:
- A couple minor fixes (Benno asked me to pick these patches up for
him this cycle).
Documentation:
- Quick Start guide: add Debian 13 (Trixie).
Debian Stable is now able to build Linux, since Debian 13 (released
2025-08-09) packages Rust 1.85.0, which is recent enough.
We are planning to propose that the minimum supported Rust version in
Linux follows Debian Stable releases, with Debian 13 being the first
one we upgrade to, i.e. Rust 1.85.
MAINTAINERS:
- Add entry for the new 'num' module.
- Remove Alex as Rust maintainer: he hasn't had the time to contribute
for a few years now, so it is a no-op change in practice.
And a few other cleanups and improvements.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPjU5OPd5QIZ9jqqOGXyLc2htIW0FAmks0WkACgkQGXyLc2ht
IW3QQg/+LpBmrz0ZSKH24kcX3x/hpfA2Erd4cmn+qjXev9RSAM1bt9xf3dsAhhFd
BStUpf0aglHOSEuWAvNHqEb5+yu+6qy6EFXXqH0ASexHK93t77Jbztjtf3SMlykT
N/lSJ+LWw2WiRT0NRWoTfKaEWzZQ8j9fi9Jb/IGNZGdNMryisVUYWqzLwNupPuK+
RMcEitHdO2NWjyodk2GGRyYQ7+XxQgbXZoxtgeubPSrrmGuGTXV42RlQKC2KHPx3
gz6CwcO3Xd0bGHHSgP32QDtGRJtniO8iXBKxiooT+ys+M83fTKbwNrIrW3tHdheY
765qsd/NvUmAkcgTCoLqj5biU6LCsepyimNg1vf4pYFohBoTaGeN+UqzbXBrSjy2
pmrgxwMRVHsYz+zoSKAVKJl7ASba5BXFdI4Whgfqwwc9So/X7uyujIYXGbRoznCV
W5vu7OUboBy26NvcsPrf6BqWcsJEpGV/M4z2UBRjAoJTRGQMcm/ckuo/GfYm3yW+
bUW62UmVCdY5crpo7XPH/G4ZGBR/k3p9dLVt8OJxEoTlfw4KDE5BszJoXmejZqdi
9LEMhzTWwoFp9NspQuEGdYdfGRlfG6XXqrwGZtQI+dlc4RvFEgBBu2Lxotq+Ods0
EfCVCJQjWmyCodVdJ/QqbCRFuXtOFLr/hPdWnvlrRxVkPtF2CDw=
=9nM+
-----END PGP SIGNATURE-----
Merge tag 'rust-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux
Pull Rust updates from Miguel Ojeda:
"Toolchain and infrastructure:
- Add support for 'syn'.
Syn is a parsing library for parsing a stream of Rust tokens into a
syntax tree of Rust source code.
Currently this library is geared toward use in Rust procedural
macros, but contains some APIs that may be useful more generally.
'syn' allows us to greatly simplify writing complex macros such as
'pin-init' (Benno has already prepared the 'syn'-based version). We
will use it in the 'macros' crate too.
'syn' is the most downloaded Rust crate (according to crates.io),
and it is also used by the Rust compiler itself. While the amount
of code is substantial, there should not be many updates needed for
these crates, and even if there are, they should not be too big,
e.g. +7k -3k lines across the 3 crates in the last year.
'syn' requires two smaller dependencies: 'quote' and 'proc-macro2'.
I only modified their code to remove a third dependency
('unicode-ident') and to add the SPDX identifiers. The code can be
easily verified to exactly match upstream with the provided
scripts.
They are all licensed under "Apache-2.0 OR MIT", like the other
vendored 'alloc' crate we had for a while.
Please see the merge commit with the cover letter for more context.
- Allow 'unreachable_pub' and 'clippy::disallowed_names' for
doctests.
Examples (i.e. doctests) may want to do things like show public
items and use names such as 'foo'.
Nevertheless, we still try to keep examples as close to real code
as possible (this is part of why running Clippy on doctests is
important for us, e.g. for safety comments, which userspace Rust
does not support yet but we are stricter).
'kernel' crate:
- Replace our custom 'CStr' type with 'core::ffi::CStr'.
Using the standard library type reduces our custom code footprint,
and we retain needed custom functionality through an extension
trait and a new 'fmt!' macro which replaces the previous 'core'
import.
This started in 6.17 and continued in 6.18, and we finally land the
replacement now. This required quite some stamina from Tamir, who
split the changes in steps to prepare for the flag day change here.
- Replace 'kernel::c_str!' with C string literals.
C string literals were added in Rust 1.77, which produce '&CStr's
(the 'core' one), so now we can write:
c"hi"
instead of:
c_str!("hi")
- Add 'num' module for numerical features.
It includes the 'Integer' trait, implemented for all primitive
integer types.
It also includes the 'Bounded' integer wrapping type: an integer
value that requires only the 'N' least significant bits of the
wrapped type to be encoded:
// An unsigned 8-bit integer, of which only the 4 LSBs are used.
let v = Bounded::<u8, 4>:🆕:<15>();
assert_eq!(v.get(), 15);
'Bounded' is useful to e.g. enforce guarantees when working with
bitfields that have an arbitrary number of bits.
Values can also be constructed from simple non-constant expressions
or, for more complex ones, validated at runtime.
'Bounded' also comes with comparison and arithmetic operations
(with both their backing type and other 'Bounded's with a
compatible backing type), casts to change the backing type,
extending/shrinking and infallible/fallible conversions from/to
primitives as applicable.
- 'rbtree' module: add immutable cursor ('Cursor').
It enables to use just an immutable tree reference where
appropriate. The existing fully-featured mutable cursor is renamed
to 'CursorMut'.
kallsyms:
- Fix wrong "big" kernel symbol type read from procfs.
'pin-init' crate:
- A couple minor fixes (Benno asked me to pick these patches up for
him this cycle).
Documentation:
- Quick Start guide: add Debian 13 (Trixie).
Debian Stable is now able to build Linux, since Debian 13 (released
2025-08-09) packages Rust 1.85.0, which is recent enough.
We are planning to propose that the minimum supported Rust version
in Linux follows Debian Stable releases, with Debian 13 being the
first one we upgrade to, i.e. Rust 1.85.
MAINTAINERS:
- Add entry for the new 'num' module.
- Remove Alex as Rust maintainer: he hasn't had the time to
contribute for a few years now, so it is a no-op change in
practice.
And a few other cleanups and improvements"
* tag 'rust-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux: (53 commits)
rust: macros: support `proc-macro2`, `quote` and `syn`
rust: syn: enable support in kbuild
rust: syn: add `README.md`
rust: syn: remove `unicode-ident` dependency
rust: syn: add SPDX License Identifiers
rust: syn: import crate
rust: quote: enable support in kbuild
rust: quote: add `README.md`
rust: quote: add SPDX License Identifiers
rust: quote: import crate
rust: proc-macro2: enable support in kbuild
rust: proc-macro2: add `README.md`
rust: proc-macro2: remove `unicode_ident` dependency
rust: proc-macro2: add SPDX License Identifiers
rust: proc-macro2: import crate
rust: kbuild: support using libraries in `rustc_procmacro`
rust: kbuild: support skipping flags in `rustc_test_library`
rust: kbuild: add proc macro library support
rust: kbuild: simplify `--cfg` handling
rust: kbuild: introduce `core-flags` and `core-skip_flags`
...
-----BEGIN PGP SIGNATURE-----
iQJPBAABCAA5FiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmkumVobFIAAAAAABAAO
bWFudTIsMi41KzEuMTEsMiwyAAoJEFKgDEdIgJTypHMQALImoj39UZiLLVMn2d4P
5droOQw+qMCfsVVY5m9Rqw3vrCsSds+7eC5Qa2GaXR480l4ZeMGyoYCr+2gDWcM3
vwTWuj8UuUOxQlMXubIFyL9r/SUtYSiqfAPK5tGtzHuNNUEHWGed9r5RrOGKZOnj
SDe6ZL5G2Kekuua1mtSZhw8qpW/nK2IuFTRPxN6O243dUGtX0wkzWr1tVRpDDzoT
UNWZ2xoE1lzb/EAviyuSgSIc5YsT50bM+fMzOUHsY/E9u8fI42N1z5B+P3ZlTvGx
Z5nJ79ZMtbF3/4o6OaUqvR9Y4zZ73TtkMoV4sjrjYg28q+n4/tqVlCsjy0Hys/dP
MlaKbLSI54+RldTWx02Yu0jcn2Artua3N0bURyEenqkhDLW9RXVfx/h29QQYPpE9
7FSlpUAQnFzFbGUGJcB02oClkxUSUcOuzVqC0n30wfdo+bVzDn19rr+Vi2bHe8TD
MsASva96n2FhRVPkm8eBxnxpaYQj9Q6G1/7B1F30Et7PLzwD9qYUvXPXceyPATnd
r3hdUiKgkPvUaDhApJRHwv/ZGnNDKujGBDc/YgBQrl3cPfG0ShGzJzWz892JjX0+
NB+O+eabHVs6tFRFNvxS2fu/ROZxrGfBzZUKoaZjZh3JEgOhlLKVlL7aRqtctFvP
Dz0QYiOqlkjho7FykCIEKXxQ
=6o9X
-----END PGP SIGNATURE-----
Merge tag 'livepatching-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching
Pull livepatching updates from Petr Mladek:
- Support both paths where tracefs is typically mounted in selftests
- Make old_sympos 0 and 1 equal. They both are valid when there is only
one symbol with the given name.
* tag 'livepatching-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching:
selftests: livepatch: use canonical ftrace path
livepatch: Match old_sympos 0 and 1 in klp_find_func()
- Improve recovery from misbehaving BPF schedulers. When a scheduler puts many
tasks with varying affinity restrictions on a shared DSQ, CPUs scanning
through tasks they cannot run can overwhelm the system, causing lockups.
Bypass mode now uses per-CPU DSQs with a load balancer to avoid this, and
hooks into the hardlockup detector to attempt recovery. Add scx_cpu0 example
scheduler to demonstrate this scenario.
- Add lockless peek operation for DSQs to reduce lock contention for schedulers
that need to query queue state during load balancing.
- Allow scx_bpf_reenqueue_local() to be called from anywhere in preparation for
deprecating cpu_acquire/release() callbacks in favor of generic BPF hooks.
- Prepare for hierarchical scheduler support: add scx_bpf_task_set_slice() and
scx_bpf_task_set_dsq_vtime() kfuncs, make scx_bpf_dsq_insert*() return bool,
and wrap kfunc args in structs for future aux__prog parameter.
- Implement cgroup_set_idle() callback to notify BPF schedulers when a cgroup's
idle state changes.
- Fix migration tasks being incorrectly downgraded from stop_sched_class to
rt_sched_class across sched_ext enable/disable. Applied late as the fix is
low risk and the bug subtle but needs stable backporting.
- Various fixes and cleanups including cgroup exit ordering, SCX_KICK_WAIT
reliability, and backward compatibility improvements.
-----BEGIN PGP SIGNATURE-----
iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCaS4h1A4cdGpAa2VybmVs
Lm9yZwAKCRCxYfJx3gVYGe/MAP9EZ0pLiTpmMtt6mI/11Fmi+aWfL84j1zt13cz9
W4vb4gEA9eVEH6n9xyC4nhcOk9AQwSDuCWMOzLsnhW8TbEHVTww=
=8W/B
-----END PGP SIGNATURE-----
Merge tag 'sched_ext-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext updates from Tejun Heo:
- Improve recovery from misbehaving BPF schedulers.
When a scheduler puts many tasks with varying affinity restrictions
on a shared DSQ, CPUs scanning through tasks they cannot run can
overwhelm the system, causing lockups.
Bypass mode now uses per-CPU DSQs with a load balancer to avoid this,
and hooks into the hardlockup detector to attempt recovery.
Add scx_cpu0 example scheduler to demonstrate this scenario.
- Add lockless peek operation for DSQs to reduce lock contention for
schedulers that need to query queue state during load balancing.
- Allow scx_bpf_reenqueue_local() to be called from anywhere in
preparation for deprecating cpu_acquire/release() callbacks in favor
of generic BPF hooks.
- Prepare for hierarchical scheduler support: add
scx_bpf_task_set_slice() and scx_bpf_task_set_dsq_vtime() kfuncs,
make scx_bpf_dsq_insert*() return bool, and wrap kfunc args in
structs for future aux__prog parameter.
- Implement cgroup_set_idle() callback to notify BPF schedulers when a
cgroup's idle state changes.
- Fix migration tasks being incorrectly downgraded from
stop_sched_class to rt_sched_class across sched_ext enable/disable.
Applied late as the fix is low risk and the bug subtle but needs
stable backporting.
- Various fixes and cleanups including cgroup exit ordering,
SCX_KICK_WAIT reliability, and backward compatibility improvements.
* tag 'sched_ext-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: (44 commits)
sched_ext: Fix incorrect sched_class settings for per-cpu migration tasks
sched_ext: tools: Removing duplicate targets during non-cross compilation
sched_ext: Use kvfree_rcu() to release per-cpu ksyncs object
sched_ext: Pass locked CPU parameter to scx_hardlockup() and add docs
sched_ext: Update comments replacing breather with aborting mechanism
sched_ext: Implement load balancer for bypass mode
sched_ext: Factor out abbreviated dispatch dequeue into dispatch_dequeue_locked()
sched_ext: Factor out scx_dsq_list_node cursor initialization into INIT_DSQ_LIST_CURSOR
sched_ext: Add scx_cpu0 example scheduler
sched_ext: Hook up hardlockup detector
sched_ext: Make handle_lockup() propagate scx_verror() result
sched_ext: Refactor lockup handlers into handle_lockup()
sched_ext: Make scx_exit() and scx_vexit() return bool
sched_ext: Exit dispatch and move operations immediately when aborting
sched_ext: Simplify breather mechanism with scx_aborting flag
sched_ext: Use per-CPU DSQs instead of per-node global DSQs in bypass mode
sched_ext: Refactor do_enqueue_task() local and global DSQ paths
sched_ext: Use shorter slice in bypass mode
sched_ext: Mark racy bitfields to prevent adding fields that can't tolerate races
sched_ext: Minor cleanups to scx_task_iter
...
- Defer task cgroup unlink until after the dying task's final context switch
so that controllers see the cgroup properly populated until the task is
truly gone.
- cpuset cleanups and simplifications. Enforce that domain isolated CPUs
stay in root or isolated partitions and fail if isolated+nohz_full would
leave no housekeeping CPU. Fix sched/deadline root domain handling during
CPU hot-unplug and race for tasks in attaching cpusets.
- Misc fixes including memory reclaim protection documentation and selftest
KTAP conformance.
-----BEGIN PGP SIGNATURE-----
iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCaS3pEQ4cdGpAa2VybmVs
Lm9yZwAKCRCxYfJx3gVYGYbrAP9H0kVyWH5tK9VhjSZyqidic8NuvtmNOyhIRrg0
8S8K0wD/YG9xlh2JUyRmS4B23ggc59+9y5xM2/sctrho51Pvsgg=
=0MB+
-----END PGP SIGNATURE-----
Merge tag 'cgroup-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
- Defer task cgroup unlink until after the dying task's final context
switch so that controllers see the cgroup properly populated until
the task is truly gone
- cpuset cleanups and simplifications.
Enforce that domain isolated CPUs stay in root or isolated partitions
and fail if isolated+nohz_full would leave no housekeeping CPU. Fix
sched/deadline root domain handling during CPU hot-unplug and race
for tasks in attaching cpusets
- Misc fixes including memory reclaim protection documentation and
selftest KTAP conformance
* tag 'cgroup-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (21 commits)
cpuset: Treat cpusets in attaching as populated
sched/deadline: Walk up cpuset hierarchy to decide root domain when hot-unplug
cgroup/cpuset: Introduce cpuset_cpus_allowed_locked()
docs: cgroup: No special handling of unpopulated memcgs
docs: cgroup: Note about sibling relative reclaim protection
docs: cgroup: Explain reclaim protection target
selftests/cgroup: conform test to KTAP format output
cpuset: remove need_rebuild_sched_domains
cpuset: remove global remote_children list
cpuset: simplify node setting on error
cgroup: include missing header for struct irq_work
cgroup: Fix sleeping from invalid context warning on PREEMPT_RT
cgroup/cpuset: Globally track isolated_cpus update
cgroup/cpuset: Ensure domain isolated CPUs stay in root or isolated partition
cgroup/cpuset: Move up prstate_housekeeping_conflict() helper
cgroup/cpuset: Fail if isolated and nohz_full don't leave any housekeeping
cgroup/cpuset: Rename update_unbound_workqueue_cpumask() to update_isolation_cpumasks()
cgroup: Defer task cgroup unlink until after the task is done switching out
cgroup: Move dying_tasks cleanup from cgroup_task_release() to cgroup_task_free()
cgroup: Rename cgroup lifecycle hooks to cgroup_task_*()
...
- Rescuer affinity management: Affinity now updated only when detached using
wq_unbound_cpumask consistently. DISASSOCIATED workers also follow unbound
cpumask changes to avoid breaking CPU isolation.
- Rescuer cleanups preparing for fetching work items one by one from pool list:
Work assignment factored out, optimized to skip pwqs no longer needing
rescue, and shutdown logic simplified.
- Unused assert_rcu_or_wq_mutex_or_pool_mutex() removed.
-----BEGIN PGP SIGNATURE-----
iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCaS3kYg4cdGpAa2VybmVs
Lm9yZwAKCRCxYfJx3gVYGWkwAQC6zcYAbGTcEkfiDsg3GM6dCsHDk3hIXVguE7Em
mOpnVQD/QxiYHZxt/SOY5LIIX9ZvTpBYeuW7MR5bCncSBdCLTw4=
=VkvX
-----END PGP SIGNATURE-----
Merge tag 'wq-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue updates from Tejun Heo:
- Rescuer affinity management: Affinity is now updated only when
detached using wq_unbound_cpumask consistently. DISASSOCIATED workers
also follow unbound cpumask changes to avoid breaking CPU isolation
- Rescuer cleanups preparing for fetching work items one by one from
pool list: Work assignment factored out, optimized to skip pwqs no
longer needing rescue, and shutdown logic simplified
- Unused assert_rcu_or_wq_mutex_or_pool_mutex() removed
* tag 'wq-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: Don't rely on wq->rescuer to stop rescuer
workqueue: Only assign rescuer work when really needed
workqueue: Factor out assign_rescuer_work()
workqueue: Init rescuer's affinities as wq_unbound_cpumask
workqueue: Let DISASSOCIATED workers follow unbound wq cpumask changes
workqueue: Update the rescuer's affinity only when it is detached
workqueue: Remove unused assert_rcu_or_wq_mutex_or_pool_mutex
-----BEGIN PGP SIGNATURE-----
iQJPBAABCAA5FiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmktlbUbFIAAAAAABAAO
bWFudTIsMi41KzEuMTEsMiwyAAoJEFKgDEdIgJTyevsP/1z98/wfCaSCquIq4H8S
OTqFGybGgYQt1NmMj2cGPpbAE3LJNYORT0A4tcoqOTy1Z5xbQz63rO3clSI/e7Mf
n4ZZ7NvkE40i8et1BjqtZa9dSkAv4QLYH73KrtNeuTr5tqvHo1x8FakUH6gQnb1k
QOOebvbVXnOb+rh89j1GZShrLFcCil0psjp165WHAYE/3PyFBgYGLMCgwLqS+W3H
re5Q4sl/ySXpMFF/XN1Kww48FWxy/h+YQFCxZwuWlUcXtVjqZ+BN+keb7AqaFQ7R
dC2exV2W0RBoupEJR/FWHoXrm/bDDLhzqRaMvoggLJrMJ9L6V0WdIhaFA4qzoG63
paJGFjUfmDX3dpPsAddq7kKeevCz4a2/HwFKhiBqqq4tdHuely7wZgnoFO7ovgmu
DYDCXHtpJuWZR3WJ5I/V/sJ9i9KFXhhyWcKVf13QTAFiCaA09aeSAcUWNYNaaxbn
nu6IkUxdIVnWIEBgcYH6jz1DrPGreYLYuD4bVb2gdZoP0r3tnMpG6xfSNIUueSGd
VFAKW9PJYaj7Id+jgACH6V+gQ22L600xJDdL1bPjRbGE0LD7vlz2F1MZTq3BFJFn
hUxJeOZplHX+TPophdvH4MO9VLmydWLUyJiDBP1yA8M9XZms/5s7IJJ1RYXqUCcf
qEB4L7W1+Qy1R/lzf2PU9X4R
=FnfO
-----END PGP SIGNATURE-----
Merge tag 'printk-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux
Pull printk updates from Petr Mladek:
- Allow creaing nbcon console drivers with an unsafe write_atomic()
callback that can only be called by the final nbcon_atomic_flush_unsafe().
Otherwise, the driver would rely on the kthread.
It is going to be used as the-best-effort approach for an
experimental nbcon netconsole driver, see
https://lore.kernel.org/r/20251121-nbcon-v1-2-503d17b2b4af@debian.org
Note that a safe .write_atomic() callback is supposed to work in NMI
context. But some networking drivers are not safe even in IRQ
context:
https://lore.kernel.org/r/oc46gdpmmlly5o44obvmoatfqo5bhpgv7pabpvb6sjuqioymcg@gjsma3ghoz35
In an ideal world, all networking drivers would be fixed first and
the atomic flush would be blocked only in NMI context. But it brings
the question how reliable networking drivers are when the system is
in a bad state. They might block flushing more reliable serial
consoles which are more suitable for serious debugging anyway.
- Allow to use the last 4 bytes of the printk ring buffer.
- Prevent queuing IRQ work and block printk kthreads when consoles are
suspended. Otherwise, they create non-necessary churn or even block
the suspend.
- Release console_lock() between each record in the kthread used for
legacy consoles on RT. It might significantly speed up the boot.
- Release nbcon context between each record in the atomic flush. It
prevents stalls of the related printk kthread after it has lost the
ownership in the middle of a record
- Add support for NBCON consoles into KDB
- Add %ptsP modifier for printing struct timespec64 and use it where
possible
- Misc code clean up
* tag 'printk-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: (48 commits)
printk: Use console_is_usable on console_unblank
arch: um: kmsg_dump: Use console_is_usable
drivers: serial: kgdboc: Drop checks for CON_ENABLED and CON_BOOT
lib/vsprintf: Unify FORMAT_STATE_NUM handlers
printk: Avoid irq_work for printk_deferred() on suspend
printk: Avoid scheduling irq_work on suspend
printk: Allow printk_trigger_flush() to flush all types
tracing: Switch to use %ptSp
scsi: snic: Switch to use %ptSp
scsi: fnic: Switch to use %ptSp
s390/dasd: Switch to use %ptSp
ptp: ocp: Switch to use %ptSp
pps: Switch to use %ptSp
PCI: epf-test: Switch to use %ptSp
net: dsa: sja1105: Switch to use %ptSp
mmc: mmc_test: Switch to use %ptSp
media: av7110: Switch to use %ptSp
ipmi: Switch to use %ptSp
igb: Switch to use %ptSp
e1000e: Switch to use %ptSp
...
Changes
-------
* Sort the memory-barriers.txt file's wait_event* and wait_on_bit*
list alphabetically.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmktwu8THHBhdWxtY2tA
a2VybmVsLm9yZwAKCRCevxLzctn7jEvKD/919GXHqRiUeDSXGySi/q8oCPLFEJcJ
f2y0NAR5C7A0oHTs91aBJ8rmszMzqzKKYvOiwlS4UMRVzUP1uI1T1bPsoY1FJNZN
B4U6BThyhgOR5bSzEkbXIwaF2Vil4IqDXP2pPgab7IHs2XK6Iei3OzlGcezgK/IB
XRG2d7Zb4OAchuaxt9kZzM7Yl55++ksBWE14TERjpi2co8QzQVaZPRB2gr18x2+C
T7nh7x0pg8hYKRrWHHB9jcjoQId4S6B/5iWCWSMv2Yt/mk0QYEXn7pa1CrUoRLuX
RJVOlObT4ruiE4hShjDmq9wgHph8aqrO5O2L6i5U0LKUINMkpvvtWtWKSJhmBaoF
NsaGChne9GZdFeQk3lzWlA6DPrr4JlBkI8WqOiQ2N8S4xv/Bh5XW3Z61vICG8itd
IOsGaRnTF3qoZBBz6u/LxF6vbR4hNK4D86PM/Fjv2vv8bfqKaM4WDSdGWkPRh/Yt
JD1xsuO7qkFIYwJmFM7MBvjICrK1Fp7zVcmLIWgxW9TBp/BYSsLC3KX/GHiBPUY9
kGhxqynFIKdMNVrAU2DMj9mPCvefYPDj8CSYKTuW+HD9xlk2Raz5hRmS/OvAlktt
NtcYN1xrhBy/pEi0Jo22tdeUMWeHF5EvmJFJLrKrSQJi+MMNkwkif8zeiMhCGfOm
mvch0DBkUmD2sw==
=wByu
-----END PGP SIGNATURE-----
Merge tag 'lkmm.2025.12.01a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull lkmm documentation update from Paul McKenney:
- Sort the memory-barriers.txt file's wait_event* and wait_on_bit* list
alphabetically
* tag 'lkmm.2025.12.01a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
memory-barriers.txt: Sort wait_event* and wait_on_bit* list alphabetically
SRCU
----
_ Properly handle SRCU readers within IRQ disabled sections in tiny SRCU
- Preparation to reimplement RCU Tasks Trace on top of SRCU fast:
- Introduce API to expedite a grace period and test it through
rcutorture.
- Split srcu-fast in two flavours: SRCU-fast and SRCU-fast-updown.
Both are still targeted toward faster readers (without full
barriers on LOCK and UNLOCK) at the expense of heavier write side
(using full RCU grace period ordering instead of simply full
ordering) as compared to "traditional" non-fast SRCU. But those
srcu-fast flavours are going to be optimized in two different
ways:
- SRCU-fast will become the reimplementation basis for
RCU-TASK-TRACE for consolidation. Since RCU-TASK-TRACE must
be NMI safe, SRCU-fast must be as well.
- SRCU-fast-updown will be needed for uretprobes code in order
to get rid of the read-side memory barriers while still
allowing entering the reader at task level while exiting it
in a timer handler. It is considered semaphore-like in that
it can have different owners between LOCK and UNLOCK.
However it is not NMI-safe.
The actual optimizations are work in progress for the next cycle.
Only the new interfaces are added for now, along with related
torture and scalability test code.
- Create/document/debug/torture new proper initializers for RCU fast:
DEFINE_SRCU_FAST() and init_srcu_struct_fast()
This allows for using right away the proper ordering on the write
side (either full ordering or full RCU grace period ordering) without
waiting for the read side to tell which to use. Also this optimizes
the read side altogether with moving flavour debug checks under debug
config and with removing a costly RmW operation on their first call.
- Make some diagnostic functions tracing safe.
REFSCALE
-------
Add performance testing for common context synchronizations
(Preemption, IRQ, Softirq) and per-cpu increments. Those are
relevant comparisons against SRCU-fast read side APIs, especially
as they are planned to synchronize further tracing fast-path code.
MISCELLANOUS
------
- In order to prepare the layout for nohz_full work deferral to
user exit, the context tracking state must shrink the counter
of transitions to/from RCU not watching. The only possible hazard
is to trigger wrap-around more easily, delaying a bit grace periods
when that happens. This should be a rare event though. Yet add
debugging and torture code to test that assumption.
- Fix memory leak on locktorture module
- Annotate accesses in rculist_nulls.h to prevent from KCSAN warnings.
On recent discussions, we also concluded that all those WRITE_ONCE()
and READ_ONCE() on list APIs deserve appropriate comments. Something
to be expected for the next cycle.
- Provide a script to apply several configs to several commits with torture.
- Allow torture to reuse a build directory in order to save needless
rebuild time.
- Various cleanups.
-----BEGIN PGP SIGNATURE-----
iQJPBAABCAA5FiEEd76+gtGM8MbftQlOhSRUR1COjHcFAmksubgbFIAAAAAABAAO
bWFudTIsMi41KzEuMTEsMiwyAAoJEIUkVEdQjox3MJ0P/Aq+AOPWuaE70B8SxVYY
VfucjjzZwSyI30Vw4GE8skNVMOnyY2wPIrNRgmckghi9LM5luureMcks9esrdFfv
csWj3AhnBu+/Lyd4EhP8AxMH2OPIX50FUrAQaY7LAcmc4nIng1MqRm35eT776XzR
tN07+nsXjNumPcdN8JAv/VzInUf5DdMvhy+Qw0Krpvan/sUpzzpFpNUWdHsyZ2Re
UCQr//AooQLfCIIIHUYlviThqigJETZlv0NKYrD0Zaxx5MCOwwWmz5otQzacSRk2
Zgn6Bayp98s0972sOghiMRXEoSvqBSJe5eVTWcj1oNLruO3flpyREgE5Wuo8T8Bg
BHvOiYaPNC5FW+btCWOQcKcSx1mxrgsX3/9OAth+kmtPqZdJSQEeY+zqWzdN7ZlV
q5+oipsFONvWXL6+U6uZf44MG1pscrzn6QvXajjg98iaLZgY0+mkAhvtpjcfuTwZ
2CXYQKacfwftpOXgM2QQGB4QE8CKXchlma381KbTHv0rX8zfK0BnWoe/oGPz22GL
3kldmhfIC/87i8t9HHngkU6SZ+Mkvf9vC28eclnuXB1PKCRsBHePGvi3I671kbLj
hycBwnwWriULEwKYTVQ8gmdBHSANB4IpmMYflADVqLuvp+7U6W5mNjWNTcXoSwFl
NoR7AViEUNzOEoa8u6WMKoec
=XnUt
-----END PGP SIGNATURE-----
Merge tag 'rcu.release.v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux
Pull RCU updates from Frederic Weisbecker:
"SRCU:
- Properly handle SRCU readers within IRQ disabled sections in tiny
SRCU
- Preparation to reimplement RCU Tasks Trace on top of SRCU fast:
- Introduce API to expedite a grace period and test it through
rcutorture
- Split srcu-fast in two flavours: SRCU-fast and SRCU-fast-updown.
Both are still targeted toward faster readers (without full
barriers on LOCK and UNLOCK) at the expense of heavier write
side (using full RCU grace period ordering instead of simply
full ordering) as compared to "traditional" non-fast SRCU. But
those srcu-fast flavours are going to be optimized in two
different ways:
- SRCU-fast will become the reimplementation basis for
RCU-TASK-TRACE for consolidation. Since RCU-TASK-TRACE must
be NMI safe, SRCU-fast must be as well.
- SRCU-fast-updown will be needed for uretprobes code in order
to get rid of the read-side memory barriers while still
allowing entering the reader at task level while exiting it
in a timer handler. It is considered semaphore-like in that
it can have different owners between LOCK and UNLOCK.
However it is not NMI-safe.
The actual optimizations are work in progress for the next
cycle. Only the new interfaces are added for now, along with
related torture and scalability test code.
- Create/document/debug/torture new proper initializers for RCU fast:
DEFINE_SRCU_FAST() and init_srcu_struct_fast()
This allows for using right away the proper ordering on the write
side (either full ordering or full RCU grace period ordering)
without waiting for the read side to tell which to use.
This also optimizes the read side altogether with moving flavour
debug checks under debug config and with removing a costly RmW
operation on their first call.
- Make some diagnostic functions tracing safe
Refscale:
- Add performance testing for common context synchronizations
(Preemption, IRQ, Softirq) and per-cpu increments. Those are
relevant comparisons against SRCU-fast read side APIs, especially
as they are planned to synchronize further tracing fast-path code
Miscellanous:
- In order to prepare the layout for nohz_full work deferral to user
exit, the context tracking state must shrink the counter of
transitions to/from RCU not watching. The only possible hazard is
to trigger wrap-around more easily, delaying a bit grace periods
when that happens. This should be a rare event though. Yet add
debugging and torture code to test that assumption
- Fix memory leak on locktorture module
- Annotate accesses in rculist_nulls.h to prevent from KCSAN
warnings. On recent discussions, we also concluded that all those
WRITE_ONCE() and READ_ONCE() on list APIs deserve appropriate
comments. Something to be expected for the next cycle
- Provide a script to apply several configs to several commits with
torture
- Allow torture to reuse a build directory in order to save needless
rebuild time
- Various cleanups"
* tag 'rcu.release.v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (29 commits)
refscale: Add SRCU-fast-updown readers
refscale: Exercise DEFINE_STATIC_SRCU_FAST() and init_srcu_struct_fast()
rcutorture: Make srcu{,d}_torture_init() announce the SRCU type
srcu: Create an SRCU-fast-updown API
refscale: Do not disable interrupts for tests involving local_bh_enable()
refscale: Add non-atomic per-CPU increment readers
refscale: Add this_cpu_inc() readers
refscale: Add preempt_disable() readers
refscale: Add local_bh_disable() readers
refscale: Add local_irq_disable() and local_irq_save() readers
torture: Permit negative kvm.sh --kconfig numberic arguments
srcu: Add SRCU_READ_FLAVOR_FAST_UPDOWN CPP macro
rcu: Mark diagnostic functions as notrace
rcutorture: Make TREE04 use CONFIG_RCU_DYNTICKS_TORTURE
rcutorture: Remove redundant rcutorture_one_extend() from rcu_torture_one_read()
rcutorture: Permit kvm-again.sh to re-use the build directory
torture: Add kvm-series.sh to test commit/scenario combination
rcu: use WRITE_ONCE() for ->next and ->pprev of hlist_nulls
locktorture: Fix memory leak in param_set_cpumask()
doc: Update for SRCU-fast definitions and initialization
...
-----BEGIN PGP SIGNATURE-----
iQFPBAABCAA5FiEEe7vIQRWZI0iWSE3xu+CwddJFiJoFAmksibgbFIAAAAAABAAO
bWFudTIsMi41KzEuMTEsMiwyAAoJELvgsHXSRYiavR8H/jTNKlb8jZtre1Q2xIGJ
PgU8+fc4PGX8C6XuKRgb4KYL+zn3VSnTyxLUc3ObKIRTrGOJOBw3YT8R0LvrMOJx
Ibx/6o0o+vjnDxmq6QGcuYdytDdL/rL6Gh8PR1dyWAqPz6jGtraP0nCJu7Y9jRZ0
JHbyRTfpC8I6fTZv/WHocTsUDUu/+M4jQx3kMAMgSSTc7IAF+El5GqhpwEaWv7u/
6D0px1lXI3rGimzmHeLy+CEjW041MTkxPH3GNzkiZwi2WUwI+ZEteMcs29KHcCOe
/sdqmlzn2CPxzqG3TkJ4LbJE3XThYkqxe56LmBVJnhHFe+vCF8urEX9UUTtMn1dh
3zs=
=iQ4N
-----END PGP SIGNATURE-----
Merge tag 'slab-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab updates from Vlastimil Babka:
- mempool_alloc_bulk() support for upcoming users in the block layer
that need to allocate multiple objects at once with the mempool's
guaranteed progress semantics, which is not achievable with an
allocation single objects in a loop. Along with refactoring and
various improvements (Christoph Hellwig)
- Preparations for the upcoming separation of struct slab from struct
page, mostly by removing the struct folio layer, as the purpose of
struct folio has shifted since it became used in slab code (Matthew
Wilcox)
- Modernisation of slab's boot param API usage, which removes some
unexpected parsing corner cases (Petr Tesarik)
- Refactoring of freelist_aba_t (now struct freelist_counters) and
associated functions for double cmpxchg, enabled by -fms-extensions
(Vlastimil Babka)
- Cleanups and improvements related to sheaves caching layer, that were
part of the full conversion to sheaves, which is planned for the next
release (Vlastimil Babka)
* tag 'slab-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: (42 commits)
slab: Remove unnecessary call to compound_head() in alloc_from_pcs()
mempool: clarify behavior of mempool_alloc_preallocated()
mempool: drop the file name in the top of file comment
mempool: de-typedef
mempool: remove mempool_{init,create}_kvmalloc_pool
mempool: legitimize the io_schedule_timeout in mempool_alloc_from_pool
mempool: add mempool_{alloc,free}_bulk
mempool: factor out a mempool_alloc_from_pool helper
slab: Remove references to folios from virt_to_slab()
kasan: Remove references to folio in __kasan_mempool_poison_object()
memcg: Convert mem_cgroup_from_obj_folio() to mem_cgroup_from_obj_slab()
mempool: factor out a mempool_adjust_gfp helper
mempool: add error injection support
mempool: improve kerneldoc comments
mm: improve kerneldoc comments for __alloc_pages_bulk
fault-inject: make enum fault_flags available unconditionally
usercopy: Remove folio references from check_heap_object()
slab: Remove folio references from kfree_nolock()
slab: Remove folio references from kfree_rcu_sheaf()
slab: Remove folio references from build_detached_freelist()
...
build-system thrashing. That work should slow down from here on out.
- The various scripts and tools for documentation were spread out in
several directories; now they are (almost) all coalesced under
tools/docs/. The holdout is the kernel-doc script, which cannot be
easily moved without some further thought.
- As the amount of Python code increases, we are accumulating modules that
are imported by multiple programs. These modules have been pulled
together under tools/lib/python/ -- at least, for documentation-related
programs. There is other Python code in the tree that might eventually
want to move toward this organization.
- The Perl kernel-doc.pl script has been removed. It is no longer used by
default, and nobody has missed it, least of all anybody who actually had
to look at it.
- The docs build was controlled by a complex mess of makefilese that few
dared to touch. Mauro has moved that logic into a new program
(tools/docs/sphinx-build-wrapper) that, with any luck at all, will be far
easier to understand and maintain.
- The get_feat.pl program, used to access information under
Documentation/features/, has been rewritten in Python, bringing an end to
the use of Perl in the docs subsystem.
- The top-level README file has been reorganized into a more
reader-friendly presentation.
- A lot of Chinese translation additions
- Typo fixes and documentation updates as usual
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmkuHzEACgkQF0NaE2wM
flgiqAf/aFUA3zCMvOSjbOpX8EO/4rs0ISkhhb01rLSMsRs3P+v9SlGVJls734BE
0ZvVmBo0p7mNakZD4tFCryFn8Gntn28smCEmpDu/FRDMOEcXFUqxQ9st9OhRlar2
tETdFIKIF+yncFJ83Mjr7F5Yeqg38m82g5JdTxvh6FmvDhPLiSXDEeBV2L7hU+St
EX8D8KOZH74XM8LMr8eg3GbUXx72A7WELndlF7DfGIAC8rFC3C9wa0CUSx8wz2Zh
CoCOYsrmd7WdB+c30SUmQbtYLVyWraiqVQVUlCdZDeYPBT/mTo6zJI5L0WxgXuHz
6o+fQdfLA7zlMHTelVcM6y4Qwkmayg==
=1TPY
-----END PGP SIGNATURE-----
Merge tag 'docs-6.19' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
"This has been another busy cycle for documentation, with a lot of
build-system thrashing. That work should slow down from here on out.
- The various scripts and tools for documentation were spread out in
several directories; now they are (almost) all coalesced under
tools/docs/. The holdout is the kernel-doc script, which cannot be
easily moved without some further thought.
- As the amount of Python code increases, we are accumulating modules
that are imported by multiple programs. These modules have been
pulled together under tools/lib/python/ -- at least, for
documentation-related programs. There is other Python code in the
tree that might eventually want to move toward this organization.
- The Perl kernel-doc.pl script has been removed. It is no longer
used by default, and nobody has missed it, least of all anybody who
actually had to look at it.
- The docs build was controlled by a complex mess of makefilese that
few dared to touch. Mauro has moved that logic into a new program
(tools/docs/sphinx-build-wrapper) that, with any luck at all, will
be far easier to understand and maintain.
- The get_feat.pl program, used to access information under
Documentation/features/, has been rewritten in Python, bringing an
end to the use of Perl in the docs subsystem.
- The top-level README file has been reorganized into a more
reader-friendly presentation.
- A lot of Chinese translation additions
- Typo fixes and documentation updates as usual"
* tag 'docs-6.19' of git://git.lwn.net/linux: (164 commits)
docs: makefile: move rustdoc check to the build wrapper
README: restructure with role-based documentation and guidelines
docs: kdoc: various fixes for grammar, spelling, punctuation
docs: kdoc_parser: use '@' for Excess enum value
docs: submitting-patches: Clarify that removal of Acks needs explanation too
docs: kdoc_parser: add data/function attributes to ignore
docs: MAINTAINERS: update Mauro's files/paths
docs/zh_CN: Add wd719x.rst translation
docs/zh_CN: Add libsas.rst translation
get_feat.pl: remove it, as it got replaced by get_feat.py
Documentation/sphinx/kernel_feat.py: use class directly
tools/docs/get_feat.py: convert get_feat.pl to Python
Documentation/admin-guide: fix typo and comment in cscope example
docs/zh_CN: Add data-integrity.rst translation
docs/zh_CN: Add blk-mq.rst translation
docs/zh_CN: Add block/index.rst translation
docs/zh_CN: Update the Chinese translation of kbuild.rst
docs: bring some order to our Python module hierarchy
docs: Move the python libraries to tools/lib/python
Documentation/kernel-parameters: Move the kernel build options
...
API:
- Rewrite memcpy_sglist from scratch.
- Add on-stack AEAD request allocation.
- Fix partial block processing in ahash.
Algorithms:
- Remove ansi_cprng.
- Remove tcrypt tests for poly1305.
- Fix EINPROGRESS processing in authenc.
- Fix double-free in zstd.
Drivers:
- Use drbg ctr helper when reseeding xilinx-trng.
- Add support for PCI device 0x115A to ccp.
- Add support of paes in caam.
- Add support for aes-xts in dthev2.
Others:
- Use likely in rhashtable lookup.
- Fix lockdep false-positive in padata by removing a helper.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEn51F/lCuNhUwmDeSxycdCkmxi6cFAmktaHwACgkQxycdCkmx
i6duthAAl4ZjsuSgt0P9ZPJXWgSH+QbNT/6fL1QzLEuzLVGn8Mt99LTQpaYU8HRh
fced8+R7UpqA/FgZTYbRKopZJVJJqhmTf2zqjbe47CroRm2Wf5UO+6ZXBsiqbMwa
6fNLilhcrq5G3DrIHepCpIQ7NM2+ucTMnPRIWP3cvzLwX0JzPtYIpYUSiVPAtkjh
9g24oPz6LR/xZfyk+wPbHOSYeqz4sSXnGJkL+Vn33AtU5KJZLum9zMP4Lleim7HP
XaNnUL/S/PYCspycrvfrnq6+YMLPw2USguttuZe0Dg0qhq/jPMyzdEkTAjcTD5LG
NZavVUbQsf6BW+YjXgaE/ybcSs6WR3ySs8aza1Ev8QqsmpbJj9xdpF9fn4RsffGR
mbhc5plJCKWzfiaparea8yY9n5vHwbOK4zoyF9P6kI5ykkoA+GmwRwTW73M9KCfa
i1R6g97O+t4Yaq9JI9GG7dkm9bxJpY+XaKouW7rqv/MX0iND1ExDYaqdcA+Xa61c
TNfdlVcGyX7Dolm2xnpvRv8EqF9NzeK4Vw1QslrdCijXfe7eJymabNKhLBlV4li0
tVfmh4vyQFgruyiR7r7AkXIKzsLZbji030UoOsQqiMW7ualBUQ0dCDbBa8J6kUcX
/vjbSmxV3LKgVgYvUBRRGIi9CJbKfs29RkS6RFtdqcq/YT4KsJU=
=DHes
-----END PGP SIGNATURE-----
Merge tag 'v6.19-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
"API:
- Rewrite memcpy_sglist from scratch
- Add on-stack AEAD request allocation
- Fix partial block processing in ahash
Algorithms:
- Remove ansi_cprng
- Remove tcrypt tests for poly1305
- Fix EINPROGRESS processing in authenc
- Fix double-free in zstd
Drivers:
- Use drbg ctr helper when reseeding xilinx-trng
- Add support for PCI device 0x115A to ccp
- Add support of paes in caam
- Add support for aes-xts in dthev2
Others:
- Use likely in rhashtable lookup
- Fix lockdep false-positive in padata by removing a helper"
* tag 'v6.19-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (71 commits)
crypto: zstd - fix double-free in per-CPU stream cleanup
crypto: ahash - Zero positive err value in ahash_update_finish
crypto: ahash - Fix crypto_ahash_import with partial block data
crypto: lib/mpi - use min() instead of min_t()
crypto: ccp - use min() instead of min_t()
hwrng: core - use min3() instead of nested min_t()
crypto: aesni - ctr_crypt() use min() instead of min_t()
crypto: drbg - Delete unused ctx from struct sdesc
crypto: testmgr - Add missing DES weak and semi-weak key tests
Revert "crypto: scatterwalk - Move skcipher walk and use it for memcpy_sglist"
crypto: scatterwalk - Fix memcpy_sglist() to always succeed
crypto: iaa - Request to add Kanchana P Sridhar to Maintainers.
crypto: tcrypt - Remove unused poly1305 support
crypto: ansi_cprng - Remove unused ansi_cprng algorithm
crypto: asymmetric_keys - fix uninitialized pointers with free attribute
KEYS: Avoid -Wflex-array-member-not-at-end warning
crypto: ccree - Correctly handle return of sg_nents_for_len
crypto: starfive - Correctly handle return of sg_nents_for_len
crypto: iaa - Fix incorrect return value in save_iaa_wq()
crypto: zstd - Remove unnecessary size_t cast
...
-----BEGIN PGP SIGNATURE-----
iIcEABYIAC8WIQQzmBmZPBN6m/hUJmnyomI6a/yO7QUCaS+zQhEcd3VmYW5Aa2Vy
bmVsLm9yZwAKCRDyomI6a/yO7TfdAP4ngYyNKMwefqmrwG7akL9sRCWEH4Y/ZM/Z
ZwFw0waDkAEA5gV5LH6DJme9rBsXjC8wkOiiUOerqopIVKPMeYKCmAc=
=sOI5
-----END PGP SIGNATURE-----
Merge tag 'ipe-pr-20251202' of git://git.kernel.org/pub/scm/linux/kernel/git/wufan/ipe
Pull IPE udates from Fan Wu:
"The primary change is the addition of support for the AT_EXECVE_CHECK
flag. This allows interpreters to signal the kernel to perform IPE
security checks on script files before execution, extending IPE
enforcement to indirectly executed scripts.
Update documentation for it, and also fix a comment"
* tag 'ipe-pr-20251202' of git://git.kernel.org/pub/scm/linux/kernel/git/wufan/ipe:
ipe: Update documentation for script enforcement
ipe: Add AT_EXECVE_CHECK support for script enforcement
ipe: Drop a duplicated CONFIG_ prefix in the ifdeffery
-----BEGIN PGP SIGNATURE-----
iIoEABYKADIWIQQdXVVFGN5XqKr1Hj7LwZzRsCrn5QUCaS896BQcem9oYXJAbGlu
dXguaWJtLmNvbQAKCRDLwZzRsCrn5RDuAQDx4fmvctP8kc9PeRjd5X/UV1ip1pPD
beMKt8ghEThQiAEAzjFJbNGUDKhfR8yWODifAvYRurU5YQJZZI9wJ8skNw0=
=3Vc4
-----END PGP SIGNATURE-----
Merge tag 'integrity-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity
Pull integrity updates from Mimi Zohar:
"Bug fixes:
- defer credentials checking from the bprm_check_security hook to the
bprm_creds_from_file security hook
- properly ignore IMA policy rules based on undefined SELinux labels
IMA policy rule extensions:
- extend IMA to limit including file hashes in the audit logs
(dont_audit action)
- define a new filesystem subtype policy option (fs_subtype)
Misc:
- extend IMA to support in-kernel module decompression by deferring
the IMA signature verification in kernel_read_file() to after the
kernel module is decompressed"
* tag 'integrity-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
ima: Handle error code returned by ima_filter_rule_match()
ima: Access decompressed kernel module to verify appended signature
ima: add fs_subtype condition for distinguishing FUSE instances
ima: add dont_audit action to suppress audit actions
ima: Attach CREDS_CHECK IMA hook to bprm_creds_from_file LSM hook
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmkuAIcUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXMWlhAAgydMY1/oYPKbP3xdJTPJU6uc4GLk
o1lRaO59FkwogA2iupU6rltKiqhPYH7lIzebMcOuRLzaHpO0sGk0anyCMjNW2Agl
P/Zu+xkF+uGnzpO3gpaOLYFbCkpT98hNbyZu33l6ftxr37+S+DWHuThUhmOnxcgQ
z7Kguq0jaruiW8oc219HjNI/VCWW0F1W/+PVjFUSZogty2K2UttsabZQFMJ8MHg6
9C2jP/f+tN2KD55u7oEA5QiucC/8BdNdyLGke4TvjhG38FG4bh71Q59LknHa5yMa
6+NeftpE76+Inb8e+ze7iNv1InRccBXurm0p6lZ/lU5nYrjRT245CleQ7nq9ppD1
hyhuGQP/fvvYExKdTl1VWXA0zGLb6+1rIn6f//MpDSbXShGj/5vK82Qo/ug1ZEGH
QEAr6g2/S6xgudl2ui5OHSDb87nDWxzNo1t9TxGPBoQ6TG3ryPcm1ccTB0tb59+3
Poej26MOuZUrTpiQl3SLnfwjN0WnkiqGX5y/Cjh9tHFVk2OzOe/VqImZk9oeFgxt
O+IEB2cUO/U0D/3tdECqiRVS/RFe3jSn/qYCzP9fQaLhD0IpqlhfJ5TXbWiQam8Y
vGjNCx17a7mwvfxCIofEANw0wu3ooajq68UjRVhTpoKka0USKvktAoFirAB8r54I
oNSBfWKl4polguo=
=3hKh
-----END PGP SIGNATURE-----
Merge tag 'audit-pr-20251201' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit
Pull audit updates from Paul Moore:
- Consolidate the loops in __audit_inode_child() to improve performance
When logging a child inode in __audit_inode_child(), we first run
through the list of recorded inodes looking for the parent and then
we repeat the search looking for a matching child entry. This pull
request consolidates both searches into one pass through the recorded
inodes, resuling in approximately a 50% reduction in audit overhead.
See the commit description for the testing details.
- Combine kmalloc()/memset() into kzalloc() in audit_krule_to_data()
- Comment fixes
* tag 'audit-pr-20251201' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: merge loops in __audit_inode_child()
audit: Use kzalloc() instead of kmalloc()/memset() in audit_krule_to_data()
audit: fix comment misindentation in audit.h
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmkuAKEUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXPKeA/8DSW+sTkQ9BMGGnyuH1uU/r84qtVh
Ft6pnIPzrogE/GKcQeFgFA9D7gQbB8J39PSxZLS3lp0UiuPCuq+D09L+uzDKzDCD
Avfe84dwsI5OiplPKyHiG3bF9W2+A1zkwH2j+5uC6yF8v9J9vglo4u5vAYeE2wxA
X4b2r9jMm7WJ/KFNiSiiLGEhOSjVVUrJULcmWMRPPruplPDC4dLnqYTWTbkrfF8h
/oXv/+ssqbj6FqfL4WaRnjN8GgZcwaWy1qu9LVlZ40iphpbVAyPBJPLJS6Q4hhOl
mMHUbYkxALPyW7riQxoXAegQjJyGgKn8Bli9U6bkiKFA2yeIhJFX+OyV1SlOAs/J
g6s5XfeCzqY0Tw3eqvT1YRhp10GcA7EtBYvhAe5ARq7PkMoqxmiI587piVX9hbos
a0AH9CDNoOw+8QXx27sOoD1YIaiYD9fikXKymrzRRaW/GX6i43XIKiELBMuKoIVZ
iwualvQiGBLLczzm5rdqPcLgp09Agn4AHfvFWXKFgS4+IJGKjeeXNOjsp9oFEivq
RnXmDpa+nBud5zeTSeSpOY2L0pvuIG5N25N6U9bTsDe+4Y6p0qIAUy8e4sQ0PA8P
xyp9/fcNr9jwHeLTjDbxZqZ+MU3GLIIVPdl0zq4z2J8nhkW3wD3pQX6B4qPIuXLx
YP3nwhAT9T+hU7w=
=IvVa
-----END PGP SIGNATURE-----
Merge tag 'selinux-pr-20251201' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux updates from Paul Moore:
- Improve the granularity of SELinux labeling for memfd files
Currently when creating a memfd file, SELinux treats it the same as
any other tmpfs, or hugetlbfs, file. While simple, the drawback is
that it is not possible to differentiate between memfd and tmpfs
files.
This adds a call to the security_inode_init_security_anon() LSM hook
and wires up SELinux to provide a set of memfd specific access
controls, including the ability to control the execution of memfds.
As usual, the commit message has more information.
- Improve the SELinux AVC lookup performance
Adopt MurmurHash3 for the SELinux AVC hash function instead of the
custom hash function currently used. MurmurHash3 is already used for
the SELinux access vector table so the impact to the code is minimal,
and performance tests have shown improvements in both hash
distribution and latency.
See the commit message for the performance measurments.
- Introduce a Kconfig option for the SELinux AVC bucket/slot size
While we have the ability to grow the number of AVC hash buckets
today, the size of the buckets (slot size) is fixed at 512. This pull
request makes that slot size configurable at build time through a new
Kconfig knob, CONFIG_SECURITY_SELINUX_AVC_HASH_BITS.
* tag 'selinux-pr-20251201' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: improve bucket distribution uniformity of avc_hash()
selinux: Move avtab_hash() to a shared location for future reuse
selinux: Introduce a new config to make avc cache slot size adjustable
memfd,selinux: call security_inode_init_security_anon()
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmkuALUUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXOtDg/8DMxvN2XKZrryP31zdknUEHLJOTfz
eFCaNKQJK9GpJ1Q/Z4P/q/dH4QUKZHEM7E18N/hjA4Nx6Z7I1eVPK6hvvySkRa9l
b5j+GTLteMcANV04i04B8VTn2mtEW5SZp0Y280EFOMoVGvav72zAt4HHWVytDzyy
tVzvuC6iPNbe7rw+eUzTjHAq3WWWYe42QmiDfnAttdjWloSnfMx6AIvEoeo6jryc
aLGeZQsrgk2wL/ovXXD5kvDo1EQnETGuxQRh8P3W2DzLwEtt6d+BpfAm9PE0FE4k
oE5YrqOhvIpmcBm/8DdkvZ0o0gdfe0IrACvoEqJVpWs6w6T6zusiTzwWp7tBzET/
ygqYabUpz+BrAsGNVtXlDD4va37e5OI500PjDntuT4GMwKBGe5JKXLeki0sQeu6d
AcZd8hu6sVpYDLWJoWDXplxq1ndJTfafVtONQ5Cw8BHM5j6CIAaZM13KG9rJSOYa
uyNOfHxndsjV7dzuQ9S763l4djixiw0oU/PF+XQP4dC/Dyf60yb47mCOlZndRaJj
/FqR0Rbp2KonOSrkmzPTteGJOLMgM5bquZsSHNClxC/qeHTv8xKWf0HRWN61ZUe2
/NLcSjL+CIcN6q0c8jx/k7I9N/yQcmQLQIVTnUY6YOi0TkhUUdqSaq0rp8rSDW9z
AUvHpfPpC92klcM=
=u7yQ
-----END PGP SIGNATURE-----
Merge tag 'lsm-pr-20251201' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull LSM updates from Paul Moore:
- Rework the LSM initialization code
What started as a "quick" patch to enable a notification event once
all of the individual LSMs were initialized, snowballed a bit into a
30+ patch patchset when everything was done. Most of the patches, and
diffstat, is due to splitting out the initialization code into
security/lsm_init.c and cleaning up some of the mess that was there.
While not strictly necessary, it does cleanup the code signficantly,
and hopefully makes the upkeep a bit easier in the future.
Aside from the new LSM_STARTED_ALL notification, these changes also
ensure that individual LSM initcalls are only called when the LSM is
enabled at boot time. There should be a minor reduction in boot times
for those who build multiple LSMs into their kernels, but only enable
a subset at boot.
It is worth mentioning that nothing at present makes use of the
LSM_STARTED_ALL notification, but there is work in progress which is
dependent upon LSM_STARTED_ALL.
- Make better use of the seq_put*() helpers in device_cgroup
* tag 'lsm-pr-20251201' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: (36 commits)
lsm: use unrcu_pointer() for current->cred in security_init()
device_cgroup: Refactor devcgroup_seq_show to use seq_put* helpers
lsm: add a LSM_STARTED_ALL notification event
lsm: consolidate all of the LSM framework initcalls
selinux: move initcalls to the LSM framework
ima,evm: move initcalls to the LSM framework
lockdown: move initcalls to the LSM framework
apparmor: move initcalls to the LSM framework
safesetid: move initcalls to the LSM framework
tomoyo: move initcalls to the LSM framework
smack: move initcalls to the LSM framework
ipe: move initcalls to the LSM framework
loadpin: move initcalls to the LSM framework
lsm: introduce an initcall mechanism into the LSM framework
lsm: group lsm_order_parse() with the other lsm_order_*() functions
lsm: output available LSMs when debugging
lsm: cleanup the debug and console output in lsm_init.c
lsm: add/tweak function header comment blocks in lsm_init.c
lsm: fold lsm_init_ordered() into security_init()
lsm: cleanup initialize_lsm() and rename to lsm_init_single()
...
This pull request includes couple of updates for trusted keys:
1. Remove duplicate 'tpm2_hash_map' and use the one in the drive via new
function 'tpm2_find_hash_alg'.
2. Fix a memory leak on failure paths of 'tpm2_load_cmd'.
BR, Jarkko
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRE6pSOnaBC00OEHEIaerohdGur0gUCaStg0gAKCRAaerohdGur
0gdUAQCit+7Vwpc2oArfJoCFI1ILhLArPSUlBI0ZUB+BpBNZbQD/T1TwdT3Ytekf
jwkKrJbavuFQ+u5pUb2WShQJ7WOcBg4=
=jor5
-----END PGP SIGNATURE-----
Merge tag 'keys-trusted-next-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
Pull trusted key updates from Jarkko Sakkinen:
- Remove duplicate 'tpm2_hash_map' in favor of 'tpm2_find_hash_alg()'
- Fix a memory leak on failure paths of 'tpm2_load_cmd'
* tag 'keys-trusted-next-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
KEYS: trusted: Fix a memory leak in tpm2_load_cmd
KEYS: trusted: Replace a redundant instance of tpm2_hash_map