Please consider pulling these changes from the signed vfs-7.0-rc1.namespace tag.
Thanks!
Christian
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaYX49gAKCRCRxhvAZXjc
ovzgAP9BpqMQhMy2VCurru8/T5VAd6eJdgXzEfXqMksL5BNm8gEAsLx666KJNKgm
Sh/yVA2KBjf51gvcLZ4gHOISaMU8bAI=
=RGLf
-----END PGP SIGNATURE-----
Merge tag 'vfs-7.0-rc1.namespace' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs mount updates from Christian Brauner:
- statmount: accept fd as a parameter
Extend struct mnt_id_req with a file descriptor field and a new
STATMOUNT_BY_FD flag. When set, statmount() returns mount information
for the mount the fd resides on — including detached mounts
(unmounted via umount2(MNT_DETACH)).
For detached mounts the STATMOUNT_MNT_POINT and STATMOUNT_MNT_NS_ID
mask bits are cleared since neither is meaningful. The capability
check is skipped for STATMOUNT_BY_FD since holding an fd already
implies prior access to the mount and equivalent information is
available through fstatfs() and /proc/pid/mountinfo without
privilege. Includes comprehensive selftests covering both attached
and detached mount cases.
- fs: Remove internal old mount API code (1 patch)
Now that every in-tree filesystem has been converted to the new
mount API, remove all the legacy shim code in fs_context.c that
handled unconverted filesystems. This deletes ~280 lines including
legacy_init_fs_context(), the legacy_fs_context struct, and
associated wrappers. The mount(2) syscall path for userspace remains
untouched. Documentation references to the legacy callbacks are
cleaned up.
- mount: add OPEN_TREE_NAMESPACE to open_tree()
Container runtimes currently use CLONE_NEWNS to copy the caller's
entire mount namespace — only to then pivot_root() and recursively
unmount everything they just copied. With large mount tables and
thousands of parallel container launches this creates significant
contention on the namespace semaphore.
OPEN_TREE_NAMESPACE copies only the specified mount tree (like
OPEN_TREE_CLONE) but returns a mount namespace fd instead of a
detached mount fd. The new namespace contains the copied tree mounted
on top of a clone of the real rootfs.
This functions as a combined unshare(CLONE_NEWNS) + pivot_root() in a
single syscall. Works with user namespaces: an unshare(CLONE_NEWUSER)
followed by OPEN_TREE_NAMESPACE creates a mount namespace owned by
the new user namespace. Mount namespace file mounts are excluded from
the copy to prevent cycles. Includes ~1000 lines of selftests"
* tag 'vfs-7.0-rc1.namespace' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
selftests/open_tree: add OPEN_TREE_NAMESPACE tests
mount: add OPEN_TREE_NAMESPACE
fs: Remove internal old mount API code
selftests: statmount: tests for STATMOUNT_BY_FD
statmount: accept fd as a parameter
statmount: permission check should return EPERM
Create some wrapper code around struct super_block so that filesystems
have a standard way to queue filesystem metadata and file I/O error
reports to have them sent to fsnotify.
If a filesystem wants to provide an error number, it must supply only
negative error numbers. These are stored internally as negative
numbers, but they are converted to positive error numbers before being
passed to fanotify, per the fanotify(7) manpage. Implementations of
super_operations::report_error are passed the raw internal event data.
Note that we have to play some shenanigans with mempools and queue_work
so that the error handling doesn't happen outside of process context,
and the event handler functions (both ->report_error and fsnotify) can
handle file I/O error messages without having to worry about whatever
locks might be held. This asynchronicity requires that unmount wait for
pending events to clear.
Add a new callback to the superblock operations structure so that
filesystem drivers can themselves respond to file I/O errors if they so
desire. This will be used for an upcoming self-healing patchset for
XFS.
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Link: https://patch.msgid.link/176826402610.3490369.4378391061533403171.stgit@frogsfrogsfrogs
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Now that the last in-tree filesystem has been converted to the new mount
API, remove all legacy mount API code designed to handle un-converted
filesystems, and remove associated documentation as well.
(The code to handle the legacy mount(2) syscall from userspace is still
in place, of course.)
Tested with an allmodconfig build on x86_64, and a sanity check of an
old mount(2) syscall mount.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Link: https://patch.msgid.link/20251212174403.2882183-1-sandeen@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaSmOZgAKCRCRxhvAZXjc
opxBAQCjNjr0yTSoaGRM0CJXg79Of3DLIlBdB7TygibTN16WhwEA+VKWoHL5eRjg
PZlwZD4Ei2ymeQYxi+6owTF8G806tAs=
=m/Bt
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.19-rc1.guards' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull superblock lock guard updates from Christian Brauner:
"This starts the work of introducing guards for superblock related
locks.
Introduce super_write_guard for scoped superblock write protection.
This provides a guard-based alternative to the manual sb_start_write()
and sb_end_write() pattern, allowing the compiler to automatically
handle the cleanup"
* tag 'vfs-6.19-rc1.guards' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
xfs: use super write guard in xfs_file_ioctl()
open: use super write guard in do_ftruncate()
btrfs: use super write guard in relocating_repair_kthread()
ext4: use super write guard in write_mmp_block()
btrfs: use super write guard in sb_start_write()
btrfs: use super write guard btrfs_run_defrag_inode()
btrfs: use super write guard in btrfs_reclaim_bgs_work()
fs: add super_write_guard
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaSmOZgAKCRCRxhvAZXjc
oq2EAQD09y/qVU81E7Qg7Cn4n5/3WTlnQjx0aSvhb4p6dFUcFwD+K9uVJNP8x8tA
xTaPt59nZbEX9BIAwtLChSPa4CZsnwM=
=XrvE
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.19-rc1.fs_header' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull fs header updates from Christian Brauner:
"This contains initial work to start splitting up fs.h.
Begin the long-overdue work of splitting up the monolithic fs.h
header. The header has grown to over 3000 lines and includes types and
functions for many different subsystems, making it difficult to
navigate and causing excessive compilation dependencies.
This series introduces new focused headers for superblock-related
code:
- Rename fs_types.h to fs_dirent.h to better reflect its actual
content (directory entry types)
- Add fs/super_types.h containing superblock type definitions
- Add fs/super.h containing superblock function declarations
This is the first step in a longer effort to modularize the VFS
headers.
Cleanups:
- Inode Field Layout Optimization (Mateusz Guzik)
Move inode fields used during fast path lookup closer together to
improve cache locality during path resolution.
- current_umask() Optimization (Mateusz Guzik)
Inline current_umask() and move it to fs_struct.h. This improves
performance by avoiding function call overhead for this
frequently-used function, and places it in a more appropriate
header since it operates on fs_struct"
* tag 'vfs-6.19-rc1.fs_header' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs: move inode fields used during fast path lookup closer together
fs: inline current_umask() and move it to fs_struct.h
fs: add fs/super.h header
fs: add fs/super_types.h header
fs: rename fs_types.h to fs_dirent.h