mirror-linux/fs
Christian Brauner 576ee5dfd4 fs: add immutable rootfs
Currently pivot_root() doesn't work on the real rootfs because it
cannot be unmounted. Userspace has to do a recursive removal of the
initramfs contents manually before continuing the boot.

Really all we want from the real rootfs is to serve as the parent mount
for anything that is actually useful such as the tmpfs or ramfs for
initramfs unpacking or the rootfs itself. There's no need for the real
rootfs to actually be anything meaningful or useful. Add a immutable
rootfs called "nullfs" that can be selected via the "nullfs_rootfs"
kernel command line option.

The kernel will mount a tmpfs/ramfs on top of it, unpack the initramfs
and fire up userspace which mounts the rootfs and can then just do:

  chdir(rootfs);
  pivot_root(".", ".");
  umount2(".", MNT_DETACH);

and be done with it. (Ofc, userspace can also choose to retain the
initramfs contents by using something like pivot_root(".", "/initramfs")
without unmounting it.)

Technically this also means that the rootfs mount in unprivileged
namespaces doesn't need to become MNT_LOCKED anymore as it's guaranteed
that the immutable rootfs remains permanently empty so there cannot be
anything revealed by unmounting the covering mount.

In the future this will also allow us to create completely empty mount
namespaces without risking to leak anything.

systemd already handles this all correctly as it tries to pivot_root()
first and falls back to MS_MOVE only when that fails.

This goes back to various discussion in previous years and a LPC 2024
presentation about this very topic.

Link: https://patch.msgid.link/20260112-work-immutable-rootfs-v2-3-88dd1c34a204@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-12 16:52:09 +01:00
..
9p - fix a bug with O_APPEND in cached mode causing data to be written multiple times on server 2025-12-07 08:29:09 -08:00
adfs
affs Coccinelle-based conversion to use ->i_state accessors 2025-10-20 20:22:26 +02:00
afs Networking changes for 6.19. 2025-12-03 17:24:33 -08:00
autofs vfs-6.19-rc1.fixes 2025-12-05 15:52:30 -08:00
befs Coccinelle-based conversion to use ->i_state accessors 2025-10-20 20:22:26 +02:00
bfs vfs-6.19-rc1.inode 2025-12-01 09:02:34 -08:00
btrfs ARM: 2025-12-05 17:01:20 -08:00
cachefiles vfs-6.19-rc1.directory.locking 2025-12-01 16:13:46 -08:00
ceph We have a patch that adds an initial set of tracepoints to the MDS 2025-12-14 15:24:10 +12:00
coda Coccinelle-based conversion to use ->i_state accessors 2025-10-20 20:22:26 +02:00
configfs Some filesystems use a kinda-sorta controlled dentry refcount leak to pin 2025-12-05 14:36:21 -08:00
cramfs Coccinelle-based conversion to use ->i_state accessors 2025-10-20 20:22:26 +02:00
crypto Significant patch series in this pull request: 2025-12-06 14:01:20 -08:00
debugfs Driver core changes for 6.19-rc1 2025-12-05 21:29:02 -08:00
devpts convert devpts 2025-11-16 01:35:04 -05:00
dlm net: Convert proto callbacks from sockaddr to sockaddr_unsized 2025-11-04 19:10:33 -08:00
ecryptfs vfs-6.19-rc1.directory.locking 2025-12-01 16:13:46 -08:00
efivarfs Some filesystems use a kinda-sorta controlled dentry refcount leak to pin 2025-12-05 14:36:21 -08:00
efs Coccinelle-based conversion to use ->i_state accessors 2025-10-20 20:22:26 +02:00
erofs ARM: 2025-12-05 17:01:20 -08:00
exfat exfat: fix remount failure in different process environments 2025-12-03 10:00:17 +09:00
exportfs
ext2 Coccinelle-based conversion to use ->i_state accessors 2025-10-20 20:22:26 +02:00
ext4 New features and improvements for the ext4 file system 2025-12-03 20:37:15 -08:00
f2fs f2fs-for-6.19-rc1 2025-12-09 12:06:20 +09:00
fat There are no significant series in this small merge. Please see the 2025-12-13 20:55:12 +12:00
freevxfs Coccinelle-based conversion to use ->i_state accessors 2025-10-20 20:22:26 +02:00
fuse fuse update for 6.19 2025-12-05 15:25:13 -08:00
gfs2 gfs2 changes 2025-12-03 20:28:50 -08:00
hfs hfs/hfsplus updates for v6.19 2025-12-03 20:08:32 -08:00
hfsplus hfs/hfsplus updates for v6.19 2025-12-03 20:08:32 -08:00
hostfs Apart from the usual small churn, we have 2025-12-05 16:30:56 -08:00
hpfs vfs-6.19-rc1.fs_header 2025-12-01 14:18:01 -08:00
hugetlbfs Some filesystems use a kinda-sorta controlled dentry refcount leak to pin 2025-12-05 14:36:21 -08:00
iomap vfs-6.19-rc1.folio 2025-12-01 10:26:38 -08:00
isofs vfs-6.19-rc1.inode 2025-12-01 09:02:34 -08:00
jbd2 jbd2: fix the inconsistency between checksum and data in memory for journal sb 2025-11-26 17:05:47 -05:00
jffs2 Coccinelle-based conversion to use ->i_state accessors 2025-10-20 20:22:26 +02:00
jfs vfs-6.19-rc1.inode 2025-12-01 09:02:34 -08:00
kernfs Driver core changes for 6.19-rc1 2025-12-05 21:29:02 -08:00
lockd NFSD 6.19 Release Notes 2025-12-06 10:57:02 -08:00
minix vfs-6.19-rc1.minix 2025-12-01 15:22:40 -08:00
netfs vfs-6.19-rc1.folio 2025-12-01 10:26:38 -08:00
nfs NFS client updates for Linux 6.19 2025-12-12 21:52:42 +12:00
nfs_common
nfsd NFSD 6.19 Release Notes 2025-12-06 10:57:02 -08:00
nilfs2 Significant patch series in this pull request: 2025-12-06 14:01:20 -08:00
nls fs/nls: Fix inconsistency between utf8_to_utf32() and utf32_to_utf8() 2025-12-01 11:58:06 +02:00
notify vfs-6.19-rc1.fd_prepare.fs 2025-12-01 17:32:07 -08:00
ntfs3 Significant patch series in this merge are as follows: 2025-12-05 13:52:43 -08:00
ocfs2 There are no significant series in this small merge. Please see the 2025-12-13 20:55:12 +12:00
omfs vfs-6.19-rc1.fs_header 2025-12-01 14:18:01 -08:00
openpromfs Coccinelle-based conversion to use ->i_state accessors 2025-10-20 20:22:26 +02:00
orangefs vfs-6.19-rc1.inode 2025-12-01 09:02:34 -08:00
overlayfs ovl: pass original credentials, not mounter credentials during create 2025-12-05 16:16:20 -08:00
proc fs/proc: replace "__auto_type" with "const auto" 2025-12-08 15:32:15 -08:00
pstore Some filesystems use a kinda-sorta controlled dentry refcount leak to pin 2025-12-05 14:36:21 -08:00
qnx4 Coccinelle-based conversion to use ->i_state accessors 2025-10-20 20:22:26 +02:00
qnx6 Coccinelle-based conversion to use ->i_state accessors 2025-10-20 20:22:26 +02:00
quota Coccinelle-based conversion to use ->i_state accessors 2025-10-20 20:22:26 +02:00
ramfs Some filesystems use a kinda-sorta controlled dentry refcount leak to pin 2025-12-05 14:36:21 -08:00
resctrl Significant patch series in this merge are as follows: 2025-12-05 13:52:43 -08:00
romfs Coccinelle-based conversion to use ->i_state accessors 2025-10-20 20:22:26 +02:00
smb Three ksmbd server fixes 2025-12-12 21:59:19 +12:00
squashfs Coccinelle-based conversion to use ->i_state accessors 2025-10-20 20:22:26 +02:00
sysfs sysfs: attribute_group: enable const variants of is_visible() 2025-11-26 15:16:35 +01:00
tests
tracefs convert tracefs 2025-11-16 01:35:03 -05:00
ubifs This pull request contains the following changes for UBI and UBIFS: 2025-12-09 08:50:27 +09:00
udf Coccinelle-based conversion to use ->i_state accessors 2025-10-20 20:22:26 +02:00
ufs Coccinelle-based conversion to use ->i_state accessors 2025-10-20 20:22:26 +02:00
unicode
vboxsf
verity Optimize fsverity with 2-way interleaved hashing 2025-09-29 15:55:20 -07:00
xfs xfs: new code for v6.19 2025-12-03 20:19:38 -08:00
zonefs vfs-6.19-rc1.inode 2025-12-01 09:02:34 -08:00
Kconfig Summary of significant series in this pull request: 2025-10-02 18:18:33 -07:00
Kconfig.binfmt
Makefile fs: add immutable rootfs 2026-01-12 16:52:09 +01:00
aio.c aio: use credential guards 2025-11-04 12:36:33 +01:00
anon_inodes.c anon_inodes: convert to FD_ADD() 2025-11-28 12:42:31 +01:00
attr.c filelock: add struct delegated_inode 2025-11-12 09:38:34 +01:00
backing-file.c kernel-6.19-rc1.cred 2025-12-01 13:45:41 -08:00
bad_inode.c
binfmt_elf.c rseq: Provide and use rseq_set_ids() 2025-11-04 08:33:33 +01:00
binfmt_elf_fdpic.c
binfmt_flat.c
binfmt_misc.c Some filesystems use a kinda-sorta controlled dentry refcount leak to pin 2025-12-05 14:36:21 -08:00
binfmt_script.c
bpf_fs_kfuncs.c
buffer.c vfs-6.19-rc1.folio 2025-12-01 10:26:38 -08:00
char_dev.c
compat_binfmt_elf.c
coredump.c Networking changes for 6.19. 2025-12-03 17:24:33 -08:00
d_path.c
dax.c Significant patch series in this merge are as follows: 2025-12-05 13:52:43 -08:00
dcache.c fuse update for 6.19 2025-12-05 15:25:13 -08:00
direct-io.c
drop_caches.c Coccinelle-based conversion to use ->i_state accessors 2025-10-20 20:22:26 +02:00
eventfd.c eventfd: convert do_eventfd() to FD_PREPARE() 2025-11-28 12:42:31 +01:00
eventpoll.c eventpoll: convert do_epoll_create() to FD_PREPARE() 2025-11-28 12:42:32 +01:00
exec.c A large overhaul of the restartable sequences and CID management: 2025-12-02 08:48:53 -08:00
fcntl.c vfs: expose delegation support to userland 2025-11-12 09:38:37 +01:00
fhandle.c fhandle: convert do_handle_open() to FD_ADD() 2025-11-28 12:42:31 +01:00
file.c vfs-6.19-rc1.fd_prepare.fs 2025-12-01 17:32:07 -08:00
file_attr.c fs: remove spurious exports in fs/file_attr.c 2025-11-19 12:17:31 +01:00
file_table.c fs: update comment in init_file() 2025-10-07 12:48:33 +02:00
filesystems.c
fs-writeback.c vfs-6.19-rc1.writeback 2025-12-01 09:20:51 -08:00
fs_context.c
fs_dirent.c fs: rename fs_types.h to fs_dirent.h 2025-11-05 09:51:30 +01:00
fs_parser.c
fs_pin.c
fs_struct.c fs: inline current_umask() and move it to fs_struct.h 2025-11-05 22:51:23 +01:00
fsopen.c
init.c fs: add init_pivot_root() 2026-01-12 16:52:09 +01:00
inode.c fs: assert on I_FREEING not being set in iput() and iput_not_last() 2025-12-03 11:14:50 +01:00
internal.h fs: add init_pivot_root() 2026-01-12 16:52:09 +01:00
ioctl.c
kernel_read_file.c
libfs.c Some filesystems use a kinda-sorta controlled dentry refcount leak to pin 2025-12-05 14:36:21 -08:00
locks.c filelock: __fcntl_getlease: fix kernel-doc warnings 2025-11-28 10:30:41 +01:00
mbcache.c
mnt_idmapping.c
mount.h fs: add immutable rootfs 2026-01-12 16:52:09 +01:00
mpage.c
namei.c vfs-6.19-rc1.directory.locking 2025-12-01 16:13:46 -08:00
namespace.c fs: add immutable rootfs 2026-01-12 16:52:09 +01:00
nsfs.c vfs-6.19-rc1.fd_prepare.fs 2025-12-01 17:32:07 -08:00
nullfs.c fs: add immutable rootfs 2026-01-12 16:52:09 +01:00
open.c vfs-6.19-rc1.fd_prepare.fs 2025-12-01 17:32:07 -08:00
pidfs.c vfs-6.19-rc1.coredump 2025-12-01 10:17:39 -08:00
pipe.c Summary 2025-12-05 11:15:37 -08:00
pnode.c
pnode.h
posix_acl.c filelock: add struct delegated_inode 2025-11-12 09:38:34 +01:00
proc_namespace.c
read_write.c
readdir.c
remap_range.c
select.c select: Convert to scoped user access 2025-11-04 08:28:34 +01:00
seq_file.c
signalfd.c signalfd: convert do_signalfd4() to FD_ADD() 2025-11-28 12:42:32 +01:00
splice.c fs/splice.c: trivial fix: pipes -> pipe's 2025-11-25 10:11:16 +01:00
stack.c
stat.c
statfs.c
super.c vfs-6.19-rc1.fixes 2025-12-05 15:52:30 -08:00
sync.c vfs-6.19-rc1.writeback 2025-12-01 09:20:51 -08:00
sysctls.c
timerfd.c timerfd: convert timerfd_create() to FD_ADD() 2025-11-28 12:42:32 +01:00
userfaultfd.c Significant patch series in this merge are as follows: 2025-12-05 13:52:43 -08:00
utimes.c vfs-6.19-rc1.directory.delegations 2025-12-01 15:34:41 -08:00
xattr.c filelock: add struct delegated_inode 2025-11-12 09:38:34 +01:00