mirror-linux/include
Christian Brauner c8134b5f13
pidfd: add CLONE_PIDFD_AUTOKILL
Add a new clone3() flag CLONE_PIDFD_AUTOKILL that ties a child's
lifetime to the pidfd returned from clone3(). When the last reference to
the struct file created by clone3() is closed the kernel sends SIGKILL
to the child. A pidfd obtained via pidfd_open() for the same process
does not keep the child alive and does not trigger autokill - only the
specific struct file from clone3() has this property.

This is useful for container runtimes, service managers, and sandboxed
subprocess execution - any scenario where the child must die if the
parent crashes or abandons the pidfd.

CLONE_PIDFD_AUTOKILL requires both CLONE_PIDFD (the whole point is tying
lifetime to the pidfd file) and CLONE_AUTOREAP (a killed child with no
one to reap it would become a zombie). CLONE_THREAD is rejected because
autokill targets a process not a thread.

The clone3 pidfd is identified by the PIDFD_AUTOKILL file flag set on
the struct file at clone3() time. The pidfs .release handler checks this
flag and sends SIGKILL via do_send_sig_info(SIGKILL, SEND_SIG_PRIV, ...)
only when it is set. Files from pidfd_open() or open_by_handle_at() are
distinct struct files that do not carry this flag. dup()/fork() share the
same struct file so they extend the child's lifetime until the last
reference drops.

CLONE_PIDFD_AUTOKILL uses a privilege model based on CLONE_NNP: without
CLONE_NNP the child could escalate privileges via setuid/setgid exec
after being spawned, so the caller must have CAP_SYS_ADMIN in its user
namespace. With CLONE_NNP the child can never gain new privileges so
unprivileged usage is allowed. This is a deliberate departure from the
pdeath_signal model which is reset during secureexec and commit_creds()
rendering it useless for container runtimes that need to deprivilege
themselves.

Link: https://patch.msgid.link/20260226-work-pidfs-autoreap-v5-3-d148b984a989@kernel.org
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-03-11 23:15:40 +01:00
..
acpi mailbox: platform and core updates 2026-02-14 11:13:32 -08:00
asm-generic hyperv-next for v7.0 2026-02-20 08:48:31 -08:00
clocksource
crypto Networking changes for 7.0 2026-02-11 19:31:52 -08:00
cxl
drm drm/pagemap: pass pagemap_addr by reference 2026-02-17 19:39:44 -05:00
dt-bindings phy-for-7.0 2026-02-17 11:40:04 -08:00
hyperv hyperv-next for v7.0 2026-02-20 08:48:31 -08:00
keys
kunit treewide: Replace kmalloc with kmalloc_obj for non-scalar types 2026-02-21 01:02:28 -08:00
kvm
linux clone: add CLONE_AUTOREAP 2026-03-11 23:14:02 +01:00
math-emu
media [GIT PULL for v7.0] media updates 2026-02-11 12:20:25 -08:00
memory
misc
net Convert more 'alloc_obj' cases to default GFP_KERNEL arguments 2026-02-21 20:03:00 -08:00
pcmcia
ras
rdma RDMA v7.0 merge window 2026-02-12 17:05:20 -08:00
rv rv: Fix multiple definition of __pcpu_unique_da_mon_this 2026-02-20 13:12:00 +01:00
scsi SCSI misc on 20260212 2026-02-12 15:43:02 -08:00
soc
sound
target
trace vfs-7.0-rc1.misc.2 2026-02-16 13:00:36 -08:00
uapi pidfd: add CLONE_PIDFD_AUTOKILL 2026-03-11 23:15:40 +01:00
ufs
vdso
video
xen
Kbuild