mirror-linux/fs
Linus Torvalds 8c2e52ebbe eventpoll: don't decrement ep refcount while still holding the ep mutex
Jann Horn points out that epoll is decrementing the ep refcount and then
doing a

    mutex_unlock(&ep->mtx);

afterwards. That's very wrong, because it can lead to a use-after-free.

That pattern is actually fine for the very last reference, because the
code in question will delay the actual call to "ep_free(ep)" until after
it has unlocked the mutex.

But it's wrong for the much subtler "next to last" case when somebody
*else* may also be dropping their reference and free the ep while we're
still using the mutex.

Note that this is true even if that other user is also using the same ep
mutex: mutexes, unlike spinlocks, can not be used for object ownership,
even if they guarantee mutual exclusion.

A mutex "unlock" operation is not atomic, and as one user is still
accessing the mutex as part of unlocking it, another user can come in
and get the now released mutex and free the data structure while the
first user is still cleaning up.

See our mutex documentation in Documentation/locking/mutex-design.rst,
in particular the section [1] about semantics:

	"mutex_unlock() may access the mutex structure even after it has
	 internally released the lock already - so it's not safe for
	 another context to acquire the mutex and assume that the
	 mutex_unlock() context is not using the structure anymore"

So if we drop our ep ref before the mutex unlock, but we weren't the
last one, we may then unlock the mutex, another user comes in, drops
_their_ reference and releases the 'ep' as it now has no users - all
while the mutex_unlock() is still accessing it.

Fix this by simply moving the ep refcount dropping to outside the mutex:
the refcount itself is atomic, and doesn't need mutex protection (that's
the whole _point_ of refcounts: unlike mutexes, they are inherently
about object lifetimes).

Reported-by: Jann Horn <jannh@google.com>
Link: https://docs.kernel.org/locking/mutex-design.html#semantics [1]
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-07-09 10:38:29 -07:00
..
9p vfs-6.16-rc1.netfs 2025-06-02 15:04:06 -07:00
adfs
affs vfs-6.15-rc1.async.dir 2025-03-24 10:47:14 -07:00
afs vfs-6.16-rc1.netfs 2025-06-02 15:04:06 -07:00
autofs VFS: rename lookup_one_len family to lookup_noperm and remove permission check 2025-04-08 11:24:36 +02:00
bcachefs bcachefs: opts.casefold_disabled 2025-07-01 19:33:46 -04:00
befs
bfs bfs: convert bfs to use the new mount api 2025-04-07 09:36:20 +02:00
btrfs for-6.16-rc4-tag 2025-07-03 13:29:56 -07:00
cachefiles vfs-6.16-rc1.netfs 2025-06-02 15:04:06 -07:00
ceph A one-liner that leads to a startling (but also very much rational) 2025-06-06 17:56:19 -07:00
coda Change inode_operations.mkdir to return struct dentry * 2025-02-27 20:00:17 +01:00
configfs - The 3 patch series "hung_task: extend blocking task stacktrace dump to 2025-05-31 19:12:53 -07:00
cramfs
crypto fscrypt: add support for hardware-wrapped keys 2025-04-08 19:32:11 -07:00
debugfs VFS: rename lookup_one_len family to lookup_noperm and remove permission check 2025-04-08 11:24:36 +02:00
devpts devpts: Fix type for uid and gid params 2025-04-07 15:22:12 +02:00
dlm treewide, timers: Rename from_timer() to timer_container_of() 2025-06-08 09:07:37 +02:00
ecryptfs VFS: rename lookup_one_len family to lookup_noperm and remove permission check 2025-04-08 11:24:36 +02:00
efivarfs vfs-6.16-rc1.super 2025-05-26 09:33:44 -07:00
efs
erofs erofs: remove a superfluous check for encoded extents 2025-06-20 23:41:12 +08:00
exfat exfat: do not clear volume dirty flag during sync 2025-05-26 20:25:23 +09:00
exportfs readdir: supply dir_context.count as readdir buffer size hint 2025-05-29 12:31:23 +02:00
ext2 ext2: Deprecate DAX 2025-04-29 13:08:20 +02:00
ext4 treewide, timers: Rename from_timer() to timer_container_of() 2025-06-08 09:07:37 +02:00
f2fs f2fs: fix to zero post-eof page 2025-06-18 21:35:29 +00:00
fat Change inode_operations.mkdir to return struct dentry * 2025-02-27 20:00:17 +01:00
freevxfs
fuse vfs-6.16-rc5.fixes 2025-07-04 09:06:49 -07:00
gfs2 gfs2: Don't clear sb->s_fs_info in gfs2_sys_fs_add 2025-05-30 19:20:20 +02:00
hfs Revert "hfs{plus}: add deprecation warning" 2025-04-19 22:48:59 +02:00
hfsplus hfsplus: use bdev_rw_virt in hfsplus_submit_bio 2025-05-07 07:31:08 -06:00
hostfs Updates for UML for this cycle, notably: 2025-04-02 12:25:03 -07:00
hpfs Change inode_operations.mkdir to return struct dentry * 2025-02-27 20:00:17 +01:00
hugetlbfs - The 6 patch series "Enable strict percpu address space checks" from 2025-04-01 09:29:18 -07:00
iomap iomap: don't lose folio dropbehind state for overwrites 2025-05-28 09:26:07 +02:00
isofs isofs: fix Y2038 and Y2156 issues in Rock Ridge TF entry 2025-04-15 11:56:57 +02:00
jbd2 treewide, timers: Rename from_timer() to timer_container_of() 2025-06-08 09:07:37 +02:00
jffs2 jffs2: check jffs2_prealloc_raw_node_refs() result in few other places 2025-05-22 20:54:38 +02:00
jfs - The 11 patch series "Add folio_mk_pte()" from Matthew Wilcox 2025-05-31 15:44:16 -07:00
kernfs Driver core changes for 6.16-rc1 2025-05-29 09:11:39 -07:00
lockd sysctl: Fixes nsm_local_state bounds 2025-03-10 09:11:13 -04:00
minix Change inode_operations.mkdir to return struct dentry * 2025-02-27 20:00:17 +01:00
netfs netfs: Update tracepoints in a number of ways 2025-07-01 22:37:14 +02:00
nfs NFSv4/flexfiles: Fix handling of NFS level errors in I/O 2025-06-26 13:46:44 -04:00
nfs_common nfs_localio: change nfsd_file_put_local() to take a pointer to __rcu pointer 2025-05-28 17:17:14 -04:00
nfsd nfsd-6.16 fixes: 2025-06-21 09:20:15 -07:00
nilfs2 treewide, timers: Rename from_timer() to timer_container_of() 2025-06-08 09:07:37 +02:00
nls
notify \n 2025-05-29 10:34:26 -07:00
ntfs3 - The 2 patch series "zram: support algorithm-specific parameters" from 2025-06-02 16:00:26 -07:00
ocfs2 treewide, timers: Rename from_timer() to timer_container_of() 2025-06-08 09:07:37 +02:00
omfs omfs: convert to new mount API 2025-04-28 10:54:39 +02:00
openpromfs
orangefs orangefs: Convert to use the new mount API 2025-05-28 12:05:30 -07:00
overlayfs vfs-6.16-rc3.fixes 2025-06-16 08:18:43 -07:00
proc fix proc_sys_compare() handling of in-lookup dentries 2025-07-03 20:59:09 -04:00
pstore treewide: Switch/rename to timer_delete[_sync]() 2025-04-05 10:30:12 +02:00
qnx4
qnx6
quota VFS: rename lookup_one_len family to lookup_noperm and remove permission check 2025-04-08 11:24:36 +02:00
ramfs Change inode_operations.mkdir to return struct dentry * 2025-02-27 20:00:17 +01:00
resctrl x86,fs/resctrl: Remove inappropriate references to cacheinfo in the resctrl subsystem 2025-06-16 21:06:12 +02:00
romfs
smb five smb3 client fixes 2025-07-05 13:05:28 -07:00
squashfs squashfs: add optional full compressed block caching 2025-05-27 19:40:33 -07:00
sysfs sysfs: constify attribute_group::bin_attrs 2025-04-15 18:46:10 +02:00
tests
tracefs VFS: rename lookup_one_len family to lookup_noperm and remove permission check 2025-04-08 11:24:36 +02:00
ubifs This pull request contains the following fixes for JFFS2 and UBIFS: 2025-06-07 07:24:07 -07:00
udf udf: Make sure i_lenExtents is uptodate on inode eviction 2025-05-07 12:04:07 +02:00
ufs ufs: convert ufs to the new mount API 2025-05-14 22:40:55 -04:00
unicode unicode: kunit: change tests filename and path 2025-02-12 14:00:11 -08:00
vboxsf vboxsf: Convert to writepages 2025-04-07 09:36:48 +02:00
verity Revert "fsverity: relax build time dependency on CRYPTO_SHA256" 2025-02-17 11:34:15 -08:00
xfs xfs: add FALLOC_FL_ALLOCATE_RANGE to supported flags mask 2025-06-30 14:16:13 +02:00
zonefs zonefs: use bdev_rw_virt in zonefs_read_super 2025-05-07 07:31:07 -06:00
Kconfig fs/resctrl: Add boiler plate for external resctrl code 2025-05-16 11:05:40 +02:00
Kconfig.binfmt
Makefile fs/resctrl: Add boiler plate for external resctrl code 2025-05-16 11:05:40 +02:00
aio.c fs: aio: initialize .ki_write_stream of read-write request 2025-05-07 08:00:11 -06:00
anon_inodes.c fs: export anon_inode_make_secure_inode() and fix secretmem LSM bypass 2025-06-23 12:41:17 +02:00
attr.c
backing-file.c
bad_inode.c Change inode_operations.mkdir to return struct dentry * 2025-02-27 20:00:17 +01:00
binfmt_elf.c vfs-6.16-rc1.misc 2025-05-26 09:02:39 -07:00
binfmt_elf_fdpic.c binfmt_elf_fdpic: fix variable set but not used warning 2025-03-07 20:07:33 -08:00
binfmt_flat.c
binfmt_misc.c VFS: rename lookup_one_len family to lookup_noperm and remove permission check 2025-04-08 11:24:36 +02:00
binfmt_script.c
bpf_fs_kfuncs.c bpf: fs/xattr: Add BPF kfuncs to set and remove xattrs 2025-02-13 19:35:32 -08:00
buffer.c vfs-6.16-rc1.writepage 2025-05-26 08:23:09 -07:00
char_dev.c
compat_binfmt_elf.c
coredump.c coredump: validate socket name as it is written 2025-05-21 13:59:12 +02:00
d_path.c
dax.c vfs-6.16-rc2.fixes 2025-06-02 12:49:16 -07:00
dcache.c vfs-6.16-rc1.misc 2025-05-26 09:02:39 -07:00
direct-io.c
drop_caches.c fs: drop_caches: move sysctl to fs/drop_caches.c 2025-02-07 16:53:04 +01:00
eventfd.c make use of anon_inode_getfile_fmode() 2025-02-21 10:25:31 +01:00
eventpoll.c eventpoll: don't decrement ep refcount while still holding the ep mutex 2025-07-09 10:38:29 -07:00
exec.c anon_inode: rework assertions 2025-07-02 14:41:39 +02:00
fcntl.c
fhandle.c
file.c fs: drop assert in file_seek_cur_needs_f_lock 2025-06-16 09:59:24 +02:00
file_table.c fs: Make file-nr output the total allocated file handles 2025-04-21 10:27:58 +02:00
filesystems.c fs/filesystems: Fix potential unsigned integer underflow in fs_name() 2025-04-14 13:05:59 +02:00
fs-writeback.c fs: fs-writeback: move sysctl to fs/fs-writeback.c 2025-02-07 16:53:04 +01:00
fs_context.c fs/fs_context: Mark an unlikely if condition with unlikely() in vfs_parse_monolithic_sep() 2025-04-14 13:05:59 +02:00
fs_parser.c fs/fs_parse: Remove unused and problematic validate_constant_table() 2025-04-21 10:27:59 +02:00
fs_pin.c
fs_struct.c
fs_types.c
fsopen.c fs: support O_PATH fds with FSCONFIG_SET_FD 2025-02-12 10:02:10 +01:00
init.c VFS: Change vfs_mkdir() to return the dentry. 2025-03-05 11:52:50 +01:00
inode.c fs: call inode_sb_list_add() outside of inode hash lock 2025-03-20 13:06:51 +01:00
internal.h vfs-6.16-rc1.super 2025-05-26 09:33:44 -07:00
ioctl.c vfs-6.16-rc1.super 2025-05-26 09:33:44 -07:00
kernel_read_file.c
libfs.c anon_inode: rework assertions 2025-07-02 14:41:39 +02:00
locks.c treewide: const qualify ctl_tables where applicable 2025-01-28 13:48:37 +01:00
mbcache.c
mnt_idmapping.c statmount: allow to retrieve idmappings 2025-02-12 12:12:27 +01:00
mount.h Don't propagate mounts into detached trees 2025-05-26 17:35:32 -04:00
mpage.c fs: use writeback_iter directly in mpage_writepages 2025-05-09 12:37:48 +02:00
namei.c vfs-6.16-rc5.fixes 2025-07-04 09:06:49 -07:00
namespace.c userns and mnt_idmap leak in open_tree_attr(2) 2025-06-24 10:25:04 -04:00
nsfs.c vfs-6.15-rc1.nsfs 2025-03-24 11:38:12 -07:00
open.c fs/open: make do_truncate() killable 2025-05-15 12:03:12 +02:00
pidfs.c pidfs: never refuse ppid == 0 in PIDFD_GET_INFO 2025-06-04 22:48:32 +02:00
pipe.c sort.h: hoist cmp_int() into generic header file 2025-05-11 17:54:12 -07:00
pnode.c Don't propagate mounts into detached trees 2025-05-26 17:35:32 -04:00
pnode.h replace collect_mounts()/drop_collected_mounts() with a safer variant 2025-06-23 14:01:49 -04:00
posix_acl.c
proc_namespace.c ->mnt_devname is never NULL 2025-05-23 14:20:44 +02:00
read_write.c fs/read_write: make default_llseek() killable 2025-05-15 12:03:12 +02:00
readdir.c readdir: supply dir_context.count as readdir buffer size hint 2025-05-29 12:31:23 +02:00
remap_range.c
select.c select: core_sys_select add unlikely branch hint on return path 2025-04-21 10:27:58 +02:00
seq_file.c
signalfd.c make use of anon_inode_getfile_fmode() 2025-02-21 10:25:31 +01:00
splice.c splice: remove duplicate noinline from pipe_clear_nowait 2025-04-25 12:11:56 +02:00
stack.c
stat.c xfs: New code for 6.16 2025-05-26 12:56:01 -07:00
statfs.c
super.c fs: unlock the superblock during iterate_supers_type 2025-06-12 14:27:39 +02:00
sync.c
sysctls.c treewide: const qualify ctl_tables where applicable 2025-01-28 13:48:37 +01:00
timerfd.c A treewide hrtimer timer cleanup 2025-03-25 10:54:15 -07:00
userfaultfd.c mm/userfaultfd: fix uninitialized output field for -EAGAIN race 2025-05-07 23:39:39 -07:00
utimes.c
xattr.c fs/xattr.c: fix simple_xattr_list() 2025-06-06 10:00:17 +02:00