mirror-linux/fs
Darrick J. Wong 9222068bc8 xfs: fix xfs_inodegc_stop racing with mod_delayed_work
commit 2254a7396a upstream.

syzbot reported this warning from the faux inodegc shrinker that tries
to kick off inodegc work:

------------[ cut here ]------------
WARNING: CPU: 1 PID: 102 at kernel/workqueue.c:1445 __queue_work+0xd44/0x1120 kernel/workqueue.c:1444
RIP: 0010:__queue_work+0xd44/0x1120 kernel/workqueue.c:1444
Call Trace:
 __queue_delayed_work+0x1c8/0x270 kernel/workqueue.c:1672
 mod_delayed_work_on+0xe1/0x220 kernel/workqueue.c:1746
 xfs_inodegc_shrinker_scan fs/xfs/xfs_icache.c:2212 [inline]
 xfs_inodegc_shrinker_scan+0x250/0x4f0 fs/xfs/xfs_icache.c:2191
 do_shrink_slab+0x428/0xaa0 mm/vmscan.c:853
 shrink_slab+0x175/0x660 mm/vmscan.c:1013
 shrink_one+0x502/0x810 mm/vmscan.c:5343
 shrink_many mm/vmscan.c:5394 [inline]
 lru_gen_shrink_node mm/vmscan.c:5511 [inline]
 shrink_node+0x2064/0x35f0 mm/vmscan.c:6459
 kswapd_shrink_node mm/vmscan.c:7262 [inline]
 balance_pgdat+0xa02/0x1ac0 mm/vmscan.c:7452
 kswapd+0x677/0xd60 mm/vmscan.c:7712
 kthread+0x2e8/0x3a0 kernel/kthread.c:376
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308

This warning corresponds to this code in __queue_work:

	/*
	 * For a draining wq, only works from the same workqueue are
	 * allowed. The __WQ_DESTROYING helps to spot the issue that
	 * queues a new work item to a wq after destroy_workqueue(wq).
	 */
	if (unlikely(wq->flags & (__WQ_DESTROYING | __WQ_DRAINING) &&
		     WARN_ON_ONCE(!is_chained_work(wq))))
		return;

For this to trip, we must have a thread draining the inodedgc workqueue
and a second thread trying to queue inodegc work to that workqueue.
This can happen if freezing or a ro remount race with reclaim poking our
faux inodegc shrinker and another thread dropping an unlinked O_RDONLY
file:

Thread 0	Thread 1	Thread 2

xfs_inodegc_stop

				xfs_inodegc_shrinker_scan
				xfs_is_inodegc_enabled
				<yes, will continue>

xfs_clear_inodegc_enabled
xfs_inodegc_queue_all
<list empty, do not queue inodegc worker>

		xfs_inodegc_queue
		<add to list>
		xfs_is_inodegc_enabled
		<no, returns>

drain_workqueue
<set WQ_DRAINING>

				llist_empty
				<no, will queue list>
				mod_delayed_work_on(..., 0)
				__queue_work
				<sees WQ_DRAINING, kaboom>

In other words, everything between the access to inodegc_enabled state
and the decision to poke the inodegc workqueue requires some kind of
coordination to avoid the WQ_DRAINING state.  We could perhaps introduce
a lock here, but we could also try to eliminate WQ_DRAINING from the
picture.

We could replace the drain_workqueue call with a loop that flushes the
workqueue and queues workers as long as there is at least one inode
present in the per-cpu inodegc llists.  We've disabled inodegc at this
point, so we know that the number of queued inodes will eventually hit
zero as long as xfs_inodegc_start cannot reactivate the workers.

There are four callers of xfs_inodegc_start.  Three of them come from the
VFS with s_umount held: filesystem thawing, failed filesystem freezing,
and the rw remount transition.  The fourth caller is mounting rw (no
remount or freezing possible).

There are three callers ofs xfs_inodegc_stop.  One is unmounting (no
remount or thaw possible).  Two of them come from the VFS with s_umount
held: fs freezing and ro remount transition.

Hence, it is correct to replace the drain_workqueue call with a loop
that drains the inodegc llists.

Fixes: 6191cf3ad5 ("xfs: flush inodegc workqueue tasks before cancel")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-19 16:22:15 +02:00
..
9p use less confusing names for iov_iter direction initializers 2023-02-09 11:28:04 +01:00
adfs
affs affs: initialize fsdata in affs_truncate() 2023-02-01 08:34:08 +01:00
afs afs: Fix accidental truncation when storing data 2023-07-19 16:22:06 +02:00
autofs
befs
bfs
btrfs btrfs: do not BUG_ON() on tree mod log failure at __btrfs_cow_block() 2023-07-19 16:22:14 +02:00
cachefiles cachefiles: use vfs_tmpfile_open() helper 2022-09-24 07:00:00 +02:00
ceph ceph: fix use-after-free bug for inodes when flushing capsnaps 2023-06-14 11:15:27 +02:00
coda coda: Avoid partial allocation of sig_inputArgs 2023-03-10 09:33:52 +01:00
configfs configfs: fix possible memory leak in configfs_create_dir() 2022-12-31 13:32:22 +01:00
cramfs fs/cramfs/inode.c: initialize file_ra_state 2023-03-10 09:34:09 +01:00
crypto blk-crypto: add a blk_crypto_config_supported_natively helper 2023-05-11 23:03:00 +09:00
debugfs debugfs: fix error when writing negative value to atomic_t debugfs file 2022-12-31 13:31:58 +01:00
devpts
dlm fs: dlm: fix race setting stop tx flag 2023-03-17 08:50:19 +01:00
ecryptfs whack-a-mole: constifying struct path * 2022-10-06 17:31:02 -07:00
efivarfs efi: efivars: Fix variable writes without query_variable_store() 2022-10-21 11:09:40 +02:00
efs
erofs erofs: fix compact 4B support for 16k block size 2023-07-19 16:20:59 +02:00
exfat exfat: fix inode->i_blocks for non-512 byte sector size device 2023-03-10 09:34:08 +01:00
exportfs
ext2 ext2: Check block size validity during mount 2023-05-24 17:32:36 +01:00
ext4 ext4: Remove ext4 locking of moved directory 2023-07-19 16:22:12 +02:00
f2fs Revert "f2fs: fix potential corruption when moving a directory" 2023-07-19 16:22:12 +02:00
fat treewide: use get_random_u32() when possible 2022-10-11 17:42:58 -06:00
freevxfs
fscache fscache: Use clear_and_wake_up_bit() in fscache_create_volume_work() 2023-02-22 12:59:43 +01:00
fuse fuse: always revalidate rename target dentry 2023-04-26 14:28:42 +02:00
gfs2 gfs2: Fix duplicate should_fault_in_pages() call 2023-07-19 16:21:54 +02:00
hfs hfs: fix missing hfs_bnode_get() in __hfs_bnode_create 2023-03-10 09:34:07 +01:00
hfsplus fs: hfsplus: remove WARN_ON() from hfsplus_cat_{read,write}_inode() 2023-05-24 17:32:34 +01:00
hostfs hostfs: move from strlcpy with unused retval to strscpy 2022-09-19 22:46:25 +02:00
hpfs
hugetlbfs hugetlbfs: fix null-ptr-deref in hugetlbfs_parse_param() 2022-12-31 13:33:05 +01:00
iomap iomap: add a tracepoint for mappings returned by map_blocks 2022-10-02 11:42:19 -07:00
isofs - hfs and hfsplus kmap API modernization from Fabio Francesco 2022-10-12 11:00:22 -07:00
jbd2 jdb2: Don't refuse invalidation of already invalidated buffers 2023-05-11 23:03:23 +09:00
jffs2 jffs2: reduce stack usage in jffs2_build_xattr_subsystem() 2023-07-19 16:22:11 +02:00
jfs fs/jfs: fix shift exponent db_agl2size negative 2023-03-11 13:55:16 +01:00
kernfs kernfs: fix missing kernfs_idr_lock to remove an ID from the IDR 2023-07-19 16:21:53 +02:00
lockd lockd: drop inappropriate svc_get() from locked_get() 2023-07-19 16:20:56 +02:00
minix vfs: open inside ->tmpfile() 2022-09-24 07:00:00 +02:00
netfs use less confusing names for iov_iter direction initializers 2023-02-09 11:28:04 +01:00
nfs NFSv4.1: freeze the session table upon receiving NFS4ERR_BADSESSION 2023-07-19 16:21:43 +02:00
nfs_common
nfsd NFSD: add encoding of op_recall flag for write delegation 2023-07-19 16:22:08 +02:00
nilfs2 nilfs2: prevent general protection fault in nilfs_clear_dirty_page() 2023-06-28 11:12:27 +02:00
nls
notify fanotify: disallow mount/sb marks on kernel internal pseudo fs 2023-07-19 16:22:05 +02:00
ntfs - hfs and hfsplus kmap API modernization from Fabio Francesco 2022-10-12 11:00:22 -07:00
ntfs3 ntfs: Fix panic about slab-out-of-bounds caused by ntfs_listxattr() 2023-07-19 16:22:04 +02:00
ocfs2 ocfs2: Fix use of slab data with sendpage 2023-07-19 16:21:13 +02:00
omfs
openpromfs
orangefs use less confusing names for iov_iter direction initializers 2023-02-09 11:28:04 +01:00
overlayfs ovl: update of dentry revalidate flags after copy up 2023-07-19 16:21:33 +02:00
proc sysctl: clarify register_sysctl_init() base directory order 2023-05-17 11:53:46 +02:00
pstore pstore/ram: Add check for kstrdup 2023-07-19 16:21:03 +02:00
qnx4
qnx6 fs/qnx6: delete unnecessary checks before brelse() 2022-09-11 21:55:07 -07:00
quota ext4: fix bug_on in __es_tree_search caused by bad quota inode 2023-01-07 11:11:59 +01:00
ramfs shmem: use ramfs_kill_sb() for kill_sb method of ramfs-based tmpfs 2023-07-19 16:22:11 +02:00
reiserfs reiserfs: Add security prefix to xattr name in reiserfs_security_write() 2023-05-11 23:03:02 +09:00
romfs
smb ksmbd: avoid field overflow warning 2023-07-19 16:21:44 +02:00
squashfs revert "squashfs: harden sanity check in squashfs_read_xattr_id_table" 2023-02-22 12:59:50 +01:00
sysfs
sysv fs: sysv: Fix sysv_nblocks() returns wrong value 2022-12-31 13:32:00 +01:00
tracefs tracefs: Only clobber mode/uid/gid on remount if asked 2022-09-08 17:10:54 -04:00
ubifs ubifs: Fix memory leak in do_rename 2023-05-11 23:03:05 +09:00
udf udf: Fix off-by-one error when discarding preallocation 2023-03-17 08:50:19 +01:00
ufs ufs: replace ll_rw_block() 2022-09-11 20:26:07 -07:00
unicode
vboxsf
verity fsverity: don't drop pagecache at end of FS_IOC_ENABLE_VERITY 2023-04-06 12:10:34 +02:00
xfs xfs: fix xfs_inodegc_stop racing with mod_delayed_work 2023-07-19 16:22:15 +02:00
zonefs zonefs: Always invalidate last cached page on append write 2023-04-06 12:10:52 +02:00
Kconfig smb: move client and server files to common directory fs/smb 2023-06-28 11:12:40 +02:00
Kconfig.binfmt Xtensa updates for v6.1 2022-10-10 14:21:11 -07:00
Makefile smb: move client and server files to common directory fs/smb 2023-06-28 11:12:40 +02:00
aio.c aio: fix mremap after fork null-deref 2023-02-22 12:59:46 +01:00
anon_inodes.c
attr.c attr: use consistent sgid stripping checks 2023-03-03 11:52:25 +01:00
bad_inode.c vfs: open inside ->tmpfile() 2022-09-24 07:00:00 +02:00
binfmt_elf.c mm: always expand the stack with the mmap write lock held 2023-07-01 13:16:25 +02:00
binfmt_elf_fdpic.c elfcore: Add a cprm parameter to elf_core_extra_{phdrs,data_size} 2023-01-18 11:58:12 +01:00
binfmt_elf_test.c
binfmt_flat.c
binfmt_misc.c binfmt_misc: fix shift-out-of-bounds in check_special_flags 2022-12-31 13:32:57 +01:00
binfmt_script.c
buffer.c - hfs and hfsplus kmap API modernization from Fabio Francesco 2022-10-12 11:00:22 -07:00
char_dev.c chardev: fix error handling in cdev_device_add() 2022-12-31 13:32:41 +01:00
compat_binfmt_elf.c
coredump.c coredump: Move dump_emit_page() to kill unused warning 2023-02-22 12:59:50 +01:00
d_path.c
dax.c Merge branch 'for-6.0/dax' into libnvdimm-fixes 2022-09-24 18:14:12 -07:00
dcache.c tmpfile API change 2022-10-10 19:45:17 -07:00
direct-io.c block: remove PSI accounting from the bio layer 2022-09-20 08:24:38 -06:00
drop_caches.c
eventfd.c eventfd: provide a eventfd_signal_mask() helper 2023-01-04 11:28:48 +01:00
eventpoll.c epoll: ep_autoremove_wake_function should use list_del_init_careful 2023-06-21 16:00:54 +02:00
exec.c mm: always expand the stack with the mmap write lock held 2023-07-01 13:16:25 +02:00
fcntl.c
fhandle.c
file.c fs: prevent out-of-bounds array speculation when closing a file descriptor 2023-03-17 08:50:13 +01:00
file_table.c
filesystems.c
fs-writeback.c writeback: fix call of incorrect macro 2023-05-17 11:53:33 +02:00
fs_context.c fs: avoid empty option when generating legacy mount string 2023-07-19 16:22:11 +02:00
fs_parser.c ext4: journal_path mount options should follow links 2023-01-07 11:11:59 +01:00
fs_pin.c
fs_struct.c
fs_types.c
fsopen.c
init.c
inode.c fs: Establish locking order for unrelated directories 2023-07-19 16:22:12 +02:00
internal.h fs: Establish locking order for unrelated directories 2023-07-19 16:22:12 +02:00
ioctl.c
kernel_read_file.c
libfs.c libfs: add DEFINE_SIMPLE_ATTRIBUTE_SIGNED for signed value 2022-12-31 13:31:58 +01:00
locks.c filelocks: use mount idmapping for setlease permission check 2023-03-17 08:50:32 +01:00
mbcache.c ext4: fix deadlock due to mbcache entry corruption 2023-01-07 11:12:02 +01:00
mount.h
mpage.c
namei.c fs: no need to check source 2023-07-19 16:22:15 +02:00
namespace.c fs: drop peer group ids under namespace lock 2023-04-13 16:55:33 +02:00
no-block.c
nsfs.c
open.c open: return EINVAL for O_DIRECTORY | O_CREAT 2023-05-24 17:32:34 +01:00
pipe.c
pnode.c pnode: terminate at peers of source 2023-01-04 11:29:01 +01:00
pnode.h
posix_acl.c - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in 2022-10-10 17:53:04 -07:00
proc_namespace.c
read_write.c use less confusing names for iov_iter direction initializers 2023-02-09 11:28:04 +01:00
readdir.c
remap_range.c
select.c
seq_file.c use less confusing names for iov_iter direction initializers 2023-02-09 11:28:04 +01:00
signalfd.c
splice.c use less confusing names for iov_iter direction initializers 2023-02-09 11:28:04 +01:00
stack.c
stat.c vfs: support STATX_DIOALIGN on block devices 2022-09-11 19:47:12 -05:00
statfs.c statfs: enforce statfs[64] structure initialization 2023-05-24 17:32:51 +01:00
super.c fscrypt: destroy keyring after security_sb_delete() 2023-03-30 12:49:23 +02:00
sync.c
sysctls.c
timerfd.c
userfaultfd.c Revert "userfaultfd: don't fail on unrecognized features" 2023-04-26 14:28:37 +02:00
utimes.c
xattr.c fs: don't audit the capability check in simple_xattr_list() 2022-12-31 13:31:55 +01:00