mirror-linux/block
Yu Kuai 5d726c4dbe blk-cgroup: fix possible deadlock while configuring policy
Following deadlock can be triggered easily by lockdep:

WARNING: possible circular locking dependency detected
6.17.0-rc3-00124-ga12c2658ced0 #1665 Not tainted
------------------------------------------------------
check/1334 is trying to acquire lock:
ff1100011d9d0678 (&q->sysfs_lock){+.+.}-{4:4}, at: blk_unregister_queue+0x53/0x180

but task is already holding lock:
ff1100011d9d00e0 (&q->q_usage_counter(queue)#3){++++}-{0:0}, at: del_gendisk+0xba/0x110

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (&q->q_usage_counter(queue)#3){++++}-{0:0}:
       blk_queue_enter+0x40b/0x470
       blkg_conf_prep+0x7b/0x3c0
       tg_set_limit+0x10a/0x3e0
       cgroup_file_write+0xc6/0x420
       kernfs_fop_write_iter+0x189/0x280
       vfs_write+0x256/0x490
       ksys_write+0x83/0x190
       __x64_sys_write+0x21/0x30
       x64_sys_call+0x4608/0x4630
       do_syscall_64+0xdb/0x6b0
       entry_SYSCALL_64_after_hwframe+0x76/0x7e

-> #1 (&q->rq_qos_mutex){+.+.}-{4:4}:
       __mutex_lock+0xd8/0xf50
       mutex_lock_nested+0x2b/0x40
       wbt_init+0x17e/0x280
       wbt_enable_default+0xe9/0x140
       blk_register_queue+0x1da/0x2e0
       __add_disk+0x38c/0x5d0
       add_disk_fwnode+0x89/0x250
       device_add_disk+0x18/0x30
       virtblk_probe+0x13a3/0x1800
       virtio_dev_probe+0x389/0x610
       really_probe+0x136/0x620
       __driver_probe_device+0xb3/0x230
       driver_probe_device+0x2f/0xe0
       __driver_attach+0x158/0x250
       bus_for_each_dev+0xa9/0x130
       driver_attach+0x26/0x40
       bus_add_driver+0x178/0x3d0
       driver_register+0x7d/0x1c0
       __register_virtio_driver+0x2c/0x60
       virtio_blk_init+0x6f/0xe0
       do_one_initcall+0x94/0x540
       kernel_init_freeable+0x56a/0x7b0
       kernel_init+0x2b/0x270
       ret_from_fork+0x268/0x4c0
       ret_from_fork_asm+0x1a/0x30

-> #0 (&q->sysfs_lock){+.+.}-{4:4}:
       __lock_acquire+0x1835/0x2940
       lock_acquire+0xf9/0x450
       __mutex_lock+0xd8/0xf50
       mutex_lock_nested+0x2b/0x40
       blk_unregister_queue+0x53/0x180
       __del_gendisk+0x226/0x690
       del_gendisk+0xba/0x110
       sd_remove+0x49/0xb0 [sd_mod]
       device_remove+0x87/0xb0
       device_release_driver_internal+0x11e/0x230
       device_release_driver+0x1a/0x30
       bus_remove_device+0x14d/0x220
       device_del+0x1e1/0x5a0
       __scsi_remove_device+0x1ff/0x2f0
       scsi_remove_device+0x37/0x60
       sdev_store_delete+0x77/0x100
       dev_attr_store+0x1f/0x40
       sysfs_kf_write+0x65/0x90
       kernfs_fop_write_iter+0x189/0x280
       vfs_write+0x256/0x490
       ksys_write+0x83/0x190
       __x64_sys_write+0x21/0x30
       x64_sys_call+0x4608/0x4630
       do_syscall_64+0xdb/0x6b0
       entry_SYSCALL_64_after_hwframe+0x76/0x7e

other info that might help us debug this:

Chain exists of:
  &q->sysfs_lock --> &q->rq_qos_mutex --> &q->q_usage_counter(queue)#3

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&q->q_usage_counter(queue)#3);
                               lock(&q->rq_qos_mutex);
                               lock(&q->q_usage_counter(queue)#3);
  lock(&q->sysfs_lock);

Root cause is that queue_usage_counter is grabbed with rq_qos_mutex
held in blkg_conf_prep(), while queue should be freezed before
rq_qos_mutex from other context.

The blk_queue_enter() from blkg_conf_prep() is used to protect against
policy deactivation, which is already protected with blkcg_mutex, hence
convert blk_queue_enter() to blkcg_mutex to fix this problem. Meanwhile,
consider that blkcg_mutex is held after queue is freezed from policy
deactivation, also convert blkg_alloc() to use GFP_NOIO.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-09-23 05:22:14 -06:00
..
partitions block: switch ->getgeo() to struct gendisk 2025-08-13 02:59:29 -04:00
Kconfig block: Remove obsolete configs BLK_MQ_{PCI,VIRTIO} 2025-05-14 05:43:56 -06:00
Kconfig.iosched
Makefile blk-mq: move the DMA mapping code to a separate file 2025-05-16 08:43:41 -06:00
badblocks.c badblocks: Fix a nonsense WARN_ON() which checks whether a u64 variable < 0 2025-03-10 07:41:58 -06:00
bdev.c xfs: New code for 6.16 2025-05-26 12:56:01 -07:00
bfq-cgroup.c Revert "block, bfq: merge bfq_release_process_ref() into bfq_put_cooperator()" 2024-11-19 19:05:32 -07:00
bfq-iosched.c blk-mq: fix elevator depth_updated method 2025-09-05 13:52:52 -06:00
bfq-iosched.h lib/sbitmap: convert shallow_depth from one word to the whole sbitmap 2025-08-07 06:30:17 -06:00
bfq-wf2q.c
bio-integrity-auto.c block: rename tuple_size field in blk_integrity to metadata_size 2025-07-01 14:00:14 +02:00
bio-integrity.c blk-integrity: enable p2p source and destination 2025-09-09 10:33:27 -06:00
bio.c block: cleanup bio_issue 2025-09-10 05:23:45 -06:00
blk-cgroup-fc-appid.c block: Replace all non-returning strlcpy with strscpy 2023-06-01 09:13:31 -06:00
blk-cgroup-rwstat.c blk-cgroup: use group allocation/free of per-cpu counters API 2024-04-03 09:10:17 -06:00
blk-cgroup-rwstat.h blk-cgroup: rwstat: fix kernel-doc warnings in header file 2025-01-13 07:47:09 -07:00
blk-cgroup.c blk-cgroup: fix possible deadlock while configuring policy 2025-09-23 05:22:14 -06:00
blk-cgroup.h block: initialize bio issue time in blk_mq_submit_bio() 2025-09-10 05:23:45 -06:00
blk-core.c block: fix ordering of recursive split IO 2025-09-10 05:23:46 -06:00
blk-crypto-fallback.c blk-crypto: convert to use bio_submit_split_bioset() 2025-09-10 05:23:46 -06:00
blk-crypto-internal.h blk-crypto: add ioctls to create and prepare hardware-wrapped keys 2025-02-10 09:54:19 -07:00
blk-crypto-profile.c blk-crypto: export wrapped key functions 2025-05-06 19:08:08 +02:00
blk-crypto-sysfs.c blk-crypto: show supported key types in sysfs 2025-02-10 09:54:19 -07:00
blk-crypto.c blk-crypto: add ioctls to create and prepare hardware-wrapped keys 2025-02-10 09:54:19 -07:00
blk-flush.c block: remove unused parameter 2025-03-12 08:25:28 -06:00
blk-ia-ranges.c block: get rid of request queue ->sysfs_dir_lock 2025-01-29 07:16:47 -07:00
blk-integrity.c blk-integrity: use iterator for mapping sg 2025-08-25 07:44:39 -06:00
blk-ioc.c blk-ioc: don't hold queue_lock for ioc_lookup_icq() 2025-07-29 06:26:34 -06:00
blk-iocost.c for-6.15/block-20250322 2025-03-26 18:08:55 -07:00
blk-iolatency.c blk-mq: add QUEUE_FLAG_BIO_ISSUE_TIME 2025-09-10 05:23:45 -06:00
blk-ioprio.c blk-cgroup: Simplify policy files registration 2025-03-11 09:22:55 -10:00
blk-ioprio.h blk-ioprio: remove per-disk structure 2024-07-28 16:47:51 -06:00
blk-lib.c block: fix detection of unsupported WRITE SAME in blkdev_issue_write_zeroes 2024-08-28 08:49:25 -06:00
blk-map.c blk-map: provide the bdev to bio if one exists 2025-09-09 10:35:28 -06:00
blk-merge.c block: fix ordering of recursive split IO 2025-09-10 05:23:46 -06:00
blk-mq-cpumap.c blk-mq: add number of queue calc helper 2025-07-01 10:24:19 -06:00
blk-mq-debugfs.c blk-mq: add QUEUE_FLAG_BIO_ISSUE_TIME 2025-09-10 05:23:45 -06:00
blk-mq-debugfs.h block: Replace zone_wlock debugfs entry with zone_wplugs entry 2024-04-17 08:44:03 -06:00
blk-mq-dma.c blk-mq-dma: bring back p2p request flags 2025-09-09 10:33:35 -06:00
blk-mq-sched.c blk-mq-sched: add new parameter nr_requests in blk_mq_alloc_sched_tags() 2025-09-10 05:25:56 -06:00
blk-mq-sched.h blk-mq-sched: add new parameter nr_requests in blk_mq_alloc_sched_tags() 2025-09-10 05:25:56 -06:00
blk-mq-sysfs.c blk-mq: Move flush queue allocation into blk_mq_init_hctx() 2025-09-08 08:05:32 -06:00
blk-mq-tag.c blk-mq: fix null-ptr-deref in blk_mq_free_tags() from error path 2025-09-23 01:35:52 -06:00
blk-mq.c blk-mq: fix null-ptr-deref in blk_mq_free_tags() from error path 2025-09-23 01:35:52 -06:00
blk-mq.h blk-mq: remove blk_mq_tag_update_depth() 2025-09-10 05:25:56 -06:00
blk-pm.c block: force noio scope in blk_mq_freeze_queue 2025-01-31 07:20:08 -07:00
blk-pm.h
blk-rq-qos.c block: avoid cpu_hotplug_lock depedency on freeze_lock 2025-08-21 07:11:11 -06:00
blk-rq-qos.h block: avoid cpu_hotplug_lock depedency on freeze_lock 2025-08-21 07:11:11 -06:00
blk-settings.c block: relax atomic write boundary vs chunk size check 2025-09-16 12:29:10 -06:00
blk-stat.c treewide, timers: Rename from_timer() to timer_container_of() 2025-06-08 09:07:37 +02:00
blk-stat.h treewide: Switch/rename to timer_delete[_sync]() 2025-04-05 10:30:12 +02:00
blk-sysfs.c blk-mq: fix potential deadlock while nr_requests grown 2025-09-10 05:25:56 -06:00
blk-throttle.c blk-throttle: fix throtl_data leak during disk release 2025-09-17 07:27:29 -06:00
blk-throttle.h blk-throttle: fix access race during throttle policy activation 2025-09-08 08:24:44 -06:00
blk-timeout.c
blk-wbt.c blk-wbt: Eliminate ambiguity in the comments of struct rq_wb 2025-08-11 10:21:38 -06:00
blk-wbt.h blk-wbt: remove the separate write cache tracking 2023-12-26 09:28:10 -07:00
blk-zoned.c block: add trace messages to zone write plugging 2025-07-15 08:03:49 -06:00
blk.h block: fix ordering of recursive split IO 2025-09-10 05:23:46 -06:00
bsg-lib.c block: remove unused parameter 'q' parameter in __blk_rq_map_sg() 2025-03-13 05:46:19 -06:00
bsg.c SCSI misc on 20230629 2023-06-30 11:57:07 -07:00
disk-events.c block: move bdev_mark_dead out of disk_check_media_change 2023-10-28 13:29:23 +02:00
early-lookup.c wrapper for access to ->bd_partno 2024-05-02 17:48:09 -04:00
elevator.c blk-mq-sched: add new parameter nr_requests in blk_mq_alloc_sched_tags() 2025-09-10 05:25:56 -06:00
elevator.h blk-mq: fix elevator depth_updated method 2025-09-05 13:52:52 -06:00
fops.c block: simplify direct io validity check 2025-09-09 10:27:01 -06:00
genhd.c block: fix kobject double initialization in add_disk 2025-08-11 08:00:49 -06:00
holder.c block: fix deadlock between bd_link_disk_holder and partition scan 2024-02-23 07:44:19 -07:00
ioctl.c block: switch ->getgeo() to struct gendisk 2025-08-13 02:59:29 -04:00
ioprio.c block: remove test of incorrect io priority level 2025-05-08 09:04:12 -06:00
kyber-iosched.c blk-mq: fix elevator depth_updated method 2025-09-05 13:52:52 -06:00
mq-deadline.c block/mq-deadline: Remove the redundant rb_entry_rq in the deadline_from_pos(). 2025-09-15 13:00:05 -06:00
opal_proto.h block: sed-opal: handle empty atoms when parsing response 2024-02-16 15:52:45 -07:00
sed-opal.c block: sed-opal: add ioctl IOC_OPAL_SET_SID_PW 2024-10-22 08:16:40 -06:00
t10-pi.c block: rename tuple_size field in blk_integrity to metadata_size 2025-07-01 14:00:14 +02:00