mirror-linux/drivers
Rob Clark ee42bfc791 interconnect: Fix locking for runpm vs reclaim
[ Upstream commit af42269c35 ]

For cases where icc_bw_set() can be called in callbaths that could
deadlock against shrinker/reclaim, such as runpm resume, we need to
decouple the icc locking.  Introduce a new icc_bw_lock for cases where
we need to serialize bw aggregation and update to decouple that from
paths that require memory allocation such as node/link creation/
destruction.

Fixes this lockdep splat:

   ======================================================
   WARNING: possible circular locking dependency detected
   6.2.0-rc8-debug+ #554 Not tainted
   ------------------------------------------------------
   ring0/132 is trying to acquire lock:
   ffffff80871916d0 (&gmu->lock){+.+.}-{3:3}, at: a6xx_pm_resume+0xf0/0x234

   but task is already holding lock:
   ffffffdb5aee57e8 (dma_fence_map){++++}-{0:0}, at: msm_job_run+0x68/0x150

   which lock already depends on the new lock.

   the existing dependency chain (in reverse order) is:

   -> #4 (dma_fence_map){++++}-{0:0}:
          __dma_fence_might_wait+0x74/0xc0
          dma_resv_lockdep+0x1f4/0x2f4
          do_one_initcall+0x104/0x2bc
          kernel_init_freeable+0x344/0x34c
          kernel_init+0x30/0x134
          ret_from_fork+0x10/0x20

   -> #3 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}:
          fs_reclaim_acquire+0x80/0xa8
          slab_pre_alloc_hook.constprop.0+0x40/0x25c
          __kmem_cache_alloc_node+0x60/0x1cc
          __kmalloc+0xd8/0x100
          topology_parse_cpu_capacity+0x8c/0x178
          get_cpu_for_node+0x88/0xc4
          parse_cluster+0x1b0/0x28c
          parse_cluster+0x8c/0x28c
          init_cpu_topology+0x168/0x188
          smp_prepare_cpus+0x24/0xf8
          kernel_init_freeable+0x18c/0x34c
          kernel_init+0x30/0x134
          ret_from_fork+0x10/0x20

   -> #2 (fs_reclaim){+.+.}-{0:0}:
          __fs_reclaim_acquire+0x3c/0x48
          fs_reclaim_acquire+0x54/0xa8
          slab_pre_alloc_hook.constprop.0+0x40/0x25c
          __kmem_cache_alloc_node+0x60/0x1cc
          __kmalloc+0xd8/0x100
          kzalloc.constprop.0+0x14/0x20
          icc_node_create_nolock+0x4c/0xc4
          icc_node_create+0x38/0x58
          qcom_icc_rpmh_probe+0x1b8/0x248
          platform_probe+0x70/0xc4
          really_probe+0x158/0x290
          __driver_probe_device+0xc8/0xe0
          driver_probe_device+0x44/0x100
          __driver_attach+0xf8/0x108
          bus_for_each_dev+0x78/0xc4
          driver_attach+0x2c/0x38
          bus_add_driver+0xd0/0x1d8
          driver_register+0xbc/0xf8
          __platform_driver_register+0x30/0x3c
          qnoc_driver_init+0x24/0x30
          do_one_initcall+0x104/0x2bc
          kernel_init_freeable+0x344/0x34c
          kernel_init+0x30/0x134
          ret_from_fork+0x10/0x20

   -> #1 (icc_lock){+.+.}-{3:3}:
          __mutex_lock+0xcc/0x3c8
          mutex_lock_nested+0x30/0x44
          icc_set_bw+0x88/0x2b4
          _set_opp_bw+0x8c/0xd8
          _set_opp+0x19c/0x300
          dev_pm_opp_set_opp+0x84/0x94
          a6xx_gmu_resume+0x18c/0x804
          a6xx_pm_resume+0xf8/0x234
          adreno_runtime_resume+0x2c/0x38
          pm_generic_runtime_resume+0x30/0x44
          __rpm_callback+0x15c/0x174
          rpm_callback+0x78/0x7c
          rpm_resume+0x318/0x524
          __pm_runtime_resume+0x78/0xbc
          adreno_load_gpu+0xc4/0x17c
          msm_open+0x50/0x120
          drm_file_alloc+0x17c/0x228
          drm_open_helper+0x74/0x118
          drm_open+0xa0/0x144
          drm_stub_open+0xd4/0xe4
          chrdev_open+0x1b8/0x1e4
          do_dentry_open+0x2f8/0x38c
          vfs_open+0x34/0x40
          path_openat+0x64c/0x7b4
          do_filp_open+0x54/0xc4
          do_sys_openat2+0x9c/0x100
          do_sys_open+0x50/0x7c
          __arm64_sys_openat+0x28/0x34
          invoke_syscall+0x8c/0x128
          el0_svc_common.constprop.0+0xa0/0x11c
          do_el0_svc+0xac/0xbc
          el0_svc+0x48/0xa0
          el0t_64_sync_handler+0xac/0x13c
          el0t_64_sync+0x190/0x194

   -> #0 (&gmu->lock){+.+.}-{3:3}:
          __lock_acquire+0xe00/0x1060
          lock_acquire+0x1e0/0x2f8
          __mutex_lock+0xcc/0x3c8
          mutex_lock_nested+0x30/0x44
          a6xx_pm_resume+0xf0/0x234
          adreno_runtime_resume+0x2c/0x38
          pm_generic_runtime_resume+0x30/0x44
          __rpm_callback+0x15c/0x174
          rpm_callback+0x78/0x7c
          rpm_resume+0x318/0x524
          __pm_runtime_resume+0x78/0xbc
          pm_runtime_get_sync.isra.0+0x14/0x20
          msm_gpu_submit+0x58/0x178
          msm_job_run+0x78/0x150
          drm_sched_main+0x290/0x370
          kthread+0xf0/0x100
          ret_from_fork+0x10/0x20

   other info that might help us debug this:

   Chain exists of:
     &gmu->lock --> mmu_notifier_invalidate_range_start --> dma_fence_map

    Possible unsafe locking scenario:

          CPU0                    CPU1
          ----                    ----
     lock(dma_fence_map);
                                  lock(mmu_notifier_invalidate_range_start);
                                  lock(dma_fence_map);
     lock(&gmu->lock);

    *** DEADLOCK ***

   2 locks held by ring0/132:
    #0: ffffff8087191170 (&gpu->lock){+.+.}-{3:3}, at: msm_job_run+0x64/0x150
    #1: ffffffdb5aee57e8 (dma_fence_map){++++}-{0:0}, at: msm_job_run+0x68/0x150

   stack backtrace:
   CPU: 7 PID: 132 Comm: ring0 Not tainted 6.2.0-rc8-debug+ #554
   Hardware name: Google Lazor (rev1 - 2) with LTE (DT)
   Call trace:
    dump_backtrace.part.0+0xb4/0xf8
    show_stack+0x20/0x38
    dump_stack_lvl+0x9c/0xd0
    dump_stack+0x18/0x34
    print_circular_bug+0x1b4/0x1f0
    check_noncircular+0x78/0xac
    __lock_acquire+0xe00/0x1060
    lock_acquire+0x1e0/0x2f8
    __mutex_lock+0xcc/0x3c8
    mutex_lock_nested+0x30/0x44
    a6xx_pm_resume+0xf0/0x234
    adreno_runtime_resume+0x2c/0x38
    pm_generic_runtime_resume+0x30/0x44
    __rpm_callback+0x15c/0x174
    rpm_callback+0x78/0x7c
    rpm_resume+0x318/0x524
    __pm_runtime_resume+0x78/0xbc
    pm_runtime_get_sync.isra.0+0x14/0x20
    msm_gpu_submit+0x58/0x178
    msm_job_run+0x78/0x150
    drm_sched_main+0x290/0x370
    kthread+0xf0/0x100
    ret_from_fork+0x10/0x20

Signed-off-by: Rob Clark <robdclark@chromium.org>
Link: https://lore.kernel.org/r/20230807171148.210181-7-robdclark@gmail.com
Signed-off-by: Georgi Djakov <djakov@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-23 11:11:07 +02:00
..
accessibility
acpi ACPI: x86: s2idle: Catch multiple ACPI_TYPE_PACKAGE objects 2023-09-23 11:11:00 +02:00
amba amba: bus: fix refcount leak 2023-09-13 09:42:56 +02:00
android binder: fix memory leak in binder_init() 2023-08-16 18:27:24 +02:00
ata ata: pata_ftide010: Add missing MODULE_DESCRIPTION 2023-09-19 12:28:05 +02:00
atm
auxdisplay
base drivers: base: Free devm resources when unregistering a device 2023-09-13 09:42:54 +02:00
bcma
block null_blk: fix poll request timeout handling 2023-09-19 12:27:56 +02:00
bluetooth Bluetooth: btusb: Do not call kfree_skb() under spin_lock_irqsave() 2023-09-13 09:42:34 +02:00
bus bus: ti-sysc: Configure uart quirks for k3 SoC 2023-09-23 11:11:05 +02:00
cdrom
char tpm_tis: Resend command to recover from data transfer errors 2023-09-23 11:11:02 +02:00
clk clk: qcom: mss-sc7180: fix missing resume during probe 2023-09-19 12:27:57 +02:00
clocksource clocksource/drivers/arm_arch_timer: Disable timer before programming CVAL 2023-09-19 12:28:04 +02:00
comedi
connector
counter
cpufreq cpufreq: brcmstb-avs-cpufreq: Fix -Warray-bounds bug 2023-09-13 09:43:04 +02:00
cpuidle powerpc/pseries: Rework lppaca_shared_proc() to avoid DEBUG_PREEMPT 2023-09-13 09:42:48 +02:00
crypto crypto: stm32 - fix loop iterating through scatterlist for DMA 2023-09-13 09:43:04 +02:00
cxl cxl/acpi: Return 'rc' instead of '0' in cxl_parse_cfmws() 2023-08-03 10:24:04 +02:00
dax dax/kmem: Pass valid argument to memory_group_register_static 2023-07-19 16:21:43 +02:00
dca
devfreq PM / devfreq: Fix leak in devfreq_dev_release() 2023-09-13 09:42:59 +02:00
dio
dma dmaengine: sh: rz-dmac: Fix destination and source data size setting 2023-09-19 12:28:04 +02:00
dma-buf dma-buf/sw_sync: Avoid recursive lock during fence signal 2023-08-30 16:11:12 +02:00
edac EDAC/igen6: Fix the issue of no error events 2023-09-13 09:42:45 +02:00
eisa
extcon extcon: cht_wc: add POWER_SUPPLY dependency 2023-09-13 09:42:53 +02:00
firewire firewire: net: fix use after free in fwnet_finish_incoming_packet() 2023-08-23 17:52:24 +02:00
firmware arm64: sdei: abort running SDEI handlers during crash 2023-09-13 09:43:03 +02:00
fpga
fsi fsi: aspeed: Reset master errors after CFAM reset 2023-09-13 09:42:54 +02:00
gnss
gpio gpio: sim: pass the GPIO device's software node to irq domain 2023-08-30 16:11:13 +02:00
gpu drm/mediatek: dp: Change logging to dev for mtk_dp_aux_transfer() 2023-09-23 11:11:04 +02:00
greybus
hid HID: multitouch: Correct devm device reference for hidinput input_dev name 2023-09-13 09:42:57 +02:00
hsi
hte
hv Drivers: hv: vmbus: Don't dereference ACPI root object handle 2023-09-13 09:42:59 +02:00
hwmon hwmon: (tmp513) Fix the channel number in tmp51x_is_visible() 2023-09-13 09:42:35 +02:00
hwspinlock hwspinlock: qcom: add missing regmap config for SFPB MMIO implementation 2023-09-19 12:28:05 +02:00
hwtracing coresight: trbe: Fix TRBE potential sleep in atomic context 2023-09-13 09:42:56 +02:00
i2c treewide: Fix probing of devices in DT overlays 2023-09-13 09:43:05 +02:00
i3c i3c: master: svc: fix probe failure when no i3c device exist 2023-09-13 09:43:01 +02:00
idle
iio iio: accel: adxl313: Fix adxl313_i2c_id[] table 2023-09-13 09:42:52 +02:00
infiniband RDMA/efa: Fix wrong resources deallocation order 2023-09-13 09:42:57 +02:00
input Input: tca6416-keypad - fix interrupt enable disbalance 2023-09-19 12:27:59 +02:00
interconnect interconnect: Fix locking for runpm vs reclaim 2023-09-23 11:11:07 +02:00
iommu iommu/vt-d: Fix to flush cache of PASID directory table 2023-09-13 09:42:54 +02:00
ipack
irqchip irqchip/loongson-eiointc: Fix return value checking of eiointc_index 2023-09-13 09:42:29 +02:00
isdn mISDN: Update parameter type of dsp_cmx_send() 2023-08-16 18:27:26 +02:00
leds leds: trigger: tty: Do not use LED_ON/OFF constants, use led_blink_set_oneshot instead 2023-09-13 09:42:58 +02:00
macintosh
mailbox mailbox: qcom-ipcc: fix incorrect num_chans counting 2023-09-19 12:27:58 +02:00
mcb mcb-pci: Reallocate memory region to avoid memory overlapping 2023-05-24 17:32:41 +01:00
md md: raid1: fix potential OOB in raid1_remove_disk() 2023-09-23 11:11:05 +02:00
media media: pci: ipu3-cio2: Initialise timing struct to avoid a compiler warning 2023-09-23 11:11:07 +02:00
memory memory: brcmstb_dpfe: fix testing array offset after use 2023-07-19 16:21:24 +02:00
memstick memstick r592: make memstick_debug_get_tpc_name() static 2023-07-19 16:21:08 +02:00
message scsi: message: mptlan: Fix use after free bug in mptlan_remove() due to race condition 2023-05-24 17:32:37 +01:00
mfd mfd: pm8008: Fix module autoloading 2023-07-23 13:49:37 +02:00
misc misc: open-dice: make OPEN_DICE depend on HAS_IOMEM 2023-09-23 11:11:07 +02:00
mmc mmc: sdhci-esdhc-imx: improve ESDHC_FLAG_ERR010450 2023-09-23 11:11:02 +02:00
most
mtd mtd: rawnand: brcmnand: Fix ECC level field setting for v7.2 controller 2023-09-19 12:28:06 +02:00
mux
net wifi: mac80211_hwsim: drop short frames 2023-09-23 11:11:03 +02:00
nfc nfcsim.c: Fix error checking for debugfs_create_dir 2023-06-28 11:12:36 +02:00
ntb ntb: Fix calculation ntb_transport_tx_free_entry() 2023-09-13 09:43:02 +02:00
nubus nubus: Partially revert proc_create_single_data() conversion 2023-07-05 18:27:37 +01:00
nvdimm nvdimm: Fix dereference after free in register_nvdimm_pmu() 2023-09-13 09:42:47 +02:00
nvme nvme-rdma: fix potential unbalanced freeze & unfreeze 2023-08-16 18:27:30 +02:00
nvmem nvmem: rmem: Use NVMEM_DEVID_AUTO 2023-07-19 16:21:57 +02:00
of treewide: Fix probing of devices in DT overlays 2023-09-13 09:43:05 +02:00
opp OPP: Fix passing 0 to PTR_ERR in _opp_attach_genpd() 2023-09-13 09:42:28 +02:00
parisc parisc: led: Reduce CPU overhead for disk & lan LED computation 2023-09-19 12:27:57 +02:00
parport
pci PCI: fu740: Set the number of MSI vectors 2023-09-23 11:11:05 +02:00
pcmcia pcmcia: rsrc_nonstatic: Fix memory leak in nonstatic_release_resource_db() 2023-08-23 17:52:24 +02:00
peci
perf perf/imx_ddr: speed up overflow frequency of cycle 2023-09-23 11:11:00 +02:00
phy phy/rockchip: inno-hdmi: do not power on rk3328 post pll on reg write 2023-09-13 09:42:58 +02:00
pinctrl pinctrl: cherryview: fix address_space_handler() argument 2023-09-19 12:27:57 +02:00
platform platform/mellanox: NVSW_SN2201 should depend on ACPI 2023-09-19 12:28:09 +02:00
pnp
power power: supply: Fix logic checking if system is running from battery 2023-06-21 16:00:52 +02:00
powercap powercap: RAPL: Fix CONFIG_IOSF_MBI dependency 2023-07-19 16:21:00 +02:00
pps
ps3
ptp
pwm pwm: lpc32xx: Remove handling of PWM channels 2023-09-19 12:28:00 +02:00
rapidio
ras
regulator regulator: tps65219: Fix matching interrupts for their regulators 2023-07-19 16:22:14 +02:00
remoteproc remoteproc: imx_dsp_rproc: Fix kernel test robot sparse warning 2023-05-24 17:32:53 +01:00
reset
rpmsg rpmsg: glink: Add check for kstrdup 2023-09-13 09:42:58 +02:00
rtc rtc: ds1685: use EXPORT_SYMBOL_GPL for ds1685_rtc_poweroff 2023-09-06 21:27:00 +01:00
s390 s390/zcrypt: don't leak memory if dev_set_name() fails 2023-09-19 12:28:03 +02:00
sbus
scsi scsi: lpfc: Abort outstanding ELS cmds when mailbox timeout error is detected 2023-09-23 11:11:06 +02:00
sh
siox
slimbus
soc soc: qcom: qmi_encdec: Restrict string length in decode 2023-09-19 12:27:57 +02:00
soundwire soundwire: fix enumeration completion 2023-08-03 10:24:15 +02:00
spi treewide: Fix probing of devices in DT overlays 2023-09-13 09:43:05 +02:00
spmi
ssb
staging media: rkvdec: increase max supported height for H.264 2023-09-13 09:42:50 +02:00
target scsi: target: iscsi: Fix buffer overflow in lio_target_nacl_info_show() 2023-09-23 11:11:07 +02:00
tc
tee tee: amdtee: Add return_origin to 'struct tee_cmd_load_ta' 2023-06-14 11:15:28 +02:00
thermal thermal/of: Fix potential uninitialized value access 2023-09-13 09:42:29 +02:00
thunderbolt thunderbolt: Fix a backport error for display flickering issue 2023-09-02 09:16:20 +02:00
tty serial: cpm_uart: Avoid suspicious locking 2023-09-23 11:11:07 +02:00
ufs scsi: ufs: Try harder to change the power mode 2023-09-13 09:42:20 +02:00
uio
usb usb: chipidea: add workaround for chipidea PEC bug 2023-09-23 11:11:07 +02:00
vdpa vdpa: Enable strict validation for netlinks ops 2023-08-23 17:52:31 +02:00
vfio vfio/type1: fix cap_migration information leak 2023-09-13 09:42:47 +02:00
vhost vhost_net: revert upend_idx only on retriable error 2023-06-28 11:12:40 +02:00
video backlight: gpio_backlight: Drop output GPIO direction check for initial power state 2023-09-19 12:27:59 +02:00
virt virt: sevguest: Add CONFIG_CRYPTO dependency 2023-07-19 16:20:55 +02:00
virtio virtio_ring: fix avail_wrap_counter in virtqueue_add_packed 2023-09-13 09:42:59 +02:00
vlynq
w1 w1: fix loop in w1_fini() 2023-07-19 16:21:48 +02:00
watchdog watchdog: intel-mid_wdt: add MODULE_ALIAS() to allow auto-load 2023-09-19 12:28:00 +02:00
xen xen: speed up grant-table reclaim 2023-08-03 10:24:14 +02:00
zorro
Kconfig
Makefile