mirror-linux/drivers
Ankit Agrawal a23b10608d vfio/nvgrace-gpu: wait for the GPU mem to be ready
Speculative prefetches from CPU to GPU memory until the GPU is
ready after reset can cause harmless corrected RAS events to
be logged on Grace systems. It is thus preferred that the
mapping not be re-established until the GPU is ready post reset.

The GPU readiness can be checked through BAR0 registers similar
to the checking at the time of device probe.

It can take several seconds for the GPU to be ready. So it is
desirable that the time overlaps as much of the VM startup as
possible to reduce impact on the VM bootup time. The GPU
readiness state is thus checked on the first fault/huge_fault
request or read/write access which amortizes the GPU readiness
time.

The first fault and read/write checks the GPU state when the
reset_done flag - which denotes whether the GPU has just been
reset. The memory_lock is taken across map/access to avoid
races with GPU reset.

Also check if the memory is enabled, before waiting for GPU
to be ready. Otherwise the readiness check would block for 30s.

Lastly added PM handling wrapping on read/write access.

Cc: Shameer Kolothum <skolothumtho@nvidia.com>
Cc: Alex Williamson <alex@shazbot.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Vikram Sethi <vsethi@nvidia.com>
Reviewed-by: Shameer Kolothum <skolothumtho@nvidia.com>
Suggested-by: Alex Williamson <alex@shazbot.org>
Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
Link: https://lore.kernel.org/r/20251127170632.3477-7-ankita@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
2025-11-28 10:09:26 -07:00
..
accel accel/qaic: Synchronize access to DBC request queue head & tail pointer 2025-10-14 08:56:31 -06:00
accessibility
acpi Merge branches 'acpi-button', 'acpi-video' and 'acpi-fan' 2025-10-30 20:40:49 +01:00
amba
android binder: remove "invalid inc weak" check 2025-10-22 08:04:15 +02:00
ata ata: libata-core: relax checks in ata_read_log_directory() 2025-10-13 09:12:36 +02:00
atm
auxdisplay
base regmap: Fixes for v6.18 2025-11-01 10:45:39 -07:00
bcma bcma: don't register devices disabled in OF 2025-10-20 13:54:15 +02:00
block block-6.18-20251031 2025-10-31 12:57:19 -07:00
bluetooth Bluetooth: fix corruption in h4_recv_buf() after cleanup 2025-10-24 10:31:24 -04:00
bus Char/Misc/IIO/Binder changes for 6.18-rc1 2025-10-04 16:26:32 -07:00
cache
cdrom
cdx Char/Misc/IIO/Binder changes for 6.18-rc1 2025-10-04 16:26:32 -07:00
char tpm_crb: Add idle support for the Arm FF-A start method 2025-10-18 14:33:22 +03:00
clk There's a bunch of patches here across drivers/clk/ to migrate drivers to use 2025-10-07 09:28:37 -07:00
clocksource hyperv-next for v6.18 2025-10-07 08:40:15 -07:00
comedi comedi: fix divide-by-zero in comedi_buf_munge() 2025-10-22 08:03:52 +02:00
connector
counter
cpufreq cpufreq/amd-pstate: Fix a regression leading to EPP 0 after hibernate 2025-10-15 08:21:16 -05:00
cpuidle cpuidle: governors: menu: Select polling state in some more cases 2025-10-27 14:41:27 +01:00
crypto crypto: hisilicon - qm updates BAR configuration 2025-11-05 14:56:16 -07:00
cxl cxl/trace: Subtract to find an hpa_alias0 in cxl_poison events 2025-10-14 14:48:14 -07:00
dax
dca
devfreq PM / devfreq: rockchip-dfi: switch to FIELD_PREP_WM16 macro 2025-10-15 10:39:54 -04:00
dibs dibs: Check correct variable in dibs_init() 2025-09-26 15:10:59 -07:00
dio
dma dmaengine updates for v6.18 2025-10-06 10:37:06 -07:00
dma-buf dma-buf: fix integer overflow in fill_sg_entry() for buffers >= 8GiB 2025-11-28 10:06:25 -07:00
dpll dpll: zl3073x: Fix output pin registration 2025-10-28 18:54:48 -07:00
edac - Add support for new AMD family 0x1a models to amd64_edac 2025-09-30 11:41:03 -07:00
eisa
extcon
firewire firewire: init_ohci1394_dma: add missing function parameter documentation 2025-10-25 08:29:56 +09:00
firmware Arm SCMI fixes for v6.18 2025-10-23 22:30:01 +02:00
fpga
fsi
fwctl pds_fwctl: Replace kzalloc + copy_from_user with memdup_user in pdsfc_fw_rpc 2025-09-22 10:33:10 -03:00
gnss
gpio gpio: ljca: Fix duplicated IRQ mapping 2025-10-23 14:30:11 +02:00
gpu vfio/gvt: Convert to get_region_info_caps 2025-11-12 15:05:03 -07:00
greybus
hid hid-for-linus-2025101701 2025-10-18 08:18:18 -10:00
hsi
hte
hv Drivers: hv: Make CONFIG_HYPERV bool 2025-10-01 00:00:45 +00:00
hwmon hwmon: (sht3x) Fix error handling 2025-10-19 18:56:14 -07:00
hwspinlock
hwtracing Char/Misc/IIO/Binder changes for 6.18-rc1 2025-10-04 16:26:32 -07:00
i2c i2c: usbio: Add ACPI device-id for MTL-CVF devices 2025-10-14 13:54:43 +02:00
i3c i3c: fix big-endian FIFO transfers 2025-09-29 00:17:22 +02:00
idle
iio IIO: New device support, features and cleanup for 6.18 2025-09-23 14:15:25 +02:00
infiniband RDMA v6.18 merge window pull request 2025-10-03 18:35:22 -07:00
input Input updates for v6.18-rc0 2025-10-08 09:44:38 -07:00
interconnect
iommu PCI/P2PDMA: Simplify bus address mapping API 2025-11-20 12:01:41 -07:00
ipack
irqchip irqchip/sifive-plic: Avoid interrupt ID 0 handling during suspend/resume 2025-10-07 10:23:22 +02:00
isdn
leds
macintosh
mailbox qcom: add Glymur CPUCP mailbox binding 2025-10-08 11:44:21 -07:00
mcb
md dm docs: fix typos 2025-10-03 18:48:02 -07:00
media USB/Thunderbolt changes for 6.18-rc1 2025-10-04 16:07:08 -07:00
memory
memstick Summary of significant series in this pull request: 2025-10-02 18:18:33 -07:00
message
mfd mfd: ls2kbmc: check for devm_mfd_add_devices() failure 2025-10-03 10:38:23 -05:00
misc Char/Misc driver fixes for 6.18-rc3 2025-10-26 10:33:46 -07:00
mmc rpmb: move rpmb_frame struct and constants to common header 2025-10-13 13:18:03 +02:00
most most: usb: hdm_probe: Fix calling put_device() before device initialization 2025-10-22 08:04:43 +02:00
mtd MTD core: 2025-10-04 15:50:37 -07:00
mux
net net: stmmac: est: Fix GCL bounds checks 2025-10-29 18:49:24 -07:00
nfc
ntb NTB: epf: Add Renesas rcar support 2025-09-22 09:35:21 -04:00
nubus
nvdimm libnvdimm for 6.18 2025-10-06 11:17:18 -07:00
nvme nvme-pci: use blk_map_iter for p2p metadata 2025-10-22 19:46:25 -07:00
nvmem nvmem: rcar-efuse: add missing MODULE_DEVICE_TABLE 2025-10-22 08:02:38 +02:00
of of/irq: Export of_msi_xlate() for module usage 2025-10-24 07:44:09 -05:00
opp
parisc
parport
pci PCI/P2PDMA: Provide an access to pci_p2pdma_map_type() function 2025-11-20 12:02:00 -07:00
pcmcia
peci
perf arm64 fixes for -rc1 2025-10-07 08:59:25 -07:00
phy phy-for-6.18 2025-10-06 10:34:22 -07:00
pinctrl pci-v6.18-changes 2025-10-06 10:41:03 -07:00
platform platform/x86: alienware-wmi-wmax: Add AWCC support to Dell G15 5530 2025-10-15 11:22:35 +03:00
pmdomain soc: driver updates for 6.18 2025-10-01 17:32:51 -07:00
pnp
power power supply and reset changes for the 6.18 series 2025-10-01 13:02:59 -07:00
powercap
pps
ps3
ptp ptp: ocp: Fix typo using index 1 instead of i in SMA initialization loop 2025-10-22 19:18:39 -07:00
pwm gpio updates for v6.18-rc1 2025-10-01 11:34:12 -07:00
rapidio
ras
regulator regulator: bd718x7: Fix voltages scaled by resistor divider 2025-10-30 11:30:23 +00:00
remoteproc remoteproc updates for v6.18 2025-10-04 15:45:17 -07:00
reset soc: driver updates for 6.18 2025-10-01 17:32:51 -07:00
rpmsg rpmsg: qcom_smd: Fix fallback to qcom,ipc parse 2025-09-20 21:29:48 -05:00
rtc RTC for 6.18 2025-10-11 11:56:47 -07:00
s390 vfio/ccw: Convert to get_region_info_caps 2025-11-12 15:05:03 -07:00
sbus
scsi scsi: core: Fix the unit attention counter implementation 2025-10-21 21:09:36 -04:00
sh
siox
slimbus
soc - switch longson32 platform to DT and use MIPS_GENERIC framework 2025-10-05 10:09:55 -07:00
soundwire soundwire updates for 6.18 2025-10-06 10:32:22 -07:00
spi spi: intel: Add support for Oak Stream SPI serial flash 2025-10-29 12:53:45 +00:00
spmi
ssb
staging staging: gpib: Fix device reference leak in fmh_gpib driver 2025-10-13 10:55:03 +02:00
target SCSI misc on 20251011 2025-10-11 11:49:00 -07:00
tc
tee TEE QTEE fixes for v6.18 2025-10-17 15:26:52 +02:00
thermal thermal: renesas: Fix RZ/G3E fall-out 2025-10-02 10:41:58 +02:00
thunderbolt thunderbolt: Fix use-after-free in tb_dp_dprx_work 2025-09-23 17:16:38 +02:00
tty serial: 8250_mtk: Enable baud clock and manage in runtime PM 2025-10-22 12:13:54 +02:00
ufs scsi: ufs: core: Declare tx_lanes witout initialization 2025-10-21 21:02:46 -04:00
uio hyperv-next for v6.18 2025-10-07 08:40:15 -07:00
usb USB serial device ids for 6.18-rc3 2025-10-24 13:52:58 +02:00
vdpa vduse: Use fixed 4KB bounce pages for non-4KB page size 2025-10-01 07:24:55 -04:00
vfio vfio/nvgrace-gpu: wait for the GPU mem to be ready 2025-11-28 10:09:26 -07:00
vhost vdpa: support virtio_map 2025-10-01 07:24:43 -04:00
video fbdev: atyfb: Check if pll_ops->init_pll failed 2025-10-28 22:59:19 +01:00
virt arm64 updates for 6.18 2025-09-29 18:48:39 -07:00
virtio virtio,vhost: fixes, cleanups 2025-10-04 08:48:16 -07:00
w1
watchdog linux-watchdog 6.18-rc1 tag 2025-10-06 11:00:30 -07:00
xen dma-mapping updates for Linux 6.18: 2025-10-03 17:41:12 -07:00
zorro
Kconfig
Makefile hyperv-next for v6.18 2025-10-07 08:40:15 -07:00