mirror-linux/include
Ankit Agrawal a23b10608d vfio/nvgrace-gpu: wait for the GPU mem to be ready
Speculative prefetches from CPU to GPU memory until the GPU is
ready after reset can cause harmless corrected RAS events to
be logged on Grace systems. It is thus preferred that the
mapping not be re-established until the GPU is ready post reset.

The GPU readiness can be checked through BAR0 registers similar
to the checking at the time of device probe.

It can take several seconds for the GPU to be ready. So it is
desirable that the time overlaps as much of the VM startup as
possible to reduce impact on the VM bootup time. The GPU
readiness state is thus checked on the first fault/huge_fault
request or read/write access which amortizes the GPU readiness
time.

The first fault and read/write checks the GPU state when the
reset_done flag - which denotes whether the GPU has just been
reset. The memory_lock is taken across map/access to avoid
races with GPU reset.

Also check if the memory is enabled, before waiting for GPU
to be ready. Otherwise the readiness check would block for 30s.

Lastly added PM handling wrapping on read/write access.

Cc: Shameer Kolothum <skolothumtho@nvidia.com>
Cc: Alex Williamson <alex@shazbot.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Vikram Sethi <vsethi@nvidia.com>
Reviewed-by: Shameer Kolothum <skolothumtho@nvidia.com>
Suggested-by: Alex Williamson <alex@shazbot.org>
Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
Link: https://lore.kernel.org/r/20251127170632.3477-7-ankita@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
2025-11-28 10:09:26 -07:00
..
acpi More power management updates for 6.18-rc1 2025-10-07 09:39:51 -07:00
asm-generic kbuild: align modinfo section for Secureboot Authenticode EDK2 compat 2025-10-27 16:21:24 -07:00
clocksource
crypto This update includes the following changes: 2025-10-04 14:59:29 -07:00
cxl
drm drm/gpuvm: Fix kernel-doc warning for drm_gpuvm_map_req.map 2025-10-15 18:37:05 +02:00
dt-bindings There's a bunch of patches here across drivers/clk/ to migrate drivers to use 2025-10-07 09:28:37 -07:00
hyperv hyperv: Remove the spurious null directive line 2025-10-02 21:21:24 +00:00
keys
kunit linux_kselftest-kunit-6.18-rc1 2025-10-01 19:15:11 -07:00
kvm KVM: arm64: Kill leftovers of ad-hoc timer userspace access 2025-10-13 14:42:41 +01:00
linux vfio/nvgrace-gpu: wait for the GPU mem to be ready 2025-11-28 10:09:26 -07:00
math-emu
media
memory
misc
net net: tls: Cancel RX async resync request on rcd_delta overflow 2025-10-29 18:32:18 -07:00
pcmcia
ras
rdma
rv
scsi scsi: core: Fix the unit attention counter implementation 2025-10-21 21:09:36 -04:00
soc There's a bunch of patches here across drivers/clk/ to migrate drivers to use 2025-10-07 09:28:37 -07:00
sound ASoC: tas2781: Support more newly-released amplifiers tas58xx in the driver 2025-10-13 11:08:09 +01:00
target
trace trace: tcp: add three metrics to trace_tcp_rcvbuf_grow() 2025-10-29 17:30:18 -07:00
uapi vfio/pci: Add dma-buf export support for MMIO regions 2025-11-20 21:12:19 -07:00
ufs
vdso Updates for the VDSO subsystem: 2025-09-30 16:58:21 -07:00
video
xen
Kbuild