mirror-linux/include
Maíra Canal 0b1217bfdf
drm/sched: Allow drivers to skip the reset and keep on running
When the DRM scheduler times out, it's possible that the GPU isn't hung;
instead, a job just took unusually long (longer than the timeout) but is
still running, and there is, thus, no reason to reset the hardware. This
can occur in two scenarios:

  1. The job is taking longer than the timeout, but the driver determined
     through a GPU-specific mechanism that the hardware is still making
     progress. Hence, the driver would like the scheduler to skip the
     timeout and treat the job as still pending from then onward. This
     happens in v3d, Etnaviv, and Xe.
  2. Timeout has fired before the free-job worker. Consequently, the
     scheduler calls `sched->ops->timedout_job()` for a job that isn't
     timed out.

These two scenarios are problematic because the job was removed from the
`sched->pending_list` before calling `sched->ops->timedout_job()`, which
means that when the job finishes, it won't be freed by the scheduler
though `sched->ops->free_job()` - leading to a memory leak.

To solve these problems, create a new `drm_gpu_sched_stat`, called
DRM_GPU_SCHED_STAT_NO_HANG, which allows a driver to skip the reset. The
new status will indicate that the job must be reinserted into
`sched->pending_list`, and the hardware / driver will still complete that
job.

Reviewed-by: Philipp Stanner <phasta@kernel.org>
Link: https://lore.kernel.org/r/20250714-sched-skip-reset-v6-2-5c5ba4f55039@igalia.com
Signed-off-by: Maíra Canal <mcanal@igalia.com>
2025-07-15 08:27:07 -03:00
..
acpi
asm-generic hyperv-next for v6.16 2025-06-03 08:39:20 -07:00
clocksource
crypto Networking changes for 6.16. 2025-05-28 15:24:36 -07:00
cxl
drm drm/sched: Allow drivers to skip the reset and keep on running 2025-07-15 08:27:07 -03:00
dt-bindings dt-bindings: power: qcom,rpmpd: add Turbo L5 corner 2025-07-04 11:09:43 -07:00
hyperv hyperv-next for v6.16 2025-06-03 08:39:20 -07:00
keys
kunit I've recently moved computers (among other things) so I'm sending this from a 2025-05-30 09:15:40 -07:00
kvm KVM: arm64: Resolve vLPI by host IRQ in vgic_v4_unset_forwarding() 2025-05-30 09:11:29 +01:00
linux PM: hibernate: Add stub for pm_hibernate_is_recovering() 2025-07-13 06:55:31 -05:00
math-emu
media
memory
misc
net bluetooth pull request for net: 2025-06-12 08:13:48 -07:00
pcmcia
ras
rdma
rv
scsi SCSI misc on 20250529 2025-05-29 22:17:52 -07:00
soc - The 3 patch series "hung_task: extend blocking task stacktrace dump to 2025-05-31 19:12:53 -07:00
sound USB/Thunderbolt changes for 6.16-rc1 2025-06-06 12:45:35 -07:00
target
trace dma-fence: Add safe access helpers and document the rules 2025-06-13 08:26:49 +01:00
uapi Merge tag 'drm-msm-next-2025-07-05' of https://gitlab.freedesktop.org/drm/msm into drm-next 2025-07-08 14:31:19 +02:00
ufs
vdso
video video: Make global edid_info depend on CONFIG_FIRMWARE_EDID 2025-06-16 11:00:29 +02:00
xen
Kbuild