Commit Graph

16421 Commits (09cfd3c52ea76f43b3cb15e570aeddf633d65e80)

Author SHA1 Message Date
Linus Torvalds 58809f614e drm next for 6.18-rc1
cross-subsystem:
 - i2c-hid: Make elan touch controllers power on after panel is enabled
 - dt bindings for STM32MP25 SoC
 - pci vgaarb: use screen_info helpers
 - rust pin-init updates
 - add MEI driver for late binding firmware update/load
 
 uapi:
 - add ioctl for reassigning GEM handles
 - provide boot_display attribute on boot-up devices
 
 core:
 - document DRM_MODE_PAGE_FLIP_EVENT
 - add vendor specific recovery method to drm device wedged uevent
 
 gem:
 - Simplify gpuvm locking
 
 ttm:
 - add interface to populate buffers
 
 sched:
 - Fix race condition in trace code
 
 atomic:
 - Reallow no-op async page flips
 
 display:
 - dp: Fix command length
 
 video:
 - Improve pixel-format handling for struct screen_info
 
 rust:
 - drop Opaque<> from ioctl args
 - Alloc:
 - BorrowedPage type and AsPageIter traits
 - Implement Vmalloc::to_page() and VmallocPageIter
 - DMA/Scatterlist:
 - Add dma::DataDirection and type alias for dma_addr_t
 - Abstraction for struct scatterlist and sg_table
 - DRM:
 - simplify use of generics
 - add DriverFile type alias
 - drop Object::SIZE
 - Rust:
 - pin-init tree merge
 - Various methods for AsBytes and FromBytes traits
 
 gpuvm:
 - Support madvice in Xe driver
 
 gpusvm:
 - fix hmm_pfn_to_map_order usage in gpusvm
 
 bridge:
 - Improve and fix ref counting on bridge management
 - cdns-dsi: Various improvements to mode setting
 - Support Solomon SSD2825 plus DT bindings
 - Support Waveshare DSI2DPI plus DT bindings
 - Support Content Protection property
 - display-connector: Improve DP display detection
 - Add support for Radxa Ra620 plus DT bindings
 - adv7511: Provide SPD and HDMI infoframes
 - it6505: Replace crypto_shash with sha()
 - synopsys: Add support for DW DPTX Controller plus DT bindings
 - adv7511: Write full Audio infoframe
 - ite6263: Support vendor-specific infoframes
 - simple: Add support for Realtek RTD2171 DP-to-HDMI plus DT bindings
 
 panel:
 - panel-edp: Support mt8189 Chromebooks; Support BOE NV140WUM-N64;
   Support SHP LQ134Z1; Fixes
 - panel-simple: Support Olimex LCD-OLinuXino-5CTS plus DT bindings
 - Support Samsung AMS561RA01
 - Support Hydis HV101HD1 plus DT bindings
 - ilitek-ili9881c: Refactor mode setting; Add support for Bestar
   BSD1218-A101KL68 LCD plus DT bindings
 - lvds: Add support for Ampire AMP19201200B5TZQW-T03 to DT bindings
 - edp: Add support for additonal mt8189 Chromebook panels
 - lvds: Add DT bindings for EDT ETML0700Z8DHA
 
 amdgpu:
 - add CRIU support for gem objects
 - RAS updates
 - VCN SRAM load fixes
 - EDID read fixes
 - eDP ALPM support
 - Documentation updates
 - Rework PTE flag generation
 - DCE6 fixes
 - VCN devcoredump cleanup
 - MMHUB client id fixes
 - VCN 5.0.1 RAS support
 - SMU 13.0.x updates
 - Expanded PCIe DPC support
 - Expanded VCN reset support
 - VPE per queue reset support
 - give kernel jobs unique id for tracing
 - pre-populate exported buffers
 - cyan skillfish updates
 - make vbios build number available in sysfs
 - userq updates
 - HDCP updates
 - support MMIO remap page as ttm pool
 - JPEG parser updates
 - DCE6 DC updates
 - use devm for i2c buses
 - GPUVM locking updates
 - Drop non-DC DCE11 code
 - improve fallback handling for pixel encoding
 
 amdkfd:
 - SVM/page migration fixes
 - debugfs fixes
 - add CRIO support for gem objects
 - SVM updates
 
 radeon:
 - use dev_warn_once in CS parsers
 
 xe:
 - add madvise interface
 - add DRM_IOCTL_XE_VM_QUERY_MEMORY_RANGE_ATTRS to query VMA count
   and memory attributes
 - drop L# bank mask reporting from media GT3 on Xe3+.
 - add SLPC power_profile sysfs interface
 - add configs attribs to add post/mid context-switch commands
 - handle firmware reported hardware errors notifying userspace with
   device wedged uevent
 - use same dir structure across sysfs/debugfs
 - cleanup and future proof vram region init
 - add G-states and PCI link states to debugfs
 - Add SRIOV support for CCS surfaces on Xe2+
 - Enable SRIOV PF mode by default on supported platforms
 - move flush to common code
 - extended core workarounds for Xe2/3
 - use DRM scheduler for delayed GT TLB invalidations
 - configs improvements and allow VF device enablement
 - prep work to expose mmio regions to userspace
 - VF migration support added
 - prepare GPU SVM for THP migration
 - start fixing XE_PAGE_SIZE vs PAGE_SIZE
 - add PSMI support for hw validation
 - resize VF bars to max possible size according to number of VFs
 - Ensure GT is in C0 during resume
 - pre-populate exported buffers
 - replace xe_hmm with gpusvm
 - add more SVM GT stats to debugfs
 - improve fake pci and WA kunnit handle for new platform testing
 - Test GuC to GuC comms to add debugging
 - use attribute groups to simplify sysfs registration
 - add Late Binding firmware code to interact with MEI
 
 i915:
 - apply multiple JSL/EHL/Gen7/Gen6 workarounds properly
 - protect against overflow in active_engine()
 - Use try_cmpxchg64() in __active_lookup()
 - include GuC registers in error state
 - get rid of dev->struct_mutex
 - iopoll: generalize read_poll_timout
 - lots more display refactoring
 - Reject HBR3 in any eDP Panel
 - Prune modes for YUV420
 - Display Wa fix, additions, and updates
 - DP: Fix 2.7 Gbps link training on g4x
 - DP: Adjust the idle pattern handling
 - DP: Shuffle the link training code a bit
 - Don't set/read the DSI C clock divider on GLK
 - Enable_psr kernel parameter changes
 - Type-C enabled/disconnected dp-alt sink
 - Wildcat Lake enabling
 - DP HDR updates
 - DRAM detection
 - wait PSR idle on dsb commit
 - Remove FBC modulo 4 restriction for ADL-P+
 - panic: refactor framebuffer allocation
 
 habanalabs:
 - debug/visibility improvements
 - vmalloc-backed coherent mmap support
 - HLDIO infrastructure
 
 nova-core:
 - various register!() macro improvements
 - minor vbios/firmware fixes/refactoring
 - advance firmware boot stages; process Booter and patch signatures
 - process GSP and GSP bootloader
 - Add r570.144 firmware bindings and update to it
 - Move GSP boot code to own module
 - Use new pin-init features to store driver's private data in a single
  allocation
 - Update ARef import from sync::aref
 
 nova-drm:
 - Update ARef import from sync::aref
 
 tyr:
 - initial driver skeleton for a rust driver for ARM Mali GPUs
 - capable of powering up, query metadata and provide it to userspace.
 
 msm:
 - GPU and Core:
 - in DT bindings describe clocks per GPU type
 - GMU bandwidth voting for x1-85
 - a623/a663 speedbins
 - cleanup some remaining no-iommu leftovers after VM_BIND conversion
 - fix GEM obj 32b size truncation
 - add missing VM_BIND param validation
 - IFPC for x1-85 and a750
 - register xml and gen_header.py sync from mesa
 - Display:
 - add missing bindings for display on SC8180X
 - added DisplayPort MST bindings
 - conversion from round_rate() to determine_rate()
 
 amdxdna:
 - add IOCTL_AMDXDNA_GET_ARRAY
 - support user space allocated buffers
 - streamline PM interfaces
 - Refactoring wrt. hardware contexts
 - improve error reporting
 
 nouveau:
 - use GSP firmware by default
 - improve error reporting
 - Pre-populate exported buffers
 
 ast:
 - Clean up detection of DRAM config
 
 exynos:
 - add DSIM bridge driver support for Exynos7870
 - Document Exynos7870 DSIM compatible in dt-binding
 
 panthor:
 - Print task/pid on errors
 - Add support for Mali G710, G510, G310, Gx15, Gx20, Gx25
 - Improve cache flushing
 - Fail VM bind if BO has offset
 
 renesas:
 - convert to RUNTIME_PM_OPS
 
 rcar-du:
 - Make number of lanes configurable
 - Use RUNTIME_PM_OPS
 - Add support for DSI commands
 
 rocket:
 - Add driver for Rockchip NPU plus DT bindings
 - Use kfree() and sizeof() correctly
 - Test DMA status
 
 rockchip:
 - dsi2: Add support for RK3576 plus DT bindings
 - Add support for RK3588 DPTX output
 
 tidss:
 - Use crtc_ fields for programming display mode
 - Remove other drivers from aperture
 
 pixpaper:
 - Add support for Mayqueen Pixpaper plus DT bindings
 
 v3d:
 - Support querying nubmer of GPU resets for KHR_robustness
 
 stm:
 - Clean up logging
 - ltdc: Add support support for STM32MP257F-EV1 plus DT bindings
 
 sitronix:
 - st7571-i2c: Add support for inverted displays and 2-bit grayscale
 
 tidss:
 - Convert to kernel's FIELD_ macros
 
 vesadrm:
 - Support 8-bit palette mode
 
 imagination:
 - Improve power management
 - Add support for TH1520 GPU
 - Support Risc-V architectures
 
 v3d:
 - Improve job management and locking
 
 vkms:
 - Support variants of ARGB8888, ARGB16161616, RGB565, RGB888 and P01x
 - Spport YUV with 16-bit components
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmjcpjkACgkQDHTzWXnE
 hr7Q7g/5AcxXqLUx7wvmDga9TpzIjDD+C+MOt568RpFQ9cYprI+/86ma7ELCpuNe
 dVgeobxQb/jyhf4acdBU+t5aZz+j8VPhPtIPrPY2kOVDuL1NfeQNS8VmGNpFhR+0
 6hqVrtfvbYdLBrAHrU/V/RwZlBJvI/D/I2QGuvZZwWzCBgYd4u4bGuRyBCvGDxOD
 CTPaEqYyzjvpVuzu7AGQk655WkZQnyPmiezIl2lit1meEMMMv80HePkyWHclZo7Q
 hMqsEasSp5w5Q5EpYqVr1z5IdBAV1O53oor9W573J3kEoB4o1zEsTPfLO4N1dgXo
 bfvc24uW3zyChWY2hWyRKvOzvAoClnjfY6whv9NRP0Qi4UjzhLlNOpmhm9cst/J+
 uj2Nn8UJtyvFJbTmDvoocpgdhq2mkGKdIVhVQ6tG7PjihFmyQRF7PJZjb+0Vee7L
 53F0c4d6HiBI4DHa+lH6fgQUBspIvSfmcnR0ACg29NByib+JEoPSPb4ET+uZ8lLd
 IbQvNiCdnUduYDCKfo5ea/FesP8AXy1KfSa+z7oEEFYHbbkc7PSztUagEyZdS/yS
 FnnYqmo/DidmyM4nxDQUII+UDqjng7fo+l4BzIhL12pR693KzCf0mexMr6SA24ny
 gasN97923OTle1J9xrPrKavkx6WjswZCvOaG7ZbnJB47ydJVu5w=
 =ZVKY
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2025-10-01' of https://gitlab.freedesktop.org/drm/kernel

Pull drm updates from Dave Airlie:
 "cross-subsystem:
   - i2c-hid: Make elan touch controllers power on after panel is
     enabled
   - dt bindings for STM32MP25 SoC
   - pci vgaarb: use screen_info helpers
   - rust pin-init updates
   - add MEI driver for late binding firmware update/load

  uapi:
   - add ioctl for reassigning GEM handles
   - provide boot_display attribute on boot-up devices

  core:
   - document DRM_MODE_PAGE_FLIP_EVENT
   - add vendor specific recovery method to drm device wedged uevent

  gem:
   - Simplify gpuvm locking

  ttm:
   - add interface to populate buffers

  sched:
   - Fix race condition in trace code

  atomic:
   - Reallow no-op async page flips

  display:
   - dp: Fix command length

  video:
   - Improve pixel-format handling for struct screen_info

  rust:
   - drop Opaque<> from ioctl args
   - Alloc:
       - BorrowedPage type and AsPageIter traits
       - Implement Vmalloc::to_page() and VmallocPageIter
   - DMA/Scatterlist:
       - Add dma::DataDirection and type alias for dma_addr_t
       - Abstraction for struct scatterlist and sg_table
   - DRM:
       - simplify use of generics
       - add DriverFile type alias
       - drop Object::SIZE
   - Rust:
       - pin-init tree merge
       - Various methods for AsBytes and FromBytes traits

  gpuvm:
   - Support madvice in Xe driver

  gpusvm:
   - fix hmm_pfn_to_map_order usage in gpusvm

  bridge:
   - Improve and fix ref counting on bridge management
   - cdns-dsi: Various improvements to mode setting
   - Support Solomon SSD2825 plus DT bindings
   - Support Waveshare DSI2DPI plus DT bindings
   - Support Content Protection property
   - display-connector: Improve DP display detection
   - Add support for Radxa Ra620 plus DT bindings
   - adv7511: Provide SPD and HDMI infoframes
   - it6505: Replace crypto_shash with sha()
   - synopsys: Add support for DW DPTX Controller plus DT bindings
   - adv7511: Write full Audio infoframe
   - ite6263: Support vendor-specific infoframes
   - simple: Add support for Realtek RTD2171 DP-to-HDMI plus DT bindings

  panel:
   - panel-edp: Support mt8189 Chromebooks; Support BOE NV140WUM-N64;
     Support SHP LQ134Z1; Fixes
   - panel-simple: Support Olimex LCD-OLinuXino-5CTS plus DT bindings
   - Support Samsung AMS561RA01
   - Support Hydis HV101HD1 plus DT bindings
   - ilitek-ili9881c: Refactor mode setting; Add support for Bestar
     BSD1218-A101KL68 LCD plus DT bindings
   - lvds: Add support for Ampire AMP19201200B5TZQW-T03 to DT bindings
   - edp: Add support for additonal mt8189 Chromebook panels
   - lvds: Add DT bindings for EDT ETML0700Z8DHA

  amdgpu:
   - add CRIU support for gem objects
   - RAS updates
   - VCN SRAM load fixes
   - EDID read fixes
   - eDP ALPM support
   - Documentation updates
   - Rework PTE flag generation
   - DCE6 fixes
   - VCN devcoredump cleanup
   - MMHUB client id fixes
   - VCN 5.0.1 RAS support
   - SMU 13.0.x updates
   - Expanded PCIe DPC support
   - Expanded VCN reset support
   - VPE per queue reset support
   - give kernel jobs unique id for tracing
   - pre-populate exported buffers
   - cyan skillfish updates
   - make vbios build number available in sysfs
   - userq updates
   - HDCP updates
   - support MMIO remap page as ttm pool
   - JPEG parser updates
   - DCE6 DC updates
   - use devm for i2c buses
   - GPUVM locking updates
   - Drop non-DC DCE11 code
   - improve fallback handling for pixel encoding

  amdkfd:
   - SVM/page migration fixes
   - debugfs fixes
   - add CRIO support for gem objects
   - SVM updates

  radeon:
   - use dev_warn_once in CS parsers

  xe:
   - add madvise interface
   - add DRM_IOCTL_XE_VM_QUERY_MEMORY_RANGE_ATTRS to query VMA count
     and memory attributes
   - drop L# bank mask reporting from media GT3 on Xe3+.
   - add SLPC power_profile sysfs interface
   - add configs attribs to add post/mid context-switch commands
   - handle firmware reported hardware errors notifying userspace with
     device wedged uevent
   - use same dir structure across sysfs/debugfs
   - cleanup and future proof vram region init
   - add G-states and PCI link states to debugfs
   - Add SRIOV support for CCS surfaces on Xe2+
   - Enable SRIOV PF mode by default on supported platforms
   - move flush to common code
   - extended core workarounds for Xe2/3
   - use DRM scheduler for delayed GT TLB invalidations
   - configs improvements and allow VF device enablement
   - prep work to expose mmio regions to userspace
   - VF migration support added
   - prepare GPU SVM for THP migration
   - start fixing XE_PAGE_SIZE vs PAGE_SIZE
   - add PSMI support for hw validation
   - resize VF bars to max possible size according to number of VFs
   - Ensure GT is in C0 during resume
   - pre-populate exported buffers
   - replace xe_hmm with gpusvm
   - add more SVM GT stats to debugfs
   - improve fake pci and WA kunnit handle for new platform testing
   - Test GuC to GuC comms to add debugging
   - use attribute groups to simplify sysfs registration
   - add Late Binding firmware code to interact with MEI

  i915:
   - apply multiple JSL/EHL/Gen7/Gen6 workarounds properly
   - protect against overflow in active_engine()
   - Use try_cmpxchg64() in __active_lookup()
   - include GuC registers in error state
   - get rid of dev->struct_mutex
   - iopoll: generalize read_poll_timout
   - lots more display refactoring
   - Reject HBR3 in any eDP Panel
   - Prune modes for YUV420
   - Display Wa fix, additions, and updates
   - DP: Fix 2.7 Gbps link training on g4x
   - DP: Adjust the idle pattern handling
   - DP: Shuffle the link training code a bit
   - Don't set/read the DSI C clock divider on GLK
   - Enable_psr kernel parameter changes
   - Type-C enabled/disconnected dp-alt sink
   - Wildcat Lake enabling
   - DP HDR updates
   - DRAM detection
   - wait PSR idle on dsb commit
   - Remove FBC modulo 4 restriction for ADL-P+
   - panic: refactor framebuffer allocation

  habanalabs:
   - debug/visibility improvements
   - vmalloc-backed coherent mmap support
   - HLDIO infrastructure

  nova-core:
   - various register!() macro improvements
   - minor vbios/firmware fixes/refactoring
   - advance firmware boot stages; process Booter and patch signatures
   - process GSP and GSP bootloader
   - Add r570.144 firmware bindings and update to it
   - Move GSP boot code to own module
   - Use new pin-init features to store driver's private data in a
     single allocation
   - Update ARef import from sync::aref

  nova-drm:
   - Update ARef import from sync::aref

  tyr:
   - initial driver skeleton for a rust driver for ARM Mali GPUs
   - capable of powering up, query metadata and provide it to userspace.

  msm:
   - GPU and Core:
      - in DT bindings describe clocks per GPU type
      - GMU bandwidth voting for x1-85
      - a623/a663 speedbins
      - cleanup some remaining no-iommu leftovers after VM_BIND conversion
      - fix GEM obj 32b size truncation
      - add missing VM_BIND param validation
      - IFPC for x1-85 and a750
      - register xml and gen_header.py sync from mesa
   - Display:
      - add missing bindings for display on SC8180X
      - added DisplayPort MST bindings
      - conversion from round_rate() to determine_rate()

  amdxdna:
   - add IOCTL_AMDXDNA_GET_ARRAY
   - support user space allocated buffers
   - streamline PM interfaces
   - Refactoring wrt. hardware contexts
   - improve error reporting

  nouveau:
   - use GSP firmware by default
   - improve error reporting
   - Pre-populate exported buffers

  ast:
   - Clean up detection of DRAM config

  exynos:
   - add DSIM bridge driver support for Exynos7870
   - Document Exynos7870 DSIM compatible in dt-binding

  panthor:
   - Print task/pid on errors
   - Add support for Mali G710, G510, G310, Gx15, Gx20, Gx25
   - Improve cache flushing
   - Fail VM bind if BO has offset

  renesas:
   - convert to RUNTIME_PM_OPS

  rcar-du:
   - Make number of lanes configurable
   - Use RUNTIME_PM_OPS
   - Add support for DSI commands

  rocket:
   - Add driver for Rockchip NPU plus DT bindings
   - Use kfree() and sizeof() correctly
   - Test DMA status

  rockchip:
   - dsi2: Add support for RK3576 plus DT bindings
   - Add support for RK3588 DPTX output

  tidss:
   - Use crtc_ fields for programming display mode
   - Remove other drivers from aperture

  pixpaper:
   - Add support for Mayqueen Pixpaper plus DT bindings

  v3d:
   - Support querying nubmer of GPU resets for KHR_robustness

  stm:
   - Clean up logging
   - ltdc: Add support support for STM32MP257F-EV1 plus DT bindings

  sitronix:
   - st7571-i2c: Add support for inverted displays and 2-bit grayscale

  tidss:
   - Convert to kernel's FIELD_ macros

  vesadrm:
   - Support 8-bit palette mode

  imagination:
   - Improve power management
   - Add support for TH1520 GPU
   - Support Risc-V architectures

  v3d:
   - Improve job management and locking

  vkms:
   - Support variants of ARGB8888, ARGB16161616, RGB565, RGB888 and P01x
   - Spport YUV with 16-bit components"

* tag 'drm-next-2025-10-01' of https://gitlab.freedesktop.org/drm/kernel: (1455 commits)
  drm/amd: Add name to modes from amdgpu_connector_add_common_modes()
  drm/amd: Drop some common modes from amdgpu_connector_add_common_modes()
  drm/amdgpu: update MODULE_PARM_DESC for freesync_video
  drm/amd: Use dynamic array size declaration for amdgpu_connector_add_common_modes()
  drm/amd/display: Share dce100_validate_global with DCE6-8
  drm/amd/display: Share dce100_validate_bandwidth with DCE6-8
  drm/amdgpu: Fix fence signaling race condition in userqueue
  amd/amdkfd: enhance kfd process check in switch partition
  amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw
  drm/amd/display: Reject modes with too high pixel clock on DCE6-10
  drm/amd: Drop unnecessary check in amdgpu_connector_add_common_modes()
  drm/amd/display: Only enable common modes for eDP and LVDS
  drm/amdgpu: remove the redeclaration of variable i
  drm/amdgpu/userq: assign an error code for invalid userq va
  drm/amdgpu: revert "rework reserved VMID handling" v2
  drm/amdgpu: remove leftover from enforcing isolation by VMID
  drm/amdgpu: Add fallback to pipe reset if KCQ ring reset fails
  accel/habanalabs: add Infineon version check
  accel/habanalabs/gaudi2: read preboot status after recovering from dirty state
  accel/habanalabs: add HL_GET_P_STATE passthrough type
  ...
2025-10-02 12:47:25 -07:00
Rafael J. Wysocki f58f86df6a Merge branches 'pm-core', 'pm-runtime' and 'pm-sleep'
Merge changes related to system sleep and runtime PM framework for
6.18-rc1:

 - Annotate loops walking device links in the power management core
   code as _srcu and add macros for walking device links to reduce the
   likelihood of coding mistakes related to them (Rafael Wysocki)

 - Document time units for *_time functions in the runtime PM API (Brian
   Norris)

 - Clear power.must_resume in noirq suspend error path to avoid resuming
   a dependant device under a suspended parent or supplier (Rafael
   Wysocki)

 - Fix GFP mask handling during hybrid suspend and make the amdgpu
   driver handle hybrid suspend correctly (Mario Limonciello, Rafael
   Wysocki)

 - Fix GFP mask handling after aborted hibernation in platform mode and
   combine exit paths in power_down() to avoid code duplication (Rafael
   Wysocki)

 - Use vmalloc_array() and vcalloc() in the hibernation core to avoid
   open-coded size computations (Qianfeng Rong)

 - Fix typo in hibernation core code comment (Li Jun)

 - Call pm_wakeup_clear() in the same place where other functions that do
   bookkeeping prior to suspend_prepare() are called (Samuel Wu)

* pm-core:
  PM: core: Add two macros for walking device links
  PM: core: Annotate loops walking device links as _srcu

* pm-runtime:
  PM: runtime: Documentation: ABI: Document time units for *_time

* pm-sleep:
  PM: hibernate: Combine return paths in power_down()
  PM: hibernate: Restrict GFP mask in power_down()
  PM: hibernate: Fix pm_hibernation_mode_is_suspend() build breakage
  drm/amd: Fix hybrid sleep
  PM: hibernate: Add pm_hibernation_mode_is_suspend()
  PM: hibernate: Fix hybrid-sleep
  PM: sleep: core: Clear power.must_resume in noirq suspend error path
  PM: sleep: Make pm_wakeup_clear() call more clear
  PM: hibernate: Fix typo in memory bitmaps description comment
  PM: hibernate: Use vmalloc_array() and vcalloc() to improve code
2025-09-29 12:54:01 +02:00
Mario Limonciello df2ba57094 drm/amd: Add name to modes from amdgpu_connector_add_common_modes()
[Why]
When DC adds common modes it adds modes with a string to match what
they are. Non-DC doesn't. This can be inconsistent when turning on/off
DC support.

[How]
Add a name member to common_modes[] and copy it into the drm display
mode.

Cc: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Link: https://lore.kernel.org/r/20250924161624.1975819-6-mario.limonciello@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-25 15:54:22 -04:00
Mario Limonciello 6d622755bc drm/amd: Drop some common modes from amdgpu_connector_add_common_modes()
[Why]
DC and non-DC codepaths have different sets of common modes that are
added for eDP and LVDS cases. This can cause different behaviors for
turning on DC on hardware that can support both.

[How]
Drop extra modes from amdgpu_connector_add_common_modes() not present
in amdgpu_dm_connector_add_common_modes().

Cc: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20250924161624.1975819-5-mario.limonciello@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-25 15:54:12 -04:00
Alex Deucher dbf2341569 drm/amdgpu: update MODULE_PARM_DESC for freesync_video
To better describe what it does.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3756
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-25 15:54:07 -04:00
Mario Limonciello 123a1750c5 drm/amd: Use dynamic array size declaration for amdgpu_connector_add_common_modes()
[Why]
Adding or removing a mode from common_modes[] can be fragile if a user
forgot to update the for loop boundaries.

[How]
Use ARRAY_SIZE() to detect size of the array and use that instead.

Cc: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Link: https://lore.kernel.org/r/20250924161624.1975819-4-mario.limonciello@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-25 15:53:55 -04:00
Jesse.Zhang b8ae2640f9 drm/amdgpu: Fix fence signaling race condition in userqueue
This commit fixes a potential race condition in the userqueue fence
signaling mechanism by replacing dma_fence_is_signaled_locked() with
dma_fence_is_signaled().

The issue occurred because:
1. dma_fence_is_signaled_locked() should only be used when holding
   the fence's individual lock, not just the fence list lock
2. Using the locked variant without the proper fence lock could lead
   to double-signaling scenarios:
   - Hardware completion signals the fence
   - Software path also tries to signal the same fence

By using dma_fence_is_signaled() instead, we properly handle the
locking hierarchy and avoid the race condition while still maintaining
the necessary synchronization through the fence_list_lock.

v2: drop the comment (Christian)

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-25 15:53:23 -04:00
Mario Limonciello 210844d2c0 drm/amd: Drop unnecessary check in amdgpu_connector_add_common_modes()
[Why]
amdgpu_connector_add_common_modes() has a check for the width and height
of common modes being too small, but the array of common_modes[] has fixed
values.  The check is dead code.

[How]
Drop unnecessary check.

Cc: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20250924161624.1975819-3-mario.limonciello@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-25 15:50:58 -04:00
Sunil Khatri 4e3b45d7b6 drm/amdgpu: remove the redeclaration of variable i
Variable "i" has been redeclared as integer later in the function
which is wrong and not serving any purpose.

Fixes: 899fbde146 ("drm/amdgpu: replace get_user_pages with HMM mirror helpers")
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-25 15:46:35 -04:00
Prike Liang 883bd89d00 drm/amdgpu/userq: assign an error code for invalid userq va
It should return an error code if userq VA validation fails.

Fixes: 9e46b8bb05 ("drm/amdgpu: validate userq buffer virtual address and size")
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-25 15:40:18 -04:00
Christian König 90e09ea4cf drm/amdgpu: revert "rework reserved VMID handling" v2
This reverts commit e44a0fe630.

Initially we used VMID reservation to enforce isolation between
processes. That has now been replaced by proper fence handling.

Both OpenGL, RADV and ROCm developers requested a way to reserve a VMID
for SPM, so restore that approach by reverting back to only allowing a
single process to use the reserved VMID.

Only compile tested for now.

v2: use -ENOENT instead of -EINVAL if VMID is not available

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-25 15:39:00 -04:00
Christian König 66f3883dbc drm/amdgpu: remove leftover from enforcing isolation by VMID
Initially we enforced isolation by reserving a VMID, but that practice
was now removed.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-25 15:38:54 -04:00
Jesse.Zhang 7469567d88 drm/amdgpu: Add fallback to pipe reset if KCQ ring reset fails
Add a fallback mechanism to attempt pipe reset when KCQ reset
fails to recover the ring. After performing the KCQ reset and
queue remapping, test the ring functionality. If the ring test
fails, initiate a pipe reset as an additional recovery step.

v2: fix the typo (Lijo)
v3: try pipeline reset when kiq mapping fails (Lijo)

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-25 15:38:48 -04:00
Mario Limonciello (AMD) 0a6e9e098f drm/amd: Fix hybrid sleep
[Why]
commit 530694f54d ("drm/amdgpu: do not resume device in thaw for
normal hibernation") optimized the flow for systems that are going
into S4 where the power would be turned off.  Basically the thaw()
callback wouldn't resume the device if the hibernation image was
successfully created since the system would be powered off.

This however isn't the correct flow for a system entering into
s0i3 after the hibernation image is created.  Some of the amdgpu
callbacks have different behavior depending upon the intended
state of the suspend.

[How]
Use pm_hibernation_mode_is_suspend() as an input to decide whether
to run resume during thaw() callback.

Reported-by: Ionut Nechita <ionut_n2001@yahoo.com>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4573
Tested-by: Ionut Nechita <ionut_n2001@yahoo.com>
Fixes: 530694f54d ("drm/amdgpu: do not resume device in thaw for normal hibernation")
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Kenneth Crudup <kenny@panix.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Cc: 6.17+ <stable@vger.kernel.org> # 6.17+: 495c8d35035e: PM: hibernate: Add pm_hibernation_mode_is_suspend()
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-09-25 21:37:38 +02:00
Jesse.Zhang 5886090032 drm/amdgpu: Move VCN reset mask setup to late_init for VCN 5.0.1
This patch moves the initialization of the VCN supported_reset mask from
sw_init to a new late_init function for VCN 5.0.1. The change ensures
that all necessary hardware and firmware initialization is complete
before determining the supported reset types.

Key changes:
- Added vcn_v5_0_1_late_init() function to handle late initialization
- Moved supported_reset mask setup from sw_init to late_init
- Added check for per-queue reset support via amdgpu_dpm_reset_vcn_is_supported()
- Updated ip_funcs to use the new late_init function

This change helps ensure proper reset behavior by waiting until all
dependencies are initialized before determining available reset types.

Reviewed-by: Sonny Jiang <sonny.jiang@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Ruili Ji <ruiliji2@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-23 10:41:37 -04:00
Jesse.Zhang dc704458dd drm/amdgpu: Add ring reset support for VCN v5.0.1
Implement the ring reset callback for VCN v5.0.1 to properly handle
hardware recovery when encountering GPU hangs. The new functionality:

1. Adds vcn_v5_0_1_ring_reset() function that:
   - Prepares for reset using amdgpu_ring_reset_helper_begin()
   - Performs VCN instance reset via amdgpu_dpm_reset_vcn()
   - Re-initializes hardware through vcn_v5_0_1_hw_init_inst()
   - Restarts DPG mode with vcn_v5_0_1_start_dpg_mode()
   - Completes reset with amdgpu_ring_reset_helper_end()

2. Hooks the reset function into the unified ring functions via:
   - Adding .reset = vcn_v5_0_1_ring_reset to vcn_v5_0_1_unified_ring_vm_funcs

3. Maintains existing behavior for SR-IOV VF cases by checking RRMT status

This provides proper hardware recovery capabilities for VCN 5.0.1 IP block
during fault conditions, matching functionality available in other VCN versions.

v2: Remove the RRMT_ENABLED cap setting in the reset function
    and replace adev->vcn.inst[ring->me].indirect_sram with vinst->indirect_sram (Lijo)

Reviewed-by: Sonny Jiang <sonny.jiang@amd.com>
Suggested-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Ruili Ji <ruiliji2@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-23 10:41:27 -04:00
Jesse.Zhang eb6910cdaa drm/amdgpu: Refactor VCN v5.0.1 HW init into separate instance function
Split the per-instance initialization code from vcn_v5_0_1_hw_init()
into a new vcn_v5_0_1_hw_init_inst() function. This improves code
organization by:

1. Separating the instance-specific initialization logic
2. Making the main init function more readable
3. Following the pattern used in queue reset

The SR-IOV specific initialization remains in the main function since
it has different requirements.

Reviewed-by: Sonny Jiang <sonny.jiang@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Ruili Ji <ruiliji2@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-23 10:41:11 -04:00
Rahul Kumar 86a54e45fd drm/amdgpu: Use kmalloc_array() instead of kmalloc()
Documentation/process/deprecated.rst recommends against the use of
kmalloc with dynamic size calculations due to the risk of overflow and
smaller allocation being made than the caller was expecting.

Replace kmalloc() with kmalloc_array() in amdgpu_amdkfd_gfx_v10.c,
amdgpu_amdkfd_gfx_v10_3.c, amdgpu_amdkfd_gfx_v11.c and
amdgpu_amdkfd_gfx_v12.c to make the intended allocation size clearer
and avoid potential overflow issues.

Suggested-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Rahul Kumar <rk0006818@gmail.com>
Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-23 10:35:54 -04:00
Sonny Jiang 854b9ab637 drm/amdgpu: Update amdgpu_vcn5_fw_shared for vcn_5_0_1
Align vcn5_fw_shared structure with FW

Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-23 10:22:51 -04:00
Mario Limonciello 1fb710793c drm/amdgpu: Enable MES lr_compute_wa by default
The MES set resources packet has an optional bit 'lr_compute_wa'
which can be used for preventing MES hangs on long compute jobs.

Set this bit by default.

Co-developed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-23 10:22:38 -04:00
Sunil Khatri c5b3cc417b drm/amdgpu: use hmm_pfns instead of array of pages
we dont need to allocate local array of pages to hold
the pages returned by the hmm, instead we could use
the hmm_range structure itself to get to hmm_pfn
and get the required pages directly.

This avoids call to alloc/free quite a lot.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Acked-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-23 10:22:31 -04:00
Lijo Lazar b29c22b8da drm/amdgpu: Fix vbios build number parsing logic
It's not necessary that the build string and atom header section has a
difference of 32 bytes. Use the remaining bytes in the section as copy
limit.

Fixes: d6fa802661 ("drm/amdgpu: Add vbios build number interface")
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-23 10:21:09 -04:00
Dave Airlie 342f141ba9 amd-drm-next-6.18-2025-09-19:
amdgpu:
 - Fence drv clean up fix
 - DPC fixes
 - Misc display fixes
 - Support the MMIO remap page as a ttm pool
 - JPEG parser updates
 - UserQ updates
 - VCN ctx handling fixes
 - Documentation updates
 - Misc cleanups
 - SMU 13.0.x updates
 - SI DPM updates
 - GC 11.x cleaner shader updates
 - DMCUB updates
 - DML fixes
 - Improve fallback handling for pixel encoding
 - VCN reset improvements
 - DCE6 DC updates
 - DSC fixes
 - Use devm for i2c buses
 - GPUVM locking updates
 - GPUVM documentation improvements
 - Drop non-DC DCE11 code
 - S0ix fixes
 - Backlight fix
 - SR-IOV fixes
 
 amdkfd:
 - SVM updates
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCaM2u5AAKCRC93/aFa7yZ
 2PcOAQC0US+uIJxkp1gmJxlpWsceD8GhonYp456gSx743XiMfAD/W8d4VKKicWMe
 DDzSumgBgjEG7YpuzZJoaExNeS/LWAg=
 =oKqL
 -----END PGP SIGNATURE-----

Merge tag 'amd-drm-next-6.18-2025-09-19' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-6.18-2025-09-19:

amdgpu:
- Fence drv clean up fix
- DPC fixes
- Misc display fixes
- Support the MMIO remap page as a ttm pool
- JPEG parser updates
- UserQ updates
- VCN ctx handling fixes
- Documentation updates
- Misc cleanups
- SMU 13.0.x updates
- SI DPM updates
- GC 11.x cleaner shader updates
- DMCUB updates
- DML fixes
- Improve fallback handling for pixel encoding
- VCN reset improvements
- DCE6 DC updates
- DSC fixes
- Use devm for i2c buses
- GPUVM locking updates
- GPUVM documentation improvements
- Drop non-DC DCE11 code
- S0ix fixes
- Backlight fix
- SR-IOV fixes

amdkfd:
- SVM updates

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20250919193354.2989255-1-alexander.deucher@amd.com
2025-09-22 08:45:51 +10:00
Guangshuo Li cc9a8e238e drm/amdgpu/atom: Check kcalloc() for WS buffer in amdgpu_atom_execute_table_locked()
kcalloc() may fail. When WS is non-zero and allocation fails, ectx.ws
remains NULL while ectx.ws_size is set, leading to a potential NULL
pointer dereference in atom_get_src_int() when accessing WS entries.

Return -ENOMEM on allocation failure to avoid the NULL dereference.

Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-18 16:59:27 -04:00
Christian König 59e4405e9e drm/amdgpu: revert to old status lock handling v3
It turned out that protecting the status of each bo_va with a
spinlock was just hiding problems instead of solving them.

Revert the whole approach, add a separate stats_lock and lockdep
assertions that the correct reservation lock is held all over the place.

This not only allows for better checks if a state transition is properly
protected by a lock, but also switching back to using list macros to
iterate over the state of lists protected by the dma_resv lock of the
root PD.

v2: re-add missing check
v3: split into two patches

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-18 16:59:14 -04:00
Alex Deucher 9272bb34b0 drm/amdgpu: suspend KFD and KGD user queues for S0ix
We need to make sure the user queues are preempted so
GFX can enter gfxoff.

Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Tested-by: David Perry <david.perry@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit f8b367e6fa)
Cc: stable@vger.kernel.org
2025-09-18 14:59:41 -04:00
Alex Deucher 2ade36eaa9 drm/amdkfd: add proper handling for S0ix
When in S0i3, the GFX state is retained, so all we need to do
is stop the runlist so GFX can enter gfxoff.

Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Tested-by: David Perry <david.perry@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 4bfa860993)
Cc: stable@vger.kernel.org
2025-09-18 14:59:24 -04:00
Sunil Khatri 0aa09d8a6c drm/amdgpu: add missing comment for the new argument
In function 'amdgpu_vm_lock_done_list' update the comment
for the new argument 'vm'.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202509180211.UAqME0zj-lkp@intel.com/
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-18 09:43:38 -04:00
Alex Deucher f8b367e6fa drm/amdgpu: suspend KFD and KGD user queues for S0ix
We need to make sure the user queues are preempted so
GFX can enter gfxoff.

Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Tested-by: David Perry <david.perry@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-18 09:43:30 -04:00
Alex Deucher 846de1384a drm/amdgpu/userq: Optimize S0ix handling
In S0i3, GFX state is retained, so it's preferrable to
preempt queues rather than unmapping them as the overhead
is lower.

Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Tested-by: David Perry <david.perry@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-18 09:43:23 -04:00
Joe.Wang f05c03ffc7 drm/amdgpu: Fix PRT flag for gfx12
AMDGPU_PTE_PRT_GFX12 flag is missed during pageTable rework, add it back.

Fixes: 6716a823d1 ("drm/amdgpu: rework how PTE flags are generated v3")
Signed-off-by: Joe Wang <joe.wang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-18 09:43:10 -04:00
Xiang Liu 1ed511fb76 drm/amdgpu: Check VF critical region before RAS poison injection
Check VF critical region before RAS poison injection to ensure that the
poison injection will not hit the VF critical region.

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Shravan Kumar Gande <Shravankumar.Gande@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-18 09:43:10 -04:00
Alex Deucher 4bfa860993 drm/amdkfd: add proper handling for S0ix
When in S0i3, the GFX state is retained, so all we need to do
is stop the runlist so GFX can enter gfxoff.

Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Tested-by: David Perry <david.perry@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-18 09:43:02 -04:00
Xiang Liu f1fdeb3d07 drm/amdgpu: Introduce VF critical region check for RAS poison injection
The SRIOV guest send requet to host to check whether the poison
injection address is in VF critical region or not via mabox.

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Shravan Kumar Gande <Shravankumar.Gande@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-18 09:43:02 -04:00
Alex Deucher 18f769ff36 drm/amdgpu: remove non-DC DCE 11 code
DC has been the default for ~8 years now and supports
many things that the non-DC code does not (audio, DP MST, etc.).
No DCE 11.x IPs ever supported analog encoders so that is not
an issue.  Finally drop this code.

Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-18 09:43:02 -04:00
Christian König df99f6d112 drm/amdgpu: re-order and document VM code
Re-order fields in the VM structure and try to improve the
documentation a bit.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-16 17:51:48 -04:00
Christian König 930595df25 drm/amdgpu: remove check for BO reservation add assert instead
We should leave such checks to lockdep and not implement something
manually.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-16 17:51:35 -04:00
Rodrigo Siqueira 63137c7c8c drm/amdgpu: Use devm_i2c_add_adapter() in SMU V11
Instead of using i2c_add_adapter() and i2c_del_adapter() in the SMU V11,
use devm_i2c_add_adapter() to simplify the code path.

Signed-off-by: Rodrigo Siqueira <siqueira@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-16 17:47:24 -04:00
Rodrigo Siqueira 0f36a3c6af drm/amdgpu/amdgpu_i2c: Use devm_i2c_add_adapter instead of i2c_add_adapter
This commit replaces i2c_add_adapter() with devm_i2c_add_adapter() and
removes part of the cleanup logic since the new function handles the i2c
removal.

Signed-off-by: Rodrigo Siqueira <siqueira@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-16 17:47:22 -04:00
Christian König 39203f5e6d drm/amdgpu: fix userq VM validation v4
That was actually complete nonsense and not validating the BOs
at all. The code just cleared all VM areas were it couldn't grab the
lock for a BO.

Try to fix this. Only compile tested at the moment.

v2: fix fence slot reservation as well as pointed out by Sunil.
    also validate PDs, PTs, per VM BOs and update PDEs
v3: grab the status_lock while working with the done list.
v4: rename functions, add some comments, fix waiting for updates to
    complete.
v4: rename amdgpu_vm_lock_done_list(), add some more comments

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-16 17:47:06 -04:00
Christian König d7ddcf921e drm/amdgpu: reject gang submissions under SRIOV
Gang submission means that the kernel driver guarantees that multiple
submissions are executed on the HW at the same time on different engines.

Background is that those submissions then depend on each other and each
can't finish stand alone.

SRIOV now uses world switch to preempt submissions on the engines to allow
sharing the HW resources between multiple VFs.

The problem is now that the SRIOV world switch can't know about such inter
dependencies and will cause a timeout if it waits for a partially running
gang submission.

To conclude SRIOV and gang submissions are fundamentally incompatible at
the moment. For now just disable them.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-16 17:47:00 -04:00
Srinivasan Shanmugam c1b6b8c770 drm/amdgpu/gfx11: Add Cleaner Shader Support for GFX11.0.1/11.0.4 GPUs
Enable the cleaner shader for additional GFX11.0.1/11.0.4 series GPUs to
ensure data isolation among GPU tasks. The cleaner shader is tasked with
clearing the Local Data Store (LDS), Vector General Purpose Registers
(VGPRs), and Scalar General Purpose Registers (SGPRs), which helps avoid
data leakage and guarantees the accuracy of computational results.

This update extends cleaner shader support to GFX11.0.1/11.0.4 GPUs,
previously available for GFX11.0.3. It enhances security by clearing GPU
memory between processes and maintains a consistent GPU state across KGD
and KFD workloads.

Cc: Wasee Alam <wasee.alam@amd.com>
Cc: Mario Sopena-Novales <mario.novales@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 0a71ceb27f)
2025-09-15 17:23:42 -04:00
Christian König 2740509623 drm/amdgpu: revert "Implement new dummy vram manager"
This is should be unnecessary since a VRAM manager isn't mandatory in
the first place.

It could be that we have some missing checks inside AMDGPU or TTM but
those should then be fixed instead of worked around like that.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-15 17:04:49 -04:00
Christian König a9273da04f drm/amdgpu: add AMDGPU_IDS_FLAGS_GANG_SUBMIT
Add a UAPI flag indicating if gang submit is supported or not.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-15 17:04:42 -04:00
Shaoyun Liu 85442bac84 drm/amd/amdgpu: Fix the mes version that support inv_tlbs
MES pipe0 will do VM invalidation with engine set 5 when assign VMID to a process,
driver will submit inv_tlb package to mes pipe1. It might run into race condition
if both pipes use the same invalidate engine set. From MES version 0x83 it will use
invalidate engine set 6 for pipe1 to fix the issue

Signed-off-by: Shaoyun Liu <shaoyun.liu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-15 17:02:44 -04:00
Mario Limonciello (AMD) 531df041f2 drm/amd: Avoid evicting resources at S5
Normally resources are evicted on dGPUs at suspend or hibernate and
on APUs at hibernate.  These steps are unnecessary when using the S4
callbacks to put the system into S5.

Cc: AceLan Kao <acelan.kao@canonical.com>
Cc: Kai-Heng Feng <kaihengf@nvidia.com>
Cc: Mark Pearson <mpearson-lenovo@squebb.ca>
Cc: Denis Benato <benato.denis96@gmail.com>
Cc: Merthan Karakaş <m3rthn.k@gmail.com>
Tested-by: Eric Naim <dnaim@cachyos.org>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-15 17:02:39 -04:00
Jesse.Zhang bb1d7f157e drm/amdgpu: Switch user queues to use preempt/restore for eviction
This patch modifies the user queue management to use preempt/restore
operations instead of full map/unmap for queue eviction scenarios where
applicable. The changes include:

1. Introduces new helper functions:
   - amdgpu_userqueue_preempt_helper()
   - amdgpu_userqueue_restore_helper()

2. Updates queue state management to track PREEMPTED state

3. Modifies eviction handling to use preempt instead of unmap:
   - amdgpu_userq_evict_all() now uses preempt_helper
   - amdgpu_userq_restore_all() now uses restore_helper

The preempt/restore approach provides better performance during queue
eviction by avoiding the overhead of full queue teardown and setup.
Full map/unmap operations are still used for initial setup/teardown
and system suspend scenarios.

v2: rename amdgpu_userqueue_restore_helper/amdgpu_userqueue_preempt_helper to
amdgpu_userq_restore_helper/amdgpu_userq_preempt_helper for consistency. (Alex)

v3: amdgpu_userq_stop_sched_for_enforce_isolation() and
amdgpu_userq_start_sched_for_enforce_isolation() should use preempt and restore (Alex)

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-15 17:02:33 -04:00
Jesse.Zhang 5cefcbb306 drm/amdgpu: adjust MES API used for suspend and resume
Use the suspend and resume API rather than remove queue
and add queue API.  The former just preempts the queue
while the latter remove it from the scheduler completely.
There is no need to do that, we only need preemption
in this case.

V2: replace queue_active with queue state
v3: set the suspend_fence_addr
v4: allocate another per queue buffer for the suspend fence, and  set the sequence number.
    also wait for the suspend fence. (Alex)
v5: use a wb slot (Alex)
v6: Change the timeout period. For MES, the default timeout  is  2100000; /* 2100 ms */ (Alex)

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-15 17:02:28 -04:00
Hawking Zhang 46fbe1e349 Revert "drm/amdgpu: Allocate psp fw private buffer in vram"
This reverts commit 22dcb283d6.
Need to certain APU platforms and will proceed to rework
the patch accordingly

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Le Ma <Le.Ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-15 16:56:15 -04:00
Srinivasan Shanmugam 0a71ceb27f drm/amdgpu/gfx11: Add Cleaner Shader Support for GFX11.0.1/11.0.4 GPUs
Enable the cleaner shader for additional GFX11.0.1/11.0.4 series GPUs to
ensure data isolation among GPU tasks. The cleaner shader is tasked with
clearing the Local Data Store (LDS), Vector General Purpose Registers
(VGPRs), and Scalar General Purpose Registers (SGPRs), which helps avoid
data leakage and guarantees the accuracy of computational results.

This update extends cleaner shader support to GFX11.0.1/11.0.4 GPUs,
previously available for GFX11.0.3. It enhances security by clearing GPU
memory between processes and maintains a consistent GPU state across KGD
and KFD workloads.

Cc: Wasee Alam <wasee.alam@amd.com>
Cc: Mario Sopena-Novales <mario.novales@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-15 16:56:07 -04:00