renderer/vulkan: compositor-modifier intersection unlocks NVIDIA direct mode

Target.init now picks a modifier by intersecting two channels:
- the GPU's supported modifiers for `format` with COLOR_ATTACHMENT |
  TRANSFER_SRC | SAMPLED feature bits (filtered to single plane);
- the compositor's accepted modifiers, fetched via a new
  ghostty_platform_vulkan_s.get_supported_modifiers callback.
First non-LINEAR hit wins (vendor-tiled is the perf path on every
modern GPU); LINEAR is the fallback; legacy_copy stays the floor.

NVIDIA RTX 2080 + Vulkan 1.4.329 verified: Target now picks
DRM_FORMAT_MOD_NVIDIA_* (0x300000000606015), Target.tiling=.direct,
image_backed=1, dmabuf flows through wl_subsurface without
protocol errors. Where Phase 1 left NVIDIA at legacy_copy + QImage,
this lands the full zero-copy path.

The new callback's data source is the zwp_linux_dmabuf_v1 format/
modifier events. SubsurfacePresenter.cpp's globals discovery now
listens for those events during its private-queue roundtrip (two
roundtrips: bind, then collect events) and caches them in a
process-wide (format → modifiers) table. Host::instance() eagerly
primes this on the GUI thread so the renderer-thread callback is a
lock-free read of an immutable map.

Renderer changes:
- Target.pickModifier replaces the LINEAR-only probe; intersects
  host ∩ GPU, preferring non-LINEAR single-plane modifiers.
- Target.initDirect now switches create-info variants by chosen
  modifier: EXPLICIT for LINEAR (we know rowPitch), LIST for
  vendor-tiled (driver picks opaque layout, we query back via
  vkGetImageDrmFormatModifierPropertiesEXT and vkGetImageSubresourceLayout).
- Direct-mode memory switches to DEVICE_LOCAL — image_backed=true
  means the host won't mmap, so we no longer need HOST_VISIBLE
  (and many drivers won't expose HOST_VISIBLE bits for tiled
  exportable images anyway).
- Device.zig adds vkGetImageDrmFormatModifierPropertiesEXT and
  vkGetPhysicalDeviceFormatProperties2 to the dispatch table.

Host changes:
- qt/src/vulkan/Host.cpp adds VK_EXT_image_drm_format_modifier to
  kRequiredDeviceExtensions so the device-level proc-addr lookup
  for vkGetImageDrmFormatModifierPropertiesEXT actually resolves.
- wl_compositor bound at version min(advertised, 6) so the child
  wl_surface supports set_buffer_scale (added in v3). Guarded the
  set_buffer_scale call by wl_proxy_get_version for older
  compositors.

Co-Authored-By: claude-flow <ruv@ruv.net>
pull/12846/head
Nathan 2026-05-24 23:39:43 -05:00
parent 9a7a31ac37
commit 33560fe83e
7 changed files with 380 additions and 79 deletions

View File

@ -513,6 +513,26 @@ typedef struct {
void* (*queue)(void* userdata); // VkQueue
uint32_t (*queue_family_index)(void* userdata);
// Compositor-supported DRM modifiers for a given DRM_FORMAT_*
// fourcc, as advertised by linux-dmabuf-v1's `modifier` events.
// libghostty intersects this with what its physical device
// supports for COLOR_ATTACHMENT to pick a tiling that the
// compositor will actually accept on attach. Without this
// intersection, drivers that don't expose COLOR_ATTACHMENT for
// the LINEAR modifier (NVIDIA) can't use the direct-export path
// and fall back to a CPU-readback path.
//
// Two-pass usage: call with `out=NULL, capacity=0` to query the
// total count; allocate; call again to fill. Returns the number
// of modifiers actually written (capped at `capacity`). May
// return 0 if the format isn't compositor-supported or the host
// doesn't speak linux-dmabuf-v1.
size_t (*get_supported_modifiers)(
void* userdata,
uint32_t drm_format,
uint64_t* out,
size_t capacity);
// Hand off a rendered frame to the host as a dmabuf fd. The host
// imports it (e.g. into Qt's RHI as a QRhiTexture, or attaches to
// a wl_subsurface via linux-dmabuf-v1) and composites.

View File

@ -9,6 +9,8 @@
#include <optional>
#include <vector>
#include "../wayland/SubsurfacePresenter.h"
namespace vulkan {
// Forward declaration of the entry point in `GhosttySurface.cpp` that
@ -30,6 +32,13 @@ namespace {
constexpr const char *kRequiredDeviceExtensions[] = {
"VK_KHR_external_memory_fd",
"VK_EXT_external_memory_dma_buf",
// Needed so libghostty can allocate render images with a chosen
// DRM modifier (vendor-tiled where supported) and query the
// driver-chosen layout back via
// `vkGetImageDrmFormatModifierPropertiesEXT`. Without it on the
// host's VkDevice, the device-level proc-addr lookup for that
// function returns null and Target.init fails.
"VK_EXT_image_drm_format_modifier",
};
bool hasRequiredExtensions(VkPhysicalDevice pd) {
@ -108,6 +117,15 @@ uint32_t cbQueueFamilyIndex(void *ud) {
return host != nullptr ? host->vkQueueFamilyIndex() : 0;
}
size_t cbGetSupportedModifiers(void *ud, uint32_t drm_format,
uint64_t *out, size_t capacity) {
(void)ud;
// Always-safe read: the registry was primed eagerly on the GUI
// thread when Host::instance() first ran, so any renderer-thread
// call sees a fully-populated immutable table.
return ::wayland::supportedDmabufModifiers(drm_format, out, capacity);
}
void cbPresent(
void *ud,
int dmabuf_fd,
@ -229,6 +247,7 @@ ghostty_platform_vulkan_s Host::asPlatform(void *surface_userdata) const {
p.device = cbDevice;
p.queue = cbQueue;
p.queue_family_index = cbQueueFamilyIndex;
p.get_supported_modifiers = cbGetSupportedModifiers;
p.present = cbPresent;
return p;
}
@ -243,6 +262,14 @@ Host *Host::instance() {
}
// candidate's destructor runs on init failure and cleans up
// any partial state.
// Eagerly prime the dmabuf modifier registry while we're
// guaranteed to be on the GUI thread (Host::instance is called
// from GhosttySurface's ctor before the renderer thread spawns).
// From here on, `wayland::supportedDmabufModifiers` is a
// lock-free read of an immutable table, safe to call from the
// renderer thread via `cbGetSupportedModifiers`.
::wayland::primeDmabufModifierRegistry();
});
return host.get();
}

View File

@ -1,7 +1,10 @@
#include "SubsurfacePresenter.h"
#include <algorithm>
#include <cstdio>
#include <cstring>
#include <unordered_map>
#include <vector>
#include <QGuiApplication>
#include <QLatin1String>
@ -16,23 +19,61 @@ namespace wayland {
namespace {
// Process-wide bindings for the Wayland globals the presenter needs.
// Lazily discovered on first `tryCreate`, mirrors the `blurManager`
// pattern in `qt/src/WindowBlur.cpp` — registry roundtrip happens on
// a private event queue so we never dispatch Qt's own Wayland events.
// Process-wide bindings for the Wayland globals the presenter needs,
// plus the (format → modifiers) table the compositor advertises via
// zwp_linux_dmabuf_v1's format/modifier events. Populated once by
// `discoverGlobals` on the GUI thread; subsequent reads from the
// renderer thread are safe because the table is never mutated after
// the initial discovery completes.
struct PresenterGlobals {
wl_compositor *compositor = nullptr;
wl_subcompositor *subcompositor = nullptr;
zwp_linux_dmabuf_v1 *dmabuf = nullptr;
std::unordered_map<uint32_t, std::vector<uint64_t>> modifiers;
bool searched = false;
};
PresenterGlobals &globalState() {
static PresenterGlobals g;
return g;
}
// Pre-v4 dmabuf format event. We ignore it: v3 also fires `modifier`
// events for every (format, modifier) tuple including LINEAR — the
// `format` event is legacy from v1/v2 when modifiers didn't exist.
void dmabufFormat(void *, zwp_linux_dmabuf_v1 *, uint32_t /*format*/) {}
// `modifier` event: compositor advertises one (format, modifier) it
// can scan out. Fires once per pair during the bind roundtrip; we
// stash them all in the per-format vector. Duplicate-keyed inserts
// are theoretically possible across compositor restarts but won't
// happen within a single bind round, so we don't dedupe.
void dmabufModifier(void *data, zwp_linux_dmabuf_v1 *, uint32_t format,
uint32_t modifier_hi, uint32_t modifier_lo) {
auto *g = static_cast<PresenterGlobals *>(data);
const uint64_t modifier =
(static_cast<uint64_t>(modifier_hi) << 32) | modifier_lo;
g->modifiers[format].push_back(modifier);
}
const zwp_linux_dmabuf_v1_listener kDmabufListener = {
dmabufFormat,
dmabufModifier,
};
void registryGlobal(void *data, wl_registry *registry, uint32_t name,
const char *interface, uint32_t /*version*/) {
const char *interface, uint32_t version) {
auto *g = static_cast<PresenterGlobals *>(data);
if (std::strcmp(interface, wl_compositor_interface.name) == 0) {
// Bind wl_compositor at version 3+ so child wl_surfaces we
// create support `set_buffer_scale` (added in v3, used by the
// presenter on HiDPI displays). Cap at v6 (the highest we've
// tested against); if the compositor advertises less, take
// what we get and `presentDmabuf` will skip the buffer_scale
// call on those compositors.
const uint32_t v = std::min<uint32_t>(version, 6u);
g->compositor = static_cast<wl_compositor *>(
wl_registry_bind(registry, name, &wl_compositor_interface, 1));
wl_registry_bind(registry, name, &wl_compositor_interface, v));
} else if (std::strcmp(interface, wl_subcompositor_interface.name) == 0) {
g->subcompositor = static_cast<wl_subcompositor *>(
wl_registry_bind(registry, name, &wl_subcompositor_interface, 1));
@ -44,6 +85,9 @@ void registryGlobal(void *data, wl_registry *registry, uint32_t name,
// dynamic format/modifier feedback dance; we don't need it yet.
g->dmabuf = static_cast<zwp_linux_dmabuf_v1 *>(wl_registry_bind(
registry, name, &zwp_linux_dmabuf_v1_interface, 3));
// Add the listener immediately so the modifier events queued by
// the bind get delivered when the dispatch loop continues.
zwp_linux_dmabuf_v1_add_listener(g->dmabuf, &kDmabufListener, g);
}
}
void registryGlobalRemove(void *, wl_registry *, uint32_t) {}
@ -54,7 +98,7 @@ const wl_registry_listener kRegistryListener = {
};
PresenterGlobals *discoverGlobals(wl_display *display) {
static PresenterGlobals globals;
PresenterGlobals &globals = globalState();
if (globals.searched) return &globals;
globals.searched = true;
@ -62,8 +106,24 @@ PresenterGlobals *discoverGlobals(wl_display *display) {
wl_registry *registry = wl_display_get_registry(display);
wl_proxy_set_queue(reinterpret_cast<wl_proxy *>(registry), queue);
wl_registry_add_listener(registry, &kRegistryListener, &globals);
// Roundtrip 1: bind compositor/subcompositor/dmabuf. Inside the
// registry callback we attach the dmabuf listener immediately, so
// any format/modifier events that arrive in the same dispatch
// pass fire on it.
wl_display_roundtrip_queue(display, queue);
wl_registry_destroy(registry);
// Roundtrip 2: belt-and-suspenders for any compositor that defers
// the modifier events past the bind reply (most don't, but some
// batch them). After this returns the modifier table is fully
// populated and frozen for the process lifetime.
if (globals.dmabuf) wl_display_roundtrip_queue(display, queue);
std::size_t total_mods = 0;
for (const auto &kv : globals.modifiers) total_mods += kv.second.size();
std::fprintf(stderr,
"[ghastty] wayland: discovered %zu dmabuf (format,modifier) "
"pairs across %zu formats\n",
total_mods, globals.modifiers.size());
// Move the bound proxies back to the default queue so Qt's main
// dispatch drives subsequent events on them, then drop the private
@ -81,6 +141,15 @@ PresenterGlobals *discoverGlobals(wl_display *display) {
return &globals;
}
wl_display *acquireWaylandDisplay() {
if (!QGuiApplication::platformName().startsWith(QLatin1String("wayland")))
return nullptr;
QPlatformNativeInterface *native = QGuiApplication::platformNativeInterface();
if (!native) return nullptr;
return static_cast<wl_display *>(
native->nativeResourceForIntegration("wl_display"));
}
// wl_buffer::release listener: the compositor is done sampling the
// buffer for any committed surface state, so we can destroy our
// client-side handle. The underlying dmabuf memory is owned by
@ -96,6 +165,26 @@ const wl_buffer_listener kBufferListener = {
} // namespace
void primeDmabufModifierRegistry() {
if (wl_display *display = acquireWaylandDisplay()) {
(void)discoverGlobals(display);
}
}
std::size_t supportedDmabufModifiers(std::uint32_t drm_format,
std::uint64_t *out,
std::size_t capacity) {
const PresenterGlobals &g = globalState();
if (!g.searched) return 0;
auto it = g.modifiers.find(drm_format);
if (it == g.modifiers.end()) return 0;
const std::size_t available = it->second.size();
if (out == nullptr || capacity == 0) return available;
const std::size_t copied = std::min(available, capacity);
std::memcpy(out, it->second.data(), copied * sizeof(std::uint64_t));
return copied;
}
std::unique_ptr<SubsurfacePresenter>
SubsurfacePresenter::tryCreate(QWindow *parent) {
if (!parent) return nullptr;
@ -223,7 +312,11 @@ void SubsurfacePresenter::presentDmabuf(int fd, uint32_t drm_format,
// is harmless but the compositor's bookkeeping is cheaper if we
// skip the redundant request.
if (buffer_scale != m_lastBufferScale) {
wl_surface_set_buffer_scale(m_childSurface, buffer_scale);
// set_buffer_scale was added in wl_surface v3; guard against
// older compositors that bind us at v1/v2 (rare but possible).
if (wl_proxy_get_version(reinterpret_cast<wl_proxy *>(m_childSurface)) >= 3) {
wl_surface_set_buffer_scale(m_childSurface, buffer_scale);
}
m_lastBufferScale = buffer_scale;
}

View File

@ -6,6 +6,15 @@
// subsurface. The compositor scans the buffers out directly — no
// mmap, no memcpy, no QImage, no QPainter blit on the present path.
//
// Also exposes the process-wide compositor modifier registry
// (`primeDmabufModifierRegistry` / `supportedDmabufModifiers`)
// learned from zwp_linux_dmabuf_v1's format/modifier events.
// libghostty's Vulkan renderer queries this via the
// `get_supported_modifiers` platform callback to pick a modifier
// the compositor will actually accept — without that intersection,
// drivers that don't expose COLOR_ATTACHMENT for LINEAR (NVIDIA)
// can't get into Target's direct-export mode at all.
//
// Wayland-only by project decision (the Qt frontend is Wayland-only;
// see `feedback-qt-no-x11` memory). If the host isn't on a Wayland
// QPA platform or the compositor lacks the required globals,
@ -14,6 +23,7 @@
#pragma once
#include <cstddef>
#include <cstdint>
#include <memory>
@ -25,6 +35,28 @@ class QWindow;
namespace wayland {
// Eagerly discover the compositor's globals (incl. the
// zwp_linux_dmabuf_v1 format/modifier list) on the calling thread.
// MUST be called from the GUI thread before any
// `supportedDmabufModifiers` reader runs (the renderer thread). Safe
// to call multiple times — discovery happens exactly once.
//
// Idempotent no-op if the QPA isn't Wayland or the
// QPlatformNativeInterface lookup fails.
void primeDmabufModifierRegistry();
// Read the cached compositor-supported DRM modifiers for the given
// DRM_FORMAT_* fourcc. Returns the number of modifiers actually
// written to `out` (capped at `capacity`). Pass `out=nullptr,
// capacity=0` to query the total count.
//
// Thread-safe for readers once `primeDmabufModifierRegistry` has
// returned. Returns 0 if the registry hasn't been primed yet or the
// format isn't advertised.
std::size_t supportedDmabufModifiers(std::uint32_t drm_format,
std::uint64_t *out,
std::size_t capacity);
class SubsurfacePresenter {
public:
// Build a subsurface parented to `parent`'s native `wl_surface`,

View File

@ -424,6 +424,20 @@ pub const Platform = union(PlatformTag) {
queue: *const fn (?*anyopaque) callconv(.c) ?*anyopaque,
queue_family_index: *const fn (?*anyopaque) callconv(.c) u32,
/// Query the compositor-supported DRM modifiers for a given
/// DRM_FORMAT_* fourcc. Two-pass usage: call with
/// `out=null, capacity=0` for the count, then again with a
/// buffer of that size. Returns the number of modifiers
/// actually written. The renderer intersects this with the
/// GPU's per-modifier feature set to pick a tiling the
/// compositor will accept on attach.
get_supported_modifiers: *const fn (
?*anyopaque,
u32, // DRM_FORMAT_*
?[*]u64, // out
usize, // capacity
) callconv(.c) usize,
/// Hand off a rendered frame to the host as a dmabuf fd. The
/// host imports it for composition; libghostty retains
/// ownership of the underlying VkDeviceMemory and the fd is
@ -479,6 +493,12 @@ pub const Platform = union(PlatformTag) {
device: ?*const fn (?*anyopaque) callconv(.c) ?*anyopaque,
queue: ?*const fn (?*anyopaque) callconv(.c) ?*anyopaque,
queue_family_index: ?*const fn (?*anyopaque) callconv(.c) u32,
get_supported_modifiers: ?*const fn (
?*anyopaque,
u32,
?[*]u64,
usize,
) callconv(.c) usize,
present: ?*const fn (
?*anyopaque,
i32,
@ -541,6 +561,8 @@ pub const Platform = union(PlatformTag) {
break :vulkan error.QueueMustBeSet,
.queue_family_index = config.queue_family_index orelse
break :vulkan error.QueueFamilyIndexMustBeSet,
.get_supported_modifiers = config.get_supported_modifiers orelse
break :vulkan error.GetSupportedModifiersMustBeSet,
.present = config.present orelse
break :vulkan error.PresentMustBeSet,
} };

View File

@ -163,6 +163,11 @@ pub const Dispatch = struct {
// device-level resolution like any other device function.
getMemoryFdKHR: std.meta.Child(vk.PFN_vkGetMemoryFdKHR),
getImageSubresourceLayout: std.meta.Child(vk.PFN_vkGetImageSubresourceLayout),
/// From `VK_EXT_image_drm_format_modifier`. Used by
/// `vulkan/Target.zig` after creating an image with the LIST
/// variant of the modifier create-info to discover which
/// modifier the driver actually chose.
getImageDrmFormatModifierPropertiesEXT: std.meta.Child(vk.PFN_vkGetImageDrmFormatModifierPropertiesEXT),
// Per-frame sync (fence + command-buffer reset) used by
// `vulkan/Frame.zig`.
@ -466,6 +471,8 @@ pub fn init(
try dl.load(vk.PFN_vkGetMemoryFdKHR, "vkGetMemoryFdKHR");
const get_image_subresource_layout =
try dl.load(vk.PFN_vkGetImageSubresourceLayout, "vkGetImageSubresourceLayout");
const get_image_drm_format_modifier_properties_ext =
try dl.load(vk.PFN_vkGetImageDrmFormatModifierPropertiesEXT, "vkGetImageDrmFormatModifierPropertiesEXT");
const create_fence =
try dl.load(vk.PFN_vkCreateFence, "vkCreateFence");
const destroy_fence =
@ -557,6 +564,7 @@ pub fn init(
.destroyPipeline = destroy_pipeline,
.getMemoryFdKHR = get_memory_fd_khr,
.getImageSubresourceLayout = get_image_subresource_layout,
.getImageDrmFormatModifierPropertiesEXT = get_image_drm_format_modifier_properties_ext,
.createFence = create_fence,
.destroyFence = destroy_fence,
.waitForFences = wait_for_fences,

View File

@ -148,34 +148,73 @@ pub fn init(opts: Options) Error!Self {
vk.VK_FORMAT_FEATURE_TRANSFER_SRC_BIT |
vk.VK_FORMAT_FEATURE_SAMPLED_IMAGE_BIT;
if (try probeLinearModifierSupported(dev, opts.format, required_features)) {
const picked = try pickModifier(dev, opts.format, drm_format, required_features);
if (picked) |m| {
const tag: []const u8 = if (m == DRM_FORMAT_MOD_LINEAR)
"LINEAR"
else
"vendor-tiled";
log.info(
"Target: direct dmabuf export (LINEAR modifier) {}x{}",
.{ opts.width, opts.height },
"Target: direct dmabuf export ({s} modifier 0x{x}) {}x{}",
.{ tag, m, opts.width, opts.height },
);
return try initDirect(opts, drm_format);
} else {
log.warn(
"Target: LINEAR modifier lacks COLOR_ATTACHMENT support; " ++
"falling back to OPTIMAL render + LINEAR-buffer copy",
.{},
);
return try initLegacyCopy(opts, drm_format);
return try initDirect(opts, drm_format, m);
}
log.warn(
"Target: no usable single-plane modifier with COLOR_ATTACHMENT " ++
"in compositor ∩ GPU intersection; falling back to " ++
"OPTIMAL render + LINEAR-buffer copy",
.{},
);
return try initLegacyCopy(opts, drm_format);
}
/// Ask the driver, via `VK_EXT_image_drm_format_modifier`'s
/// per-modifier feature list, whether `DRM_FORMAT_MOD_LINEAR`
/// supports the format-feature flags we need to use the image as a
/// color attachment + transfer source + sampled.
fn probeLinearModifierSupported(
/// Intersect the compositor's accepted modifier list (from the host
/// callback) with the GPU's supported modifiers for `format` (queried
/// via `VK_EXT_image_drm_format_modifier`), filtered by single-plane
/// + the required format-feature flags. Prefer the first non-LINEAR
/// hit (vendor-tiled NVIDIA block-linear, AMD DCC variants, Intel
/// Y-tiled; these are where the perf win lives on most hardware).
/// Fall back to LINEAR if it's in the intersection. Return null when
/// no modifier qualifies the caller drops to `.legacy_copy`.
///
/// Why both intersections matter:
/// - GPU-only: passes on AMD/Intel for LINEAR but NVIDIA never
/// exposes COLOR_ATTACHMENT for LINEAR direct mode would
/// create the image OK but rasterize nothing.
/// - Compositor-only: GPU may not be able to render into the
/// compositor's preferred tilings (drivers don't always expose
/// COLOR_ATTACHMENT for every modifier).
fn pickModifier(
dev: *const Device,
format: vk.VkFormat,
drm_format: u32,
required_features: vk.VkFormatFeatureFlags,
) Error!bool {
var mods: [MAX_MODIFIERS]vk.VkDrmFormatModifierPropertiesEXT = undefined;
) Error!?u64 {
// Compositor side: ask the host what it will accept on attach.
// Two-pass query (NULL out + capacity 0 returns count). Empty
// result means the compositor doesn't speak linux-dmabuf-v1 or
// doesn't advertise this format direct mode would still likely
// work for AMD/Intel LINEAR but the compositor attach would
// fail, so treat it as "no intersection."
var host_mods: [MAX_MODIFIERS]u64 = undefined;
const host_count = dev.platform.get_supported_modifiers(
dev.platform.userdata,
drm_format,
&host_mods,
MAX_MODIFIERS,
);
if (host_count == 0) {
log.warn(
"host advertises no dmabuf modifiers for format 0x{x}; " ++
"cannot use direct mode",
.{drm_format},
);
return null;
}
// First pass: get count.
// GPU side: enumerate modifiers + their per-modifier feature bits.
var gpu_mods: [MAX_MODIFIERS]vk.VkDrmFormatModifierPropertiesEXT = undefined;
var mod_list: vk.VkDrmFormatModifierPropertiesListEXT = .{
.sType = vk.VK_STRUCTURE_TYPE_DRM_FORMAT_MODIFIER_PROPERTIES_LIST_EXT,
.pNext = null,
@ -192,43 +231,64 @@ fn probeLinearModifierSupported(
format,
&props2,
);
if (mod_list.drmFormatModifierCount == 0) return false;
if (mod_list.drmFormatModifierCount == 0) return null;
if (mod_list.drmFormatModifierCount > MAX_MODIFIERS) {
// Cap to our stack buffer; we only look for LINEAR (which
// tends to be first or close to it), so a truncation here is
// very unlikely to hide it. Log if we ever hit this.
log.warn(
"modifier list truncated: driver reports {}, MAX_MODIFIERS={}",
"GPU modifier list truncated: driver reports {}, MAX_MODIFIERS={}",
.{ mod_list.drmFormatModifierCount, MAX_MODIFIERS },
);
mod_list.drmFormatModifierCount = MAX_MODIFIERS;
}
// Second pass: fill list.
mod_list.pDrmFormatModifierProperties = &mods[0];
mod_list.pDrmFormatModifierProperties = &gpu_mods[0];
dev.dispatch.getPhysicalDeviceFormatProperties2(
dev.physical_device,
format,
&props2,
);
for (mods[0..mod_list.drmFormatModifierCount]) |m| {
if (m.drmFormatModifier != DRM_FORMAT_MOD_LINEAR) continue;
// Single-plane only multi-plane modifiers need a wider
// present-callback ABI (one fd/offset/stride per plane).
if (m.drmFormatModifierPlaneCount != 1) continue;
if ((m.drmFormatModifierTilingFeatures & required_features) == required_features) {
return true;
var has_linear: bool = false;
var best_tiled: ?u64 = null;
for (gpu_mods[0..mod_list.drmFormatModifierCount]) |gm| {
// Single-plane only: present callback ABI passes one fd /
// offset / stride. Multi-plane (AMD AFBC, some video
// formats) needs a wider ABI.
if (gm.drmFormatModifierPlaneCount != 1) continue;
if ((gm.drmFormatModifierTilingFeatures & required_features) != required_features) continue;
// Intersect with what the compositor accepts.
var compositor_ok = false;
for (host_mods[0..host_count]) |hm| {
if (hm == gm.drmFormatModifier) {
compositor_ok = true;
break;
}
}
if (!compositor_ok) continue;
if (gm.drmFormatModifier == DRM_FORMAT_MOD_LINEAR) {
has_linear = true;
} else if (best_tiled == null) {
best_tiled = gm.drmFormatModifier;
}
}
return false;
if (best_tiled) |m| return m;
if (has_linear) return DRM_FORMAT_MOD_LINEAR;
return null;
}
/// `.direct` mode: allocate the render image with
/// `VkImageDrmFormatModifierExplicitCreateInfoEXT` and export its own
/// memory as the dmabuf.
fn initDirect(opts: Options, drm_format: u32) Error!Self {
/// `VK_EXT_image_drm_format_modifier` so its own memory can be
/// exported as the dmabuf. Two create-info variants depending on
/// the chosen modifier:
/// - LINEAR: EXPLICIT layout (we know rowPitch = width*bpp).
/// Lets us populate `stride` deterministically without a
/// post-create driver query.
/// - non-LINEAR (vendor-tiled): LIST with a single-modifier list.
/// The driver picks the only option and computes its own
/// internal layout; we recover the chosen modifier via
/// `vkGetImageDrmFormatModifierPropertiesEXT` (sanity check
/// it should equal `chosen_mod`) and the per-plane layout via
/// `vkGetImageSubresourceLayout` for the right `stride` value.
fn initDirect(opts: Options, drm_format: u32, chosen_mod: u64) Error!Self {
const dev = opts.device;
const image_usage = @as(vk.VkImageUsageFlags, vk.VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT) |
@ -236,11 +296,10 @@ fn initDirect(opts: Options, drm_format: u32) Error!Self {
vk.VK_IMAGE_USAGE_TRANSFER_SRC_BIT |
opts.extra_usage;
// BGRA8, single-plane LINEAR rowPitch is just width * bpp.
const bytes_per_pixel: u32 = 4;
const row_pitch: vk.VkDeviceSize = @as(vk.VkDeviceSize, opts.width) * bytes_per_pixel;
// ---- 1. Image: LINEAR-modifier, externally-shareable -----------
// ---- 1. Image: modifier-aware, externally-shareable -----------
const plane_layout: vk.VkSubresourceLayout = .{
.offset = 0,
.size = 0, // ignored for EXPLICIT create-info
@ -248,16 +307,30 @@ fn initDirect(opts: Options, drm_format: u32) Error!Self {
.arrayPitch = 0,
.depthPitch = 0,
};
const mod_create: vk.VkImageDrmFormatModifierExplicitCreateInfoEXT = .{
const explicit_create: vk.VkImageDrmFormatModifierExplicitCreateInfoEXT = .{
.sType = vk.VK_STRUCTURE_TYPE_IMAGE_DRM_FORMAT_MODIFIER_EXPLICIT_CREATE_INFO_EXT,
.pNext = null,
.drmFormatModifier = DRM_FORMAT_MOD_LINEAR,
.drmFormatModifierPlaneCount = 1,
.pPlaneLayouts = &plane_layout,
};
// Single-modifier list the driver "picks" the only option, but
// crucially computes its own opaque internal layout for the
// tiling, which we don't have to know.
const list_mod = chosen_mod;
const list_create: vk.VkImageDrmFormatModifierListCreateInfoEXT = .{
.sType = vk.VK_STRUCTURE_TYPE_IMAGE_DRM_FORMAT_MODIFIER_LIST_CREATE_INFO_EXT,
.pNext = null,
.drmFormatModifierCount = 1,
.pDrmFormatModifiers = &list_mod,
};
const mod_pnext: ?*const anyopaque = if (chosen_mod == DRM_FORMAT_MOD_LINEAR)
@ptrCast(&explicit_create)
else
@ptrCast(&list_create);
const ext_image_info: vk.VkExternalMemoryImageCreateInfo = .{
.sType = vk.VK_STRUCTURE_TYPE_EXTERNAL_MEMORY_IMAGE_CREATE_INFO,
.pNext = &mod_create,
.pNext = mod_pnext,
.handleTypes = vk.VK_EXTERNAL_MEMORY_HANDLE_TYPE_DMA_BUF_BIT_EXT,
};
const image_info: vk.VkImageCreateInfo = .{
@ -279,37 +352,33 @@ fn initDirect(opts: Options, drm_format: u32) Error!Self {
};
var image: vk.VkImage = undefined;
if (dev.dispatch.createImage(dev.device, &image_info, null, &image) != vk.VK_SUCCESS) {
log.err("vkCreateImage (Target direct) failed", .{});
log.err("vkCreateImage (Target direct, mod=0x{x}) failed", .{chosen_mod});
return error.VulkanFailed;
}
errdefer dev.dispatch.destroyImage(dev.device, image, null);
// ---- 2. Image memory: exportable, host-cacheable for Qt mmap ---
// ---- 2. Image memory: exportable ---------------------------------
var image_reqs: vk.VkMemoryRequirements = undefined;
dev.dispatch.getImageMemoryRequirements(dev.device, image, &image_reqs);
// HOST_CACHED matters: Qt's `presentVulkanDmabuf` mmaps and reads
// every pixel into a QImage. Without HOST_CACHED, NVIDIA hands
// back write-combining memory and that read crawls (see legacy
// path note for the ~260 ms regression we hit). HOST_COHERENT
// avoids explicit flushes. Fall back to uncached if cached isn't
// available for the memory type bits the image requires.
const host_flags_cached =
@as(vk.VkMemoryPropertyFlags, vk.VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT) |
vk.VK_MEMORY_PROPERTY_HOST_COHERENT_BIT |
vk.VK_MEMORY_PROPERTY_HOST_CACHED_BIT;
const host_flags_uncached =
@as(vk.VkMemoryPropertyFlags, vk.VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT) |
vk.VK_MEMORY_PROPERTY_HOST_COHERENT_BIT;
const image_mem_idx = dev.findMemoryType(image_reqs.memoryTypeBits, host_flags_cached) orelse
dev.findMemoryType(image_reqs.memoryTypeBits, host_flags_uncached) orelse
{
log.err(
"no HOST_VISIBLE memory type for direct dmabuf image (typeBits=0x{x})",
.{image_reqs.memoryTypeBits},
);
return error.NoSuitableMemoryType;
};
// In direct mode the host doesn't mmap the dmabuf it imports it
// as a 2D image into the compositor (`image_backed=true` per
// `Target.present`). So DEVICE_LOCAL is the right choice: GPU-
// local memory is faster for the COLOR_ATTACHMENT_OUTPUT writes,
// and vendor-tiled modifiers often require it on drivers like
// NVIDIA (which won't expose HOST_VISIBLE memory types for the
// bits a tiled exportable image requires anyway).
const image_mem_idx = dev.findMemoryType(
image_reqs.memoryTypeBits,
vk.VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT,
) orelse {
log.err(
"no DEVICE_LOCAL memory type for direct dmabuf image " ++
"(mod=0x{x} typeBits=0x{x})",
.{ chosen_mod, image_reqs.memoryTypeBits },
);
return error.NoSuitableMemoryType;
};
const export_info: vk.VkExportMemoryAllocateInfo = .{
.sType = vk.VK_STRUCTURE_TYPE_EXPORT_MEMORY_ALLOCATE_INFO,
.pNext = null,
@ -340,9 +409,39 @@ fn initDirect(opts: Options, drm_format: u32) Error!Self {
const fd = try exportDmabufFd(dev, image_memory);
errdefer std.posix.close(fd);
// ---- 5. Query the actual plane stride --------------------------
// We requested rowPitch = width * 4 via EXPLICIT create-info, but
// the driver can technically round up; ask for what we actually got.
// ---- 5. Confirm the actual modifier + plane layout -------------
// For non-LINEAR we used LIST create-info (one entry), so the
// driver "picked" the only option. We query back via
// `vkGetImageDrmFormatModifierPropertiesEXT` as a sanity check
// and log a warning if the driver returned a different modifier
// that would indicate a driver bug or our list being ignored.
var actual_mod = chosen_mod;
if (chosen_mod != DRM_FORMAT_MOD_LINEAR) {
var mod_props: vk.VkImageDrmFormatModifierPropertiesEXT = .{
.sType = vk.VK_STRUCTURE_TYPE_IMAGE_DRM_FORMAT_MODIFIER_PROPERTIES_EXT,
.pNext = null,
.drmFormatModifier = 0,
};
if (dev.dispatch.getImageDrmFormatModifierPropertiesEXT(
dev.device,
image,
&mod_props,
) == vk.VK_SUCCESS) {
actual_mod = mod_props.drmFormatModifier;
if (actual_mod != chosen_mod) {
log.warn(
"driver chose modifier 0x{x}, we asked for 0x{x}",
.{ actual_mod, chosen_mod },
);
}
}
}
// Plane 0 layout: rowPitch is what we report as `stride` to the
// compositor. For LINEAR this is width*bpp (possibly padded).
// For vendor-tiled formats the value is implementation-specific
// the compositor's GPU knows how to interpret it given the
// modifier we report alongside.
var subres: vk.VkImageSubresource = .{
.aspectMask = vk.VK_IMAGE_ASPECT_MEMORY_PLANE_0_BIT_EXT,
.mipLevel = 0,
@ -365,7 +464,7 @@ fn initDirect(opts: Options, drm_format: u32) Error!Self {
.height = opts.height,
.fd = fd,
.drm_format = drm_format,
.drm_modifier = DRM_FORMAT_MOD_LINEAR,
.drm_modifier = actual_mod,
.stride = @intCast(layout.rowPitch),
};
}