renderer/vulkan: compositor-modifier intersection unlocks NVIDIA direct mode
Target.init now picks a modifier by intersecting two channels: - the GPU's supported modifiers for `format` with COLOR_ATTACHMENT | TRANSFER_SRC | SAMPLED feature bits (filtered to single plane); - the compositor's accepted modifiers, fetched via a new ghostty_platform_vulkan_s.get_supported_modifiers callback. First non-LINEAR hit wins (vendor-tiled is the perf path on every modern GPU); LINEAR is the fallback; legacy_copy stays the floor. NVIDIA RTX 2080 + Vulkan 1.4.329 verified: Target now picks DRM_FORMAT_MOD_NVIDIA_* (0x300000000606015), Target.tiling=.direct, image_backed=1, dmabuf flows through wl_subsurface without protocol errors. Where Phase 1 left NVIDIA at legacy_copy + QImage, this lands the full zero-copy path. The new callback's data source is the zwp_linux_dmabuf_v1 format/ modifier events. SubsurfacePresenter.cpp's globals discovery now listens for those events during its private-queue roundtrip (two roundtrips: bind, then collect events) and caches them in a process-wide (format → modifiers) table. Host::instance() eagerly primes this on the GUI thread so the renderer-thread callback is a lock-free read of an immutable map. Renderer changes: - Target.pickModifier replaces the LINEAR-only probe; intersects host ∩ GPU, preferring non-LINEAR single-plane modifiers. - Target.initDirect now switches create-info variants by chosen modifier: EXPLICIT for LINEAR (we know rowPitch), LIST for vendor-tiled (driver picks opaque layout, we query back via vkGetImageDrmFormatModifierPropertiesEXT and vkGetImageSubresourceLayout). - Direct-mode memory switches to DEVICE_LOCAL — image_backed=true means the host won't mmap, so we no longer need HOST_VISIBLE (and many drivers won't expose HOST_VISIBLE bits for tiled exportable images anyway). - Device.zig adds vkGetImageDrmFormatModifierPropertiesEXT and vkGetPhysicalDeviceFormatProperties2 to the dispatch table. Host changes: - qt/src/vulkan/Host.cpp adds VK_EXT_image_drm_format_modifier to kRequiredDeviceExtensions so the device-level proc-addr lookup for vkGetImageDrmFormatModifierPropertiesEXT actually resolves. - wl_compositor bound at version min(advertised, 6) so the child wl_surface supports set_buffer_scale (added in v3). Guarded the set_buffer_scale call by wl_proxy_get_version for older compositors. Co-Authored-By: claude-flow <ruv@ruv.net>pull/12846/head
parent
9a7a31ac37
commit
33560fe83e
|
|
@ -513,6 +513,26 @@ typedef struct {
|
|||
void* (*queue)(void* userdata); // VkQueue
|
||||
uint32_t (*queue_family_index)(void* userdata);
|
||||
|
||||
// Compositor-supported DRM modifiers for a given DRM_FORMAT_*
|
||||
// fourcc, as advertised by linux-dmabuf-v1's `modifier` events.
|
||||
// libghostty intersects this with what its physical device
|
||||
// supports for COLOR_ATTACHMENT to pick a tiling that the
|
||||
// compositor will actually accept on attach. Without this
|
||||
// intersection, drivers that don't expose COLOR_ATTACHMENT for
|
||||
// the LINEAR modifier (NVIDIA) can't use the direct-export path
|
||||
// and fall back to a CPU-readback path.
|
||||
//
|
||||
// Two-pass usage: call with `out=NULL, capacity=0` to query the
|
||||
// total count; allocate; call again to fill. Returns the number
|
||||
// of modifiers actually written (capped at `capacity`). May
|
||||
// return 0 if the format isn't compositor-supported or the host
|
||||
// doesn't speak linux-dmabuf-v1.
|
||||
size_t (*get_supported_modifiers)(
|
||||
void* userdata,
|
||||
uint32_t drm_format,
|
||||
uint64_t* out,
|
||||
size_t capacity);
|
||||
|
||||
// Hand off a rendered frame to the host as a dmabuf fd. The host
|
||||
// imports it (e.g. into Qt's RHI as a QRhiTexture, or attaches to
|
||||
// a wl_subsurface via linux-dmabuf-v1) and composites.
|
||||
|
|
|
|||
|
|
@ -9,6 +9,8 @@
|
|||
#include <optional>
|
||||
#include <vector>
|
||||
|
||||
#include "../wayland/SubsurfacePresenter.h"
|
||||
|
||||
namespace vulkan {
|
||||
|
||||
// Forward declaration of the entry point in `GhosttySurface.cpp` that
|
||||
|
|
@ -30,6 +32,13 @@ namespace {
|
|||
constexpr const char *kRequiredDeviceExtensions[] = {
|
||||
"VK_KHR_external_memory_fd",
|
||||
"VK_EXT_external_memory_dma_buf",
|
||||
// Needed so libghostty can allocate render images with a chosen
|
||||
// DRM modifier (vendor-tiled where supported) and query the
|
||||
// driver-chosen layout back via
|
||||
// `vkGetImageDrmFormatModifierPropertiesEXT`. Without it on the
|
||||
// host's VkDevice, the device-level proc-addr lookup for that
|
||||
// function returns null and Target.init fails.
|
||||
"VK_EXT_image_drm_format_modifier",
|
||||
};
|
||||
|
||||
bool hasRequiredExtensions(VkPhysicalDevice pd) {
|
||||
|
|
@ -108,6 +117,15 @@ uint32_t cbQueueFamilyIndex(void *ud) {
|
|||
return host != nullptr ? host->vkQueueFamilyIndex() : 0;
|
||||
}
|
||||
|
||||
size_t cbGetSupportedModifiers(void *ud, uint32_t drm_format,
|
||||
uint64_t *out, size_t capacity) {
|
||||
(void)ud;
|
||||
// Always-safe read: the registry was primed eagerly on the GUI
|
||||
// thread when Host::instance() first ran, so any renderer-thread
|
||||
// call sees a fully-populated immutable table.
|
||||
return ::wayland::supportedDmabufModifiers(drm_format, out, capacity);
|
||||
}
|
||||
|
||||
void cbPresent(
|
||||
void *ud,
|
||||
int dmabuf_fd,
|
||||
|
|
@ -229,6 +247,7 @@ ghostty_platform_vulkan_s Host::asPlatform(void *surface_userdata) const {
|
|||
p.device = cbDevice;
|
||||
p.queue = cbQueue;
|
||||
p.queue_family_index = cbQueueFamilyIndex;
|
||||
p.get_supported_modifiers = cbGetSupportedModifiers;
|
||||
p.present = cbPresent;
|
||||
return p;
|
||||
}
|
||||
|
|
@ -243,6 +262,14 @@ Host *Host::instance() {
|
|||
}
|
||||
// candidate's destructor runs on init failure and cleans up
|
||||
// any partial state.
|
||||
|
||||
// Eagerly prime the dmabuf modifier registry while we're
|
||||
// guaranteed to be on the GUI thread (Host::instance is called
|
||||
// from GhosttySurface's ctor before the renderer thread spawns).
|
||||
// From here on, `wayland::supportedDmabufModifiers` is a
|
||||
// lock-free read of an immutable table, safe to call from the
|
||||
// renderer thread via `cbGetSupportedModifiers`.
|
||||
::wayland::primeDmabufModifierRegistry();
|
||||
});
|
||||
return host.get();
|
||||
}
|
||||
|
|
|
|||
|
|
@ -1,7 +1,10 @@
|
|||
#include "SubsurfacePresenter.h"
|
||||
|
||||
#include <algorithm>
|
||||
#include <cstdio>
|
||||
#include <cstring>
|
||||
#include <unordered_map>
|
||||
#include <vector>
|
||||
|
||||
#include <QGuiApplication>
|
||||
#include <QLatin1String>
|
||||
|
|
@ -16,23 +19,61 @@ namespace wayland {
|
|||
|
||||
namespace {
|
||||
|
||||
// Process-wide bindings for the Wayland globals the presenter needs.
|
||||
// Lazily discovered on first `tryCreate`, mirrors the `blurManager`
|
||||
// pattern in `qt/src/WindowBlur.cpp` — registry roundtrip happens on
|
||||
// a private event queue so we never dispatch Qt's own Wayland events.
|
||||
// Process-wide bindings for the Wayland globals the presenter needs,
|
||||
// plus the (format → modifiers) table the compositor advertises via
|
||||
// zwp_linux_dmabuf_v1's format/modifier events. Populated once by
|
||||
// `discoverGlobals` on the GUI thread; subsequent reads from the
|
||||
// renderer thread are safe because the table is never mutated after
|
||||
// the initial discovery completes.
|
||||
struct PresenterGlobals {
|
||||
wl_compositor *compositor = nullptr;
|
||||
wl_subcompositor *subcompositor = nullptr;
|
||||
zwp_linux_dmabuf_v1 *dmabuf = nullptr;
|
||||
std::unordered_map<uint32_t, std::vector<uint64_t>> modifiers;
|
||||
bool searched = false;
|
||||
};
|
||||
|
||||
PresenterGlobals &globalState() {
|
||||
static PresenterGlobals g;
|
||||
return g;
|
||||
}
|
||||
|
||||
// Pre-v4 dmabuf format event. We ignore it: v3 also fires `modifier`
|
||||
// events for every (format, modifier) tuple including LINEAR — the
|
||||
// `format` event is legacy from v1/v2 when modifiers didn't exist.
|
||||
void dmabufFormat(void *, zwp_linux_dmabuf_v1 *, uint32_t /*format*/) {}
|
||||
|
||||
// `modifier` event: compositor advertises one (format, modifier) it
|
||||
// can scan out. Fires once per pair during the bind roundtrip; we
|
||||
// stash them all in the per-format vector. Duplicate-keyed inserts
|
||||
// are theoretically possible across compositor restarts but won't
|
||||
// happen within a single bind round, so we don't dedupe.
|
||||
void dmabufModifier(void *data, zwp_linux_dmabuf_v1 *, uint32_t format,
|
||||
uint32_t modifier_hi, uint32_t modifier_lo) {
|
||||
auto *g = static_cast<PresenterGlobals *>(data);
|
||||
const uint64_t modifier =
|
||||
(static_cast<uint64_t>(modifier_hi) << 32) | modifier_lo;
|
||||
g->modifiers[format].push_back(modifier);
|
||||
}
|
||||
|
||||
const zwp_linux_dmabuf_v1_listener kDmabufListener = {
|
||||
dmabufFormat,
|
||||
dmabufModifier,
|
||||
};
|
||||
|
||||
void registryGlobal(void *data, wl_registry *registry, uint32_t name,
|
||||
const char *interface, uint32_t /*version*/) {
|
||||
const char *interface, uint32_t version) {
|
||||
auto *g = static_cast<PresenterGlobals *>(data);
|
||||
if (std::strcmp(interface, wl_compositor_interface.name) == 0) {
|
||||
// Bind wl_compositor at version 3+ so child wl_surfaces we
|
||||
// create support `set_buffer_scale` (added in v3, used by the
|
||||
// presenter on HiDPI displays). Cap at v6 (the highest we've
|
||||
// tested against); if the compositor advertises less, take
|
||||
// what we get and `presentDmabuf` will skip the buffer_scale
|
||||
// call on those compositors.
|
||||
const uint32_t v = std::min<uint32_t>(version, 6u);
|
||||
g->compositor = static_cast<wl_compositor *>(
|
||||
wl_registry_bind(registry, name, &wl_compositor_interface, 1));
|
||||
wl_registry_bind(registry, name, &wl_compositor_interface, v));
|
||||
} else if (std::strcmp(interface, wl_subcompositor_interface.name) == 0) {
|
||||
g->subcompositor = static_cast<wl_subcompositor *>(
|
||||
wl_registry_bind(registry, name, &wl_subcompositor_interface, 1));
|
||||
|
|
@ -44,6 +85,9 @@ void registryGlobal(void *data, wl_registry *registry, uint32_t name,
|
|||
// dynamic format/modifier feedback dance; we don't need it yet.
|
||||
g->dmabuf = static_cast<zwp_linux_dmabuf_v1 *>(wl_registry_bind(
|
||||
registry, name, &zwp_linux_dmabuf_v1_interface, 3));
|
||||
// Add the listener immediately so the modifier events queued by
|
||||
// the bind get delivered when the dispatch loop continues.
|
||||
zwp_linux_dmabuf_v1_add_listener(g->dmabuf, &kDmabufListener, g);
|
||||
}
|
||||
}
|
||||
void registryGlobalRemove(void *, wl_registry *, uint32_t) {}
|
||||
|
|
@ -54,7 +98,7 @@ const wl_registry_listener kRegistryListener = {
|
|||
};
|
||||
|
||||
PresenterGlobals *discoverGlobals(wl_display *display) {
|
||||
static PresenterGlobals globals;
|
||||
PresenterGlobals &globals = globalState();
|
||||
if (globals.searched) return &globals;
|
||||
globals.searched = true;
|
||||
|
||||
|
|
@ -62,8 +106,24 @@ PresenterGlobals *discoverGlobals(wl_display *display) {
|
|||
wl_registry *registry = wl_display_get_registry(display);
|
||||
wl_proxy_set_queue(reinterpret_cast<wl_proxy *>(registry), queue);
|
||||
wl_registry_add_listener(registry, &kRegistryListener, &globals);
|
||||
// Roundtrip 1: bind compositor/subcompositor/dmabuf. Inside the
|
||||
// registry callback we attach the dmabuf listener immediately, so
|
||||
// any format/modifier events that arrive in the same dispatch
|
||||
// pass fire on it.
|
||||
wl_display_roundtrip_queue(display, queue);
|
||||
wl_registry_destroy(registry);
|
||||
// Roundtrip 2: belt-and-suspenders for any compositor that defers
|
||||
// the modifier events past the bind reply (most don't, but some
|
||||
// batch them). After this returns the modifier table is fully
|
||||
// populated and frozen for the process lifetime.
|
||||
if (globals.dmabuf) wl_display_roundtrip_queue(display, queue);
|
||||
|
||||
std::size_t total_mods = 0;
|
||||
for (const auto &kv : globals.modifiers) total_mods += kv.second.size();
|
||||
std::fprintf(stderr,
|
||||
"[ghastty] wayland: discovered %zu dmabuf (format,modifier) "
|
||||
"pairs across %zu formats\n",
|
||||
total_mods, globals.modifiers.size());
|
||||
|
||||
// Move the bound proxies back to the default queue so Qt's main
|
||||
// dispatch drives subsequent events on them, then drop the private
|
||||
|
|
@ -81,6 +141,15 @@ PresenterGlobals *discoverGlobals(wl_display *display) {
|
|||
return &globals;
|
||||
}
|
||||
|
||||
wl_display *acquireWaylandDisplay() {
|
||||
if (!QGuiApplication::platformName().startsWith(QLatin1String("wayland")))
|
||||
return nullptr;
|
||||
QPlatformNativeInterface *native = QGuiApplication::platformNativeInterface();
|
||||
if (!native) return nullptr;
|
||||
return static_cast<wl_display *>(
|
||||
native->nativeResourceForIntegration("wl_display"));
|
||||
}
|
||||
|
||||
// wl_buffer::release listener: the compositor is done sampling the
|
||||
// buffer for any committed surface state, so we can destroy our
|
||||
// client-side handle. The underlying dmabuf memory is owned by
|
||||
|
|
@ -96,6 +165,26 @@ const wl_buffer_listener kBufferListener = {
|
|||
|
||||
} // namespace
|
||||
|
||||
void primeDmabufModifierRegistry() {
|
||||
if (wl_display *display = acquireWaylandDisplay()) {
|
||||
(void)discoverGlobals(display);
|
||||
}
|
||||
}
|
||||
|
||||
std::size_t supportedDmabufModifiers(std::uint32_t drm_format,
|
||||
std::uint64_t *out,
|
||||
std::size_t capacity) {
|
||||
const PresenterGlobals &g = globalState();
|
||||
if (!g.searched) return 0;
|
||||
auto it = g.modifiers.find(drm_format);
|
||||
if (it == g.modifiers.end()) return 0;
|
||||
const std::size_t available = it->second.size();
|
||||
if (out == nullptr || capacity == 0) return available;
|
||||
const std::size_t copied = std::min(available, capacity);
|
||||
std::memcpy(out, it->second.data(), copied * sizeof(std::uint64_t));
|
||||
return copied;
|
||||
}
|
||||
|
||||
std::unique_ptr<SubsurfacePresenter>
|
||||
SubsurfacePresenter::tryCreate(QWindow *parent) {
|
||||
if (!parent) return nullptr;
|
||||
|
|
@ -223,7 +312,11 @@ void SubsurfacePresenter::presentDmabuf(int fd, uint32_t drm_format,
|
|||
// is harmless but the compositor's bookkeeping is cheaper if we
|
||||
// skip the redundant request.
|
||||
if (buffer_scale != m_lastBufferScale) {
|
||||
wl_surface_set_buffer_scale(m_childSurface, buffer_scale);
|
||||
// set_buffer_scale was added in wl_surface v3; guard against
|
||||
// older compositors that bind us at v1/v2 (rare but possible).
|
||||
if (wl_proxy_get_version(reinterpret_cast<wl_proxy *>(m_childSurface)) >= 3) {
|
||||
wl_surface_set_buffer_scale(m_childSurface, buffer_scale);
|
||||
}
|
||||
m_lastBufferScale = buffer_scale;
|
||||
}
|
||||
|
||||
|
|
|
|||
|
|
@ -6,6 +6,15 @@
|
|||
// subsurface. The compositor scans the buffers out directly — no
|
||||
// mmap, no memcpy, no QImage, no QPainter blit on the present path.
|
||||
//
|
||||
// Also exposes the process-wide compositor modifier registry
|
||||
// (`primeDmabufModifierRegistry` / `supportedDmabufModifiers`)
|
||||
// learned from zwp_linux_dmabuf_v1's format/modifier events.
|
||||
// libghostty's Vulkan renderer queries this via the
|
||||
// `get_supported_modifiers` platform callback to pick a modifier
|
||||
// the compositor will actually accept — without that intersection,
|
||||
// drivers that don't expose COLOR_ATTACHMENT for LINEAR (NVIDIA)
|
||||
// can't get into Target's direct-export mode at all.
|
||||
//
|
||||
// Wayland-only by project decision (the Qt frontend is Wayland-only;
|
||||
// see `feedback-qt-no-x11` memory). If the host isn't on a Wayland
|
||||
// QPA platform or the compositor lacks the required globals,
|
||||
|
|
@ -14,6 +23,7 @@
|
|||
|
||||
#pragma once
|
||||
|
||||
#include <cstddef>
|
||||
#include <cstdint>
|
||||
#include <memory>
|
||||
|
||||
|
|
@ -25,6 +35,28 @@ class QWindow;
|
|||
|
||||
namespace wayland {
|
||||
|
||||
// Eagerly discover the compositor's globals (incl. the
|
||||
// zwp_linux_dmabuf_v1 format/modifier list) on the calling thread.
|
||||
// MUST be called from the GUI thread before any
|
||||
// `supportedDmabufModifiers` reader runs (the renderer thread). Safe
|
||||
// to call multiple times — discovery happens exactly once.
|
||||
//
|
||||
// Idempotent no-op if the QPA isn't Wayland or the
|
||||
// QPlatformNativeInterface lookup fails.
|
||||
void primeDmabufModifierRegistry();
|
||||
|
||||
// Read the cached compositor-supported DRM modifiers for the given
|
||||
// DRM_FORMAT_* fourcc. Returns the number of modifiers actually
|
||||
// written to `out` (capped at `capacity`). Pass `out=nullptr,
|
||||
// capacity=0` to query the total count.
|
||||
//
|
||||
// Thread-safe for readers once `primeDmabufModifierRegistry` has
|
||||
// returned. Returns 0 if the registry hasn't been primed yet or the
|
||||
// format isn't advertised.
|
||||
std::size_t supportedDmabufModifiers(std::uint32_t drm_format,
|
||||
std::uint64_t *out,
|
||||
std::size_t capacity);
|
||||
|
||||
class SubsurfacePresenter {
|
||||
public:
|
||||
// Build a subsurface parented to `parent`'s native `wl_surface`,
|
||||
|
|
|
|||
|
|
@ -424,6 +424,20 @@ pub const Platform = union(PlatformTag) {
|
|||
queue: *const fn (?*anyopaque) callconv(.c) ?*anyopaque,
|
||||
queue_family_index: *const fn (?*anyopaque) callconv(.c) u32,
|
||||
|
||||
/// Query the compositor-supported DRM modifiers for a given
|
||||
/// DRM_FORMAT_* fourcc. Two-pass usage: call with
|
||||
/// `out=null, capacity=0` for the count, then again with a
|
||||
/// buffer of that size. Returns the number of modifiers
|
||||
/// actually written. The renderer intersects this with the
|
||||
/// GPU's per-modifier feature set to pick a tiling the
|
||||
/// compositor will accept on attach.
|
||||
get_supported_modifiers: *const fn (
|
||||
?*anyopaque,
|
||||
u32, // DRM_FORMAT_*
|
||||
?[*]u64, // out
|
||||
usize, // capacity
|
||||
) callconv(.c) usize,
|
||||
|
||||
/// Hand off a rendered frame to the host as a dmabuf fd. The
|
||||
/// host imports it for composition; libghostty retains
|
||||
/// ownership of the underlying VkDeviceMemory and the fd is
|
||||
|
|
@ -479,6 +493,12 @@ pub const Platform = union(PlatformTag) {
|
|||
device: ?*const fn (?*anyopaque) callconv(.c) ?*anyopaque,
|
||||
queue: ?*const fn (?*anyopaque) callconv(.c) ?*anyopaque,
|
||||
queue_family_index: ?*const fn (?*anyopaque) callconv(.c) u32,
|
||||
get_supported_modifiers: ?*const fn (
|
||||
?*anyopaque,
|
||||
u32,
|
||||
?[*]u64,
|
||||
usize,
|
||||
) callconv(.c) usize,
|
||||
present: ?*const fn (
|
||||
?*anyopaque,
|
||||
i32,
|
||||
|
|
@ -541,6 +561,8 @@ pub const Platform = union(PlatformTag) {
|
|||
break :vulkan error.QueueMustBeSet,
|
||||
.queue_family_index = config.queue_family_index orelse
|
||||
break :vulkan error.QueueFamilyIndexMustBeSet,
|
||||
.get_supported_modifiers = config.get_supported_modifiers orelse
|
||||
break :vulkan error.GetSupportedModifiersMustBeSet,
|
||||
.present = config.present orelse
|
||||
break :vulkan error.PresentMustBeSet,
|
||||
} };
|
||||
|
|
|
|||
|
|
@ -163,6 +163,11 @@ pub const Dispatch = struct {
|
|||
// device-level resolution like any other device function.
|
||||
getMemoryFdKHR: std.meta.Child(vk.PFN_vkGetMemoryFdKHR),
|
||||
getImageSubresourceLayout: std.meta.Child(vk.PFN_vkGetImageSubresourceLayout),
|
||||
/// From `VK_EXT_image_drm_format_modifier`. Used by
|
||||
/// `vulkan/Target.zig` after creating an image with the LIST
|
||||
/// variant of the modifier create-info to discover which
|
||||
/// modifier the driver actually chose.
|
||||
getImageDrmFormatModifierPropertiesEXT: std.meta.Child(vk.PFN_vkGetImageDrmFormatModifierPropertiesEXT),
|
||||
|
||||
// Per-frame sync (fence + command-buffer reset) — used by
|
||||
// `vulkan/Frame.zig`.
|
||||
|
|
@ -466,6 +471,8 @@ pub fn init(
|
|||
try dl.load(vk.PFN_vkGetMemoryFdKHR, "vkGetMemoryFdKHR");
|
||||
const get_image_subresource_layout =
|
||||
try dl.load(vk.PFN_vkGetImageSubresourceLayout, "vkGetImageSubresourceLayout");
|
||||
const get_image_drm_format_modifier_properties_ext =
|
||||
try dl.load(vk.PFN_vkGetImageDrmFormatModifierPropertiesEXT, "vkGetImageDrmFormatModifierPropertiesEXT");
|
||||
const create_fence =
|
||||
try dl.load(vk.PFN_vkCreateFence, "vkCreateFence");
|
||||
const destroy_fence =
|
||||
|
|
@ -557,6 +564,7 @@ pub fn init(
|
|||
.destroyPipeline = destroy_pipeline,
|
||||
.getMemoryFdKHR = get_memory_fd_khr,
|
||||
.getImageSubresourceLayout = get_image_subresource_layout,
|
||||
.getImageDrmFormatModifierPropertiesEXT = get_image_drm_format_modifier_properties_ext,
|
||||
.createFence = create_fence,
|
||||
.destroyFence = destroy_fence,
|
||||
.waitForFences = wait_for_fences,
|
||||
|
|
|
|||
|
|
@ -148,34 +148,73 @@ pub fn init(opts: Options) Error!Self {
|
|||
vk.VK_FORMAT_FEATURE_TRANSFER_SRC_BIT |
|
||||
vk.VK_FORMAT_FEATURE_SAMPLED_IMAGE_BIT;
|
||||
|
||||
if (try probeLinearModifierSupported(dev, opts.format, required_features)) {
|
||||
const picked = try pickModifier(dev, opts.format, drm_format, required_features);
|
||||
if (picked) |m| {
|
||||
const tag: []const u8 = if (m == DRM_FORMAT_MOD_LINEAR)
|
||||
"LINEAR"
|
||||
else
|
||||
"vendor-tiled";
|
||||
log.info(
|
||||
"Target: direct dmabuf export (LINEAR modifier) {}x{}",
|
||||
.{ opts.width, opts.height },
|
||||
"Target: direct dmabuf export ({s} modifier 0x{x}) {}x{}",
|
||||
.{ tag, m, opts.width, opts.height },
|
||||
);
|
||||
return try initDirect(opts, drm_format);
|
||||
} else {
|
||||
log.warn(
|
||||
"Target: LINEAR modifier lacks COLOR_ATTACHMENT support; " ++
|
||||
"falling back to OPTIMAL render + LINEAR-buffer copy",
|
||||
.{},
|
||||
);
|
||||
return try initLegacyCopy(opts, drm_format);
|
||||
return try initDirect(opts, drm_format, m);
|
||||
}
|
||||
log.warn(
|
||||
"Target: no usable single-plane modifier with COLOR_ATTACHMENT " ++
|
||||
"in compositor ∩ GPU intersection; falling back to " ++
|
||||
"OPTIMAL render + LINEAR-buffer copy",
|
||||
.{},
|
||||
);
|
||||
return try initLegacyCopy(opts, drm_format);
|
||||
}
|
||||
|
||||
/// Ask the driver, via `VK_EXT_image_drm_format_modifier`'s
|
||||
/// per-modifier feature list, whether `DRM_FORMAT_MOD_LINEAR`
|
||||
/// supports the format-feature flags we need to use the image as a
|
||||
/// color attachment + transfer source + sampled.
|
||||
fn probeLinearModifierSupported(
|
||||
/// Intersect the compositor's accepted modifier list (from the host
|
||||
/// callback) with the GPU's supported modifiers for `format` (queried
|
||||
/// via `VK_EXT_image_drm_format_modifier`), filtered by single-plane
|
||||
/// + the required format-feature flags. Prefer the first non-LINEAR
|
||||
/// hit (vendor-tiled — NVIDIA block-linear, AMD DCC variants, Intel
|
||||
/// Y-tiled; these are where the perf win lives on most hardware).
|
||||
/// Fall back to LINEAR if it's in the intersection. Return null when
|
||||
/// no modifier qualifies — the caller drops to `.legacy_copy`.
|
||||
///
|
||||
/// Why both intersections matter:
|
||||
/// - GPU-only: passes on AMD/Intel for LINEAR but NVIDIA never
|
||||
/// exposes COLOR_ATTACHMENT for LINEAR — direct mode would
|
||||
/// create the image OK but rasterize nothing.
|
||||
/// - Compositor-only: GPU may not be able to render into the
|
||||
/// compositor's preferred tilings (drivers don't always expose
|
||||
/// COLOR_ATTACHMENT for every modifier).
|
||||
fn pickModifier(
|
||||
dev: *const Device,
|
||||
format: vk.VkFormat,
|
||||
drm_format: u32,
|
||||
required_features: vk.VkFormatFeatureFlags,
|
||||
) Error!bool {
|
||||
var mods: [MAX_MODIFIERS]vk.VkDrmFormatModifierPropertiesEXT = undefined;
|
||||
) Error!?u64 {
|
||||
// Compositor side: ask the host what it will accept on attach.
|
||||
// Two-pass query (NULL out + capacity 0 returns count). Empty
|
||||
// result means the compositor doesn't speak linux-dmabuf-v1 or
|
||||
// doesn't advertise this format — direct mode would still likely
|
||||
// work for AMD/Intel LINEAR but the compositor attach would
|
||||
// fail, so treat it as "no intersection."
|
||||
var host_mods: [MAX_MODIFIERS]u64 = undefined;
|
||||
const host_count = dev.platform.get_supported_modifiers(
|
||||
dev.platform.userdata,
|
||||
drm_format,
|
||||
&host_mods,
|
||||
MAX_MODIFIERS,
|
||||
);
|
||||
if (host_count == 0) {
|
||||
log.warn(
|
||||
"host advertises no dmabuf modifiers for format 0x{x}; " ++
|
||||
"cannot use direct mode",
|
||||
.{drm_format},
|
||||
);
|
||||
return null;
|
||||
}
|
||||
|
||||
// First pass: get count.
|
||||
// GPU side: enumerate modifiers + their per-modifier feature bits.
|
||||
var gpu_mods: [MAX_MODIFIERS]vk.VkDrmFormatModifierPropertiesEXT = undefined;
|
||||
var mod_list: vk.VkDrmFormatModifierPropertiesListEXT = .{
|
||||
.sType = vk.VK_STRUCTURE_TYPE_DRM_FORMAT_MODIFIER_PROPERTIES_LIST_EXT,
|
||||
.pNext = null,
|
||||
|
|
@ -192,43 +231,64 @@ fn probeLinearModifierSupported(
|
|||
format,
|
||||
&props2,
|
||||
);
|
||||
|
||||
if (mod_list.drmFormatModifierCount == 0) return false;
|
||||
if (mod_list.drmFormatModifierCount == 0) return null;
|
||||
if (mod_list.drmFormatModifierCount > MAX_MODIFIERS) {
|
||||
// Cap to our stack buffer; we only look for LINEAR (which
|
||||
// tends to be first or close to it), so a truncation here is
|
||||
// very unlikely to hide it. Log if we ever hit this.
|
||||
log.warn(
|
||||
"modifier list truncated: driver reports {}, MAX_MODIFIERS={}",
|
||||
"GPU modifier list truncated: driver reports {}, MAX_MODIFIERS={}",
|
||||
.{ mod_list.drmFormatModifierCount, MAX_MODIFIERS },
|
||||
);
|
||||
mod_list.drmFormatModifierCount = MAX_MODIFIERS;
|
||||
}
|
||||
|
||||
// Second pass: fill list.
|
||||
mod_list.pDrmFormatModifierProperties = &mods[0];
|
||||
mod_list.pDrmFormatModifierProperties = &gpu_mods[0];
|
||||
dev.dispatch.getPhysicalDeviceFormatProperties2(
|
||||
dev.physical_device,
|
||||
format,
|
||||
&props2,
|
||||
);
|
||||
|
||||
for (mods[0..mod_list.drmFormatModifierCount]) |m| {
|
||||
if (m.drmFormatModifier != DRM_FORMAT_MOD_LINEAR) continue;
|
||||
// Single-plane only — multi-plane modifiers need a wider
|
||||
// present-callback ABI (one fd/offset/stride per plane).
|
||||
if (m.drmFormatModifierPlaneCount != 1) continue;
|
||||
if ((m.drmFormatModifierTilingFeatures & required_features) == required_features) {
|
||||
return true;
|
||||
var has_linear: bool = false;
|
||||
var best_tiled: ?u64 = null;
|
||||
for (gpu_mods[0..mod_list.drmFormatModifierCount]) |gm| {
|
||||
// Single-plane only: present callback ABI passes one fd /
|
||||
// offset / stride. Multi-plane (AMD AFBC, some video
|
||||
// formats) needs a wider ABI.
|
||||
if (gm.drmFormatModifierPlaneCount != 1) continue;
|
||||
if ((gm.drmFormatModifierTilingFeatures & required_features) != required_features) continue;
|
||||
// Intersect with what the compositor accepts.
|
||||
var compositor_ok = false;
|
||||
for (host_mods[0..host_count]) |hm| {
|
||||
if (hm == gm.drmFormatModifier) {
|
||||
compositor_ok = true;
|
||||
break;
|
||||
}
|
||||
}
|
||||
if (!compositor_ok) continue;
|
||||
if (gm.drmFormatModifier == DRM_FORMAT_MOD_LINEAR) {
|
||||
has_linear = true;
|
||||
} else if (best_tiled == null) {
|
||||
best_tiled = gm.drmFormatModifier;
|
||||
}
|
||||
}
|
||||
return false;
|
||||
|
||||
if (best_tiled) |m| return m;
|
||||
if (has_linear) return DRM_FORMAT_MOD_LINEAR;
|
||||
return null;
|
||||
}
|
||||
|
||||
/// `.direct` mode: allocate the render image with
|
||||
/// `VkImageDrmFormatModifierExplicitCreateInfoEXT` and export its own
|
||||
/// memory as the dmabuf.
|
||||
fn initDirect(opts: Options, drm_format: u32) Error!Self {
|
||||
/// `VK_EXT_image_drm_format_modifier` so its own memory can be
|
||||
/// exported as the dmabuf. Two create-info variants depending on
|
||||
/// the chosen modifier:
|
||||
/// - LINEAR: EXPLICIT layout (we know rowPitch = width*bpp).
|
||||
/// Lets us populate `stride` deterministically without a
|
||||
/// post-create driver query.
|
||||
/// - non-LINEAR (vendor-tiled): LIST with a single-modifier list.
|
||||
/// The driver picks the only option and computes its own
|
||||
/// internal layout; we recover the chosen modifier via
|
||||
/// `vkGetImageDrmFormatModifierPropertiesEXT` (sanity check —
|
||||
/// it should equal `chosen_mod`) and the per-plane layout via
|
||||
/// `vkGetImageSubresourceLayout` for the right `stride` value.
|
||||
fn initDirect(opts: Options, drm_format: u32, chosen_mod: u64) Error!Self {
|
||||
const dev = opts.device;
|
||||
|
||||
const image_usage = @as(vk.VkImageUsageFlags, vk.VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT) |
|
||||
|
|
@ -236,11 +296,10 @@ fn initDirect(opts: Options, drm_format: u32) Error!Self {
|
|||
vk.VK_IMAGE_USAGE_TRANSFER_SRC_BIT |
|
||||
opts.extra_usage;
|
||||
|
||||
// BGRA8, single-plane LINEAR — rowPitch is just width * bpp.
|
||||
const bytes_per_pixel: u32 = 4;
|
||||
const row_pitch: vk.VkDeviceSize = @as(vk.VkDeviceSize, opts.width) * bytes_per_pixel;
|
||||
|
||||
// ---- 1. Image: LINEAR-modifier, externally-shareable -----------
|
||||
// ---- 1. Image: modifier-aware, externally-shareable -----------
|
||||
const plane_layout: vk.VkSubresourceLayout = .{
|
||||
.offset = 0,
|
||||
.size = 0, // ignored for EXPLICIT create-info
|
||||
|
|
@ -248,16 +307,30 @@ fn initDirect(opts: Options, drm_format: u32) Error!Self {
|
|||
.arrayPitch = 0,
|
||||
.depthPitch = 0,
|
||||
};
|
||||
const mod_create: vk.VkImageDrmFormatModifierExplicitCreateInfoEXT = .{
|
||||
const explicit_create: vk.VkImageDrmFormatModifierExplicitCreateInfoEXT = .{
|
||||
.sType = vk.VK_STRUCTURE_TYPE_IMAGE_DRM_FORMAT_MODIFIER_EXPLICIT_CREATE_INFO_EXT,
|
||||
.pNext = null,
|
||||
.drmFormatModifier = DRM_FORMAT_MOD_LINEAR,
|
||||
.drmFormatModifierPlaneCount = 1,
|
||||
.pPlaneLayouts = &plane_layout,
|
||||
};
|
||||
// Single-modifier list — the driver "picks" the only option, but
|
||||
// crucially computes its own opaque internal layout for the
|
||||
// tiling, which we don't have to know.
|
||||
const list_mod = chosen_mod;
|
||||
const list_create: vk.VkImageDrmFormatModifierListCreateInfoEXT = .{
|
||||
.sType = vk.VK_STRUCTURE_TYPE_IMAGE_DRM_FORMAT_MODIFIER_LIST_CREATE_INFO_EXT,
|
||||
.pNext = null,
|
||||
.drmFormatModifierCount = 1,
|
||||
.pDrmFormatModifiers = &list_mod,
|
||||
};
|
||||
const mod_pnext: ?*const anyopaque = if (chosen_mod == DRM_FORMAT_MOD_LINEAR)
|
||||
@ptrCast(&explicit_create)
|
||||
else
|
||||
@ptrCast(&list_create);
|
||||
const ext_image_info: vk.VkExternalMemoryImageCreateInfo = .{
|
||||
.sType = vk.VK_STRUCTURE_TYPE_EXTERNAL_MEMORY_IMAGE_CREATE_INFO,
|
||||
.pNext = &mod_create,
|
||||
.pNext = mod_pnext,
|
||||
.handleTypes = vk.VK_EXTERNAL_MEMORY_HANDLE_TYPE_DMA_BUF_BIT_EXT,
|
||||
};
|
||||
const image_info: vk.VkImageCreateInfo = .{
|
||||
|
|
@ -279,37 +352,33 @@ fn initDirect(opts: Options, drm_format: u32) Error!Self {
|
|||
};
|
||||
var image: vk.VkImage = undefined;
|
||||
if (dev.dispatch.createImage(dev.device, &image_info, null, &image) != vk.VK_SUCCESS) {
|
||||
log.err("vkCreateImage (Target direct) failed", .{});
|
||||
log.err("vkCreateImage (Target direct, mod=0x{x}) failed", .{chosen_mod});
|
||||
return error.VulkanFailed;
|
||||
}
|
||||
errdefer dev.dispatch.destroyImage(dev.device, image, null);
|
||||
|
||||
// ---- 2. Image memory: exportable, host-cacheable for Qt mmap ---
|
||||
// ---- 2. Image memory: exportable ---------------------------------
|
||||
var image_reqs: vk.VkMemoryRequirements = undefined;
|
||||
dev.dispatch.getImageMemoryRequirements(dev.device, image, &image_reqs);
|
||||
|
||||
// HOST_CACHED matters: Qt's `presentVulkanDmabuf` mmaps and reads
|
||||
// every pixel into a QImage. Without HOST_CACHED, NVIDIA hands
|
||||
// back write-combining memory and that read crawls (see legacy
|
||||
// path note for the ~260 ms regression we hit). HOST_COHERENT
|
||||
// avoids explicit flushes. Fall back to uncached if cached isn't
|
||||
// available for the memory type bits the image requires.
|
||||
const host_flags_cached =
|
||||
@as(vk.VkMemoryPropertyFlags, vk.VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT) |
|
||||
vk.VK_MEMORY_PROPERTY_HOST_COHERENT_BIT |
|
||||
vk.VK_MEMORY_PROPERTY_HOST_CACHED_BIT;
|
||||
const host_flags_uncached =
|
||||
@as(vk.VkMemoryPropertyFlags, vk.VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT) |
|
||||
vk.VK_MEMORY_PROPERTY_HOST_COHERENT_BIT;
|
||||
const image_mem_idx = dev.findMemoryType(image_reqs.memoryTypeBits, host_flags_cached) orelse
|
||||
dev.findMemoryType(image_reqs.memoryTypeBits, host_flags_uncached) orelse
|
||||
{
|
||||
log.err(
|
||||
"no HOST_VISIBLE memory type for direct dmabuf image (typeBits=0x{x})",
|
||||
.{image_reqs.memoryTypeBits},
|
||||
);
|
||||
return error.NoSuitableMemoryType;
|
||||
};
|
||||
// In direct mode the host doesn't mmap the dmabuf — it imports it
|
||||
// as a 2D image into the compositor (`image_backed=true` per
|
||||
// `Target.present`). So DEVICE_LOCAL is the right choice: GPU-
|
||||
// local memory is faster for the COLOR_ATTACHMENT_OUTPUT writes,
|
||||
// and vendor-tiled modifiers often require it on drivers like
|
||||
// NVIDIA (which won't expose HOST_VISIBLE memory types for the
|
||||
// bits a tiled exportable image requires anyway).
|
||||
const image_mem_idx = dev.findMemoryType(
|
||||
image_reqs.memoryTypeBits,
|
||||
vk.VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT,
|
||||
) orelse {
|
||||
log.err(
|
||||
"no DEVICE_LOCAL memory type for direct dmabuf image " ++
|
||||
"(mod=0x{x} typeBits=0x{x})",
|
||||
.{ chosen_mod, image_reqs.memoryTypeBits },
|
||||
);
|
||||
return error.NoSuitableMemoryType;
|
||||
};
|
||||
const export_info: vk.VkExportMemoryAllocateInfo = .{
|
||||
.sType = vk.VK_STRUCTURE_TYPE_EXPORT_MEMORY_ALLOCATE_INFO,
|
||||
.pNext = null,
|
||||
|
|
@ -340,9 +409,39 @@ fn initDirect(opts: Options, drm_format: u32) Error!Self {
|
|||
const fd = try exportDmabufFd(dev, image_memory);
|
||||
errdefer std.posix.close(fd);
|
||||
|
||||
// ---- 5. Query the actual plane stride --------------------------
|
||||
// We requested rowPitch = width * 4 via EXPLICIT create-info, but
|
||||
// the driver can technically round up; ask for what we actually got.
|
||||
// ---- 5. Confirm the actual modifier + plane layout -------------
|
||||
// For non-LINEAR we used LIST create-info (one entry), so the
|
||||
// driver "picked" the only option. We query back via
|
||||
// `vkGetImageDrmFormatModifierPropertiesEXT` as a sanity check
|
||||
// and log a warning if the driver returned a different modifier
|
||||
// — that would indicate a driver bug or our list being ignored.
|
||||
var actual_mod = chosen_mod;
|
||||
if (chosen_mod != DRM_FORMAT_MOD_LINEAR) {
|
||||
var mod_props: vk.VkImageDrmFormatModifierPropertiesEXT = .{
|
||||
.sType = vk.VK_STRUCTURE_TYPE_IMAGE_DRM_FORMAT_MODIFIER_PROPERTIES_EXT,
|
||||
.pNext = null,
|
||||
.drmFormatModifier = 0,
|
||||
};
|
||||
if (dev.dispatch.getImageDrmFormatModifierPropertiesEXT(
|
||||
dev.device,
|
||||
image,
|
||||
&mod_props,
|
||||
) == vk.VK_SUCCESS) {
|
||||
actual_mod = mod_props.drmFormatModifier;
|
||||
if (actual_mod != chosen_mod) {
|
||||
log.warn(
|
||||
"driver chose modifier 0x{x}, we asked for 0x{x}",
|
||||
.{ actual_mod, chosen_mod },
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Plane 0 layout: rowPitch is what we report as `stride` to the
|
||||
// compositor. For LINEAR this is width*bpp (possibly padded).
|
||||
// For vendor-tiled formats the value is implementation-specific —
|
||||
// the compositor's GPU knows how to interpret it given the
|
||||
// modifier we report alongside.
|
||||
var subres: vk.VkImageSubresource = .{
|
||||
.aspectMask = vk.VK_IMAGE_ASPECT_MEMORY_PLANE_0_BIT_EXT,
|
||||
.mipLevel = 0,
|
||||
|
|
@ -365,7 +464,7 @@ fn initDirect(opts: Options, drm_format: u32) Error!Self {
|
|||
.height = opts.height,
|
||||
.fd = fd,
|
||||
.drm_format = drm_format,
|
||||
.drm_modifier = DRM_FORMAT_MOD_LINEAR,
|
||||
.drm_modifier = actual_mod,
|
||||
.stride = @intCast(layout.rowPitch),
|
||||
};
|
||||
}
|
||||
|
|
|
|||
Loading…
Reference in New Issue