fix(audit): pass 1 — correctness, lifetimes, ABI surface, build

Findings + fixes from /audit-code on PR #16. Both variants build
clean against Fedora 42 + Zig 0.15.2.

Critical:
- buffer_pool moved from threadlocal to process-wide mutex-
  protected. Cross-thread releases (atlas-upload thread vs
  renderer thread) used to leak every staging buffer; now the
  staging path uses Buffer.destroyImmediate which bypasses the
  pool, and the renderer path goes through release/cycle which
  is mutex-correct across splits/tabs sharing a VkDevice.
- buffer.zig grow() now routes the old VkBuffer through release
  instead of destroying it directly (the prior path raced the
  in-flight command buffer same as deinit used to).
- buffer_pool OOM fallback now waits the device idle before
  destroying pending entries — pre-fix it would tear down GPU
  resources still referenced by the in-flight command buffer.
- RenderPass.step now allocates fresh per-call descriptor sets
  from a per-frame step_pool whenever a pipeline is bound more
  than once in a single pass. vkCmdDraw reads descriptors at
  submit time, so re-updating the pipeline's static sets in
  place would silently corrupt every prior draw on the same
  pipeline (the kitty image path issues N draws on the same
  `image` pipeline with per-call vertex buffers + textures).

High:
- Pipeline.init: missing errdefer destroyPipeline after
  createGraphicsPipelines; descriptor-set allocation failure
  leaked the VkPipeline.
- shaders.zig post-shader errdefer used a single `built` counter
  that was bumped only after Module + Pipeline both succeeded.
  Split into modules_built / pipelines_built so a Pipeline.init
  failure doesn't leak the just-built VkShaderModule.
- Texture.zig: barrier/copy aspectMask was hardcoded to COLOR_BIT
  ignoring Options.aspect; depth textures would have produced
  silently invalid layout transitions.
- DescriptorPool.init: rejected at the boundary when max_sets > 0
  but every per-type cap is zero (spec violation that some
  drivers accept and others fail at allocation).
- Device.init: result-checked both vkEnumerateDeviceExtensionProperties
  calls; VK_INCOMPLETE on the fill pass now bails instead of
  passing a truncated extension list to the required-extension
  scan.
- SubsurfacePresenter: dmabuf proxy is now intentionally kept on
  the discovery queue (never re-dispatched) so post-discovery
  modifier events from compositor restarts can't race the
  renderer thread reading the modifier map.
- GhosttySurface m_eglTarget.reset(): make the owning context
  current first; the destructor's currentContext() check used to
  see the wrong context and silently leak gl framebuffer/texture
  on QPlatformSurface destruction (every QSplitter reparent /
  fullscreen toggle / monitor change).
- m_fbDpr is now std::atomic<double>; the renderer thread used
  to read a non-atomic double the GUI thread wrote.
- presentVulkanDmabuf: bound width / height / stride against
  MAX_DIM (65536) and pre-checked stride*height against SIZE_MAX
  before mmap; pre-fix `width * 4` and `stride * height` could
  wrap on hostile inputs.
- qt/CMakeLists.txt: libvulkan + vulkan/Host.cpp are now Vulkan-
  variant only, matching the documented side-by-side variant
  story; OpenGL-only systems no longer need the Vulkan loader.
- qt/CMakeLists.txt: validate GHASTTY_VARIANT cache value with
  FATAL_ERROR on unknown values.

Medium:
- Vulkan.zig deinit: per-surface tear-down now waits on this
  surface's frame fence instead of `vkDeviceWaitIdle` on the
  shared device. The final-refcount path still does the device-
  wide wait. Closing one of N tabs no longer stalls every other
  tab's GPU work.
- Frame.zig complete: queueSubmit / endCommandBuffer / fence-wait
  failures used to early-return without `frameCompleted` or
  `buffer_pool.cycle`, which hung the next drawFrame and grew
  pending buffers unboundedly. Errors now drive `health =
  .unhealthy` while the frame teardown still runs.
- shaders.zig vulkanizeGlsl now skips line/block comments and
  string literals so future shaders that mention `gl_VertexID`
  or `texture(atlas_*, ...)` in a comment don't get silently
  rewritten.
- shaders.zig processIncludes accepts whitespace between
  `#include` and `"` instead of asserting a literal prefix.
- Module.init propagates Allocator.Error from vulkanizeGlsl via
  `try` instead of conflating OOM with `error.GlslangFailed`.
- Shaders carries a `device: *const Device` field so deinit no
  longer fishes the device pointer out of an arbitrary
  modules.* sub-field.
- shadertoy.zig prefix-injection no longer crashes on a
  newline-less prefix (defensive fallback writes defines first).
- SubsurfacePresenter.presentDmabuf validates the (format,
  modifier) pair against the registry before handing it to
  create_immed; an unrecognized modifier would otherwise trigger
  a fatal wl_display protocol error and kill every window in the
  process.
- apprt/embedded.zig: collapsed 8 per-callback "MustBeSet" Vulkan
  errors into one `error.MissingVulkanCallback`, with per-callback
  diagnostic logging.
- Vulkan.zig: 2 ms QTimer polling safety-net removed (was 500
  wakeups/sec/surface for a phantom-bug); QMetaObject::invokeMethod
  is reliable and the dropped-frame counter surfaces any actual
  loss.
- GhosttySurface: Vulkan host nullptr now `std::abort()`s on the
  Vulkan variant instead of silently falling through to a GL ctor
  branch the .so doesn't support.

Low:
- Vulkan.zig device_refcount: assert > 0 before decrement.
- Target.zig host_count: `@min` clamp against MAX_MODIFIERS so a
  misbehaving host doesn't OOB-read the stack buffer.
- Target.zig stride: bounds-check rowPitch fits in u32 instead of
  panicking on `@intCast` at exotic resolutions.
- Device.findMemoryType caches MemoryProperties at init; was
  re-querying on every allocation.
- Device.waitIdle logs the VkResult on failure.
- shaders.zig: dropped dead `nextNonSpaceIsOpenParen`.
- shadertoy.zig: dropped unused `test_focus` embed.
- Module deinit alloc parameter: now used via the slice's owner
  instead of stashed `post_alloc` field; field removed.

Co-Authored-By: claude-flow <ruv@ruv.net>
pull/12846/head
ntomsic 2026-05-25 15:58:03 -05:00
parent 44d508fb9b
commit 0f80583825
18 changed files with 863 additions and 233 deletions

View File

@ -143,6 +143,15 @@ set(GHOSTTY_SO "${GHOSTTY_LIB_DIR}/ghostty-internal.so")
set(GHASTTY_VARIANT "opengl" CACHE STRING
"Renderer variant: opengl (default) or vulkan")
set_property(CACHE GHASTTY_VARIANT PROPERTY STRINGS opengl vulkan)
# Validate the cache value: STRINGS only constrains the cmake-gui
# dropdown, not the command-line. `-DGHASTTY_VARIANT=foo` would
# otherwise silently fall into the OpenGL branch below.
if(NOT GHASTTY_VARIANT STREQUAL "opengl" AND
NOT GHASTTY_VARIANT STREQUAL "vulkan")
message(FATAL_ERROR
"GHASTTY_VARIANT='${GHASTTY_VARIANT}' is invalid; "
"must be 'opengl' or 'vulkan'.")
endif()
if(GHASTTY_VARIANT STREQUAL "vulkan")
set(GHASTTY_EXE_NAME "ghastty-vulkan")
set(GHASTTY_LIB_SUBDIR "ghastty-vulkan")
@ -189,7 +198,6 @@ add_custom_target(ghostty_link DEPENDS "${GHOSTTY_LINK_SO}")
add_executable(ghastty
src/main.cpp
src/vulkan/Host.cpp
src/actions/ActionDispatcher.cpp
src/actions/ChromeActions.cpp
src/actions/InputActions.cpp
@ -226,6 +234,15 @@ add_executable(ghastty
"${FRACSCALE_HEADER}"
)
# Vulkan host glue is variant-only. Adding it to the OpenGL build
# would force an unconditional libvulkan link on a binary that
# never calls into Vulkan, contradicting the side-by-side
# `~/.local/lib/libghostty.so` story that the variant block above
# documents.
if(GHASTTY_VARIANT STREQUAL "vulkan")
target_sources(ghastty PRIVATE src/vulkan/Host.cpp)
endif()
# Embed the app icon so it is available even running from the build tree.
qt_add_resources(ghastty "appicon"
PREFIX "/"
@ -250,10 +267,18 @@ target_link_libraries(ghastty PRIVATE
PkgConfig::XKBCOMMON
PkgConfig::EGL
LayerShellQt::Interface
vulkan
"${GHOSTTY_LINK_SO}"
)
# libvulkan is Vulkan-variant only. The OpenGL variant compiles
# nothing that references Vulkan symbols (vulkan/Host.cpp is gated
# above), so not linking libvulkan keeps OpenGL-only systems from
# needing the loader installed at runtime matching the
# documented side-by-side variant story above.
if(GHASTTY_VARIANT STREQUAL "vulkan")
target_link_libraries(ghastty PRIVATE vulkan)
endif()
# Hook up the private QPA headers (see find_package above).
#
# Qt6::WaylandClientPrivate gives us QtWaylandClient::QWaylandWindow,

View File

@ -8,7 +8,9 @@
#include "SearchBar.h"
#include "TabWidget.h"
#include "Util.h"
#ifdef GHASTTY_USE_VULKAN
#include "vulkan/Host.h"
#endif
#include "wayland/EglDmabufTarget.h"
#include "wayland/SubsurfacePresenter.h"
@ -112,12 +114,43 @@ GhosttySurface::GhosttySurface(ghostty_app_t app, MainWindow *owner,
// produce a mismatch crash. Mixing GL+VK on the same process
// (e.g. NVIDIA's coexistence on one Wayland surface) is also
// reportedly fragile.
vulkan::Host *vk_host = nullptr;
#ifdef GHASTTY_USE_VULKAN
vk_host = vulkan::Host::instance();
#endif
// The "use Vulkan" decision is purely compile-time on this fork:
// each binary is linked against exactly one libghostty.so variant
// (opengl or vulkan). A runtime fallback would just mis-initialize
// the surface against the wrong renderer.
ghostty_surface_config_s sc =
m_parentSurface
? ghostty_surface_inherited_config(m_parentSurface,
GHOSTTY_SURFACE_CONTEXT_TAB)
: ghostty_surface_config_new();
if (vk_host == nullptr) {
#ifdef GHASTTY_USE_VULKAN
{
vulkan::Host *vk_host = vulkan::Host::instance();
if (vk_host == nullptr) {
// libghostty was compiled with -Drenderer=vulkan and there's
// no GL fallback available: libghostty's GL surface init
// would crash on the first call. Fail loudly here.
std::fprintf(stderr,
"[ghastty] Vulkan host bring-up failed (no Vulkan 1.3 "
"GPU with VK_KHR_external_memory_fd + "
"VK_EXT_external_memory_dma_buf). The Vulkan variant "
"of libghostty has no OpenGL fallback — exiting.\n");
std::abort();
}
m_useVulkan = true;
sc.platform_tag = GHOSTTY_PLATFORM_VULKAN;
sc.platform.vulkan = vk_host->asPlatform(this);
// GUI-thread frame delivery is driven by
// `QMetaObject::invokeMethod` (Qt::QueuedConnection) from
// `presentVulkanDmabuf`. The earlier 2 ms safety-net polling
// timer was removed once delivery was shown to be reliable;
// any genuine loss is visible via the dropped-frame counter
// logged from `presentVulkanDmabuf`.
}
#else
{
// OpenGL path: stand up the private context + offscreen FBO
// libghostty's GL renderer draws into.
m_context = new QOpenGLContext(this);
@ -140,33 +173,7 @@ GhosttySurface::GhosttySurface(ghostty_app_t app, MainWindow *owner,
fmt.setInternalTextureFormat(GL_RGBA8);
m_fbw = m_fbh = 16;
m_fbo = new QOpenGLFramebufferObject(QSize(m_fbw, m_fbh), fmt);
}
ghostty_surface_config_s sc =
m_parentSurface
? ghostty_surface_inherited_config(m_parentSurface,
GHOSTTY_SURFACE_CONTEXT_TAB)
: ghostty_surface_config_new();
if (vk_host != nullptr) {
m_useVulkan = true;
sc.platform_tag = GHOSTTY_PLATFORM_VULKAN;
sc.platform.vulkan = vk_host->asPlatform(this);
// GUI-thread frame drain. The renderer thread wakes us per frame
// via QMetaObject::invokeMethod (Qt::QueuedConnection) on each
// present — see `presentVulkanDmabuf`. The 2 ms timer is a
// safety net: if `invokeMethod` ever fails to deliver (the
// earlier QImage-handoff diagnostics suggested this could
// happen), the next tick drains the parked frame within at most
// 2 ms. Idle case has negligible CPU cost because `drainVulkan`
// returns immediately when nothing is pending.
m_vulkanPollTimer = new QTimer(this);
m_vulkanPollTimer->setInterval(2);
connect(m_vulkanPollTimer, &QTimer::timeout, this,
[this]() { drainVulkan(); });
m_vulkanPollTimer->start();
} else {
sc.platform_tag = GHOSTTY_PLATFORM_OPENGL;
sc.platform.opengl.userdata = this;
sc.platform.opengl.get_proc_address = glGetProcAddress;
@ -174,6 +181,7 @@ GhosttySurface::GhosttySurface(ghostty_app_t app, MainWindow *owner,
sc.platform.opengl.release_current = glReleaseCurrent;
sc.platform.opengl.present = glPresent;
}
#endif
sc.userdata = this;
sc.scale_factor = devicePixelRatioF();
@ -234,10 +242,12 @@ void GhosttySurface::syncSurfaceSize() {
// shave a pixel off the framebuffer relative to the QImage blit.
const int w = std::max(1, static_cast<int>(std::lround(width() * dpr)));
const int h = std::max(1, static_cast<int>(std::lround(height() * dpr)));
if (w == m_fbw && h == m_fbh && dpr == m_fbDpr) return;
if (w == m_fbw && h == m_fbh &&
dpr == m_fbDpr.load(std::memory_order_relaxed))
return;
m_fbw = w;
m_fbh = h;
m_fbDpr = dpr;
m_fbDpr.store(dpr, std::memory_order_release);
// Vulkan path: libghostty manages the target image itself (it
// allocates the dmabuf-exportable VkImage). Tell it the new
@ -405,7 +415,19 @@ bool GhosttySurface::event(QEvent *e) {
static_cast<QPlatformSurfaceEvent *>(e)->surfaceEventType();
if (type == QPlatformSurfaceEvent::SurfaceAboutToBeDestroyed) {
m_useSubsurface.store(false, std::memory_order_release);
m_eglTarget.reset();
// EglDmabufTarget's destructor deletes a GL framebuffer +
// texture allocated against `m_context`; without that
// context current its `QOpenGLContext::currentContext()`
// check sees the wrong (or no) context and silently skips
// the gl* calls, leaking the resources every time Qt
// re-creates the QPA window (QSplitter reparent, fullscreen
// toggle, screen change). Make the owning context current
// before tearing down. Vulkan-variant builds have no
// `m_context` and skip the makeCurrent.
if (m_eglTarget) {
if (m_context) makeCurrent();
m_eglTarget.reset();
}
m_subsurfacePresenter.reset();
}
// SurfaceCreated is handled implicitly: the next QEvent::Show
@ -623,7 +645,7 @@ void GhosttySurface::renderTerminal() {
// (Scaling it to the widget instead made the whole frame — images
// included — rubber-band while a resize was in flight.)
m_image = m_fbo->toImage();
m_image.setDevicePixelRatio(m_fbDpr);
m_image.setDevicePixelRatio(m_fbDpr.load(std::memory_order_acquire));
m_fbo->release();
update();
@ -1537,10 +1559,11 @@ QVariant GhosttySurface::inputMethodQuery(Qt::InputMethodQuery query) const {
ghostty_surface_cursor_position(m_surface);
// m_fbDpr defaults to 1.0 and only ever takes positive values
// from syncSurfaceSize, so dividing is always safe.
return QRect(static_cast<int>(c.x / m_fbDpr),
static_cast<int>(c.y / m_fbDpr),
std::max(1, static_cast<int>(c.width / m_fbDpr)),
std::max(1, static_cast<int>(c.height / m_fbDpr)));
const double dpr = m_fbDpr.load(std::memory_order_acquire);
return QRect(static_cast<int>(c.x / dpr),
static_cast<int>(c.y / dpr),
std::max(1, static_cast<int>(c.width / dpr)),
std::max(1, static_cast<int>(c.height / dpr)));
}
default:
return QWidget::inputMethodQuery(query);
@ -1621,8 +1644,25 @@ void GhosttySurface::presentVulkanDmabuf(
image_backed ? 1 : 0, useSubsurface ? "subsurface" : "qimage");
}
if (dmabuf_fd < 0 || width == 0 || height == 0 || stride < width * 4)
return;
// Validate the renderer-supplied dimensions. width / height /
// stride are all u32 and the multiplications below would wrap if
// they're hostile/buggy:
// - `width * 4` (the minimum acceptable stride) wraps for
// width >= 0x40000000, accepting any stride.
// - `stride * height` (the legacy mmap path's byte count) wraps
// to a small size_t when promoted on platforms where size_t
// is 32-bit, causing an under-mapped buffer that we then
// read past.
// Cap on a sane upper bound — 65536×65536 dwarfs any plausible
// terminal — and check that stride*height doesn't exceed
// SIZE_MAX before promoting.
constexpr quint32 MAX_DIM = 65536;
if (dmabuf_fd < 0 || width == 0 || height == 0) return;
if (width > MAX_DIM || height > MAX_DIM) return;
if (stride < static_cast<quint64>(width) * 4) return;
// stride*height as 64-bit and check the size_t fit explicitly.
const quint64 bytes64 = static_cast<quint64>(stride) * height;
if (bytes64 > std::numeric_limits<std::size_t>::max()) return;
// Don't park / dispatch frames while we're hidden — racing the
// renderer's final post-Hide frame past presenter.hide() is what
@ -1670,8 +1710,9 @@ void GhosttySurface::presentVulkanDmabuf(
return;
}
// Fallback: mmap + memcpy into a QImage.
const size_t bytes = static_cast<size_t>(stride) * height;
// Fallback: mmap + memcpy into a QImage. `bytes64` was computed
// and bounds-checked above.
const size_t bytes = static_cast<size_t>(bytes64);
void *mapped = ::mmap(nullptr, bytes, PROT_READ, MAP_SHARED, dmabuf_fd, 0);
if (mapped == MAP_FAILED) {
std::fprintf(stderr, "[ghastty] mmap of dmabuf fd=%d failed: %s\n",
@ -1694,7 +1735,8 @@ void GhosttySurface::presentVulkanDmabuf(
QImage owned = stamped.copy();
::munmap(mapped, bytes);
if (m_fbDpr > 0) owned.setDevicePixelRatio(m_fbDpr);
const double dpr_now = m_fbDpr.load(std::memory_order_acquire);
if (dpr_now > 0) owned.setDevicePixelRatio(dpr_now);
{
QMutexLocker lock(&m_pendingMutex);
m_pending = std::move(owned);

View File

@ -288,13 +288,14 @@ private:
bool m_useVulkan = false;
// Cross-thread frame handoff for the Vulkan path. The renderer
// thread calls `presentVulkanDmabuf` with a borrowed dmabuf fd; a
// 16 ms `QTimer` on the GUI thread drains the pending frame and
// routes it through the wl_subsurface (zero-copy) when the
// SubsurfacePresenter is available, or falls back to the
// mmap+memcpy+QImage path otherwise. The polling timer was kept
// (rather than QMetaObject::invokeMethod) because queued lambdas
// from the renderer thread were unreliable in earlier diagnostics.
// thread calls `presentVulkanDmabuf` with a borrowed dmabuf fd
// and posts a queued `drainVulkan` invocation; the GUI thread
// runs `drainVulkan` and routes the parked descriptor through
// either the wl_subsurface presenter (zero-copy) or the
// mmap+memcpy+QImage fallback. The dropped-frame counter
// (`m_droppedFrames`) surfaces any queue-loss that ever happens
// in practice — the earlier safety-net polling timer was
// removed once delivery was shown to be reliable.
//
// `m_useSubsurface` is set once on the GUI thread when the
// presenter comes up; the renderer thread reads it acquire-style
@ -318,7 +319,6 @@ private:
// null and paintEvent skips its blit.
QImage m_pending;
QMutex m_pendingMutex;
QTimer *m_vulkanPollTimer = nullptr;
// GL objects for the alpha-premultiply pass.
QOpenGLShaderProgram *m_premultProg = nullptr;
@ -326,7 +326,13 @@ private:
int m_fbw = 0; // framebuffer size, device pixels
int m_fbh = 0;
double m_fbDpr = 1.0; // DPR the framebuffer was sized at
// DPR the framebuffer was sized at. Atomic because the renderer
// thread reads it from `presentVulkanDmabuf` to tag the legacy
// QImage path while the GUI thread writes it from
// `syncSurfaceSize`. `double` writes aren't guaranteed atomic
// across threads on every architecture; std::atomic<double> uses
// CAS-loop fallbacks where needed.
std::atomic<double> m_fbDpr{1.0}; // DPR the framebuffer was sized at
QLabel *m_exitOverlay = nullptr; // "process exited" banner; lazily made
QLabel *m_keySeqOverlay = nullptr; // pending keybind chord; lazily made

View File

@ -11,7 +11,6 @@
#include "GlobalShortcuts.h"
#include "MainWindow.h"
#include "ghostty.h"
#include "vulkan/Host.h"
// True when any argv entry starts with `+` — i.e. the user invoked a
// libghostty CLI action (`+show-config`, `+list-fonts`, `+version`, …).

View File

@ -49,9 +49,13 @@ void dmabufFormat(void *, zwp_linux_dmabuf_v1 *, uint32_t /*format*/) {}
// `modifier` event: compositor advertises one (format, modifier) it
// can scan out. Fires once per pair during the bind roundtrip; we
// stash them all in the per-format vector. Duplicate-keyed inserts
// are theoretically possible across compositor restarts but won't
// happen within a single bind round, so we don't dedupe.
// stash them all in the per-format vector. Only fires from inside
// `discoverGlobals` because we keep the dmabuf proxy on a private
// queue that's never dispatched after discovery — see the queue-
// retention comment in `discoverGlobals`. That guarantee is what
// lets the renderer thread read `globals.modifiers` without a
// lock, and is also why we don't bother deduping (one bind round
// only fires each pair once).
void dmabufModifier(void *data, zwp_linux_dmabuf_v1 *, uint32_t format,
uint32_t modifier_hi, uint32_t modifier_lo) {
auto *g = static_cast<PresenterGlobals *>(data);
@ -140,21 +144,32 @@ PresenterGlobals *discoverGlobals(wl_display *display) {
// Move the bound proxies back to the default queue so Qt's main
// dispatch drives subsequent events on them, then drop the private
// queue. (Same lifecycle dance as `blurManager`.)
//
// EXCEPT the dmabuf proxy: its listener mutates `globals.modifiers`
// on every `modifier` event, and the renderer thread reads that
// map from `supportedDmabufModifiers` without locking. If we
// moved the proxy back to the default queue, a compositor
// restart / hot-plug fires more `modifier` events that would
// race the reader. Keep the proxy on `queue` and intentionally
// never dispatch that queue again — the events queue up
// harmlessly and are reaped at proxy destruction. The map is
// genuinely frozen post-discovery now.
if (globals.compositor)
wl_proxy_set_queue(reinterpret_cast<wl_proxy *>(globals.compositor),
nullptr);
if (globals.subcompositor)
wl_proxy_set_queue(reinterpret_cast<wl_proxy *>(globals.subcompositor),
nullptr);
if (globals.dmabuf)
wl_proxy_set_queue(reinterpret_cast<wl_proxy *>(globals.dmabuf), nullptr);
if (globals.viewporter)
wl_proxy_set_queue(reinterpret_cast<wl_proxy *>(globals.viewporter),
nullptr);
if (globals.fractionalScale)
wl_proxy_set_queue(reinterpret_cast<wl_proxy *>(globals.fractionalScale),
nullptr);
wl_event_queue_destroy(queue);
// We deliberately leak `queue` (and leave globals.dmabuf attached
// to it) for the process lifetime — it has no resources beyond a
// small kernel-side buffer and going away would put dmabuf events
// back on the default queue.
return &globals;
}
@ -403,6 +418,33 @@ void SubsurfacePresenter::presentDmabuf(int fd, uint32_t drm_format,
if (dest_width <= 0) dest_width = 1;
if (dest_height <= 0) dest_height = 1;
// Validate the (format, modifier) pair against the compositor's
// advertised list before handing it to `create_immed`. If the
// pair isn't on the list, the compositor will reject the
// subsequent `create_immed` with `invalid_format` — a FATAL
// protocol error that kills the entire wl_display, taking down
// every window in the process. Better to drop this single frame
// than to take down the app.
{
const PresenterGlobals &g = globalState();
const auto it = g.modifiers.find(drm_format);
bool ok = false;
if (it != g.modifiers.end()) {
for (const uint64_t m : it->second) {
if (m == drm_modifier) { ok = true; break; }
}
}
if (!ok) {
std::fprintf(stderr,
"[ghastty] SubsurfacePresenter: refusing dmabuf "
"(fourcc=0x%08x mod=0x%llx) — compositor doesn't "
"advertise this (format, modifier) pair\n",
drm_format,
static_cast<unsigned long long>(drm_modifier));
return;
}
}
// Wrap libghostty's borrowed fd in a wl_buffer.
zwp_linux_buffer_params_v1 *params =
zwp_linux_dmabuf_v1_create_params(m_dmabuf);

View File

@ -109,8 +109,13 @@ public:
// units of 1/120 (e.g. 144 = 1.2, 180 = 1.5, 240 = 2.0). Returns
// 120 (= 1.0) until the compositor sends its first
// wp_fractional_scale_v1.preferred_scale event for our surface.
// Renderer / GhosttySurface size their buffers at
// `logical * preferredScale120() / 120` device pixels.
//
// Currently INFORMATIONAL only: GhosttySurface uses Qt's
// devicePixelRatioF() for buffer sizing (which Qt derives from
// the same protocol on Wayland), so the two values agree at
// steady state. Exposed for diagnostics + a future direct-
// protocol path that bypasses Qt's DPR cache lag during a
// screen-change race.
uint32_t preferredScale120() const { return m_preferredScale120; }
// Stretch the existing subsurface buffer to a new destination

View File

@ -547,24 +547,42 @@ pub const Platform = union(PlatformTag) {
.vulkan => vulkan: {
const config = c_platform.vulkan;
// Collapse the eight per-callback "MustBeSet"
// variants into a single `error.MissingVulkanCallback`.
// Pre-this, every caller of `Platform.init` had to
// handle 8 separate error tags (or `try` swallow
// them) eight names that all mean "the host
// didn't fill out one of these fields." Log which
// one was null for diagnostics; the error tag
// itself stays narrow.
const which: ?[]const u8 = blk: {
if (config.get_instance_proc_addr == null) break :blk "get_instance_proc_addr";
if (config.instance == null) break :blk "instance";
if (config.physical_device == null) break :blk "physical_device";
if (config.device == null) break :blk "device";
if (config.queue == null) break :blk "queue";
if (config.queue_family_index == null) break :blk "queue_family_index";
if (config.get_supported_modifiers == null) break :blk "get_supported_modifiers";
if (config.present == null) break :blk "present";
break :blk null;
};
if (which) |name| {
std.log.scoped(.embedded).err(
"ghostty_platform_vulkan_s.{s} is null",
.{name},
);
break :vulkan error.MissingVulkanCallback;
}
break :vulkan .{ .vulkan = .{
.userdata = config.userdata,
.get_instance_proc_addr = config.get_instance_proc_addr orelse
break :vulkan error.GetInstanceProcAddrMustBeSet,
.instance = config.instance orelse
break :vulkan error.InstanceMustBeSet,
.physical_device = config.physical_device orelse
break :vulkan error.PhysicalDeviceMustBeSet,
.device = config.device orelse
break :vulkan error.DeviceMustBeSet,
.queue = config.queue orelse
break :vulkan error.QueueMustBeSet,
.queue_family_index = config.queue_family_index orelse
break :vulkan error.QueueFamilyIndexMustBeSet,
.get_supported_modifiers = config.get_supported_modifiers orelse
break :vulkan error.GetSupportedModifiersMustBeSet,
.present = config.present orelse
break :vulkan error.PresentMustBeSet,
.get_instance_proc_addr = config.get_instance_proc_addr.?,
.instance = config.instance.?,
.physical_device = config.physical_device.?,
.device = config.device.?,
.queue = config.queue.?,
.queue_family_index = config.queue_family_index.?,
.get_supported_modifiers = config.get_supported_modifiers.?,
.present = config.present.?,
} };
},
};

View File

@ -115,8 +115,9 @@ var device: ?Device = null;
var device_refcount: usize = 0;
var device_mutex: std.Thread.Mutex = .{};
/// Per-thread pool of `(VkBuffer, VkDeviceMemory)` pairs that get
/// recycled across frames. Solves two problems together:
/// Process-wide pool of `(VkBuffer, VkDeviceMemory)` pairs recycled
/// across frames on the renderer thread. Solves two problems
/// together:
///
/// 1. Lifetime: `vulkan/buffer.zig`'s `Buffer.deinit` is called
/// mid-frame (by `renderer/image.zig:draw`'s `defer buf.deinit()`)
@ -130,8 +131,22 @@ var device_mutex: std.Thread.Mutex = .{};
/// Lifecycle: `Buffer.deinit` pushes to `pending`. `Frame.complete`
/// after `vkWaitForFences` moves `pending` `ready`. `Buffer.create`
/// scans `ready` for an entry of matching usage + size and pops it
/// before allocating new. The pool only grows; entries get destroyed
/// when the device tears down (`Vulkan.deinit`).
/// before allocating new.
///
/// Process-wide (not threadlocal) and mutex-protected: splits/tabs
/// run independent renderer threads against the SAME shared
/// VkDevice, and a per-thread pool would mean each thread leaks
/// every staging buffer the other threads release. The mutex is
/// uncontended in the steady state entries are short-lived and
/// the pool only grows.
///
/// Caller responsibilities:
/// - Only call `release` from a code path whose VkBuffer reference
/// is bounded by a fence the renderer thread will eventually
/// wait on (i.e. the per-frame command buffer).
/// - For one-shot uploads (e.g. atlas staging) the caller already
/// does `vkQueueWaitIdle` post-submit; that path uses
/// `Buffer.destroyImmediate` which bypasses this pool.
pub const buffer_pool = struct {
const Entry = struct {
buffer: vk.VkBuffer,
@ -140,8 +155,9 @@ pub const buffer_pool = struct {
capacity: u64,
};
threadlocal var pending: std.ArrayList(Entry) = .{};
threadlocal var ready: std.ArrayList(Entry) = .{};
var mutex: std.Thread.Mutex = .{};
var pending: std.ArrayList(Entry) = .{};
var ready: std.ArrayList(Entry) = .{};
/// Queue a buffer for recycling. The buffer cannot be reused
/// until the next fence-wait (handled by `cycle`); it sits in
@ -154,6 +170,8 @@ pub const buffer_pool = struct {
capacity: u64,
) !void {
_ = dev;
mutex.lock();
defer mutex.unlock();
try pending.append(std.heap.smp_allocator, .{
.buffer = buffer,
.memory = memory,
@ -170,6 +188,8 @@ pub const buffer_pool = struct {
usage: vk.VkBufferUsageFlags,
min_capacity: u64,
) ?Entry {
mutex.lock();
defer mutex.unlock();
var i: usize = 0;
while (i < ready.items.len) : (i += 1) {
const e = ready.items[i];
@ -186,18 +206,19 @@ pub const buffer_pool = struct {
/// `Frame.complete` after `vkWaitForFences`.
///
/// `dev` is needed only on the OOM fallback path: if `ready`
/// can't grow to absorb `pending`, we destroy the pending
/// VkBuffers / VkDeviceMemory directly instead of leaking them
/// (the alternative would be to leave them in `pending` forever,
/// where each successive frame's `cycle` would try the same
/// failing append on an ever-growing list guaranteed VkDevice
/// memory exhaustion).
/// can't grow to absorb `pending`, we wait the device idle and
/// then destroy the pending entries directly so the next frame
/// doesn't double up on a pending list that can never drain.
pub fn cycle(dev: *const Device) void {
mutex.lock();
defer mutex.unlock();
ready.appendSlice(std.heap.smp_allocator, pending.items) catch {
// Couldn't grow `ready` destroy the GPU resources now
// (the GPU is provably done with them, the fence wait
// already returned) so the next frame doesn't double up
// on a pending list that can never drain.
// Couldn't grow `ready` destroy the pending GPU
// resources directly. Other renderer threads may still
// be submitting against the shared queue, so wait the
// device idle to make sure no command buffer in flight
// anywhere references these handles before we destroy.
_ = dev.dispatch.deviceWaitIdle(dev.device);
for (pending.items) |e| {
dev.dispatch.destroyBuffer(dev.device, e.buffer, null);
dev.dispatch.freeMemory(dev.device, e.memory, null);
@ -207,8 +228,10 @@ pub const buffer_pool = struct {
}
/// Tear down both lists. Call only when the device is idle
/// (`vkDeviceWaitIdle` or surface destroy).
/// (`vkDeviceWaitIdle` or final surface destroy).
pub fn drainAll(dev: *const Device) void {
mutex.lock();
defer mutex.unlock();
for (pending.items) |e| {
dev.dispatch.destroyBuffer(dev.device, e.buffer, null);
dev.dispatch.freeMemory(dev.device, e.memory, null);
@ -248,6 +271,28 @@ threadlocal var frame_cb: vk.VkCommandBuffer = null;
/// in `Frame.complete` before handing the target dmabuf to the host.
threadlocal var frame_fence: vk.VkFence = null;
/// Per-thread descriptor pool used by `RenderPass.step` to allocate
/// fresh descriptor sets when the same pipeline is bound more than
/// once in a single pass (vkCmdDraw reads descriptors at submit
/// time, so re-using the pipeline's static set would silently
/// corrupt prior draws). Reset at the start of every `beginFrame`
/// so this frame's allocations don't pile on the previous frame's;
/// the per-pass usage is bounded by a small constant see the
/// `step_pool_*` caps below.
threadlocal var step_pool: ?DescriptorPool = null;
/// Caps for the per-frame `step_pool`. Sized for the worst pass
/// shape (kitty image with N placements + the post pipelines): one
/// set per (image_step × MAX_DESCRIPTOR_SETS) plus a handful of
/// the renderer's other pipelines stepped once each. 256 is generous
/// actual frames stabilize well under that. If a frame ever
/// exhausts the pool, `RenderPass.step` falls back to the pipeline's
/// static set with a warning logged.
const STEP_POOL_MAX_SETS: u32 = 256;
const STEP_POOL_UNIFORM_BUFFERS: u32 = 256;
const STEP_POOL_COMBINED_IMAGE_SAMPLERS: u32 = 256;
const STEP_POOL_STORAGE_BUFFERS: u32 = 256;
// ---- lifecycle ----------------------------------------------------------
pub fn init(alloc: Allocator, opts: rendererpkg.Options) !Vulkan {
@ -301,8 +346,27 @@ pub fn deinit(self: *Vulkan) void {
// per surface), so it's always safe to clean them up regardless
// of other surfaces' state.
if (device) |*d| {
d.waitIdle();
// Per-surface teardown only needs THIS surface's submissions
// to be done block on this thread's frame fence (if it
// exists) instead of `vkDeviceWaitIdle` on the shared device,
// which would stall every other tab/split's in-flight GPU
// work just to close one. The final-refcount path below does
// the device-wide waitIdle.
if (frame_fence != null) {
const wait_r = d.dispatch.waitForFences(
d.device,
1,
&frame_fence,
vk.VK_TRUE,
std.math.maxInt(u64),
);
if (wait_r != vk.VK_SUCCESS) {
log.warn(
"Vulkan.deinit: vkWaitForFences returned {}, falling back to device-wide wait",
.{wait_r},
);
d.waitIdle();
}
d.dispatch.destroyFence(d.device, frame_fence, null);
frame_fence = null;
}
@ -314,13 +378,14 @@ pub fn deinit(self: *Vulkan) void {
p.deinit();
frame_pool = null;
}
if (step_pool) |*p| {
p.deinit();
step_pool = null;
}
// `last_target` is a borrow into this thread's FrameState
// target slot. The SwapChain teardown destroys the target;
// we just drop our reference.
last_target = null;
// Recycle this thread's pooled buffers the waitIdle above
// proves no GPU work references them anymore.
buffer_pool.drainAll(d);
}
// Decrement the shared-device refcount; only the last surface
@ -330,9 +395,17 @@ pub fn deinit(self: *Vulkan) void {
// renderer thread.
device_mutex.lock();
defer device_mutex.unlock();
std.debug.assert(device_refcount > 0);
device_refcount -= 1;
if (device_refcount == 0) {
if (device) |*d| d.deinit();
// Last surface: NOW we can safely drain the global buffer
// pool and tear the device down. The waitIdle is needed
// because non-final deinits skipped it.
if (device) |*d| {
d.waitIdle();
buffer_pool.drainAll(d);
d.deinit();
}
device = null;
}
self.* = undefined;
@ -499,16 +572,37 @@ pub fn beginFrame(
if (dev.dispatch.createFence(dev.device, &fence_info, null, &frame_fence) != vk.VK_SUCCESS)
return error.VulkanFailed;
}
if (step_pool == null) {
step_pool = try DescriptorPool.init(.{
.device = dev,
.max_sets = STEP_POOL_MAX_SETS,
.uniform_buffers = STEP_POOL_UNIFORM_BUFFERS,
.combined_image_samplers = STEP_POOL_COMBINED_IMAGE_SAMPLERS,
.storage_buffers = STEP_POOL_STORAGE_BUFFERS,
});
}
_ = self;
// Reset the command buffer + fence so this frame starts clean.
// Reset the command buffer + fence + step descriptor pool so
// this frame starts clean. `vkResetDescriptorPool` returns every
// set the previous frame allocated to the pool much cheaper
// than freeing them individually, and removes any chance of
// last-frame's set being bound by accident.
if (dev.dispatch.resetCommandBuffer(frame_cb, 0) != vk.VK_SUCCESS)
return error.VulkanFailed;
if (dev.dispatch.resetFences(dev.device, 1, &frame_fence) != vk.VK_SUCCESS)
return error.VulkanFailed;
if (step_pool) |*p| {
if (dev.dispatch.resetDescriptorPool(dev.device, p.pool, 0) != vk.VK_SUCCESS)
return error.VulkanFailed;
}
return try Frame.begin(
.{ .cb = frame_cb, .fence = frame_fence },
.{
.cb = frame_cb,
.fence = frame_fence,
.step_pool = if (step_pool) |*p| p else null,
},
dev,
renderer,
target,

View File

@ -217,17 +217,28 @@ pub fn glslFromShader(
try writer.writeAll(prefix);
} else {
// Find the first newline after `#version ...` and inject the
// defines on the following line. We assume the prefix begins
// with a `#version` directive on its own line (true today;
// the comptime split below would crash loudly otherwise).
const first_nl = std.mem.indexOfScalar(u8, prefix, '\n').?;
try writer.writeAll(prefix[0 .. first_nl + 1]);
for (defines) |def| {
try writer.writeAll("#define ");
try writer.writeAll(def);
try writer.writeAll("\n");
// defines on the following line. The prefix is expected to
// start with `#version` followed by a newline; if a future
// edit ever drops that newline (e.g. a single-line prefix)
// we inject the defines BEFORE the prefix so glslang sees
// the directives on their own lines and reports a clear
// error instead of us crashing on a `null.?` unwrap.
if (std.mem.indexOfScalar(u8, prefix, '\n')) |first_nl| {
try writer.writeAll(prefix[0 .. first_nl + 1]);
for (defines) |def| {
try writer.writeAll("#define ");
try writer.writeAll(def);
try writer.writeAll("\n");
}
try writer.writeAll(prefix[first_nl + 1 ..]);
} else {
for (defines) |def| {
try writer.writeAll("#define ");
try writer.writeAll(def);
try writer.writeAll("\n");
}
try writer.writeAll(prefix);
}
try writer.writeAll(prefix[first_nl + 1 ..]);
}
try writer.writeAll("\n\n");
try writer.writeAll(src);
@ -506,4 +517,3 @@ test "shadertoy to glsl" {
const test_crt = @embedFile("shaders/test_shadertoy_crt.glsl");
const test_invalid = @embedFile("shaders/test_shadertoy_invalid.glsl");
const test_focus = @embedFile("shaders/test_shadertoy_focus.glsl");

View File

@ -47,6 +47,28 @@ device: *const Device,
pool: vk.VkDescriptorPool,
pub fn init(opts: Options) Error!Self {
// Vulkan spec requires `maxSets > 0` and `poolSizeCount > 0`
// a pool that vends N sets but doesn't admit any descriptor
// type would be useless and is rejected by some drivers
// (loose drivers accept it and fail at allocation time). Catch
// both shapes here so the caller gets a clear error instead of
// a downstream allocation failure.
if (opts.max_sets == 0) {
log.err("DescriptorPool.init: max_sets must be > 0", .{});
return error.VulkanFailed;
}
if (opts.uniform_buffers == 0 and
opts.combined_image_samplers == 0 and
opts.storage_buffers == 0)
{
log.err(
"DescriptorPool.init: at least one per-type cap must be > 0 " ++
"(uniform_buffers, combined_image_samplers, storage_buffers)",
.{},
);
return error.VulkanFailed;
}
// Build a small VkDescriptorPoolSize array from whichever caps
// are non-zero. Vulkan accepts an array; we cap at 3 entries
// matching the three types `Options` exposes.
@ -78,11 +100,12 @@ pub fn init(opts: Options) Error!Self {
.sType = vk.VK_STRUCTURE_TYPE_DESCRIPTOR_POOL_CREATE_INFO,
.pNext = null,
// No FREE_DESCRIPTOR_SET_BIT we tear down by destroying
// the pool, which matches the per-frame reset pattern.
// the pool (or `vkResetDescriptorPool` for the per-frame
// step pool).
.flags = 0,
.maxSets = opts.max_sets,
.poolSizeCount = n,
.pPoolSizes = if (n > 0) &sizes else null,
.pPoolSizes = &sizes,
};
var pool: vk.VkDescriptorPool = undefined;
const r = opts.device.dispatch.createDescriptorPool(

View File

@ -195,6 +195,7 @@ pub const Dispatch = struct {
// the actual renderer integration lands.
createDescriptorPool: std.meta.Child(vk.PFN_vkCreateDescriptorPool),
destroyDescriptorPool: std.meta.Child(vk.PFN_vkDestroyDescriptorPool),
resetDescriptorPool: std.meta.Child(vk.PFN_vkResetDescriptorPool),
allocateDescriptorSets: std.meta.Child(vk.PFN_vkAllocateDescriptorSets),
updateDescriptorSets: std.meta.Child(vk.PFN_vkUpdateDescriptorSets),
cmdBindDescriptorSets: std.meta.Child(vk.PFN_vkCmdBindDescriptorSets),
@ -218,6 +219,12 @@ queue_family_index: u32,
/// `error.UnsupportedVulkanVersion`).
api_version: u32,
/// Cached `VkPhysicalDeviceMemoryProperties`. The properties are
/// immutable for the physical device's lifetime, so we query once
/// at `init` time instead of on every `findMemoryType` call (which
/// happens for every Buffer/Texture/Target allocation).
memory_properties: vk.VkPhysicalDeviceMemoryProperties,
dispatch: Dispatch,
/// Process-wide mutex protecting access to `queue`. Vulkan requires
@ -351,10 +358,41 @@ pub fn init(
// ---- 4. extension check --------------------------------------
var ext_count: u32 = 0;
_ = enumerate_device_extension_properties(physical_device, null, &ext_count, null);
{
const r = enumerate_device_extension_properties(physical_device, null, &ext_count, null);
// SUCCESS or INCOMPLETE both populate `ext_count`. INCOMPLETE
// shouldn't happen on the count-only call (no buffer to
// truncate) but we accept it defensively.
if (r != vk.VK_SUCCESS and r != vk.VK_INCOMPLETE) {
log.err("vkEnumerateDeviceExtensionProperties (count) failed: result={}", .{r});
return error.HostHandleMissing;
}
}
const exts = try alloc.alloc(vk.VkExtensionProperties, ext_count);
defer alloc.free(exts);
_ = enumerate_device_extension_properties(physical_device, null, &ext_count, exts.ptr);
{
const r = enumerate_device_extension_properties(physical_device, null, &ext_count, exts.ptr);
if (r != vk.VK_SUCCESS and r != vk.VK_INCOMPLETE) {
log.err("vkEnumerateDeviceExtensionProperties (fill) failed: result={}", .{r});
return error.HostHandleMissing;
}
// VK_INCOMPLETE here means the extension list grew between
// the count and fill calls (race with a driver hot-reload
// very unlikely in practice but spec-permitted). The
// partially-filled buffer is still authoritative for the
// entries it does contain, but a required extension not yet
// populated would be missed. Treat as a hard fail since the
// extension presence check below would silently pass on a
// truncated list.
if (r == vk.VK_INCOMPLETE) {
log.err(
"vkEnumerateDeviceExtensionProperties returned INCOMPLETE; " ++
"device extension list changed between count and fill",
.{},
);
return error.HostHandleMissing;
}
}
inline for (REQUIRED_DEVICE_EXTENSIONS) |required| {
var found = false;
@ -501,6 +539,8 @@ pub fn init(
try dl.load(vk.PFN_vkCreateDescriptorPool, "vkCreateDescriptorPool");
const destroy_descriptor_pool =
try dl.load(vk.PFN_vkDestroyDescriptorPool, "vkDestroyDescriptorPool");
const reset_descriptor_pool =
try dl.load(vk.PFN_vkResetDescriptorPool, "vkResetDescriptorPool");
const allocate_descriptor_sets =
try dl.load(vk.PFN_vkAllocateDescriptorSets, "vkAllocateDescriptorSets");
const update_descriptor_sets =
@ -508,6 +548,12 @@ pub fn init(
const cmd_bind_descriptor_sets =
try dl.load(vk.PFN_vkCmdBindDescriptorSets, "vkCmdBindDescriptorSets");
// Snapshot the memory properties once. They never change for
// the device's lifetime, so per-allocation re-queries (which
// findMemoryType used to do) were pure waste.
var memory_properties: vk.VkPhysicalDeviceMemoryProperties = undefined;
get_physical_device_memory_properties(physical_device, &memory_properties);
return .{
.platform = platform,
.instance = instance,
@ -516,6 +562,7 @@ pub fn init(
.queue = queue,
.queue_family_index = queue_family_index,
.api_version = props.apiVersion,
.memory_properties = memory_properties,
.dispatch = .{
.getPhysicalDeviceProperties = get_physical_device_properties,
.getPhysicalDeviceMemoryProperties = get_physical_device_memory_properties,
@ -579,6 +626,7 @@ pub fn init(
.cmdCopyImageToBuffer = cmd_copy_image_to_buffer,
.createDescriptorPool = create_descriptor_pool,
.destroyDescriptorPool = destroy_descriptor_pool,
.resetDescriptorPool = reset_descriptor_pool,
.allocateDescriptorSets = allocate_descriptor_sets,
.updateDescriptorSets = update_descriptor_sets,
.cmdBindDescriptorSets = cmd_bind_descriptor_sets,
@ -593,9 +641,16 @@ pub fn deinit(self: *Device) void {
}
/// Block until the device is idle. Useful before tearing down
/// renderer resources to make sure no command buffers are in flight.
/// renderer resources to make sure no command buffers are in
/// flight. On `VK_ERROR_DEVICE_LOST` (or any other failure) we
/// log the result so callers proceeding to destroy resources on
/// a dead device leave a diagnostic crumb instead of silently
/// crashing on the subsequent vkDestroy*.
pub fn waitIdle(self: *const Device) void {
_ = self.dispatch.deviceWaitIdle(self.device);
const r = self.dispatch.deviceWaitIdle(self.device);
if (r != vk.VK_SUCCESS) {
log.warn("vkDeviceWaitIdle returned {}; teardown proceeding anyway", .{r});
}
}
/// Find a `VkMemoryType` index satisfying the requirements from a
@ -609,8 +664,7 @@ pub fn findMemoryType(
type_bits: u32,
required_props: vk.VkMemoryPropertyFlags,
) ?u32 {
var props: vk.VkPhysicalDeviceMemoryProperties = undefined;
self.dispatch.getPhysicalDeviceMemoryProperties(self.physical_device, &props);
const props = &self.memory_properties;
var i: u32 = 0;
while (i < props.memoryTypeCount) : (i += 1) {
const bit: u32 = @as(u32, 1) << @intCast(i);

View File

@ -37,6 +37,7 @@ const vk = @import("vulkan").c;
const Device = @import("Device.zig");
const Target = @import("Target.zig");
const DescriptorPool = @import("DescriptorPool.zig");
const RenderPass = @import("RenderPass.zig");
const Vulkan = @import("../Vulkan.zig");
@ -53,6 +54,15 @@ pub const Options = struct {
/// Fence that gets signaled when the submit completes. Caller
/// resets it to unsignaled before `begin` is called.
fence: vk.VkFence,
/// Per-frame descriptor pool. `RenderPass.step` borrows it for
/// the per-call descriptor sets it allocates whenever a
/// pipeline is re-used within a single pass. The pool is
/// caller-owned (top-level `Vulkan.zig` keeps it threadlocal)
/// and must be reset (`vkResetDescriptorPool`) by the caller
/// before each Frame.begin so this frame's allocations don't
/// pile on the previous frame's.
step_pool: ?*DescriptorPool = null,
};
pub const Error = error{
@ -67,6 +77,7 @@ renderer: *Renderer,
target: *Target,
cb: vk.VkCommandBuffer,
fence: vk.VkFence,
step_pool: ?*DescriptorPool = null,
/// Begin recording a frame. The command buffer is reset and started
/// with `ONE_TIME_SUBMIT` since we always submit before the next
@ -95,6 +106,7 @@ pub fn begin(
.target = target,
.cb = opts.cb,
.fence = opts.fence,
.step_pool = opts.step_pool,
};
}
@ -112,75 +124,102 @@ pub fn complete(self: *const Self, sync: bool) void {
_ = sync;
const dev = self.device;
// `health` becomes `.unhealthy` on any GPU-side error below. We
// ALWAYS run `buffer_pool.cycle` and `frameCompleted` on the
// way out skipping them on error left every retired buffer
// stuck in `pending` (unbounded growth) and held the renderer's
// swap-chain semaphore forever, so the NEXT `drawFrame` would
// hang with no diagnostic.
var health: Health = .healthy;
var submitted = false;
// Make the rendered pixels visible to the host's mmap read. In
// `.direct` mode this is just a memory barrier; in `.legacy_copy`
// mode it also runs `vkCmdCopyImageToBuffer`. See `Target.zig`.
self.target.recordPresentBarrier(self.cb);
{
end_cb: {
const r = dev.dispatch.endCommandBuffer(self.cb);
if (r != vk.VK_SUCCESS) {
log.err("vkEndCommandBuffer (frame) failed: result={}", .{r});
return;
health = .unhealthy;
break :end_cb;
}
}
const submit_info: vk.VkSubmitInfo = .{
.sType = vk.VK_STRUCTURE_TYPE_SUBMIT_INFO,
.pNext = null,
.waitSemaphoreCount = 0,
.pWaitSemaphores = null,
.pWaitDstStageMask = null,
.commandBufferCount = 1,
.pCommandBuffers = &self.cb,
.signalSemaphoreCount = 0,
.pSignalSemaphores = null,
};
{
const submit_info: vk.VkSubmitInfo = .{
.sType = vk.VK_STRUCTURE_TYPE_SUBMIT_INFO,
.pNext = null,
.waitSemaphoreCount = 0,
.pWaitSemaphores = null,
.pWaitDstStageMask = null,
.commandBufferCount = 1,
.pCommandBuffers = &self.cb,
.signalSemaphoreCount = 0,
.pSignalSemaphores = null,
};
// Externally-synchronized via `Device.queueSubmit` splits
// and tabs share the host's VkQueue and Vulkan rejects
// concurrent unsynchronized access.
const r = dev.queueSubmit(1, &submit_info, self.fence);
if (r != vk.VK_SUCCESS) {
log.err("vkQueueSubmit (frame) failed: result={}", .{r});
return;
const sr = dev.queueSubmit(1, &submit_info, self.fence);
if (sr != vk.VK_SUCCESS) {
log.err("vkQueueSubmit (frame) failed: result={}", .{sr});
health = .unhealthy;
break :end_cb;
}
}
submitted = true;
// Wait for the GPU to finish writing the target before letting
// the host import the dmabuf. UINT64_MAX = "wait indefinitely".
{
const r = dev.dispatch.waitForFences(
// Wait for the GPU to finish writing the target before letting
// the host import the dmabuf. UINT64_MAX = "wait indefinitely".
const wr = dev.dispatch.waitForFences(
dev.device,
1,
&self.fence,
vk.VK_TRUE,
std.math.maxInt(u64),
);
if (r != vk.VK_SUCCESS) {
log.err("vkWaitForFences (frame) failed: result={}", .{r});
if (wr != vk.VK_SUCCESS) {
log.err("vkWaitForFences (frame) failed: result={}", .{wr});
health = .unhealthy;
}
}
// Recycle the per-frame Buffer pool now that the fence has
// signaled every VkBuffer queued during this frame's
// recording is provably no longer in use by the GPU and is
// safe to hand to the next `Buffer.create` call. See
// `Vulkan.buffer_pool` for the lifecycle.
Vulkan.buffer_pool.cycle(dev);
// Recycle the per-frame Buffer pool. Even on the error path we
// still want to cycle: buffers that the failed submit referenced
// are now stuck (we can't prove the GPU is done with them), so
// we conservatively wait the device idle on the unhealthy path
// before draining. Without this, every failed submit leaks
// every buffer the renderer queued for that frame.
if (health == .unhealthy and !submitted) {
// Submit never happened nothing in flight references
// recorded buffers, safe to cycle directly.
Vulkan.buffer_pool.cycle(dev);
} else if (health == .unhealthy) {
// Submit happened but fence wait failed (DEVICE_LOST etc.).
// Drain the device before recycling to avoid use-after-free
// on whatever queue is still ticking.
_ = dev.dispatch.deviceWaitIdle(dev.device);
Vulkan.buffer_pool.cycle(dev);
} else {
Vulkan.buffer_pool.cycle(dev);
}
// Hand the rendered target off to the host via `Vulkan.present`,
// which both calls the platform's present callback AND records
// the target pointer for `presentLastTarget` no-op republishes.
self.renderer.api.present(self.target) catch |err| {
log.err("present failed: {}", .{err});
};
// Hand the rendered target off to the host. On the unhealthy
// path we skip present the dmabuf may be partially written
// and the host should see the previous frame instead (the
// generic renderer's no-op-frame logic re-presents
// `last_target`).
if (health == .healthy) {
self.renderer.api.present(self.target) catch |err| {
log.err("present failed: {}", .{err});
health = .unhealthy;
};
}
// Tell the generic renderer the frame is done so it releases the
// swap-chain semaphore. Without this, `SwapChain.nextFrame()`
// blocks the second call to `drawFrame` forever (one buffer in
// the chain, never freed).
self.renderer.frameCompleted(.healthy);
// the chain, never freed). MUST run regardless of `health`.
self.renderer.frameCompleted(health);
}
/// Begin a render pass recording into this frame's command buffer.
@ -193,6 +232,7 @@ pub inline fn renderPass(
return RenderPass.begin(.{
.device = self.device,
.cb = self.cb,
.step_pool = self.step_pool,
.attachments = attachments,
});
}

View File

@ -136,9 +136,27 @@ layout: vk.VkPipelineLayout,
/// `RenderPass.step` skips updating/binding it). `set_count` is one
/// past the last non-null index, matching what
/// `vkCmdBindDescriptorSets` needs as `setCount`.
///
/// HOT-PATH NOTE: these sets are SHARED across all `step()` calls
/// that bind this pipeline within a single command buffer, but
/// `vkCmdDraw` reads descriptors at submit time, so re-using the
/// same pipeline twice with different per-call resources would
/// cause both draws to see the LAST update's bindings.
/// `RenderPass.step` defends against this by allocating a fresh
/// per-call set from the pass's `step_pool` whenever the per-step
/// resources differ; these `descriptor_sets[i]` slots act as
/// pre-warmed defaults (used only when the call site is
/// single-step-per-pipeline like bg_color / cell_bg).
descriptor_sets: [MAX_DESCRIPTOR_SETS]vk.VkDescriptorSet = .{ null, null, null },
set_count: u32 = 0,
/// Descriptor set layouts associated with this pipeline, indexed by
/// set number. `null` matches a `null` slot in `descriptor_sets`.
/// Stored so `RenderPass.step` can allocate per-call sets from the
/// pass's per-frame descriptor pool without round-tripping through
/// the original `Shaders.init` layout-creation code path.
descriptor_set_layouts: [MAX_DESCRIPTOR_SETS]vk.VkDescriptorSetLayout = .{ null, null, null },
/// Binding number that `Step.uniforms` writes to within set 0.
/// Defaults to 1 to match `common.glsl`'s
/// `layout(binding = 1, std140) uniform Globals`. Override per
@ -395,14 +413,20 @@ pub fn init(opts: Options) Error!Self {
return error.VulkanFailed;
}
}
errdefer dev.dispatch.destroyPipeline(dev.device, pipeline, null);
// Allocate one descriptor set per non-null entry in
// `opts.descriptor_set_layouts`. Null entries are placeholder
// `opts.descriptor_set_layouts`. Null entries are placeholders
// (the shader's set=i isn't actually used) nothing to allocate.
// Also remember the layouts on `Self` so `RenderPass.step` can
// allocate fresh per-call sets from a per-frame pool without
// re-creating layouts.
var dsets: [MAX_DESCRIPTOR_SETS]vk.VkDescriptorSet = .{ null, null, null };
var dsls: [MAX_DESCRIPTOR_SETS]vk.VkDescriptorSetLayout = .{ null, null, null };
if (opts.descriptor_pool) |pool_ptr| {
for (opts.descriptor_set_layouts, 0..) |maybe_dsl, i| {
if (maybe_dsl) |dsl| {
dsls[i] = dsl;
dsets[i] = pool_ptr.allocate(dsl) catch |err| {
log.err(
"Pipeline.init: descriptor set {} allocation failed: {}",
@ -412,6 +436,10 @@ pub fn init(opts: Options) Error!Self {
};
}
}
} else {
for (opts.descriptor_set_layouts, 0..) |maybe_dsl, i| {
if (maybe_dsl) |dsl| dsls[i] = dsl;
}
}
return .{
@ -419,6 +447,7 @@ pub fn init(opts: Options) Error!Self {
.pipeline = pipeline,
.layout = layout,
.descriptor_sets = dsets,
.descriptor_set_layouts = dsls,
.set_count = @intCast(opts.descriptor_set_layouts.len),
.sampler = opts.sampler,
.vertex_stride = if (opts.vertex_input) |vi| vi.stride else 0,

View File

@ -18,6 +18,7 @@ const Self = @This();
const std = @import("std");
const vk = @import("vulkan").c;
const DescriptorPool = @import("DescriptorPool.zig");
const Device = @import("Device.zig");
const Pipeline = @import("Pipeline.zig");
const Sampler = @import("Sampler.zig");
@ -57,6 +58,16 @@ pub const Options = struct {
/// by the enclosing `Frame`.
cb: vk.VkCommandBuffer,
/// Per-frame descriptor pool. Used by `step` to allocate fresh
/// descriptor sets on the SECOND and later step() calls that
/// bind the same pipeline within this pass without it,
/// mutating the pipeline's static `descriptor_sets[i]` for the
/// second call would overwrite the first call's bindings before
/// the GPU has read them (vkCmdDraw reads at submit time).
/// Optional: passes that never re-use a pipeline (bg_color,
/// cell_bg, cell_text) work without it.
step_pool: ?*DescriptorPool = null,
/// Color attachments for the pass. With dynamic rendering each
/// attachment is a render target + optional clear color.
attachments: []const Attachment,
@ -114,8 +125,21 @@ pub const Error = error{
attachments: []const Options.Attachment,
cb: vk.VkCommandBuffer,
device: *const Device,
step_pool: ?*DescriptorPool = null,
step_number: usize = 0,
/// VkPipeline handles already used by an earlier `step` in this
/// pass. On second-and-later use of the same pipeline we allocate
/// a fresh per-call descriptor set from `step_pool` instead of
/// mutating `pipeline.descriptor_sets[i]` (vkCmdDraw reads at
/// submit time, so re-updating the same set in place would
/// overwrite the prior call's bindings before the GPU has read
/// them). Capacity covers our worst case: per-pass image draws
/// can fire dozens of pipeline reuses. The slice is empty when no
/// step_pool was provided.
seen_pipelines: [MAX_SEEN_PIPELINES]vk.VkPipeline = .{null} ** MAX_SEEN_PIPELINES,
seen_pipelines_len: usize = 0,
/// Last `Step.uniforms` value seen in this pass. The OpenGL backend
/// keeps the bound UBO across draw calls implicitly (GL state
/// persists), and the renderer's image/overlay draw calls in
@ -126,6 +150,13 @@ step_number: usize = 0,
/// to null at `begin`.
last_uniforms: ?vk.VkBuffer = null,
/// Cap on the number of distinct pipelines we'll track per pass
/// for "first-use vs re-use" detection. The renderer's pass shape
/// is: bg_color (1), cell_bg (1), cell_text (1), bg_image (1),
/// image (varies). 8 is generous; we degrade gracefully to "always
/// allocate fresh" past this cap.
const MAX_SEEN_PIPELINES: usize = 8;
/// Begin a render pass. Transitions the first attachment to
/// `COLOR_ATTACHMENT_OPTIMAL` and opens a `vkCmdBeginRendering`
/// scope with the caller's clear color (defaults to opaque black).
@ -138,6 +169,7 @@ pub fn begin(opts: Options) Self {
.attachments = opts.attachments,
.cb = opts.cb,
.device = opts.device,
.step_pool = opts.step_pool,
};
if (opts.attachments.len == 0) return self;
@ -340,6 +372,48 @@ pub fn step(self: *Self, s: Step) void {
}
}
// Pick effective descriptor sets for this step.
//
// First time we see a given pipeline within this pass, we use
// its pre-allocated `descriptor_sets[]` slots and update them
// in place cheap and avoids a per-pass-pool allocation in
// the common single-step case (bg_color/cell_bg/cell_text).
//
// SECOND-and-later use of the same pipeline within the same
// pass requires fresh sets: vkCmdDraw reads the descriptor
// contents at SUBMIT time, so re-updating the static sets in
// place would silently make every prior draw bound to this
// pipeline read the LAST update's UBO/sampler/storage. The
// image / kitty path issues N draws on the same `image`
// pipeline with per-call vertex buffers and textures without
// this fix every kitty image rendered with the FINAL image's
// texture and the final draw's vertex buffer.
//
// The fresh sets come from `step_pool`, owned by the enclosing
// Frame and reset at frame start. When `step_pool` is null
// (test harnesses, smoke tests) we fall back to the static
// sets and accept the limitation.
var effective_sets: [Pipeline.MAX_DESCRIPTOR_SETS]vk.VkDescriptorSet =
s.pipeline.descriptor_sets;
const reused = self.markPipelineUsed(s.pipeline.pipeline);
if (reused) if (self.step_pool) |pool| {
for (s.pipeline.descriptor_set_layouts, 0..) |maybe_dsl, i| {
if (i >= s.pipeline.set_count) break;
const dsl = maybe_dsl orelse continue;
if (pool.allocate(dsl)) |fresh| {
effective_sets[i] = fresh;
} else |err| {
log.err(
"RenderPass.step: per-call descriptor set " ++
"allocation for set {} failed ({}); falling " ++
"back to the pipeline's static set, which " ++
"may corrupt prior draws on this pipeline",
.{ i, err },
);
}
}
};
// ---- update descriptor sets ---------------------------------
//
// We do one vkUpdateDescriptorSets call per descriptor write to
@ -353,7 +427,7 @@ pub fn step(self: *Self, s: Step) void {
// supply one. Track the new one for later steps.
const ubo: ?vk.VkBuffer = s.uniforms orelse self.last_uniforms;
if (s.uniforms) |b| self.last_uniforms = b;
if (s.pipeline.descriptor_sets[0] != null) if (ubo) |ubo_buffer| {
if (effective_sets[0] != null) if (ubo) |ubo_buffer| {
const buffer_info: vk.VkDescriptorBufferInfo = .{
.buffer = ubo_buffer,
.offset = 0,
@ -362,7 +436,7 @@ pub fn step(self: *Self, s: Step) void {
const write: vk.VkWriteDescriptorSet = .{
.sType = vk.VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET,
.pNext = null,
.dstSet = s.pipeline.descriptor_sets[0],
.dstSet = effective_sets[0],
.dstBinding = s.pipeline.uniforms_binding,
.dstArrayElement = 0,
.descriptorCount = 1,
@ -375,7 +449,7 @@ pub fn step(self: *Self, s: Step) void {
};
// Samplers (set 1)
if (s.pipeline.descriptor_sets[1] != null) {
if (effective_sets[1] != null) {
const slot_count = @max(s.textures.len, s.samplers.len);
for (0..slot_count) |slot| {
const tex_opt: ?Texture = if (slot < s.textures.len) s.textures[slot] else null;
@ -396,7 +470,7 @@ pub fn step(self: *Self, s: Step) void {
const write: vk.VkWriteDescriptorSet = .{
.sType = vk.VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET,
.pNext = null,
.dstSet = s.pipeline.descriptor_sets[1],
.dstSet = effective_sets[1],
.dstBinding = @intCast(slot),
.dstArrayElement = 0,
.descriptorCount = 1,
@ -411,7 +485,7 @@ pub fn step(self: *Self, s: Step) void {
// Storage buffers (set 2). `buffers[0]` is reserved for the
// vertex buffer (handled above), so storage starts at slot 1.
if (s.pipeline.descriptor_sets[2] != null and s.buffers.len > 1) {
if (effective_sets[2] != null and s.buffers.len > 1) {
for (s.buffers[1..], 1..) |maybe_buf, slot| {
const buf = maybe_buf orelse continue;
const buffer_info: vk.VkDescriptorBufferInfo = .{
@ -422,7 +496,7 @@ pub fn step(self: *Self, s: Step) void {
const write: vk.VkWriteDescriptorSet = .{
.sType = vk.VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET,
.pNext = null,
.dstSet = s.pipeline.descriptor_sets[2],
.dstSet = effective_sets[2],
.dstBinding = @intCast(slot),
.dstArrayElement = 0,
.descriptorCount = 1,
@ -443,19 +517,19 @@ pub fn step(self: *Self, s: Step) void {
// contiguous run of non-null sets.
var start: usize = 0;
while (start < s.pipeline.set_count) {
if (s.pipeline.descriptor_sets[start] == null) {
if (effective_sets[start] == null) {
start += 1;
continue;
}
var end = start + 1;
while (end < s.pipeline.set_count and s.pipeline.descriptor_sets[end] != null) : (end += 1) {}
while (end < s.pipeline.set_count and effective_sets[end] != null) : (end += 1) {}
dev.dispatch.cmdBindDescriptorSets(
self.cb,
vk.VK_PIPELINE_BIND_POINT_GRAPHICS,
s.pipeline.layout,
@intCast(start),
@intCast(end - start),
&s.pipeline.descriptor_sets[start],
&effective_sets[start],
0,
null,
);
@ -477,6 +551,26 @@ pub fn step(self: *Self, s: Step) void {
self.step_number += 1;
}
/// Mark `pipeline` as used in this pass and report whether it was
/// already seen. Returns `false` on the FIRST call (so `step` can
/// safely update the pipeline's static descriptor sets in place);
/// `true` on every subsequent call (so `step` allocates fresh sets
/// from `step_pool` to avoid clobbering the prior call's bindings).
///
/// Beyond `MAX_SEEN_PIPELINES` we conservatively report `true` so
/// callers always allocate fresh the alternative (silently
/// reverting to in-place updates) is the bug this whole mechanism
/// exists to prevent.
fn markPipelineUsed(self: *Self, pipeline: vk.VkPipeline) bool {
for (self.seen_pipelines[0..self.seen_pipelines_len]) |seen| {
if (seen == pipeline) return true;
}
if (self.seen_pipelines_len >= MAX_SEEN_PIPELINES) return true;
self.seen_pipelines[self.seen_pipelines_len] = pipeline;
self.seen_pipelines_len += 1;
return false;
}
/// Close the rendering scope and leave the attachment in a layout
/// the host can read back via the dmabuf export. `GENERAL` is the
/// safest choice for unknown consumer access patterns; the host

View File

@ -198,12 +198,17 @@ fn pickModifier(
// work for AMD/Intel LINEAR but the compositor attach would
// fail, so treat it as "no intersection."
var host_mods: [MAX_MODIFIERS]u64 = undefined;
const host_count = dev.platform.get_supported_modifiers(
const host_returned = dev.platform.get_supported_modifiers(
dev.platform.userdata,
drm_format,
&host_mods,
MAX_MODIFIERS,
);
// Clamp defensively. The C ABI contract is "host returns ≤ capacity",
// but we don't get to assume the host's implementation is correct
// and in safe builds an OOB read on `host_mods[..host_returned]`
// panics, hiding the real diagnostic.
const host_count: usize = @min(host_returned, MAX_MODIFIERS);
if (host_count == 0) {
log.warn(
"host advertises no dmabuf modifiers for format 0x{x}; " ++
@ -465,7 +470,22 @@ fn initDirect(opts: Options, drm_format: u32, chosen_mod: u64) Error!Self {
.fd = fd,
.drm_format = drm_format,
.drm_modifier = actual_mod,
.stride = @intCast(layout.rowPitch),
.stride = stride: {
// VkSubresourceLayout.rowPitch is u64 but the platform
// present callback accepts u32 stride. For a sanely-
// sized terminal target stride fits comfortably in u32,
// but vendor-tiled drivers at exotic resolutions could
// legitimately exceed it. Fail the init explicitly
// instead of letting `@intCast` panic in safe builds.
if (layout.rowPitch > std.math.maxInt(u32)) {
log.err(
"Target.initDirect: rowPitch {} > u32 max; refusing direct mode",
.{layout.rowPitch},
);
return error.UnsupportedFormat;
}
break :stride @intCast(layout.rowPitch);
},
};
}

View File

@ -74,6 +74,12 @@ image: vk.VkImage,
memory: vk.VkDeviceMemory,
view: vk.VkImageView,
format: vk.VkFormat,
/// Aspect mask the image was created with (e.g. COLOR_BIT for
/// renderable textures, DEPTH_BIT for depth attachments). Stored
/// so per-frame `replaceRegion` barrier/copy use the same aspect
/// the image view was made with hardcoding COLOR_BIT here was a
/// silent validation error for any non-color caller.
aspect: vk.VkImageAspectFlags,
width: usize,
height: usize,
device: *const Device,
@ -207,6 +213,7 @@ pub fn init(
.memory = memory,
.view = view,
.format = opts.format,
.aspect = opts.aspect,
.width = width,
.height = height,
.device = dev,
@ -249,7 +256,15 @@ pub fn replaceRegion(
.device = dev,
.usage = vk.VK_BUFFER_USAGE_TRANSFER_SRC_BIT,
}, data);
defer staging.deinit();
// `destroyImmediate` instead of `deinit`: replaceRegion runs
// synchronously on the calling thread (typically the main /
// app-init thread, NOT the renderer thread), and
// `OneShot.endAndSubmit` below calls `vkQueueWaitIdle` so the
// staging buffer is provably done with the GPU before this
// defer fires. Routing it into `Vulkan.buffer_pool` from a
// non-renderer thread would leak it forever the pool's
// `cycle()` runs only on the renderer thread.
defer staging.destroyImmediate();
// ---- command pool (one-shot) --------------------------------
var pool = try CommandPool.init(dev);
@ -279,7 +294,7 @@ pub fn replaceRegion(
.dstQueueFamilyIndex = vk.VK_QUEUE_FAMILY_IGNORED,
.image = self.image,
.subresourceRange = .{
.aspectMask = vk.VK_IMAGE_ASPECT_COLOR_BIT,
.aspectMask = self.aspect,
.baseMipLevel = 0,
.levelCount = 1,
.baseArrayLayer = 0,
@ -304,7 +319,7 @@ pub fn replaceRegion(
.bufferRowLength = 0, // tightly packed
.bufferImageHeight = 0,
.imageSubresource = .{
.aspectMask = vk.VK_IMAGE_ASPECT_COLOR_BIT,
.aspectMask = self.aspect,
.mipLevel = 0,
.baseArrayLayer = 0,
.layerCount = 1,
@ -343,7 +358,7 @@ pub fn replaceRegion(
.dstQueueFamilyIndex = vk.VK_QUEUE_FAMILY_IGNORED,
.image = self.image,
.subresourceRange = .{
.aspectMask = vk.VK_IMAGE_ASPECT_COLOR_BIT,
.aspectMask = self.aspect,
.baseMipLevel = 0,
.levelCount = 1,
.baseArrayLayer = 0,

View File

@ -83,20 +83,26 @@ pub fn Buffer(comptime T: type) type {
return self;
}
/// Hand the (VkBuffer, VkDeviceMemory) pair back to the
/// process-wide pool. The pool (see `Vulkan.buffer_pool`)
/// holds the entry until the current frame's fence has
/// signaled (the GPU is done with our recorded references)
/// and then makes it available to a future `Buffer.create`
/// call. Returning to the pool solves both:
/// - `renderer/image.zig:draw`'s `defer buf.deinit()` no
/// longer use-after-frees the in-flight buffer.
/// - It avoids the per-frame allocation thrash that
/// drove the driver to SIGSEGV on image-heavy frames.
///
/// MUST be called only from the renderer thread (the path
/// whose fence will eventually retire references to this
/// buffer in `Frame.complete`). One-shot uploads (atlas
/// staging buffers, etc.) that already block on
/// `vkQueueWaitIdle` post-submit must use
/// `destroyImmediate` instead they don't share the
/// renderer thread's fence cycle.
pub fn deinit(self: Self) void {
const dev = self.opts.device;
// Hand the (VkBuffer, VkDeviceMemory) pair back to the
// process-wide pool instead of destroying it. The pool
// (see `Vulkan.buffer_pool`) holds the entry until the
// current frame's fence has signaled (the GPU is done
// with our recorded references) and then makes it
// available to a future `Buffer.create` call. Returning
// to the pool solves BOTH:
// - `renderer/image.zig:draw`'s `defer buf.deinit()`
// no longer use-after-frees the in-flight buffer.
// - It avoids the per-frame allocation thrash that
// drove the driver to SIGSEGV on image-heavy
// frames.
const bp = @import("../Vulkan.zig").buffer_pool;
const capacity_bytes: u64 = @as(u64, self.len) * @sizeOf(T);
bp.release(
@ -106,16 +112,35 @@ pub fn Buffer(comptime T: type) type {
self.opts.usage,
capacity_bytes,
) catch {
// OOM growing the pool fall back to immediate
// destroy. Logging here is awkward (no logger in
// scope) so we accept the loud failure and let
// Vulkan stderr diagnose any use-after-free that
// follows.
// OOM growing the pool. The buffer may still be
// referenced by an in-flight command buffer, so we
// wait the entire device idle before destroying
// expensive but correct. Logging here is awkward (no
// logger in scope) so we accept the loud failure and
// let Vulkan stderr diagnose anything that follows.
_ = dev.dispatch.deviceWaitIdle(dev.device);
dev.dispatch.destroyBuffer(dev.device, self.buffer, null);
dev.dispatch.freeMemory(dev.device, self.memory, null);
};
}
/// Destroy the buffer immediately, bypassing the recycle
/// pool. The caller MUST ensure no in-flight command buffer
/// references this buffer (e.g. by having waited on a fence
/// or `vkQueueWaitIdle` covering its submission).
///
/// Used by short-lived staging buffers like
/// `Texture.replaceRegion` whose lifetime is bounded by a
/// `OneShot.endAndSubmit` that already drains the queue;
/// stuffing those into the pool from a non-renderer thread
/// would leak them (the renderer thread's `cycle` runs the
/// pool, so an upload thread's pushes never get reused).
pub fn destroyImmediate(self: Self) void {
const dev = self.opts.device;
dev.dispatch.destroyBuffer(dev.device, self.buffer, null);
dev.dispatch.freeMemory(dev.device, self.memory, null);
}
/// Replace the buffer's contents. Grows (doubles) if needed
/// matches the OpenGL backend's behavior. Data shorter than
/// the current capacity leaves the trailing slots untouched.
@ -235,14 +260,30 @@ pub fn Buffer(comptime T: type) type {
};
}
/// Grow the buffer to hold at least `new_len` Ts. Destroys
/// and recreates the underlying VkBuffer (Vulkan buffers are
/// immutable in size). Contents are discarded callers
/// Grow the buffer to hold at least `new_len` Ts. Vulkan
/// buffers are immutable in size, so we route the old
/// buffer through the recycle pool (it may still be
/// referenced by the in-flight command buffer destroying
/// it directly would race the GPU same as `deinit` would)
/// and create a fresh one. Contents are discarded; callers
/// always `sync` immediately after `grow` returns.
fn grow(self: *Self, new_len: usize) Error!void {
const dev = self.opts.device;
dev.dispatch.destroyBuffer(dev.device, self.buffer, null);
dev.dispatch.freeMemory(dev.device, self.memory, null);
const bp = @import("../Vulkan.zig").buffer_pool;
const capacity_bytes: u64 = @as(u64, self.len) * @sizeOf(T);
bp.release(
dev,
self.buffer,
self.memory,
self.opts.usage,
capacity_bytes,
) catch {
// OOM appending to the pool wait the device idle
// and destroy directly. Same fallback as `deinit`.
_ = dev.dispatch.deviceWaitIdle(dev.device);
dev.dispatch.destroyBuffer(dev.device, self.buffer, null);
dev.dispatch.freeMemory(dev.device, self.memory, null);
};
const replacement = try create(self.opts, new_len);
self.* = replacement;
}

View File

@ -67,9 +67,20 @@ fn processIncludes(comptime contents: [:0]const u8) [:0]const u8 {
var i: usize = 0;
while (i < contents.len) {
if (std.mem.startsWith(u8, contents[i..], "#include")) {
std.debug.assert(std.mem.startsWith(u8, contents[i..], "#include \""));
const start = i + "#include \"".len;
const end = std.mem.indexOfScalarPos(u8, contents, start, '"').?;
// Skip whitespace (space or tab) between `#include` and
// the opening quote. The previous literal-prefix
// `startsWith("#include \"")` assert tripped on legal
// `#include\t""` and `#include ""` variants. Accept
// any horizontal whitespace and require exactly one
// double-quoted path.
var p = i + "#include".len;
while (p < contents.len and (contents[p] == ' ' or contents[p] == '\t')) : (p += 1) {}
if (p >= contents.len or contents[p] != '"') {
@compileError("processIncludes: malformed #include directive in shader");
}
const start = p + 1;
const end = std.mem.indexOfScalarPos(u8, contents, start, '"') orelse
@compileError("processIncludes: unterminated #include path");
return std.fmt.comptimePrint("{s}{s}{s}", .{
contents[0..i],
@embedFile("../shaders/glsl/" ++ contents[start..end]),
@ -178,6 +189,12 @@ pub fn vulkanizeGlsl(
var i: usize = 0;
while (i < src.len) {
// Skip comments + string literals verbatim anything
// that looks like an identifier inside one of those is
// not a real token, and rewriting it (e.g. a comment
// that mentions `gl_VertexID` or `texture(atlas_*, ...)`)
// would silently corrupt the shader source.
if (try copySkippable(alloc, &out, src, &i)) continue;
const c = src[i];
const is_ident_start = isIdentChar(c);
if (is_ident_start) {
@ -245,6 +262,11 @@ pub fn vulkanizeGlsl(
var i: usize = 0;
while (i < pass1.len) {
// Skip comments + string literals verbatim, same reason as
// pass 1 rewriting `layout(binding=)` text inside a
// comment would inject a `set =` qualifier into a comment
// that's never compiled, harmless today but a footgun.
if (try copySkippable(alloc, &out, pass1, &i)) continue;
if (matchKeyword(pass1, i, "layout")) |layout_end| {
// Skip whitespace between `layout` and `(`.
var p = layout_end;
@ -307,6 +329,52 @@ pub fn vulkanizeGlsl(
return try out.toOwnedSliceSentinel(alloc, 0);
}
/// If position `i` in `src` is the start of a GLSL line comment
/// (`//...\n`), block comment (`/* ... */`), or `"..."` string
/// literal, copy the whole token verbatim into `out`, advance
/// `*i` past it, and return true. Otherwise `*i` is unchanged
/// and we return false.
///
/// Strings are unusual in GLSL but `#extension` directives can
/// quote in some preprocessor flavors, and the safe thing is to
/// leave any quoted run untouched.
fn copySkippable(
alloc: std.mem.Allocator,
out: *std.ArrayList(u8),
src: []const u8,
i: *usize,
) std.mem.Allocator.Error!bool {
const start = i.*;
if (start >= src.len) return false;
if (start + 1 < src.len and src[start] == '/' and src[start + 1] == '/') {
var p = start;
while (p < src.len and src[p] != '\n') : (p += 1) {}
try out.appendSlice(alloc, src[start..p]);
i.* = p;
return true;
}
if (start + 1 < src.len and src[start] == '/' and src[start + 1] == '*') {
var p = start + 2;
while (p + 1 < src.len and !(src[p] == '*' and src[p + 1] == '/')) : (p += 1) {}
// Include the closing `*/` if found; otherwise consume to EOF.
const end = if (p + 1 < src.len) p + 2 else src.len;
try out.appendSlice(alloc, src[start..end]);
i.* = end;
return true;
}
if (src[start] == '"') {
var p = start + 1;
while (p < src.len and src[p] != '"') : (p += 1) {
if (src[p] == '\\' and p + 1 < src.len) p += 1;
}
const end = if (p < src.len) p + 1 else src.len;
try out.appendSlice(alloc, src[start..end]);
i.* = end;
return true;
}
return false;
}
fn isIdentChar(c: u8) bool {
return (c >= 'a' and c <= 'z') or
(c >= 'A' and c <= 'Z') or
@ -314,15 +382,6 @@ fn isIdentChar(c: u8) bool {
c == '_';
}
/// True if the first non-space, non-comment character at or after
/// position `i` in `src` is `(`. Used to recognize a function call
/// when the caller is positioned right after the identifier name.
fn nextNonSpaceIsOpenParen(src: []const u8, i: usize) bool {
var p = i;
while (p < src.len and isAnySpace(src[p])) : (p += 1) {}
return p < src.len and src[p] == '(';
}
/// Names of samplers we create with `unnormalized_coordinates =
/// VK_TRUE`. The shaders here all use only the two atlas samplers
/// for cell_text; if more get added (or renamed) update this list.
@ -457,9 +516,11 @@ pub const Module = struct {
return error.GlslangFailed;
};
const translated = vulkanizeGlsl(alloc, src) catch {
return error.GlslangFailed;
};
// vulkanizeGlsl returns `Allocator.Error` only surface it
// via `try` (Module.Error includes Allocator.Error) so OOM
// doesn't get reported as `error.GlslangFailed`. Conflating
// them masked the actual failure mode in earlier diagnostics.
const translated = try vulkanizeGlsl(alloc, src);
defer alloc.free(translated);
const c = glslang.c;
@ -671,14 +732,19 @@ const empty_pipeline: Pipeline = .{
/// `Shaders.deinit` walks the same set in reverse to destroy
/// pipelines, layouts, samplers, the descriptor pool, and modules.
pub const Shaders = struct {
/// Borrowed pointer to the host-owned VkDevice wrapper. Stored
/// so `deinit` can reach the device dispatch table without
/// reaching into an arbitrary module's `.device` field (which
/// would silently break if `Modules` is restructured). The
/// pointer outlives `Shaders` because the device is process-
/// global in `Vulkan.zig`.
device: *const Device,
pipelines: PipelineCollection,
/// One per user-supplied custom shader. Built by `Shaders.init`
/// from the `post_shaders` arg empty when no custom shaders.
/// Owned by `Shaders` (deinit destroys each).
/// Owned by `Shaders` (deinit destroys each + frees the slice
/// using the allocator passed to `deinit`).
post_pipelines: []Pipeline,
/// Allocator used to allocate `post_pipelines`; held so deinit
/// can free the slice.
post_alloc: ?Allocator = null,
/// Compiled `VkShaderModule`s for each user shader, parallel to
/// `post_pipelines`. Owned by `Shaders` (deinit destroys each).
post_modules: []Module = &.{},
@ -1188,12 +1254,17 @@ pub const Shaders = struct {
post_modules = try alloc.alloc(Module, post_shaders.len);
errdefer alloc.free(post_modules);
// Init counter so partial failures can deinit only what
// was built.
var built: usize = 0;
// Init counters so partial failures deinit exactly what
// was built. We track modules and pipelines separately
// because the inner loop creates a module first, then
// tries to build a pipeline against it if Pipeline.init
// fails after Module.initFromSpirv succeeded, the module
// is populated but the pipeline isn't.
var modules_built: usize = 0;
var pipelines_built: usize = 0;
errdefer {
for (post_pipelines[0..built]) |p| p.deinit();
for (post_modules[0..built]) |m| m.deinit();
for (post_pipelines[0..pipelines_built]) |p| p.deinit();
for (post_modules[0..modules_built]) |m| m.deinit();
}
// Shared descriptor set layouts across post pipelines.
@ -1224,6 +1295,7 @@ pub const Shaders = struct {
}
const spv_words: []const u32 = std.mem.bytesAsSlice(u32, @as([]align(@alignOf(u32)) const u8, @alignCast(spv_bytes)));
post_modules[i] = try Module.initFromSpirv(device, spv_words, .fragment);
modules_built = i + 1;
post_pipelines[i] = try Pipeline.init(.{
.device = device,
.descriptor_pool = &pool,
@ -1236,14 +1308,14 @@ pub const Shaders = struct {
.blending_enabled = false,
.topology = vk.VK_PRIMITIVE_TOPOLOGY_TRIANGLE_LIST,
});
built = i + 1;
pipelines_built = i + 1;
}
}
return .{
.device = device,
.pipelines = pipelines,
.post_pipelines = post_pipelines,
.post_alloc = if (post_shaders.len > 0) alloc else null,
.post_modules = post_modules,
.modules = modules,
.descriptor_pool = pool,
@ -1292,7 +1364,6 @@ pub const Shaders = struct {
}
pub fn deinit(self: *Shaders, alloc: Allocator) void {
_ = alloc;
if (self.defunct) return;
self.defunct = true;
@ -1314,10 +1385,12 @@ pub const Shaders = struct {
// pipeline first (holds VkPipelineLayout), then shader module.
for (self.post_pipelines) |p| p.deinit();
for (self.post_modules) |m| m.deinit();
if (self.post_alloc) |a| {
a.free(self.post_pipelines);
a.free(self.post_modules);
}
// The slices were allocated from the same allocator the
// caller hands to deinit (the renderer's `self.alloc`).
// Use it directly the previous `post_alloc` field was
// an extra source of truth for the same value.
if (self.post_pipelines.len > 0) alloc.free(self.post_pipelines);
if (self.post_modules.len > 0) alloc.free(self.post_modules);
// Atlas sampler held by `Shaders` for the cell_text pipeline's
// texture bindings.
@ -1331,7 +1404,7 @@ pub const Shaders = struct {
// Destroy every descriptor set layout we created. The empty
// placeholder is one of the entries.
const dev = self.modules.full_screen_vert.device;
const dev = self.device;
for (self.set_layouts[0..self.set_layouts_len]) |dsl| {
if (dsl != null) dev.dispatch.destroyDescriptorSetLayout(
dev.device,