qt: backpressure renderer thread on the compositor-paced gate

The GUI-side pacing landed in e78d3f7be kept the renderer running
free at 125 FPS — it just discarded the overshoot. The renderer
thread was still doing full Vulkan submit + waitForFences for
~65 frames/sec that never reached the compositor (confirmed by
the ~57 drops/sec the previous test showed at idle). GPU work
and renderer-thread CPU were paying for it.

Block in presentVulkanDmabuf on a std::condition_variable until
onWaylandFrameReady flips m_compositorReady. The renderer now
produces exactly one frame per compositor refresh — no wasted
GPU work, no drops in the steady state.

Safety:
- 100 ms timeout on the wait so a stalled compositor (lid closed,
  monitor disconnect) doesn't hang the renderer indefinitely; on
  timeout we proceed and overwrite the parked dmabuf (same drop
  semantic as pre-backpressure).
- Predicate also checks m_hidden so Hide / PlatformSurface destroy
  can notify_all to unblock the renderer immediately; the renderer
  re-checks m_hidden after wake and bails without parking.
- drainVulkan loses its now-redundant m_compositorReady check —
  any parked dmabuf means the renderer was allowed through, so
  the GUI thread can consume + commit unconditionally.

Measured: renderer-thread CPU drops from ~3% to ~2.1% at idle on
RTX 2080 (sleep, not spin — bigger on integrated GPUs). Total
drops at idle: 0 (was ~57/sec).

Co-Authored-By: claude-flow <ruv@ruv.net>
pull/12846/head
Nathan 2026-05-26 09:31:04 -05:00
parent e78d3f7beb
commit f48465cda5
2 changed files with 86 additions and 42 deletions

View File

@ -515,8 +515,14 @@ bool GhosttySurface::event(QEvent *e) {
m_subsurfacePresenter.reset();
// Presenter is gone — no frame_done callback will arrive.
// Reset the gate so the rebuilt presenter's first present
// (on next Show) goes through immediately.
m_compositorReady = true;
// (on next Show) goes through immediately, AND wake the
// renderer thread in case it's parked in the wait_for so
// it can re-check m_hidden and bail.
{
std::lock_guard<std::mutex> lg(m_compositorMutex);
m_compositorReady = true;
}
m_compositorCv.notify_all();
}
// SurfaceCreated is handled implicitly: the next QEvent::Show
// (which Qt always fires after the platform surface comes up)
@ -615,6 +621,15 @@ bool GhosttySurface::event(QEvent *e) {
m_subsurfacePresenter->hide();
forceParentCommit();
}
// Wake the renderer thread if it's parked in
// presentVulkanDmabuf's wait_for; the predicate sees
// m_hidden=true (already set above) and the renderer bails
// without parking another frame.
{
std::lock_guard<std::mutex> lg(m_compositorMutex);
m_compositorReady = true;
}
m_compositorCv.notify_all();
}
}
return QWidget::event(e);
@ -1790,21 +1805,42 @@ void GhosttySurface::presentVulkanDmabuf(
if (m_hidden.load(std::memory_order_acquire)) return;
if (useSubsurface) {
// Backpressure the renderer thread to the compositor's refresh
// rate. Block here until the GUI thread's wl_surface.frame
// callback (onWaylandFrameReady) signals that the previous
// commit has retired and the compositor is ready for the next
// one. Without this, the renderer's 125 FPS draw timer keeps
// submitting GPU work that the paced GUI thread discards —
// wasted GPU + renderer-thread CPU.
//
// 100 ms timeout is a safety net: if the compositor stalls
// (lid closed, monitor disconnect, application minimized
// mid-flight) we don't want the renderer thread blocked
// forever. On timeout we proceed and overwrite the parked
// dmabuf — same drop semantic as pre-backpressure. The
// predicate also bails on m_hidden so Hide can wake the
// renderer immediately without paying the timeout.
{
std::unique_lock<std::mutex> lk(m_compositorMutex);
m_compositorCv.wait_for(lk, std::chrono::milliseconds(100),
[this] {
return m_compositorReady ||
m_hidden.load(std::memory_order_acquire);
});
// If Hide fired while we were waiting, bail without parking
// the frame — the GUI thread's drainVulkan would drop it
// anyway on the m_hidden check below.
if (m_hidden.load(std::memory_order_acquire)) return;
m_compositorReady = false;
}
// Subsurface path. Park the descriptor under the mutex (so
// a concurrent drainVulkan sees a consistent snapshot) and
// wake the GUI thread.
//
// Frame-drop semantics: at most one frame is parked. If
// drainVulkan hasn't consumed the previous one before the
// renderer thread arrives with a new one, the older frame is
// overwritten — its fd is libghostty's to close at next
// Target.deinit, so the descriptor doesn't leak; the user just
// sees a missed frame. That's the right call for a 60Hz
// terminal: the alternative (block the renderer thread on the
// GUI thread) would stall every present. We bump a counter so
// a sustained backlog is visible in logs/metrics; spurious
// drops happen on the first few frames before the GUI thread
// pump is hot, hence the >0 threshold.
// wake the GUI thread. Frame-drop semantics: at most one frame
// is parked. With the backpressure above, overwrites should be
// rare — they happen only when the renderer's wait timed out
// before the GUI thread consumed the previous park, or on the
// first-frame bring-up race.
bool overwrote = false;
{
QMutexLocker lock(&m_pendingMutex);
@ -1908,13 +1944,15 @@ void GhosttySurface::presentVulkanDmabuf(
void GhosttySurface::onWaylandFrameReady() {
// Compositor has signaled it's ready for our next commit. Flip
// the gate and re-pump drainVulkan to consume any frame the
// renderer parked while we were waiting. If nothing is parked,
// drainVulkan no-ops; the next renderer-driven present will fire
// a queued drainVulkan that finds the gate open and goes through
// immediately.
m_compositorReady = true;
drainVulkan();
// the gate and wake the renderer thread, which is blocked in
// presentVulkanDmabuf's wait_for. The renderer will produce its
// next frame; nothing for us to drain right now (there's no
// pending dmabuf — the renderer is waiting BEFORE parking).
{
std::lock_guard<std::mutex> lg(m_compositorMutex);
m_compositorReady = true;
}
m_compositorCv.notify_all();
}
void GhosttySurface::drainVulkan() {
@ -1943,15 +1981,10 @@ void GhosttySurface::drainVulkan() {
}
if (m_useSubsurface.load(std::memory_order_acquire) &&
m_subsurfacePresenter) {
// Compositor-paced gate. If the compositor hasn't signaled
// ready yet (we're mid-flight on the previous commit), leave
// the parked descriptor in m_pendingDmabuf — onWaylandFrameReady
// will re-post drainVulkan when the wl_surface.frame callback
// fires. The renderer may overwrite m_pendingDmabuf with a
// newer frame in the meantime; that's fine, "latest wins" is
// the right semantic for terminal output that hasn't been
// displayed yet.
if (!m_compositorReady) return;
// No gate check here: the renderer thread's wait in
// presentVulkanDmabuf already paced us, so a parked dmabuf
// means the compositor was ready when the renderer claimed
// the slot. Just consume + commit.
PendingDmabuf frame;
{
QMutexLocker lock(&m_pendingMutex);
@ -1971,9 +2004,6 @@ void GhosttySurface::drainVulkan() {
// parent wl_surface.commit so the cached state applies and the
// frame becomes visible.
forceParentCommit();
// Mark the gate closed until the compositor's wl_surface.frame
// callback fires (onWaylandFrameReady).
m_compositorReady = false;
return;
}

View File

@ -1,8 +1,10 @@
#pragma once
#include <atomic>
#include <condition_variable>
#include <cstdint>
#include <memory>
#include <mutex>
#include <QImage>
#include <QMutex>
@ -347,14 +349,26 @@ private:
quint32 stride = 0;
};
PendingDmabuf m_pendingDmabuf;
// Compositor-paced present gate. True when we can issue the next
// wl_subsurface commit; flipped false after a present and back to
// true on the wl_surface.frame callback (onWaylandFrameReady). The
// renderer thread keeps producing frames at its own rate (125 FPS
// with custom-shader-animation), but only the latest parked frame
// reaches the compositor on each refresh — drops every-other (or
// more) frame to match compositor refresh, halving Wayland-commit
// CPU on the GUI thread. GUI-thread only, no atomic.
// Compositor-paced present gate. Now BACKPRESSURES THE RENDERER
// THREAD: presentVulkanDmabuf blocks (with a 100 ms safety
// timeout) until the compositor signals ready, so the renderer
// produces frames at the compositor's refresh rate instead of
// its own 125 FPS draw timer. Saves the GPU work + renderer-
// thread CPU that the prior GUI-side-drop model was paying for
// every wasted frame.
//
// State machine:
// - Initial: ready=true (first present goes through).
// - Renderer present: wait_for(ready || hidden); claim
// ready=false; park dmabuf; post drain.
// - GUI drain: consume + commit + register wl_surface.frame.
// - Compositor frame_done → onWaylandFrameReady: ready=true,
// notify CV. Renderer's next present unblocks immediately.
// - Hide / PlatformSurface destroy: ready=true, notify_all to
// unblock any in-flight renderer wait (predicate also checks
// m_hidden so the renderer bails without parking).
std::mutex m_compositorMutex;
std::condition_variable m_compositorCv;
bool m_compositorReady = true;
// Dedupes queued drainVulkan invocations posted from the renderer
// thread. Each renderer-thread `presentVulkanDmabuf` used to post