Join Nostr
2026-02-02 16:05:30 UTC
in reply to

Andrew Zonenberg on Nostr: New version: directly map CPU side staging buffer and run the shader against that ...

New version: directly map CPU side staging buffer and run the shader against that buffer without moving to local memory first.

This version has 98% SM occupancy and takes just over 1ms per buffer, GPU memory write bandwidth is
7%.

Overall application performance is higher (my benchmark went from 7.7-9.0 to 8.6-9.2 WFM/s) with less jitter due to contention between waveform download and the filter graph. It also saved around 100 MB of VRAM that had been used fr the staging buffers.

Seems like a pretty clear win all around, and I'll probably want to do similar optimizations elsewhere on other shaders that have read-once / write-once buffers.