New version: directly map CPU side staging buffer and run the shader against that ...

2026-02-02 16:05:30 UTC

New version: directly map CPU side staging buffer and run the shader against that buffer without moving to local memory first.

This version has 98% SM occupancy and takes just over 1ms per buffer, GPU memory write bandwidth is
7%.

Overall application performance is higher (my benchmark went from 7.7-9.0 to 8.6-9.2 WFM/s) with less jitter due to contention between waveform download and the filter graph. It also saved around 100 MB of VRAM that had been used fr the staging buffers.

Seems like a pretty clear win all around, and I'll probably want to do similar optimizations elsewhere on other shaders that have read-once / write-once buffers.

Author Public Key

npub1cddglts94qutscms0qpmk87lel9m8xku7q0wr20u2th5fxvvunqqxz9vpd

Seen on

wss://relay.ditto.pub

Show more details

Andrew Zonenberg on Nostr: New version: directly map CPU side staging buffer and run the shader against that ...