Andrew Zonenberg on Nostr: So, I have an answer to my previous question about GPU transfer efficiency. Original ...
So, I have an answer to my previous question about GPU transfer efficiency.
Original code: write data to staging buffer on CPU, vkCopyBuffer to GPU local memory, run int-float32 conversion on GPU out of that buffer. The copy operation shows 50% SM occupancy by compute warps, 50% unallocated warp slots in active SMs.
GPU memory write bandwidth is sitting around 2%, about 1.9 ms copy/shader run time.
Published at
2026-02-02 16:01:42 UTCEvent JSON
{
"id": "ec870c211cea9b774e614fc6fdd53ed8fd7be0005561e199baa4c927def718f0",
"pubkey": "c35a8fae05a838b863707803bb1fdfcfcbb39adcf01ee1a9fc52ef44998ce4c0",
"created_at": 1770048102,
"kind": 1,
"tags": [
[
"imeta",
"url https://files.ioc.exchange/media_attachments/files/116/001/860/778/213/806/original/22e18aeb99fb6085.png",
"m image/png",
"dim 3840x2127",
"blurhash U89G]pIpM{RP~qRjRjV[JAsmoJX8Rlofj[kB"
],
[
"proxy",
"https://ioc.exchange/users/azonenberg/statuses/116001872441673757",
"activitypub"
],
[
"client",
"Mostr",
"31990:6be38f8c63df7dbf84db7ec4a6e6fbbd8d19dca3b980efad18585c46f04b26f9:mostr",
"wss://relay.ditto.pub"
]
],
"content": "So, I have an answer to my previous question about GPU transfer efficiency.\n\nOriginal code: write data to staging buffer on CPU, vkCopyBuffer to GPU local memory, run int-float32 conversion on GPU out of that buffer. The copy operation shows 50% SM occupancy by compute warps, 50% unallocated warp slots in active SMs.\n\nGPU memory write bandwidth is sitting around 2%, about 1.9 ms copy/shader run time.\n\nhttps://files.ioc.exchange/media_attachments/files/116/001/860/778/213/806/original/22e18aeb99fb6085.png",
"sig": "334c97cb54ebd639e9b7ce6e0843cc1fd82602f1f640364506b652d3f08dc42a23776fab1f3408ed2dcfc957160dad3d2e3f01029984c88d02ede04793608359"
}