Andrew Zonenberg on Nostr: Another evening, another few filters getting OOM speedups. Tonight it was invert ...
Another evening, another few filters getting OOM speedups.
Tonight it was invert (27.3x, trivial memory bound shader that just outputs negative x[i] for each output sample) and the 8B/10B decode (didn't even bother to GPU it, just removing the redundant sampling operation by using the new CDR recovered-data output was enough for a 12.1x speedup and 6ms on my 50M point benchmark is fast enough I'm in no hurry to GPU... but at faster data rates it may still be worth it)
Published at
2026-02-01 05:45:40 UTCEvent JSON
{
"id": "0f7da92a020e66738707c8c8405e32ca15e00445c953c49c5e9c29c8efabc525",
"pubkey": "c35a8fae05a838b863707803bb1fdfcfcbb39adcf01ee1a9fc52ef44998ce4c0",
"created_at": 1769924740,
"kind": 1,
"tags": [
[
"proxy",
"https://ioc.exchange/users/azonenberg/statuses/115993787794704825",
"activitypub"
],
[
"client",
"Mostr",
"31990:6be38f8c63df7dbf84db7ec4a6e6fbbd8d19dca3b980efad18585c46f04b26f9:mostr",
"wss://relay.ditto.pub"
]
],
"content": "Another evening, another few filters getting OOM speedups.\n\nTonight it was invert (27.3x, trivial memory bound shader that just outputs negative x[i] for each output sample) and the 8B/10B decode (didn't even bother to GPU it, just removing the redundant sampling operation by using the new CDR recovered-data output was enough for a 12.1x speedup and 6ms on my 50M point benchmark is fast enough I'm in no hurry to GPU... but at faster data rates it may still be worth it)",
"sig": "ef34169762e2a6b716415ebeb6934f0c07e09352f77821e54ccbafef74919cc76ee2d149cc18026cfcca2ca5c48d33564c973a6f15167a0e9579aa91328d8053"
}