Join Nostr
2026-01-30 08:15:27 UTC
in reply to

deadmanoz on Nostr: “The 1.8-bit (UD-TQ1_0) quant will run on a single 24GB GPU if you offload all MoE ...

“The 1.8-bit (UD-TQ1_0) quant will run on a single 24GB GPU if you offload all MoE layers to system RAM (or a fast SSD). With ~256GB RAM, expect ~10 tokens/s. The full Kimi K2.5 model is 630GB and typically requires at least 4× H200 GPUs.

If the model fits, you will get >40 tokens/s when using a B200.”

https://unsloth.ai/docs/models/kimi-k2.5