Join Nostr
2026-02-08 23:00:43 UTC
in reply to

Ryan on Nostr: I fired up Ollama this weekend for the first time. So far, I’ve run two models; ...

I fired up Ollama this weekend for the first time. So far, I’ve run two models; phi4 and qwen3-coder-next on my MacBook Pro M4 Pro with 48 GB RAM. Phi4 was very quick and responsive. Qwen3 consumed all my RAM and the system was about 25 GB into swap. Surprisingly, while a bit sluggish, it kept going alright.

I also tried both models on a Proxmox cluster I run at work. None of the members have a GPU worth anything, but 3 of the 5 members have 768 GB RAM so I fired up a Linux container running Ubuntu 24.04 and gave it 384 GB RAM. The member I ran it on has 48 hyperthreaded cores (shows as 96 cores) on it. Ollama is smart enough to pay attention to NUMA boundaries so it ran everything on the same CPU socket which was 48 cores including the hyperthreaded cores. While it maxed out usage of cores on the one socket while responding to my prompts, it wasn’t unbearable to use. Phi4 was definitely faster of the two, and it was almost as fast as the same model running on my MBP which had the advantage of GPU. Qwen3 was definitely slower, but it wasn’t like I couldn’t use it.

It’s by no means a Claude Sonnet replacement, but if I can take something like Phi4 and update its training with more recent info, this might be a decent model to run in-house for coding tasks, or other administrative things.

It’s a fascinating time to be alive.