I fired up Ollama this weekend for the first time. So far, I’ve run two models; ...

Why Nostr? What is Njump? Join Nostr

Ryan / Ryan Wilkins

npub1ej…qyu4g

2026-02-08 23:00:43 UTC

in reply to nevent1q…vyye

I fired up Ollama this weekend for the first time. So far, I’ve run two models; phi4 and qwen3-coder-next on my MacBook Pro M4 Pro with 48 GB RAM. Phi4 was very quick and responsive. Qwen3 consumed all my RAM and the system was about 25 GB into swap. Surprisingly, while a bit sluggish, it kept going alright.

I also tried both models on a Proxmox cluster I run at work. None of the members have a GPU worth anything, but 3 of the 5 members have 768 GB RAM so I fired up a Linux container running Ubuntu 24.04 and gave it 384 GB RAM. The member I ran it on has 48 hyperthreaded cores (shows as 96 cores) on it. Ollama is smart enough to pay attention to NUMA boundaries so it ran everything on the same CPU socket which was 48 cores including the hyperthreaded cores. While it maxed out usage of cores on the one socket while responding to my prompts, it wasn’t unbearable to use. Phi4 was definitely faster of the two, and it was almost as fast as the same model running on my MBP which had the advantage of GPU. Qwen3 was definitely slower, but it wasn’t like I couldn’t use it.

It’s by no means a Claude Sonnet replacement, but if I can take something like Phi4 and update its training with more recent info, this might be a decent model to run in-house for coding tasks, or other administrative things.

It’s a fascinating time to be alive.

Author Public Key

npub1ejt4286v3q7f6tcz0k2lwm44q3ygvk6lza9dp3rrdrad03arercsfqyu4g

Seen on

Show more details

Published at

2026-02-08 23:00:43 UTC

Kind type

1 Short Text Note

Event JSON

{ "id": "34aa4489a780a97b322f77c6a50a2faa28b973d8885790ac221075c64839e8d9", "pubkey": "cc97551f4c883c9d2f027d95f76eb50448865b5f174ad0c46368fad7c7a3c8f1", "created_at": 1770591643, "kind": 1, "tags": [ [ "e", "cf7a80096908a33061fef3ffe3299fa2dcd821443f3b3eac16e33899ddc386e6", "", "root" ], [ "p", "9267545d2917b80f707ffdb44a8ff979182568ef7baa04ee756b1f01d4e3688a", "", "mention" ] ], "content": "I fired up Ollama this weekend for the first time. So far, I’ve run two models; phi4 and qwen3-coder-next on my MacBook Pro M4 Pro with 48 GB RAM. Phi4 was very quick and responsive. Qwen3 consumed all my RAM and the system was about 25 GB into swap. Surprisingly, while a bit sluggish, it kept going alright.\n\nI also tried both models on a Proxmox cluster I run at work. None of the members have a GPU worth anything, but 3 of the 5 members have 768 GB RAM so I fired up a Linux container running Ubuntu 24.04 and gave it 384 GB RAM. The member I ran it on has 48 hyperthreaded cores (shows as 96 cores) on it. Ollama is smart enough to pay attention to NUMA boundaries so it ran everything on the same CPU socket which was 48 cores including the hyperthreaded cores. While it maxed out usage of cores on the one socket while responding to my prompts, it wasn’t unbearable to use. Phi4 was definitely faster of the two, and it was almost as fast as the same model running on my MBP which had the advantage of GPU. Qwen3 was definitely slower, but it wasn’t like I couldn’t use it.\n\nIt’s by no means a Claude Sonnet replacement, but if I can take something like Phi4 and update its training with more recent info, this might be a decent model to run in-house for coding tasks, or other administrative things.\n\nIt’s a fascinating time to be alive.", "sig": "78c9ff56ea0f5f0fba595c032eee5bd145ed21f0e04bbd59209a4c5c2c65ee40b993b7c9a224d2778d0f2b3f42fdd44764c115b11399a8358f7bf23b25c3acb1" }