Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes, and plenty of others do too. Quantizied. Join us at r/localllama

My largest models

   318G    /llmzoo/models/Qwen3.5-397B
   377G    DeepSeekv3.2-nolight
   380G    /llmzoo/models/DeepSeek-V3.2-UD
   400G    /llmzoo/models/Qwen3.5-397B-Q8
   443G    DeepSeek-Math-v2
   443G    DeepSeek-V3-0324-Q5
   522G    /llmzoo/models/GLM5.1
   545G    /llmzoo/models/kimi2.6
   546G    /llmzoo/models/KimiK2.5


Is your house's heating system based on H100s?


What hardware do you use?


I think the answer to this is:"yes"


Most of those have custom quants for Mac Studio M3 Ultra 512GB. You'll typically see them mention it by name.

All of that list but the last three run at these sizes. For last three, look for a custom quant, e.g. 9.5 bits and/or the Ultra M3 512GB mention.

Not sure which direction I'm surprised but Macbook Pro M5 Max ticks over models at the same speed. With "only" 128GB look for models of 116 GB (the absolute max that retains reasonable stability) or less.


a Beowulf cluster of 256 x Raspberry Pi 3.


I used to maintain a 2000 pi 4 cluster, before LLMs were relevant, with around 6gb free ram per node. I wonder what I could have done with something like this.


All of it.


even quantised, those are HUGE




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: