Hosting a 7B model is completely different than hosting a 150B+ model. I thought... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		sillysaurusx on Feb 26, 2024 \| parent \| context \| favorite \| on: Mistral Large Hosting a 7B model is completely different than hosting a 150B+ model. I thought this would be obvious, but I should have been explicit.

declaredapple on Feb 27, 2024 [–]

It's not really. And 8x7B is not a 7B model, it's a MoE that's closer to 60B that has to be kept in memory, and uses 2 experts per token so it runs at 15B speeds.

All of the current frameworks support MoE and sharding among GPUs so I don't see what the issue is.

Consider applying for YC's Summer 2026 batch! Applications are open till May 4
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact