Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> The question is if a llm will run with usable performance at that scale?

For the self-attention mechanism, memory bandwidth requirements scale ~quadratically with the sequence length.



Someone has got to be working on a better method than that. Hundreds of billions are at stake.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: