Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Do we know if is it slower because of hardware is not as well suited for the task or is it mostly a software issue -- the code hasn't been optimized to run on Apple Silicon?


AFAICT the neural engine has accelerators for CNNs and integer math, but not the exact tensor operations in popular LLM transformer architectures that are well-supported in GPUs.


The neural engine is perfectly capable of accelerating matmults. It's just that autoregressive decoding in single batch LLM inference is memory bandwidth constrained, so there are no performance benefits to using the ANE for LLM inference (although, there's a huge power efficiency benefit). And the only way to use the neural engine is via CoreML. Using the GPU with MLX or MPS is often easier.


I have to assume they’re doing something like that in the lab for 4 years from now.


Memory bandwidth is the issue




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: