Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yeah, agreed. I'll say I do use the M3 Max for Baldur's gate :).

On LLMs, the issue is largely that memory bandwidth: M2 Ultra is 800GB/s, M3 Max is 400GB/s. Inference on larger models are simple math on what's in memory, so the performance is roughly double. Probably perf / watt suffers a little, but when you're trying to chew through 128GB of RAM and do math on all of it, you're generally maxing your thermal budget.

Also, note that it's absolutely incredible how cheap it is to run a model on an M2 Ultra vs an H100 -- Apple's integrated system memory makes a lot possible at much lower price points.



Ahh right, I'd seen a few comments about the memory bandwidth when it was posted on LinkedIn, specifically that the M2 was much more powerful.

This makes a load of sense, thanks for explaining.


I've been considering buying a Mac specifically for LLMs, and I've come across a lot of info/misinfo on the topic of bandwidth. I see you are talking about M2 bandwidth issues that you read about on linkedin, so I wanted to expand upon that in case there is any confusion on your part or someone else who is following this comment chain.

M2 Ultra at 800 GB/s is for the mac studio only. So it's not quite apples to apples when comparing against the M3 which is currently only offered for macbooks.

M2 Max has bandwidth at 400 GB/s. This is a better comparison to the current M3 macbook line. I believe it tops out at 96GB of memory.

M3 Max has a bandwidth of either 300 GB/s or 400 GB/s depending on the cpu/gpu you choose. There is a lower line cpu/gpu w/ a max memory size of 96GB, this has a bandwidth of 300 GB/s. There is a top of the line cpu/gpu with a max memory size of 128GB, this has the same bandwidth as the previous M2 chip at 400 GB/s.

The different bandwidths depending on the M3 max configuration chosen has led to a lot of confusion on this topic, and some criticism for the complexity of trade offs for the most recent generation of macbook (number of efficiency/performance cores being another source of criticism).

Sorry if this was already clear to you, just thought it might be helpful to you or others reading the thread who have had similar questions :)


Worth noting that when AnandTech did their initial M1 Max review, they never were able to achieve full 400GB/s memory bandwidth saturation, the max they saw when engaging all CPU/GPU cores was 243GB/s - https://www.anandtech.com/show/17024/apple-m1-max-performanc....

I have not seen the equivalent comparisons with M[2-3] Max.


Interesting! There are anecdotal reports here and there on local llama about real world performance, but yeah I'm just reporting what Apple advertises for those devices on their spec sheet


All this sounds right!

If money is no object, and you don't need a laptop, and you want a suggestion, then I'd say the M2 Ultra / Studio is the way to go. If money is still no object and you need a laptop, M3 with maxed RAM.

I have a 300GB/s M3 and a 400 GB/s M1 with more RAM, and generally the LLM difference is minimal; the extra RAM is helpful though.

If you want to try some stuff out, and don't anticipate running an LLM more than 10 hours a week, lambda labs or together.ai will save you a lot of money. :)


The tech geek in me really wants to get a studio with an M2 ultra just for the cool factor, but yeah I think cost effectiveness wise it makes more sense to rent something in the cloud for now.

Things are moving so quickly with local llms too it's hard to say what the ideal hardware setup will be 6 months from now, so locking into a platform might not be the best idea.


H100 is kind of a poor comparison. There are much cheaper ways to get to decent memory without that. Such as 2 A6000s.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: