Hacker News new | past | comments | ask | show | jobs | submit login

Be a lot cooler if you said what laptop, and how much quantisation you're assuming :)



They're probably referring to the new MacBook Pros with up to 128GB of unified memory.


Sibling commenter tvararu is correct. 2023 Apple Macbook with 128GiB RAM, all available to the GPU. No quantisation required :)

Other sibling commenter refulgentis is correct too. The Apple M{1-3} Max chips have up to 400GB/s memory bandwidth. I think that's noticably faster than every other consumer CPU out there. But it's slower than a top Nvidia GPU. If the entire 96GB model has to be read by the GPU for each token, that will limit unquantised performance to 4 tokens/s at best. However, as the "Mixtral" model under discussion is a mixture-of-experts, it doesn't have to read the whole model for each token, so it might go faster. Perhaps still single-digit tokens/s though, for unquantised.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: