Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> […] but how much can your code actually use?

All of it, and it is transparent to the code. The correct question is «how much data does the code transfer?»

Whether you are scanning large string ropes for a lone character or multiplying huge matrices, no manual code optimisation is required.





Have you tested it or is that just what you expect?

Are you well-read enough into the platform so that you can attest to it requiring no manual code optimisation for high-performance datapaths? I'm only familiar with Apple Silicon-specific code in llama.cpp, and not really familiar with either Accelerate[0] or MLX[1] specifically. Have they really cracked it at homogenous computing so that you could use a single description of computation, and have it emit efficient code for whatever target in the SoC? Or are you merely referring to the full memory capacity/bandwidth being available to CPU in normal operation?

[0]: https://developer.apple.com/documentation/accelerate

[1]: https://ml-explore.github.io/mlx/build/html/usage/quick_star...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: