> […] but how much can your code actually use? All of it, and it is transparent ...

Rohansi · 2025-09-15T20:00:44 1757966444

Have you tested it or is that just what you expect?

tucnak · 2025-09-15T13:40:04 1757943604

Are you well-read enough into the platform so that you can attest to it requiring no manual code optimisation for high-performance datapaths? I'm only familiar with Apple Silicon-specific code in llama.cpp, and not really familiar with either Accelerate[0] or MLX[1] specifically. Have they really cracked it at homogenous computing so that you could use a single description of computation, and have it emit efficient code for whatever target in the SoC? Or are you merely referring to the full memory capacity/bandwidth being available to CPU in normal operation?

[0]: https://developer.apple.com/documentation/accelerate

[1]: https://ml-explore.github.io/mlx/build/html/usage/quick_star...