Then it's good that the A13/A14/M1 have a neural inference engine, the latter fe...

touisteur · on Dec 29, 2020

We're talking about INT4/7 or bfloat TOPS, right? And if similar to other neural inference engines, vpus, tpus, etc. it's probably off except for heavy duty stuff and slow to power up again? Whilst repowering a matmul in-cpu block might be faster?

I don't see the need for an effort such as a matmul-dedicated instruction-set elsewhere? What's your guess?