While it does seem that AVX10 was mainly designed for consumer CPUs so they could use modern vector instructions without 512-bit vectors, the upcoming Arrow Lake will not have it.[1]
I guess we will have to wait for at least one more generation.
Not only Arrow Lake does not have AVX10, but even Panther Lake, the 2025/2026 Intel CPU does not have it.
Panther Lake will introduce FRED (Flexible Return and Event Delivery) a new manner of handling interrupts, exceptions and system calls.
FRED will bring tremendous changes to the operating system kernels, but it will have little influence on user programs, except that the computer will spend less time running OS kernel code than now.
For now it is expected that Intel will introduce AVX10 in its consumer CPUs only in Nova Lake, the Intel 2026/2027 CPU.
Meanwhile, AMD Zen 4 and Zen 5 are already happily supporting AVX10, except for implementing the CPUID AVX10 flags. AVX10.1 differs from AVX-512 only by adding a simpler method for identifying which instructions are supported. AVX10.2 will add only some instructions that are not needed on the CPUs that support the 512-bit AVX-512 instructions, like Zen 4 and Zen 5. AVX10.3 has not been defined yet and it is far in the future.
Intel is making Xeons out of E-cores (up to 288 of them on one chip) so I assume those will also be motivating the rollout of AVX10, not just their consumer parts.
Another concern besides register file size is shuffle instructions, which can transfer any byte of a 512-bit register to any other (or any byte across two such registers for another instruction variant (vpermt2b), i.e. selecting from 128 bytes, and doing 64 such selections in one instruction).
You can't emulate that via just two regular 256-bit uops, you need four (maybe more for blending the results together). And if you don't have the two-register table 256-bit variant (e.g. Tiger Lake doesn't, though for 512-bit of course; it splits it into three uops), that'd end up at a rather massive 12 uops.
I think Intels E-cores are quite a bit smaller than the Zen 4c/5c cores, maybe at that scale it's prohibitive to even double up the register file? That's required even if the logic is double-pumped. AIUI the small Zen cores are mostly the same design as the big ones, just with less cache, silicon layout retuned for density rather than speed, and the removal of the 3D Cache stacking vias, while Intels small cores are clean-sheet designs with next to nothing in common with their big cores so they have to opportunity to shrink them a lot more.
Yes, while the big Intel cores are much bigger than the big AMD cores (e.g. 5 square mm in Meteor Lake vs. 3.8 square mm for Zen 4) the Intel small cores are much smaller than the AMD compact cores (e.g. 1.5 square mm in Meteor Lake vs. 2.5 square mm for Zen 4c).
The smaller size of the Intel E-cores is not only due to their different microarchitecture, but also because only their L1 cache memories are non-shared, while their L2 cache memories are shared within groups of 4 E-cores.
The shared L2 cache may not matter much for many general-purpose programs, but for other multi-threaded programs, which depend on having a great total throughput for the transfers with the L2 cache, the performance of each group of 4 E-cores becomes similar to that of a single core, instead of being 4 times greater.
The AMD compact cores have the same non-shared cache memories as the big cores. Only the shared L3 cache blocks that service a group of compact cores are smaller than for the same number of big cores.
My non-expert brain immediately jumped to double-pumping + maybe working with their thread director to have tasks using a lot of AVX512 instructions prefer P cores more. It feels like such an obvious solution to a really dumb problem that I assumed there was something simple I was missing.
The register file size makes sense, I didn't think they were that much of the die on those processors but I guess they had to be pretty aggressive to meet power goals?
Well, they don't support it either. According to the document I linked, neither the just-released Sierra Forest, nor the planned Clearwater Forest support AVX10.
AVX10 is still pretty much in the proposal phase, and has been recently updated based on feedback Intel has received. It takes several years to get from that stage to shipping hardware.
Granite Rapids, to be launched in a few months, is said by Intel to support AVX10.1/512 (which is identical to the ISA supported by Zen 5, except for a few additional flags reported by CPUID; Zen 4 lacks only VP2INTERSECT of AVX10.1).
Only the availability of AVX10/256 in Intel's consumer CPUs and in its server CPUs with E-cores is in the proposal phase (mainly because Intel has yet to design and launch, as the successor of Skymont that is being launched now, an E-core supporting AVX10/256; this is expected only in H2 2026).
I guess we will have to wait for at least one more generation.
[1] - According to Intel® Architecture Instruction Set Extensions Programming Reference: https://cdrdv2-public.intel.com/826290/architecture-instruct...