> Is it the the larger random cache writes that are causing the additional laten...

> Is it the the larger random cache writes that are causing the additional latency?

Think of the MESI model.

If Core#0 controls memory location #500 (Exclusive state), and then Core#32 wants to write to memory location #500 (also requires Exclusive state), how do you coordinate this?

The steps are as follows:

#1: Core#0 flushes the write buffer, L1 cache, and L2 cache so that the L3 cache & memory location #500 is fully updated.

#2: Memory location #500 is pushed out from Core#0 L3 cache and pushed into Core#32 L3 cache. (Core#0 sets Location#500 to "Invalid", which allows Core#32 to set Location#500 to Exclusive).

#3: Core#32 L3 cache then transfers the data to L2, L1, and finally is able to be read by core#32.

--------

EDIT: Step #1 is missing when you read from DDR4 RAM. So DDR4 RAM reads under the Zen and Zen2 architecture are faster than remote L3 reads. An interesting quirk for sure.

In practice, Zen / Zen2's quirk doesn't seem to be a big deal for a large number of workloads (especially cloud servers / VMs). Databases are the only major workload I'm aware of where this really becomes a huge issue.