This memory is now the least recently used in the L1 cache, despite being freed ...

astrange · on June 15, 2024

You can use non-temporal writes to avoid this, and some CPUs have an instruction that zeroes a cache line. It's not expensive to do this.

celrod · on June 15, 2024

Nontemporal writes are substantially slower, e.g. with avx512 you can do 1 64 byte nontemporal write every 5 or so clock cycles. That puts you at >= 640 cycles for 8 KiB. https://uops.info/html-instr/VMOVNTPS_M512_ZMM.html

astrange · on June 16, 2024

Well, the point of a non-temporal write kind of is that you don't care how fast it is. (Since if it was being read again anytime soon, you'd want it in the cache.)

But yes, it can be an over-optimization.

10000truths · on June 15, 2024

The worker is already reading/writing to the buffer memory to service each incoming HTTP request, whether the memory is zeroed or not. The side effects on the CPU cache are insubstantial.