Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This memory is now the least recently used in the L1 cache, despite being freed by the allocator, meaning it probably isn't being used again.

If it was freed after already being removed from the L1 cache, then you also need to evict other L1 cache contents and wait for it to be read into L1 so you can write to it.

128 cycles is a generous estimate, and ignores the costs to the rest of the program.



You can use non-temporal writes to avoid this, and some CPUs have an instruction that zeroes a cache line. It's not expensive to do this.


Nontemporal writes are substantially slower, e.g. with avx512 you can do 1 64 byte nontemporal write every 5 or so clock cycles. That puts you at >= 640 cycles for 8 KiB. https://uops.info/html-instr/VMOVNTPS_M512_ZMM.html


Well, the point of a non-temporal write kind of is that you don't care how fast it is. (Since if it was being read again anytime soon, you'd want it in the cache.)

But yes, it can be an over-optimization.


The worker is already reading/writing to the buffer memory to service each incoming HTTP request, whether the memory is zeroed or not. The side effects on the CPU cache are insubstantial.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: