Yes, the RAM is mostly used by content sitting in the VM page cache.
Yes, you could go NVME->NIC with P2P DMA. The problem is that NICs want to read data at once TCP mss (~1448b) and NVME really wants to speak in 4K sized chunks. So there needs to be some buffers somewhere. It might eventually be CXL based memory, but for now it is host memory.
EDIT: missed the last question. No, with NIC kTLS, the host RAM usage is about the same as it would be without TLS at all. Eg, connection data sitting in the socket buffers refers to pages in the host vm page cache which can be shared among multiple connections. With software kTLS, data in the socket buffers must refer to private, per-connection encrypted data which increases RAM requirements.
Thank you, I understood that efficient offload may eventually be possible.
Back when I was in NetApp, folks had researched on splitting 4k chunks as 3 ethernet packets (NetCache) line so that they'd happily fit and issue 3 I/Os on non 4k aligned boundaries. There was also a similar issue to reassemble smaller I/Os into a bigger packet, because some disks were 512b blocks back then. The idea was to give multiple gather/scatter and the engine would take care of reassembly.
Really looking forward to what interesting things happen in this space :)
Is the RAM mostly used by page content read by the NICs due to kTLS?
If there was better DMA/Offload could this be done with a fraction of the RAM? (NVME->NIC)
If there was no need to TLS, would the RAM usage drop dramatically?