Yeah my mention of gift was a red herring: I had assumed gift was being used but...

rostayob · on June 2, 2022

Yes, this all makes sense, although like everything splicing-related, it is very subtle. Maybe I should have mentioned the subtleness and dangerousness of splicing at the beginning, rather than at the end.

I still think the man page of vmsplice is quite misleading! Specifically:

       SPLICE_F_GIFT
              The  user pages are a gift to the kernel.  The application may not modify
              this memory ever, otherwise the page cache and on-disk data  may  differ.
              Gifting   pages   to   the  kernel  means  that  a  subsequent  splice(2)
              SPLICE_F_MOVE can successfully move the pages; if this flag is not speci‐
              fied,  then  a  subsequent  splice(2)  SPLICE_F_MOVE must copy the pages.
              Data must also be properly page aligned, both in memory and length.

To me, this indicates that if we're _not_ using SPLICE_F_GIFT downstream splices will be automatically taken care of, safety-wise.

scottlamb · on June 2, 2022

Hmm, reading this side-by-side with a paragraph from BeeOnRope's comment:

> This post (and the earlier FizzBuzz variant) try to get around this by assuming the pages are available again after "pipe size" bytes have been written after the gift, _but this is not true in general_. For example, the read side may also use splice-like calls to move the pages to another pipe or IO queue in zero-copy way so the lifetime of the page can extend beyond the original pipe.

The paragraph you quoted says that the "splice-like calls to move the pages" actually copy when SPLICE_F_GIFT is not specified. So perhaps the combination of not using SPLICE_F_GIFT and waiting until "pipe size" bytes have been written is safe.

BeeOnRope · on June 2, 2022

Yes it is not clear to me when the copy actually happens but I had assumed the > 30 GB/s result after read was changed to use splice must imply zero copy.

rostayob · on June 2, 2022

It could be that when splicing to /dev/null (which I'm doing), the kernel knows that they their content is never witnessed, and therefore no copy is required. But I haven't verified that

scottlamb · on June 2, 2022

Makes sense. If so, some of the nice benchmark numbers for vmsplice would go away in a real scenario, so that'd be nice to know.

BeeOnRope · on June 2, 2022

Splicing seems to work well for the "middle" part of a chain of piped processes, e.g., how pv works: it can splice pages from one pipe to another w/o needing to worry about reusing the page since someone upstream already wrote the page.

Similarly for splicing from a pipe to a file or something like that. It's really the end(s) of the chain that want to (a) generate the data in memory or (b) read the data in memory that seem to create the problem.