Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yeah my mention of gift was a red herring: I had assumed gift was being used but the same general problem (the "page garbage collection issue") crops up regardless.

If you don't use gift, you never know when the pages are free to use again, so in principle you need to keep writing to new buffers indefinitely. One "solution" to this problem is to gift the pages, in which case the kernel does the GC for you, but you need to churn through new pages constantly because you've gifted the old ones. Gift is especially useful when the page gifted can be used directly in the page cache (i.e., writing a file, not a pipe).

Without gift some consumption patterns may be safe but I think they are exactly those which involve a copy (not using gift means that a copy will occur for additional read-side scenarios). Ultimately the problem is that if some downstream process is able to get a zero-copy view of a page from an upstream writer, how can this be safe to concurrently modification? The pipe size trick is one way it could work, but it doesn't pan out because the pages may live beyond the immediately pipe (this is actually alluded in the FizzBuzz article where they mentioned things blew up if more than one pipe was involved).



Yes, this all makes sense, although like everything splicing-related, it is very subtle. Maybe I should have mentioned the subtleness and dangerousness of splicing at the beginning, rather than at the end.

I still think the man page of vmsplice is quite misleading! Specifically:

       SPLICE_F_GIFT
              The  user pages are a gift to the kernel.  The application may not modify
              this memory ever, otherwise the page cache and on-disk data  may  differ.
              Gifting   pages   to   the  kernel  means  that  a  subsequent  splice(2)
              SPLICE_F_MOVE can successfully move the pages; if this flag is not speci‐
              fied,  then  a  subsequent  splice(2)  SPLICE_F_MOVE must copy the pages.
              Data must also be properly page aligned, both in memory and length.
To me, this indicates that if we're _not_ using SPLICE_F_GIFT downstream splices will be automatically taken care of, safety-wise.


Hmm, reading this side-by-side with a paragraph from BeeOnRope's comment:

> This post (and the earlier FizzBuzz variant) try to get around this by assuming the pages are available again after "pipe size" bytes have been written after the gift, _but this is not true in general_. For example, the read side may also use splice-like calls to move the pages to another pipe or IO queue in zero-copy way so the lifetime of the page can extend beyond the original pipe.

The paragraph you quoted says that the "splice-like calls to move the pages" actually copy when SPLICE_F_GIFT is not specified. So perhaps the combination of not using SPLICE_F_GIFT and waiting until "pipe size" bytes have been written is safe.


Yes it is not clear to me when the copy actually happens but I had assumed the > 30 GB/s result after read was changed to use splice must imply zero copy.


It could be that when splicing to /dev/null (which I'm doing), the kernel knows that they their content is never witnessed, and therefore no copy is required. But I haven't verified that


Makes sense. If so, some of the nice benchmark numbers for vmsplice would go away in a real scenario, so that'd be nice to know.


Splicing seems to work well for the "middle" part of a chain of piped processes, e.g., how pv works: it can splice pages from one pipe to another w/o needing to worry about reusing the page since someone upstream already wrote the page.

Similarly for splicing from a pipe to a file or something like that. It's really the end(s) of the chain that want to (a) generate the data in memory or (b) read the data in memory that seem to create the problem.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: