I've been thinking about something similar. I don't see how timed expiration would conflict with the two most important features - the filling mechanism and the replication of hot items. Am I missing something that would make timed expiration impossible?
Yeah on the first pass of the problem you seem right.
The CAS must have an authoritative node (my mind wanders thinking about replication and failover) but the key it protects - with the version baked in - can be replicated surely?
CAS is incompatible with the distribution architecture, which uses a best-effort distributed lock in lieu of e.g. a strongly consistent distributed state machine. It would require a lot more work.
If you have a bug in another application on the server running the cache that causes it to grow its memory use, your cache would suddenly disappear/underperform, and the failure could cascade on to the system that the cache is in front of. If instead you let the offending program crash because the cache is using regular memory, this would not happen. Just a thought.
If you hit swap, again only the offending application or instance is punished, not everyone else (for instance by pummeling a backend database server that other services are using as well)
If program A hits swap, it means that cold pages are written to swap so that A can get those pages; this initial writing is done by program A, its true. But A may not be the cause of the problem, A is just the straw that breaks the camel's back.
And those pages that got written to swap likely belong to others, and they pay the cost when they need those pages back...
In my practical experience, when one of my apps hits swap, the whole system becomes distressed. It is not isolated to the 'offender'.
You can of course avoid swap, but with your OS doing overcommit on memory allocations, you are just inviting a completely different way of failing and that too is hard to manage. You end up having to know a lot about your deployment environment and ring-fence memory between components and manage their budgets. If you want to have both app code and cache on the same node - and that's a central tenet of groupcache - then you have to make sure everything is under-dimensioned because the needs of one cannot steal from the other; your cache isn't adaptive.
That's why I built a system to do caching centrally at the OS level.
I hope someone like Brad is browsing here and can make some kind of piecing observation I've missed.
That's rather common. If swapping can harm your application, than don't swap. On a machine where slowdown is tolerable (temporarily, on a desktop), swap is fine. On a machine whose entire purpose is to serve as a fast cache in front of slow storage, swapoff and fall back to shedding or queuing requests at the frontend.
That is my experience as well. In my thought experiment the 'offender' would be a server instance, not a process running among other applications on a single machine. Applications that hit swap often have memory leaks, and hitting swap is then just a matter of time. Creating a cascading failure may be preventable however.
Its an alternative to memcache but not a direct replacement. I hope he adds CAS etc.
I hope they start using the kernel's buffer cache as the backing store, or explain why its not a good idea: http://williamedwardscoder.tumblr.com/post/13363076806/buffc...