I heard that even the TP crisis made sense, people stopped using TP outside of home (work/school/etc.) and that put stress on the home TP supply chain.
It's kind of crazy to me that we all lived through that and most people still don't understand why it happened. Like you said, many all working from home more, so demand for TP went up. Suddenly shelves are empty more and the news starts talking about a TP shortage, so next time you're at the store and see TP in stock you buy a case even if you don't need it yet. Actually who knows when it'll be in stock again and at what price, you so might as well buy another case too. The shortage got worse and worse, and in order to absolve themselves of any personal responsibility people imagined that evil preppers were going around and buying up all the TP and hand sanitizer and PPE. The problem is that in reality the system was irrational (people buying TP they don't really need) but individuals were behaving rationally (stock up when you can).
That definitely caused the TP "crisis" but it was sustained by lag on the supply side. Paper manufacturing was tooled up to make "commercial" toilet paper, which saw an immediate drop in demand when we suddenly stopped going to the office. Manufacturers had to re-tool for residential toilet paper, which saw an immediate and sustained rise in demand.
In retrospect, I'd say that the first wave of people overstocking TP weren't totally irrational.
There was plenty of TP, it was just packaged incorrectly for individual consumption. The supply chains couldn't quickly change the packaging. There were even stories of TP available from Mexico that couldn't legally be sold because the labeling was in Spanish and not up to American regulations.
Same thing happened for other items like flour. Suddenly everyone was baking at home and restaurants weren't buying 50lb sacks of flour anymore. Plenty of flour, improperly packaged.
TLDR; "Ingestion of a high number of column files under memory pressure led to the kernel starting readahead disk read operations, which you wouldn't expect from a write-only load. The rest was as simple as using madvise in our code to disable the readahead in table writers."
The article kind of dances around it, but AIUI the reason that their "weite-only load" caused reads (and thus readahead) was because they were writing to a mapped page that had already been evicted - so the kernel was reading/faulting those pages because it can only write in block/page sized chunks.
In some sense maybe this could be thought of as readahead in preparation for writing to those pages, which is undesirable in this case.
However, what confused be about this article was if the data files are append only, how is there a "next" block to read ahead to? I guess maybe the files are pre-allocated or the kernel is reading previous pages.
Reading between the lines, it sounds as if they're using mmap. There is no "append" operation on a memory mapping, so the file would need to be preallocated before mapping it.
If the preallocation is done using fallocate or just writing zeros, then by default it's backed by blocks on disk, and readahead must hit the disk since there is data there. On the other hand, preallocating with fallocate using FALLOC_FL_ZERO_RANGE or (often) with ftruncate() will just update the logical file length, and even if readahead is triggered it won't actually hit the disk.
For the file being entirely pre-allocated case I understand, but for the file hole case I'm not sure I understand why you'd get such high disk activity.
If the index block also got evicted from the page cache, then could reading into a file hole still trigger a fault? Or is the "holiness" of a page for a mapping stored in the page table?
I suspect page size/aligned file holes could be backed by a read-only zero page via PTE as an optimization, but they might not be (I'm not as familiar with Linux mmap/filesystems as with FreeBSD).
It is quite possible the filesystem caches, e.g., the file extent tree (including holiness) separately from the backing inode/on-disk sectors for the tree.
The readahead is a bit of a readaround when I last checked, as in it'll pull in some stuff before the fault as well.
There used to be a sys-wide tunable in /sys to control how large an area readahead would extend to, but I'm not seeing it anymore on this 6.1 laptop. I think there's been some work changing stuff to be more clever in this area in recent years. It used to be interesting to make that value small vs. large and see how things like uncached journalctl (heavy mmap user) were affected in terms of performance vs. IO generated.
The article distinguishes "readaround" from a linear predicted "readahead", but then says the output of blktrace indicates a "potential readahead", which is where I got confused.
Does MADV_RANDOM disable both "readahead" and "readaround"?
Very cool development. There is too much busy work going from development to test to production. This will help to unify everything. OpenAI Triton https://github.com/openai/triton/ is going for a similar goal. But this is a more fundamental approach.
For about $70 per month, you could get managed Kubernetes from AWS, GCP or Azure. Why would you bother with Kops on these platforms? Also DO, OVH and Vultr provide managed K8s for free.
Depends on how well managed it is (hint: not always good).
For a high performance service like ClickHouse the nodes may need to be optimized and that's not often done in the fully managed solutions.
For EKS they often take forever to get to the current version of k8s and that might be a problem.
Whilst performance of the master nodes have increased recently they might not be up to scratch for what's required if you're doing a lot of operations.
All in all managed does not mean fit for all requirements. It certainly is great for at least 80% of cases but not all.
EKS and GKE are good products. We run a competing platform for ClickHouse on AWS and made the same decision as ClickHouse Inc 3 years ago and for very similar reasons. Kubernetes requires investment to run well, especially if you aspire to be multi-platform.
The question I would ask is would your team have the bandwidth and experience to do it better than GKE , EKS ,AKS etc.? My employer deploys to AKS and Anthos ; they both have limitations but my advise to teams is to work within those limitations as we are in the business of building business applications and not systems management s/w.
> Why would you bother with Kops on these platforms?
Given sytse is the CEO and founder of GitLab, I am sure he’s not interested in getting vendor locked. Community solutions like kops help keep things open and accessible to everyone.
With remote working you tend to have team-members more spread out. Bringing them together requires more travel. If this is by airplane it quickly cancels out the reduction in commute emissions.
Regarding wait_for_less_busy_worker on the surface it seems suboptimal to add a wait time before responding. Can someone explain why this is the best solution?
I may have this wrong, but here's my best understanding of it:
Ruby supports multi-threading, but unless you're using the new (and experimental) Ractor feature, you're subject to the global interpreter lock in most cases (with a few important and useful exceptions, like some kinds of I/O). That means that Ruby servers will typically employ multi-processing in addition to (or in place of) multi-threading as a way to increase performance and use multiple CPUs - otherwise, multiple threads just end up competing for the global interpreter lock and the additional threads don't increase performance as much as you would hope, especially if serving those requests requires any actual work to be done in Ruby code.
Puma supports a multi-processing mode, where a main Puma process forks multiple workers (each running multiple threads), and each worker listens on the same socket. The linux kernel distributes the load between the workers, and then the workers distribute the load internally between their threads. Since the global interpreter lock is a per-process thing, this is a pretty effective way to get more throughput for a Ruby server.
The problem is that you can't directly control how the kernel is going to balance incoming requests across the multiple workers listening on a socket. Because Ruby does support some instances where threads can run concurrently - like network I/O - it's possible that the kernel may end up handing off multiple requests to one worker process when there were others that were idle and could have handled the request. Doesn't sound like a big deal - but because most threaded Ruby operations do not run concurrently that means that the actual Ruby code that needs to process that request is going to be competing for the global interpreter lock.
So basically this allows a worker process that is already handling requests to insert a tiny delay before accepting another one - which gives an idle worker process a chance to accept it instead. On balance, this means that you'll get higher utilization of the CPU resources available to you and will often result in a lower average latency for all requests.
There are different types of features you can monetize with open core. The article talks about monetizing features that allow you to run it as a SaaS and the problems with that. At GitLab we opted to make those open: "The open source codebase will have all the features that are essential to running a large 'forge' with public and private repositories" https://about.gitlab.com/company/stewardship/#promises Instead we monetize features that managers and executives care more about https://about.gitlab.com/company/pricing/#buyer-based-open-c... This prevents the perverse incentives mentioned in the article.
There a lots of features in gitlab that are closed source that I, as small, one man shop, would like to have. I do think gitlab does a much better job of balancing proprietary versus open source features than most open core products, but the incentive still seems to be there to keep useful features back.
One example is global cross-repo search. When looking for substrings on KDE or GNOME's Gitlab, I often want to use the global search bar, but there is no free full-text search across all repositories hosted there, and I have to rely on code search tools external from the code hosting site's UI, like KDE's https://lxr.kde.org/.
Why wouldn't they? I guess I don't see why the question needs to be asked. There are more interesting questions to be asked about how GitLab makes such decisions.
If there's a feature that is mostly only valuable to pointy-haired bosses, then of course there will be people who want that feature to be available for free. But such a feature should be behind the paywall. It will end up funding features outside of the paywall, so the free users have a reason to be happy about it being behind the paywall.
Of course there will be lots of things in the middle, and people will disagree on every aspect of the evaluation. And people will argue that something should be free because it's good for the community when in fact they want it to be free because they don't want to pay for it. Such is the nature of the balancing act that GitLab signed up for, and I respect them for it.
It's a relatively new spin on the ancient balancing act of value creation vs value capture. TANSTAAFL
I think we're coming at it the same way, but I'm being more terse in my question line.
I am interested in how they arrive in those decisions and do revenue generation features / fixes get weighted differently than non, or are those types of tasks separate from their revenue potential or retention?
I think I'm just leaning on the OP for this context, but I should have added it.
The problem is, some think that these (and other) approaches are killing open source.
> If we normalize projects baiting developers with an open source license to gain traction and switching to a non-open source license to monopolize the returns on that traction, then the logical next step for investors will be skipping that first step entirely.
> And that, for the industry, is nothing but a dead end.
You can absolutely use GitLab to the highest levels of productivity and effectiveness without paying a cent. It just tends to require more input and effort and results in tool sprawl if you want to recreate the GitLab EE experience for free. It's possible for many of the important features, and where it's not possible, it's usually not critical.
GitLab is, imo, one of the best examples of how to do paid open source. Especially because they've gone about it without the relicensing switch that many companies have attempted in order to "protect their business/product."
> If we normalize projects baiting developers with an open source license to gain traction […], then the logical next step for investors will be skipping that [bait] step entirely.
But where would traction come from, then? Switching to a proprietary license works well because they're baiting devs, if investors skip that step there's not as much to monopolise. Or am I misreading that paragraph?
Are you then incentivized to work on managerial features rather than core product features? The disincentive comes from the fact that profits aren’t tied to the quality of the product.