As someone who contributed to this benchmark it really isn't a great one. First of the JSON parsing is done before the timer starts. If you look at the raw results you can see that the full runtime of the Julia solution for 60k posts is actually 9.6 seconds (compared to 3.7, 3.8, 4.4, and 7.9 seconds for Rust, Go, Zig, and Nim respectively). The only thing it really measures is hashing and hash table performance (with a sprinkling of GC and memory management). And of course with any benchmark like this how much dev time each respective language put into the benchmark.
They're also benchmarking on Github Actions runners, with swings of up to ~50% from run to run, which is more than enough to shuffle the results to more or less random positions. I contributed to it last week but without any will to solve that kind of fundamental problem I don't see it as being particularly good.
There's also no control on quality of contributions to the language-specific benchmarks.
If I understood the code and the GitHub actions well, it also appears that they run each benchmark once. If, as you said GitHub actions runners show that much variability between runs, one should at least run the action multiple times and report the aggregate running time along with other statistics (standard deviation)...
According to another replier it's been moved to dedicated VMs in Azure, so it's not as bad, but still subject to noisy neighbors. I agree with your assessment - if I were fixing it I would do something similar.
I would think it's one of the most basic rules of benchmarking (or so I was thought during my earlier days as a student) that one should repeat the benchmark several time to smooth over the "randomness" inherent in the system
That seems to instead be accounted for in this benchmark by just parsing more entries. The longer running the benchmark (if the task is homogeneous), the less noise should be relevant.
Yeah of course. But that’d also be affect it if the benchmark was shorter and was re-run a hundred times.
Though, granted in the case of re-running it you can do things like take the minimum or median time which are much better benchmark metrics, rather than the mean which is thrown off more by outliers and system noise.
Definitely bot trying to defend this as a good benchmarking scheme.
When I see that Zig is slower than Go, I know for a fact that something's off. This benchmark really looks biased.
Rant: As with many benchmarks, this suffers from the fact that there are multiple ways to do the same thing in multiple languages, and the most common one isn't necessarily the best for a particular use case.
In an ideal world, you would have the benchmark in each language written by people that work with that language on daily basis and have all the necessary knowledge to produce a fair benchmark, which is something that a naive implementation often fails to do.
How much of this is benchmarking JSON parsing vs other processing? It'd be nice to see timing broken down based on each step in the requirements section.
Not that JSON parsing isn't valid to measure, but it's not the most interesting thing to me given the large number of JSON parsers that exist in each language.
The JSON parsing is actually done before the timer starts. If you look at the raw results Julia actually comes out very poorly when the JSON part is taken into account.
They say it because it sounds clever and it's sort of true on a very superficial level if you don't think about it too much. Not because popularity gives a license to be awful.
Another post here indicated that these times don't include startup costs. From my experience with Julia (and I love Julia and have used it extensively at school!), the "time to first plot" thing is still a big problem. You would never want to write shell scripts in Julia, mostly because it takes several seconds to get the interpreter running.
> From my experience with Julia (and I love Julia and have used it extensively at school!), the "time to first plot" thing is still a big problem.
This part is true - it has improved by an order of magnitude in recent versions and continues to, but time-to-first-X is still a tangible issue you have to deal with.
> You would never want to write shell scripts in Julia, mostly because it takes several seconds to get the interpreter running.
This part overstates the case - it takes less than half a second for the Julia runtime to start, even on my mediocre laptop. Which isn't nothing, to be sure, and maybe a non-starter for some use cases.
But generally, for shell script like programs, the time taken by the runtime (the "interpreter", though it's not really one) itself isn't much of an issue. The delays come in when your code needs to load big packages to do its thing, their loading times and other time-to-first costs. And you can mitigate that too, by putting the main part of your code in a precompiled package, and just calling out to that from your script.
All that is to say only that, in the past few years, Julia has gone from "you would never want to write shell scripts in Julia" to "it's mildly annoying that you have to consciously arrange your code in a certain way to avoid latencies, but it's doable and not too hard".
Not just that, systems programming languages like C++ can just mmap the file and then absolutely murder the simple parsing (e.g. no floating point), use a single mem alloc, tiny hashmap, radix sort for the final list. Simple stuff like this should be limited by memory bandwidth (gigabytes per second).