Hacker News new | past | comments | ask | show | jobs | submit login

> Rust developers might consider switching to jemallocator for improved performance

I am curious if this is something that everyone can do to get free performance or if there are caveats. Can C codebases benefit from this too? Is this performance that is simply left on table currently?




Be aware `jemalloc` will make you suffer the observability issues of `MADV_FREE`. `htop` will no longer show the truth about how much memory is in use.

* https://github.com/jemalloc/jemalloc/issues/387#issuecomment...

* https://gitlab.haskell.org/ghc/ghc/-/issues/17411

Apparently now `jemalloc` will call `MADV_DONTNEED` 10 seconds after `MADV_FREE`: https://github.com/JuliaLang/julia/issues/51086#issuecomment...

So while this "fixes" the issue, it'll introduce a confusing time delay between you freeing the memory and you observing that in `htop`.

But according to https://jemalloc.net/jemalloc.3.html you can set `opt.muzzy_decay_ms = 0` to remove the delay.

Still, the musl author has some reservations against making `jemalloc` the default:

https://www.openwall.com/lists/musl/2018/04/23/2

> It's got serious bloat problems, problems with undermining ASLR, and is optimized pretty much only for being as fast as possible without caring how much memory you use.

With the above-mentioned tunables, this should be mitigated to some extent, but the general "theme" (focusing on e.g. performance vs memory usage) will likely still mean "it's a tradeoff" or "it's no tradeoff, but only if you set tunables to what you need".


Note that glibc has a similar problem in multithreaded contexts. It strands unused memory in thread-local pools, which grows your memory usage over time like a memory leak. We got lower memory usage that didn't grow over time by switching to jemalloc.

Example of this: https://github.com/prestodb/presto/issues/8993


The musl remark is funny, because jemalloc's use of pretty fine-grained arenas sometimes leads to better memory utilisation through reduced fragmentation. For instance Aerospike couldn't fit in available memory under (admittedly old) glibc, and jemalloc fixed the issue: http://highscalability.com/blog/2015/3/17/in-memory-computin...

And this is not a one-off: https://hackernoon.com/reducing-rails-memory-use-on-amazon-l... https://engineering.linkedin.com/blog/2021/taming-memory-fra...

jemalloc also has extensive observability / debugging capabilities, which can provide a useful global view of the system, it's been used to debug memleaks in JNI-bridge code: https://www.evanjones.ca/java-native-leak-bug.html https://technology.blog.gov.uk/2015/12/11/using-jemalloc-to-...


Yes, almost everybody who looks at memory usage in production will eventually discover glibc's memory fragmentation issues. This is how I learned about this topic.

Setting the env var MALLOC_MMAP_THRESHOLD_=65536 usually solves these problems instantaneously.

Most programmers seem to not bother to understand what is going on (thus arriving at the above solution) but follow "we switched to jemalloc and it fixes the issue".

(I have no opinion yet on whether jemalloc is better or worse than glibc malloc. Both have tunables, and will create problematic corner cases if the tunables are not set accordingly. The fact that jemalloc has /more/ tunables, and more observability / debugging features, seems like a pro point for those that read the documentation. For users that "just want low memory usage", both libraries' defaults look bad, and the musl attitude seems like the best default, since OOM will cause a crash vs just having the program be some percent slower.)


Aiming to please people who panic about their RSS numbers seems... misguided? It seems like worrying about RAM being "used" as file cache[0].

If you want to gauge whether your system is memory-limited look at the PSI metrics instead.

[0] https://www.linuxatemyram.com/


Those are not the same.

You can see cache usage in htop; it has a different colour.

With MADV_FREE, it looks like the process is still using the memory.

That sucks: If you have some server that's slow, you want to SSH into a server and see how much memory each process takes. That's a basic, and good, observability workflow. Memory leaks exist, and tools should show them easily.

The point of RES is to show resident memory, not something else.

If you change htop to show the correct memory, that'd fix the issue of course.


Well, RES is resident in physical memory. It just is marked so that the kernel can make reclaim it when it needs to. But until then it's resident. If you want to track leaks you need resident-and-in-use metric which may be more difficult to come by (probably requires scanning smaps?).

It's a case of people using the subtly wrong metrics and then trying to optimize tools chasing that metric rather than improving their metrics. That's what I'm calling misguided.


Not that I would recommend using jemalloc by default but it’s definitely going to be better than musl’s allocator ;)


Thank you! That was very thorough! I will be reading the links. :)



I think it's pretty much free performance that's being left on the table. There's slight cost to binary size. And it may not perform better in absolutely all circumstances (but it will in almost all).

Rust used to use jemalloc by default but switched as people found this surprising as the default.


Switching to non-default allocator does not always brings performance boost. It really depend on your workload, which requires profiling and benchmarking. But C/C++/Rust and other lower level languages should all at least be able to choose from these allocators. One caveat is binary size. Custom allocator does add more bytes to executable.


I don’t know why people still look to jemalloc. Mimalloc outperforms the standard allocator on nearly every single benchmark. Glibc’s allocator & jemalloc both are long in the tooth & don’t actually perform as well as state of the art allocators. I wish Rust would switch to mimalloc or the latest tcmalloc (not the one in gperftools).


> I wish Rust would switch to mimalloc or the latest tcmalloc (not the one in gperftools).

That's nonsensical. Rust uses the system allocators for reliability, compatibility, binary bloat, maintenance burden, ..., not because they're good (they were not when Rust switched away from jemalloc, and they aren't now).

If you want to use mimalloc in your rust programs, you can just set it as global allocator same as jemalloc, that takes all of three lines: https://github.com/purpleprotocol/mimalloc_rust#usage

If you want the rust compiler to link against mimilloc rather than jemalloc, feel free to test it out and open an issue, but maybe take a gander at the previous attempt: https://github.com/rust-lang/rust/pull/103944 which died for the exact same reason the the one before that (https://github.com/rust-lang/rust/pull/92249) did: unacceptable regression of max-rss.


I know it’s easy to change but the arguments for using glibc’s allocator are less clear to me:

1. Reliability - how is an alternate allocator less reliable? Seems like a FUD-based argument. Unless by reliability you mean performance in which case yes - jemalloc isn’t reliably faster than standard allocators, but mimalloc is.

2. Compatibility - again sounds like a FUD argument. How is compatibility reduced by swapping out the allocator? You don’t even have to do it on all systems if you want. Glibc is just unequivocally bad.

3. Binary bloat - This one is maybe an OK argument although I don’t know what size difference we’re talking about for mimalloc. Also, most people aren’t writing hello world applications so the default should probably be for a good allocator. I’d also note that having a dependency of the std runtime on glibc in the first place likely bloats your binary more than the specific allocator selected.

4. Maintenance burden - I don’t really buy this argument. In both cases you’re relying on a 3rd party to maintain the code.


> I know it’s easy to change but the arguments for using glibc’s allocator are less clear to me:

You can find them at the original motivation for removing jemalloc, 7 years ago: https://github.com/rust-lang/rust/issues/36963

Also it's not "glibc's allocator", it's the system allocator. If you're unhappy with glibc's, get that replaced.

> 1. Reliability - how is an alternate allocator less reliable?

Jemalloc had to be disabled on various platforms and architectures, there is no reason to think mimalloc or tcmalloc are any different.

The system allocator, while shit, is always there and functional, the project does not have to curate its availability across platforms.

> 2. Compatibility - again sounds like a FUD argument. How is compatibility reduced by swapping out the allocator?

It makes interactions with anything which does use the system allocator worse, and almost certainly fails to interact correctly with some of the more specialised system facilities (e.g. malloc.conf) or tooling (in rust, jemalloc as shipped did not work with valgrind).

> Also, most people aren’t writing hello world applications

Most people aren't writing applications bound on allocation throughput either

> so the default should probably be for a good allocator.

Probably not, no.

> I’d also note that having a dependency of the std runtime on glibc in the first place likely bloats your binary more than the specific allocator selected.

That makes no sense whatsoever. The libc is the system's and dynamically linked. And changing allocator does not magically unlink it.

> 4. Maintenance burden - I don’t really buy this argument.

It doesn't matter that you don't buy it. Having to ship, resync, debug, and curate (cf (1)) an allocator is a maintenance burden. With a system allocator, all the project does is ensure it calls the system allocators correctly, the rest is out of its purview.


The reason the reliability & compatibility arguments don’t make sense to me is that jemalloc is still in use for rustc (again - not sure why they haven’t switched to mimalloc) which has all the same platform requirements as the standard library. There’s also no reason an alternate allocator can’t be used on Linux specifically because glibc’s allocator is just bad full stop.

> It makes interactions with anything which does use the system allocator worse

That’s a really niche argument. Most people are not doing any of that and malloc.conf is only for people who are tuning the glibc allocator which is a silly thing to do when mimalloc will outperform whatever tuning you do (yes - glibc really is that bad).

> or tooling (in rust, jemalloc as shipped did not work with valgrind)

That’s a fair argument, but it’s not an unsolvable one.

> Most people aren’t writing applications bound on allocation throughput either

You’d be surprised at how big an impact the allocator can make even when you don’t think you’re bound on allocations. There’s also all sorts of other things beyond allocation throughput & glibc sucks at all of them (e.g. freeing memory, behavior in multithreaded programs, fragmentation etc etc).

> The libc is the system’s and dynamically linked. And changing allocator does not magically unlink it

I meant that the dependency on libc at all in the standard library bloats the size of a statically linked executable.


> jemalloc is still in use for rustc (again - not sure why they haven’t switched to mimalloc)

Performance of rustc matters a lot! If the rust compiler runs faster when using mimalloc, please benchmark & submit a patch to the compiler.


I literally linked two attempts to use mimalloc in rustc just a few comments upthread.


Ah - my mistake; I somehow misread your comment. Pity about the RSS regression.

Personally I have plenty of RAM and I'd happily use more in exchange for a faster compile. Its much cheaper to buy more ram than a faster CPU, but I certainly understand the choice.

With compilers I sometimes wonder if it wouldn't be better to just switch to an arena allocator for the whole compilation job. But it wouldn't surprise me if LLVM allocates way more memory than you'd expect.


Any links to instructions on how to run said benchmarks?


Not to mention that by using the system allocator you get all sorts of things “for free” that the system developers provide for you, wrt observability and standard tooling. This is especially true of the OS and the allocator are shipped by one group rather than being developed independently.


I've never not gotten increased performance by swapping outc the allocator.


Rust used to use jemalloc as the default, but went back to using the system malloc back in 2018-ish[0]. Since Rust now has the GlobalAlloc trait (and the #[global_allocator] attribute), apps can use jemalloc as their allocator if they want. Not sure if there's a way for users to override via LD_PRELOAD or something, though.

It turns out jemalloc isn't always best for every workload and use case. While the system allocator is often far from perfect, it at least has been widely tested as a general-purpose allocator.

[0] https://github.com/rust-lang/rust/issues/36963


Performance is not a one-dimensional scale where programs go from “slow” to “fast”, because there are always other factors at play. jemalloc can be the right fit for some applications but for others another choice might be faster, but it also might be that the choice is slower but better matches their goals (less dirty memory, better observability, certain security guarantees, …)


basically that's why jason wrote it in the first place, but other allocators have caught up since then to some extent. so jemalloc might make your c either slower or faster, you'll have to test to know. it's pretty reliable at being close to the best choice

does tend to use more ram tho


jemalloc and mimalloc are very popular in C and C++ software, yes. There are few drawbacks, and it's really easy to benchmark different allocators against eachother in your particular use case.


You can override the allocator for any app via LD_PRELOAD




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: