Hacker News new | past | comments | ask | show | jobs | submit login

What? It all depends on the sizes of the strings. Hardware benchmarks, for each significant hardware µarchitectures, must be done on combinations of string sizes in order to know "when" the SIMD setup code is worth it.

Often the setup code is not worth it, even the SIMD complexity is not bringing enough performance in many use cases (basically, kept for niche applications, little lib statically linked for those).






Did you look at the algorithm 1? The setup code is a pair of splats. There's no consideration for alignment, it works with unaligned data just fine. So what do you mean? What threshold would you expect for it to be worth it (assuming it's useful at all)?

Dude, I cannot be more explicit than that.

That said, when you work on such code, use assembly, C code is for reference (have a look at dav1d av1 decoder).


I can't either.

Having implemented this, I'll claim that this is already competitive for needles as small as 2 bytes, and if the needle doesn't show up in the first couple “vector sizes” of the haystack.

Also, look at burntsushi's comment. They use this algorithm for similarly small sizes. They do use Rabin-Karp for “supremely short haystacks” (just because that lib is awesome and uses every trick in the book), but that wording should give you an idea of how small the threshold will be.

And really, how common are “supremely small” haystacks?


What? You don't have to do that. Hell, the SIMD code that ripgrep uses is written in Rust.

Yes, I said C, since there are many real-life alternative compilers.



Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: