I implemented many different algorithms for searching and splitting strings usin...

ncruces · 2025-06-14T08:48:17 1749890897

Could you summarize your algorithm? Like a high-level description?

Like algorithm 1 (from the article) checks N possible positions at a time by matching the first and last character; algorithm 2 instead tests the 4 leading characters.

clauderoux · 2025-06-15T07:43:41 1749973421

What I do is to build a substring out of the initial strings that is a multiple of 2. If the string I try to search is 9 characters long, then I extract an 8 characters substrings that I transform into an integer over an integer:

Here is the example for 4 characters: //We look for the presence of four characters in a row ``` int32_t cc = search[3]; for (shift = 2; shift >= 0; shift--) { cc <<= 8; cc |= search[shift]; } __m256i firstchar = _mm256_set1_epi16(cc); ```

In this case, I will look for a 4 bytes integers over my sequence: ``` current_bytes = _mm256_cmpeq_epi16(firstchar, current_bytes); q <<= 1; q |= _mm256_movemask_epi8(current_bytes); ```` I'm looking for blocks of 4 characters at a time in my string.

Rendello · 2025-06-14T15:36:11 1749915371

I tried to implement a modified version for LZ77 window search with Zig's generic SIMD a few years ago:

https://news.ycombinator.com/item?id=44273983