More

jheriko · 2025-02-19T02:58:00 1739933880

this is unexciting.

jheriko · 2025-02-13T18:54:51 1739472891

The title is moronic, the content reads like it is written by someone completely out of touch with reality.

This is a common sense idea... the fact that this required this level thought is scary.

oldandboring · 2025-02-13T19:10:16 1739473816

Warning: snark incoming.

The article also has a strong whiff of: the author has one young, neurotypical child and is now pretty sure he's got this whole parenting thing figured out, decides to blog about it. Then the overwhelmingly young, male, childless HN brigade descends with virtue-signaling responses about what amazing parents they'd be if, you know, they had kids, and how easy it would be.

jheriko · 2025-02-13T18:51:42 1739472702

I thought this meant the website was a failure... and looking at it, I think its due to become one.

jheriko · 2025-02-11T05:33:16 1739251996

i feel this article is missing some detail or incorrect in reporting the actual development here. either that or i am missing something myself...

hash tables are constant time on average for all insertion, lookup and deletion operations, and in some special cases, which i've seen used in practice very, very often, they have very small constant run-time just like a fixed-size array (exactly equivalent in-fact).

this came up in an interview question i had in 2009 where i got judged poorly for deriding the structure as "not something i've often needed", and i've seen it in much older code.

i'm guessing maybe there are constraints at play here, like having to support unbounded growth, and some generic use case that i've not encountered in the wild...?

yxhuvud · 2025-02-11T07:13:13 1739257993

What you are missing is how the hash table behaves when it is almost full. If there is one empty spot left in the whole table, how do you find it when you insert a new entry?

sfn42 · 2025-02-11T09:28:43 1739266123

Same way you find it when doing a lookup later?

I know that's probably a naive answer, I honestly don't even know how a hash table works. I know how a hash map works, at least some implementations use a linked list as a bucket. So the hash gives you the bucket, then you linear search the bucket for the element. Buckets should be small so the time to search them is negligible, giving O(1) lookup and insert performance.

Obviously this is different from what's being discussed here, this data structure doesn't even really get "full" but it's also the only implementation I know is in practical use. Not sure why one might use a hash table instead

yxhuvud · 2025-02-11T10:18:35 1739269115

So using buckets with linked lists like that is neither space efficient or fast. A strategy that nowadays is more common and fast is to store conflicts in the table itself using some strategy to find a place in the table itself to put the new entry. The simplest (but not optimal) way to do this is to just take the next one that isn't used yet.

This means a linear scan that once the table gets close to being full will approach O(n). To avoid this, better strategies for choosing the next place to look is used, as well as automatic resizing of the hash table at some occupancy percentage to keep the lookup chains short. Other strategies in use will also approach O(n) but will require resizing on difference occupancy percentage. What is new in this approach is that they manage to go faster than O(n) even at almost full occupancy.

xxs · 2025-02-11T16:01:12 1739289672

>The simplest (but not optimal) way to do this is to just take the next one that isn't used yet.

The linear probe is by far the most efficient way to build a hashtable on any modern hardware, nothing is near close. Everything else leads to cache trashing on misses. For the nearly full table - that's a mistake - table should not go above a specific fill factor, e.g. the notorious 75% for large tables.

yxhuvud · 2025-02-11T18:17:19 1739297839

The problem with the linear probe is that it creates long runs of collisions, thereby forcing you to avoid that by having a lower fill factor.

xxs · 2025-02-11T18:49:47 1739299787

>that it creates long runs of collisions

Yes, of course. In practice it still outperforms pretty much anything else. The lower fill factor is still cheaper (memory footprint) than having buckets and indirection.

jheriko · 2025-02-09T03:13:47 1739070827

missing the cursive theta.

jheriko · 2025-01-24T12:23:04 1737721384

this whole thing is a story about using outdated stuff in a shitty ecosystem.

its not a real problem for most modern developers.

pwrite? wtf?

not one mention of fopen.

granted some of the fine detail discussion is interesting, but it doesn't make practical sense since about 1990.

rep_lodsb · 2025-01-24T14:36:26 1737729386

The article is about the hardware and kernel level APIs used for interacting with storage. Everything else is by necessity built on top of that interface.

"fopen"? That is outdated stuff from a shitty ecosystem, and how do you think it's implemented?

jheriko · 2025-01-21T16:15:24 1737476124

i can believe this. many "experts" are consistent bullshitters, and it helps them to look more like experts.

this is not always intentional either, and there is a lot of social pressure to do it.

have you ever read a popular article about something you have expert knowledge in? the general standard for accuracy and quality in public discourse is mindblowingly low.

declan_roberts · 2025-01-21T16:21:59 1737476519

Bullshitting? yes, absolutely.

But you can't make up scripture without being immediately spotted. Their comments would be flooded with people calling them out.

It's like trying to talk about star trek episodes that don't exist with star trek nerds. There would be a few seconds of confusion before the righteous indignation.

curo · 2025-01-27T13:30:22 1737984622

Not scripture. Passages from the "Great Books" (literature, philosophy, poetry...)

jheriko · 2025-01-14T16:11:25 1736871085

interesting.

on a basic level, with the gates, it seems, if you have at most two input amounts of work, and get at most one out, then storing the lost work for later reuse makes sense

jheriko · 2025-01-11T14:43:25 1736606605

can also be spotted with experience.

these kinds of problems are present in very many standard treatments of algorithms and don't survive their first contact with real world use in some cases.

"needs a carry bit or a wider type" is common for arithmetic operations that actually use the range.

jheriko · 2025-01-11T14:18:03 1736605083

i didn't use the trick, or read what it was until after i tried for myself. it took me about 7 seconds at a conservative overestimate...

then again, i did outperform my entire national cohort at school in almost every subject by a wide margin... an outlier.

the trick however is very clever, but it wont work in more complicated scenarios where attention to detail matters.

EDIT: after doing the first one near the top i tried the rest. with a bit of warm up its very fast. no tricks needed. maybe a relic of playing these games when younger and having a "once in a generation" level of learning power coupled with training it when very young when learning speed is multiplied by a huge scale factor. i had to zoom the last one, but the other two were incredibly fast, close to immediate. sub second.

EDIT2: the warm up was doing the first image once after reading the first few sentences.

EDIT3: this is not a superpower.