Nice article. I discovered this myself recently, but in a slightly different context: Taking the union of two sets represented as arrays. To my surprise, sorting the arrays and iteratively computing their union was faster than a hashset.
Others in this thread have mentioned that computing the hash takes time. This is true. Let n be the size of the array. When n in small, the (constant) overhead of computing the hash >> O(n * log n) sorting. Now, consider what happens as n becomes large: Virtually every access becomes a cache miss. The (constant) overhead of a cache miss is >> O(n * log n). At the time, I did the math, and determined that n would need to be some unreasonably gargantuan quantity (on the order of the number of stars in the visible universe) before log n is big enough to overtake the effects of cache misses.
Others in this thread have mentioned that computing the hash takes time. This is true. Let n be the size of the array. When n in small, the (constant) overhead of computing the hash >> O(n * log n) sorting. Now, consider what happens as n becomes large: Virtually every access becomes a cache miss. The (constant) overhead of a cache miss is >> O(n * log n). At the time, I did the math, and determined that n would need to be some unreasonably gargantuan quantity (on the order of the number of stars in the visible universe) before log n is big enough to overtake the effects of cache misses.