It's hard to understand why this technique works so well without digging deep in the math. Roughly speaking, if you throw n balls in n bins at random, the maximum of number balls in any bins will grow surprisingly quickly (because of the birthday paradox). However, if we allow ourselves to choose between two random bins instead of one, and put the ball in the one with the fewest balls in it, the maximum number of balls in any bins grow much more slowly (i.e., O(ln ln n)). Hence, having that one extra random choice allows us to get surprisingly close to the optimal approach of comparing all bins (which would give us O(1)), without doing all that work.
Thanks for the explanation! Much clearer and I get the concept. In the case of load balancing, we'd need a ton of servers (1000s?) for this to pay off vs just comparing all, right? Cache updating aside, most of the overhead would be in reading the load numbers in. Comparing a thousand numbers has to be quick in comparison, no?
The problem with load balancing is herd behavior. Stats for load are usually at least a little stale, because it's a distributed system where you can't afford to wait for consistency. When there are traffic spikes a whole herd of new connections will go to the least loaded server for a window of time where the cached "load" number is out of date. Picking two at random helps keep from a bunch of connections racing to one server, even when you're only running 3-4 of them.
1) Throw n balls into n bins, the bin for each ball chosen randomly
2) Throw n balls into n bins, two bin for each ball chosen randomly, always picking the bin with fewer balls in it
In both cases you will have n balls distributed over n bins in the end. But the number of balls in the largest bin will be different for the two processes above. In the first case the largest bin has more balls: O(log n / log log n) == O(log n). And the second case has just O(log log n) balls. So just adding an extra choice of bins made the expected largest bin exponentially smaller.
More rough intuition: if x of your bins are occupied, in the first case your next ball has x/n probability of queueing instead of finding an empty bin but in the second it's only (x/n)^2 chance to need to queue.
Generally, yes, but I think `O(log n / log log n) == O(log n)` is wrong.
log(n) / log(log(n)) = logx(n) (where x = log(n), wasn't sure how to describe logarithm base in a better way).
So you get O(logx(n)). In general the logarithm base doesn't matter for Big-O when it's a constant, but I'm not sure you can apply the same thing to a base of log(n).
1) Θ( log n = log / log n )
2) Θ(log log n)