Nice article, but I suspect the implementation will not work. I did essentially the same for an AI-class exercise, and was thrilled to see that you could write a working Bayes classifier in 60 short lines of Python code. But later I landed a free-lance job that required writing a classifier that could be applied to real world data, and I soon realized that repeated multiplication of numbers between 0 and 1 sends you to zero too fast for the implementation to actually work. I might have missed it in the code, but I think he's doing the same mistake: you need to normalize or move to logarithms for the estimation of probabilities to work for medium or large datasets.
Yes you are right, it's better to convert the whole thing to a sum of logs, otherwise you end up with floating-point underflow.
The article was getting already too long however, but I'll add a note about it because it is an important optimization that affects both speed and even correctness (because of the underlying limitations of floating point).
Is the sum of logs method mathematically equivalent to the multiplication of probabilities (i.e. will it always produce a monotonic ordering of class predictions)?
I think this should also have the added effect of being overall faster, as doing lots of addition would be quicker than doing lots of multiplication (log notwithstanding).