> Due to cost limitations we had to limit crafty to 2 seconds of analysis time per move
A grandmaster with standard time controls could defeat a 2-second limited Crafty. So how do you know you're finding true blunders, and not simply positions that the engine evaluates incorrectly?
This is definitely the biggest limitation of our approach right now and there are certainly some things that we counted as blunders that aren't true blunders. We're working on rectifying this by doing another pass with a better engine and more time to analyze.
That said we tested this on a smaller set of games by comparing it to results from better engines and found that only a very small number of moves tricked crafty. It's still generally quite reliable for the majority of moves.
you could just rewrite your article to call these "obvious blunders" - i.e. which you define as ones that crafty identifies in 2 seconds or less. redefine what you're doing so your methodology is correct :) Plus it's still interesting. Probably more interesting than blunders that take longer to identify!
Once you have found the blunders, you can verify them by analyzing the found positions more deeply. (Of course you should also report the number of false positives - ones that appear blunders after 2 seconds but turn out not to be on slightly longer analysis.)
A grandmaster with standard time controls could defeat a 2-second limited Crafty. So how do you know you're finding true blunders, and not simply positions that the engine evaluates incorrectly?