Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Failing over is correct because there's no way to discern that the hardware is not at fault. They should have designed a better response to the second failure to avoid the knock-on effects.


I don't think anything in this incident pointed to a hardware fault

The software raised an exception because a "// TODO: this should never happen" case happened

A hardware fault would look like machines not talking to each other or corrupted data file unreadable


Retroactive inspection revealed that it wasn't a hardware failure, but the computer didn't know that at the time, and hardware failure can look like anything, so it was correct to exercise its only option.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: