One of my servers is an older Supermicro X11, and I've flashed its BIOS to be able to boot from an NVMe drive in a PCIe adapter. While this normally works fine (I had another identical board it worked flawlessly on), this particular one for some reason doesn't always see the drive. Since the server is off most of the time, but wakes up daily to ingest ZFS snapshots from the NAS, the ability to turn on and off reliably is important.
I didn't feel like doing that deep of a dive to determine why it sometimes sees the NVMe drive, so instead I wrote a script [0] that uses IPMI to power cycle it until it can get an ssh connection. Originally I was sending a magic packet, but realized that was only a one-shot, so I switched to calling the IPMI executable via subprocess. No idea why I left the other stuff in there.
Anyway, this has reliably worked for well over a year at this point, dutifully restarting the server until it comes up. Some days it takes 1 attempt, some days it takes 5, but it always comes up.
I didn't feel like doing that deep of a dive to determine why it sometimes sees the NVMe drive, so instead I wrote a script [0] that uses IPMI to power cycle it until it can get an ssh connection. Originally I was sending a magic packet, but realized that was only a one-shot, so I switched to calling the IPMI executable via subprocess. No idea why I left the other stuff in there.
Anyway, this has reliably worked for well over a year at this point, dutifully restarting the server until it comes up. Some days it takes 1 attempt, some days it takes 5, but it always comes up.
[0]: https://gist.github.com/stephanGarland/93c02385e344d8b338aab...