I lead an oncall rotation. It's important to stay on top of the pages/alerts, or else the rotation will rot and enter a death spiral where you don't get any sleep.
Every page (particularly the nighttime ones) are root-caused by the team every week. Each page gets one of four things:
1. Fix the code to handle the situation.
2. Tune the alert to increase the signal - resulting in a more actionable page.
3. Re-route to a more appropriate team.
4. Remove the alert if it doesn't help us keep the systems running.
So many pages were "informational" that we couldn't action and didn't indicate a problem that needed to be dealt with. Many others were bugs that people knew about but hadn't worked on because they didn't know it was waking us up! :)
Now, we get our sleep and people are asking to join the rotation!
Paying people to take the pager does not help when the rot sets in, but it does help encourage people to pick up extra shifts.
Every page (particularly the nighttime ones) are root-caused by the team every week. Each page gets one of four things:
1. Fix the code to handle the situation.
2. Tune the alert to increase the signal - resulting in a more actionable page.
3. Re-route to a more appropriate team.
4. Remove the alert if it doesn't help us keep the systems running.
So many pages were "informational" that we couldn't action and didn't indicate a problem that needed to be dealt with. Many others were bugs that people knew about but hadn't worked on because they didn't know it was waking us up! :)
Now, we get our sleep and people are asking to join the rotation!
Paying people to take the pager does not help when the rot sets in, but it does help encourage people to pick up extra shifts.