How does Google manage Oncall?

"Login", the Usenix Newsletter, has an excellent article about how Google manages oncall. Authors Andrea Spadaccini and Kavita Guliani did an excellent job of providing an overview of how Google seeks to balance oncall time with non-oncall time so that engineers have time for actual engineering.

While most of the article deals with how to prevent operations people from getting overloaded, they also raise the issue that operations underload is dangerous too. SREs get out of practice if they don't get paged enough. They describe games and simulations that SRE teams do to stay in practice.

The article is available for free to Usenix members and newsletter subscribers, or for a nominal charge to everyone else.

Being an On-Call Engineer: A Google SRE Perspective, Andrea Spadaccini and Kavita Guliani

(Side note: the article cites the Oncall chapter of TPOCSA for our analysis of various oncall rotation schemes. Read it for free on SBO.)

Posted by Tom Limoncelli in Usenix

No TrackBacks

TrackBack URL: https://everythingsysadmin.com/cgi-bin/mt-tb.cgi/2004

1 Comment | Leave a comment

Leave a comment

 
  • LISA17
  • Don't Miss Out - Register Today