For the first time ever Google discusses our "DiRT" (Disaster Recovery Test) procedure. This is the week of hell where systems are taken down with little or no notice to verify that all the failure protection systems work.
Oh yeah... and the funny sidebar at the end was written by me :-)
P.S. (I take credit for cajoling Kripa into writing the article. I think she did a bang-up job! Go Kripa!!)