Scientists complain that there are only 2 scientists in congress and how difficult they find it to explain basic science to their peers. What about system administrators? How many people in congress or on the president's cabinet have every had the root or administrator password to systems that other people depend on?
Health and Human Services Secretary Kathleen Sebelius announced her resignation and the media has been a mix of claiming she's leaving in disgrace after the failed ACA website launch countered with she stuck it out until it was a success, which redeems her.
The truth is, folks, how many of you have launched a website and had it work perfectly the first day? Zero. Either you've never been faced with such a task, or you have and it didn't go well. Very few people can say they've launched a big site and had it be perfect the first day.
Let me quote from a draft of the new book I'm working on with Strata and Christine ("The Practice of Cloud Administration", due out this autumn):
[Some companies] declare that all outages are unacceptable and only accept perfection. Any time there is an outage, therefore, it must be someone's fault and that person, being imperfect, is fired. By repeating this process eventually the company will only employ perfect people. While this is laughable, impossible, and unrealistic it is the methodology we have observed in many organizations. Perfect people don't exist, yet organizations often adopt strategies that assume they do.
Firing someone "to prove a point" makes for exciting press coverage but terrible IT. Quoting Allspaw, "an engineer who thinks they're going to be reprimanded are disincentivized to give the details necessary to get an understanding of the mechanism, pathology, and operation of the failure. This lack of understanding of how the accident occurred all but guarantees that it will repeat. If not with the original engineer, another one in the future." (link)
HHS wasn't doing the modern IT practices (DevOps) that Google, Facebook, and other companies use to have successful launches. However most companies today aren't either. The government is slower to adopt new practices and this is one area where that bites us all.
All the problems the site had were classic "old world IT thinking" leading to cascading failures that happen in business all the time. One of the major goals of DevOps is to eliminate this kind of problem.
Could you imagine a CEO today that didn't know what accounting is? No. They might not be experts at it, but at least they know it exists and why it is important. Can you imagine a CEO that doesn't understand what DevOps is and why small batches, blameless postmortems, and continuous delivery are important? Yes.. but not for long.
Obama did the right thing by not accepting her resignation until the system was up and running. It would have been disruptive and delayed the entire process. It would have also disincentivized engineers and managers to do the right thing in the future. [Yesterday I saw a quote from Obama where he basically paraphrased Allspaw's quote but I can't find it again. Links anyone?]
Healthcare is 5% "medical services" and 95% information management. Anyone in the industry can tell you that.
The next HHS Secretary needs to be a sysadmin. A DevOps-trained operations expert.
What government official has learned the most about doing IT right in the last year? Probably Sebelius. It's a shame she's leaving.
You can read about how DevOps techniques and getting rid of a lot of "old world IT thinking" saved the Obamacare website in this article at the Time Magazine website. Login required.)
A good post, but I have one nitpick. It doesn't take a DevOps-trained operations expert to identify successful strategies for implementing a solution. You can apply the scientific method to this, and many other initiatives.
I'm a sysadmin by trade. Big fan of your work. That being said, I'll pick a good scientist over a good sysadmin in politics nearly every time.