Awesome Conferences

See us live(rss)   

December 2012 Archives

I can't take credit for this, as a co-worker recently introduced me to this point.

All outages are, at their core, a failure to plan.

If a dead component (for example, a hard drive) failed, then there was a lack of planning for failed components. Components fail. Hard disks, RAM chips, CPUs, mother boards, power supplies, even ethernet cables fail. If a component fails and causes a visible outage, then there was a failure to plan for enough redundancy to survive the outage. There are technologies that, with prior forethought, can be included in a design to make any single component's failure a non-issue.

If a user-visible outage is caused by human error it is still a failure to plan: someone failed to plan for enough training, failed to plan the right staffing levels and competencies, failure to plan disaster exercises to verify training, failure to plan to validate the construction and execution of training.

What about the kind of outages that "nobody could have expected"? Also a failure to plan.

What about the kind of outages that are completely unavoidable? That is a failure to plan to have an SLA that permits a reasonable amount of downtime each year. If you plan includes up to 4 hours of downtime each year, those first 239 minutes are not an outage. If someone complains that 4 hours isn't acceptable to them, there was a failure to communicate that everyone should only adopt plans that are ok with 4 hours of downtime each year; or the communication worked but the dependent plan failed to incorporate the agreed upon SLA. If someone feels they didn't agree to that SLA, there was a failure to plan how to get all stakeholders buy-in.

If designs that meet the SLA are too expensive then there was a failure to plan the budget. If a the product can not be made profitable at the expense required to meet the SLA, there was a failure to plan a workable business case.

If the problem is that someone didn't follow the plan, then the plan failed to include enough training, communication, or enforcement.

If there wasn't enough time to plan all of the above, there was a failure to start planning early enough to incorporate a sufficient level of planning.

The next time there is an outage, whether you are on the receiving end of the outage or not, think about what was the failure to plan at the root of this problem. I assure you the cause will have been a failure to plan.

Posted by Tom Limoncelli in Management

Ben Cotton write an excellent summary of my half-day tutorial from LISA this year:

https://www.usenix.org/blog/time-management-system-administrators-0

Did you miss the Usenix LISA live stream of Vint Cerf's keynote? Video is online:

http://ow.ly/g38p7

Posted by Tom Limoncelli in Conferences

Every year at Usenix LISA it seems that there is a moment where someone says something that makes me want to jump up and shout, "OMG! Learning that just paid for my entire conference!"

It may be something an instructor says at a tutorial, a presenter says at a paper or Invited Talk. Often it is something you learn from the person you just happened to start chatting with while on line waiting for lunch.

If you have a "LISA Moment", I encourage you to tweet it with hashtag #lisa12 #moment or post it as a comment to this post.

Posted by Tom Limoncelli in Conferences

In the past I've said good things a few different times about "Taming Information Technology: Lessons from Studies of System Administrators" by Eser Kandogan, Paul Maglio, Eben Haber and John Bailey.

Eben will be at Usenix LISA next week, in San Diego, doing a book signing during the Wednesday afternoon break on the expo floor. He'll have a limited number of copies for sale at a huge discount (I hear it's $40/book while supplies last).

See you there!

Posted by Tom Limoncelli in Book News

In the past I've said good things a few different times about "Taming Information Technology: Lessons from Studies of System Administrators" by Eser Kandogan, Paul Maglio, Eben Haber and John Bailey.

Eben will be at Usenix LISA next week, in San Diego, doing a book signing during the Wednesday afternoon break on the expo floor. He'll have a limited number of copies for sale at a huge discount (I hear it's $40/book while supplies last).

See you there!

Posted by Tom Limoncelli in Book News

As you know, I'll be teaching 3 tutorials at LISA this year (Intro To Time Management, Advanced Time Managemente, and Ganeti/Build a private cloud). If you can't attend in person you can still watch over the internet. The cost is about the same as being there, and there will be a chatroom so that you can ask questions just like in-person attendees. However, you save money of travel and hotel.

https://www.usenix.org/conference/lisa12/training-program/live-streaming

See you there at the conference or via the interwebz!

Posted by Tom Limoncelli in Conferences