Recently in The Practice of Cloud System Administration Category

Hi Boston-area friends! I'll be giving my "Radical ideas from The Practice of Cloud System Administration" talk at the Back Bay LISA user group meeting on Wednesday, January 14, 2015. Visit for more info.

InfoQ interviewed the authors of The Practice of Cloud System Administration and included it as part of their review of the book.

Read it here!

Win Treese interviewed me and my co-authors about the book.

An Interview with the authors of "The Practice of Cloud System Administration" on DevOps and Data Security

We discussed DevOps in the enterprise, trends in system administration, and at the end I got riled up and ranted about how terrible computer security has become. has published an excerpt from our book "The Practice of Cloud System Administration: Designing and Operating Large Distributed Systems Vol 2".

The article has a title that implies it is about capacity planning for data centers but it's really about capacity planning for any system or service.

Room to grow: Tips for data center capacity planning

If you like that it, there's 547 more pages of good stuff like that in the book.

Tom will be the speaker at the Wed, March 4, 2015 "Crabby Admins" meetup in Fulton, MD. This group is the Baltimore chapter of LOPSA. I'll be talking about our new book, The Practice of Cloud System Administration.

For more info:

I'll be a keynote speaker at Utah State University's "Partners In Business" Information Technology Conference, on Thursday, February 26, 2015. If you work in IT in the Utah area, check out this excellent conference!

For more information, visit:

Tom will be the speaker at the Wed, Feb 11, 2015 meeting of the Bucks County DevOps Meetup, which meets in New Hope PA. I'll be talking about our new book, The Practice of Cloud System Administration.

For more info:

Tom will be the afternoon keynote at NLUUG's Autumn Conference. For more info, visit

When Esther Schindler asked for permission to publish an excerpt from The Practice of Cloud System Administration on the Druva Blog, we thought this would be the perfect piece. We're glad she agreed. Check out this passage from Chapter 7, "Operations in a Distributed World".

If you manage a sysadmin team that manages services, here is some advice on how to organize the team and their work:

Organizing Strategy for Operational Teams

To celebrate Usenix LISA, for 24 hours you can get The Practice of Cloud System Administration at an extra special discount:

I'll be doing a book signing at LISA on Friday at 10:30 in the LISA Lab. If you have the eBook, I have something special for you! See you there!

Tom will be teaching tutorials and giving other presentations at Usenix LISA in Seattle Washington, Nov 9-14, 2014.


  • Work Like a Team: Best Practices for Team Coordination and Collaborations So You Aren't Acting Like a Group of Individuals (S5)
  • Evil Genius 101: Subversive Ways to Promote DevOps and Other Big Changes (M7)
  • How To Not Get Paged: Managing Oncall to Reduce Outages (T8)


Book Signing:

  • TBA (still being worked out)

We're really excited about LISA this year. It is full of all new material and speakers. I'm really psyched and can't wait to attend!

I'm the guest on the new episode of Arrested Devops. I had a lot of fun recording this podcast. I hope you enjoy listening to it!

Check it out!

I'll be doing a book signing at Usenix LISA on Friday at 10:30am in the LISA Lab. The first 10 people to arrive will receive a free (printed) copy of the new book The Practice of Cloud System Administration. (I'll also sign other books you bring.) For info about the new book, please attend my talk "Radical Ideas from the Practice of Cloud Computing" on Wednesday at 11:45am-12:30 pm in Grand Ballroom C. I'll also be teaching tutorials and mini-tutorials.

Register for LISA today!

Three new speaking gigs have been announced. January (BBLISA in Cambridge, MA), February (Bucks County, PA), and March (Baltimore-area). The full list is on or subscribe to the RSS feed to learn about any new speaking engagements.

The next 3 speaking gigs is always listed on "see us live" box at the top of

Tom will be the speaker at the Tue, Oct 21, 2012 meeting of the Denver DevOps Meetup. I'll be talking about our new book, The Practice of Cloud System Administration.

For more info:

Hey Denver folks! Don't forget that tomorrow evening (Tue, Oct 21) I'll be speaking at the Denver DevOps Meetup. It starts at 6:30pm! Hope to see you there!

On Tuesday, Oct 21st, I'll be speaking at the Denver DevOps Meetup. It is short notice, but if you happen to be in the area, please come! I'll be talking about the new book and how DevOps principles can make the world a better place. I'll have a copy or two to give away, and special discount codes for everyone.

The meeting is at the Craftsy Offices, 999 18th St., Suite 240, Denver, CO. For more information and to RSVP, please go to

Holly from SpiceWorks interviewed me while I was in Austin for the SpiceWorld '14 conference. We talked about DevOps from the SMB "IT guy" perspective, Lord of the Rings, Chef vs. Puppet, and my secret desire start a podcast what would be "the Stephen Colbert of DevOps."

The interview has been published on their community website:

Demystifying DevOps: Q&A with Tom Limoncelli


Quoting from a community forum post on SpiceWorks:

It doesn't have "DevOps" in the name, but the new The Practice of Cloud System Administration ... covers a lot of the same concepts, more as "here's some things that have emerged as best practices in the modern world of system administration." Textbook-thick but destined to be a classic like his previous The Practice of System and Network Administration.

Thanks to Ernest Mueller for the kind words!

Tom will be the speaker at the October 8th meeting of The NYC DevOps Meetup which meets (I kid you not) at the office of MeetUp, Inc in New York City. I'll be talking about our new book, The Practice of Cloud System Administration.

For more info:

I'm honored to be a keynote at NLUUG's Autumn Conference, 20-Nov-2014, in The Netherlands. I don't get to Europe often, so this may be the last chance to see me there for a while. I'm also trying to arrange a book-signing while I'm there.

For more info, visit

Register now! Registration is limited!

Even though the registration page is in Dutch, the talk will be in English. Google translate is your friend.

I'll be the speaker at the Wed, October 8th meeting of the NYCDEVOPS Meetup which meets (I kid you not) at the office of MeetUp, Inc. in New York City. I'll be talking about our new book, The Practice of Cloud System Administration.

For more info:

I'm excited to announce that I'm interviewed on the new episode of DevOps Cafe. We talk about the history of system administration leading up to DevOps, recent changes, how the Usenix LISA conference has changed this year, and more.

The live stream of Apple's announcement of the Apple Watch was marred by technical problems. Users saw messages about "could not load movie" and "you don't have permission to access".

As we read Dan Rayburn's excellent technical analysis of what went wrong, we couldn't help but think how easily preventable their problems were.

The problem was that Apple introduced a new feature that had unknown resource requirements and (oops!) they didn't have enough resources. For example, suppose a thousand website visitors requires a certain number of computers (resources) to serve the website. Some websites are "heavier" and require the same work to be spread over more computers, others require fewer resources per thousand users.

For previous events the live stream page was a static page that embedded the live stream video. This made the page highly cacheable and required very few resources.

However this event added a new interactive feature: a live scrolling display of tweets which created a live summary of what was being presented. The idea was brilliant and did (when it was working) created a richer user experience.

The problem was that this "tweet feed" went from having 0 to millions of users in a matter of minutes. That's much too fast to add new capacity if the number of resources had been under estimated.

Normally a web service starts small and grows over many months. For example, you might start a new website for trading Beanie Babies. At first you get a few dozen or hundred visitors each day. Soon your site becomes more popular and you have a constant flood of visitors. You add capacity to handle the flood and all is good. Eventually you notice you are growing at, for example, 10% per month. That gives you a good sense of how much capacity you have to add each week to handle the additional users.

When growth is spread out over a long amount of time it is easy to stay ahead of the curve. However what if on "day 1" you were going to have a million users? You would have to accurately predict how much capacity will be required with zero prior experience; just engineering estimates.

You can do some tests. You can easily simulate 1,000 users and multiply it up, but that kind of projection is rarely accurate. When dealing with growth in the hundreds of thousands you can't predict what problems will crop up.

For example, Apple may have tested their new interactive tweet feed with thousands of users, but until they had millions of users there was no way to know what their weakest link was. It could have been bandwidth, a software thread locking problem, not enough CPU, or a myriad other issues.

So how do you handle this kind of situation? Companies like Google and Facebook have developed ways to perform tests that more accurately predict the resources needed. They had no choice. Google can't announce a new feature without millions of people trying the new feature the moment it is available.

One technique is to slow down the rate of growth artificially. When Gmail was new, Google required new users to get an invitation to join. They controlled the rate at which invitations were distributed, thus controlling the rate of growth. Did a new shipment of hard disks arrive late? Delay the next batch of invites. Did a code optimization make the system more efficient? Hand out more invitations.

Another way to artificially control growth is to enable the feature without announcing it. This is called a "soft launch." Word of mouth only spreads so fast. This may slow down the growth enough that it can be monitored by engineers who can fix the problems as they are discovered. In the worst case, you can turn off the feature since it hasn't been officially announced yet. Users will be disappointed but the truth is that removing the feature might just create more hype for when it does return.

Apple's situation was a little different. Literally zero to millions in just a few minutes. For that, you need to do a dark launch. As described in our new book, The Practice of Cloud System and Network Administration:

The term dark launch was coined in 2008 when Facebook revealed the technique was used to launch Facebook Chat. The launch raised an important issue: How to go from zero to seventy million users overnight without scaling issues. An outage would be highly visible. Long before the feature was visible to users, Facebook pages were programmed to make connections to the chat servers, query for presence information and simulate message sends without a single UI element drawn on the page. This gave Facebook an opportunity to find and fix any issues ahead of time. If you were a Facebook user back then, you had no idea your web browser was sending simulated chat messages but the testing you provided was greatly appreciated.

Apple could have added some code to their homepage that would have queried the tweet stream but not displayed it. This would have enabled them to use the millions of people that visit their homepage to help them test this feature. In fact, they could have started small: enabling this hidden feature for 1% of all visitors and "turning up the dial" slowly over time to see how the tweet stream system reacted. Once it was at 100%, they'd have developed confidence in the system.

In the process,they would have probably found what is suspected to be a bug. The code was querying for updates 1000 times per second instead of a more reasonable ten times per second. This alone might have created 100 times the pressure on Apple's resources.

All of this could have been done weeks before the actual event.

What makes this such a shame is that the media spent time discussing the technical problems of the launch, even though this had no real bearing on the new Apple Watch itself. This subtracted from the amount of positive press that Apple was trying to achieve.

We're not saying that if Apple's engineers had read The Practice of Cloud System Administration the entire fiasco would have been prevented. In fact, we have to assume someone at Apple knows these techniques. The problem is that the right people didn't know. This is why we wrote this book: to spread these kind of ideas to more people.

It isn't just big companies like Apple that need to understand these techniques. Even small startups have unexpected success.

Big launches are high-stakes tests of a company's IT team. You have to get it right the first time or you'll end up like the successful restaurant Yogi Berra once described: "Nobody goes there anymore. It's too crowded."

For more information about The Practice of Cloud System Administration, please visit

Hi Philly folks!

I will be speaking at the Philadelphia area Linux Users' Group (PLUG) meeting on Wednesday night (Oct 1st). They meet at the University of the Sciences in Philadelphia (USP). My topic will be "Highlights from The Practice of Cloud System Administration" and I'll have a few copies of the book to give away.

For more info, visit their website:

Hope to see you there!

Previously Safari Books Online (the O'Reilly thing... not the Apple thing) had a rough draft of The Practice of Cloud System Administration. Now it has the final version:


...because we're re-branding The Practice of System and Network Administration as "Volume 1".

  • Vol 1 == enterprise IT
  • Vol 2 == server/service administration


Available as a PDF here.

Stack Exchange, Inc. ( / is hosting the launch party for Tom Limoncelli's newest book, "The Practice of Cloud System Administration." The local DevOps/Sysadmin/Linux user community is invited. Food and beverages will be provided.

  • Date: Wed, Sept. 17, 2014
  • Time: 7 p.m. until 9 p.m.
  • Location: Stack Exchange NYC HQ, 110 William Street, 28th Floor, NY, NY 10038
  • Directions:
  • RSVP: click here

Information about the book:

If you are in town for Velocity NYC, please stop by!

The ebook is shipping!

The Practice of Cloud System Administration is shipping on Kindle and PDF/Mobi versions are shipping on InformIT. Physical book should start shipping today or Monday.

If you get the PDF, I'd love to know the md5 hash of the file. Post in the comments.

I'll be the speaker at the LOPSA-NJ September meeting. I'll be talking about my new book, The Practice of Cloud System Administration.

For more information, visit their web site

I'm excited to announce my "book tour" to promote The Practice of Cloud System Administration, which starts shipping on Friday, September 5!

I'll be speaking and/or doing book signings at the following events. More dates to be announced soon.

[NOTE: The complete list has been moved to]

This book is the culmination of 2 years of research on the best practices for modern IT / DevOps / cloud / distributed computing. It is all new material. We're very excited by the early reviews and hope you find the book educational as well as fun to read.

I'd be glad to autograph you copy if you bring it to any of these events. (I have something special planned for ebook owners.)

Information about the book is available on Read a rough draft on Safari Books Online. For a limited time, save 35% by using discount code TPOSA35 on

I look forward to seeing you at one or more of these events!

I'll be the speaker at the September LOPSA-NJ meeting. My topic will be the more radical ideas in our new book, The Practice of Cloud System Administration. This talk is (hopefully) useful whether you are legacy enterprise, fully cloud, or anywhere in between.

  • Topic: Radical ideas from The Practice of Cloud Computing
  • Date: Thursday, September 4, 2014
  • Time: 7:00pm (social), 7:30pm (discussion)

This is material from our newest book, which starts shipping the next day. Visit for more info.

Tom will present "Highlights from The Practice of Cloud System Administration" at the Philadelphia area Linux Users' Group meeting in October. They meet at the University of the Sciences in Philadelphia (USP).

For more info, visit their website:

Hope to see you there!

I'll be the September speaker at the CloudAustin Meetup. I'll be talking about our new book, The Practice of Cloud System Administration.

For more info:

Thanks to the organizers for moving their meeting date to adjust for my travel schedule. I'd also like to thank the Austin DevOps Meetup for jointly hosting this meeting.

I'll be giving the Tuesday closing talk about what enterprise IT can learn from cloud or "distributed computing." is up and online! This is our new website dedicated to promoting The Practice of Cloud System Administration. It has a few incomplete pages, but we've decided to start spreading the word now.

Check it out!

I'll be giving a tutorial called "Time Management for Busy DevOps" as part of the tutorial track, Monday at 3:30pm. Details here.

I'll be the inaugural speaker at the new meetup for people interested in DevOps and Automation in New Jersey, Monday, August 18, 2014, in Clifton, NJ

Safari Books Online now has all chapters of The Practice of Cloud Administration. "Rough Cuts" are pre-editing drafts. You get to see the book with all the typos and misspelled words... but 2-3 months before the real book is available:

If you want to get some fan-only details about the book and other inside information. Join our mailing list

Subscribe to our mailing list

Tom will be teaching 2 tutorials including the all-new Evil Genius 101 half-day class.

  1. Half-day tutorial: Advanced Time Management: Team Efficiency
  2. Half-day tutorial: Evil Genius 101

There will also be a preview of our new book, The Practice of Cloud System Administration.

  • LISA15