October 2014 Archives

Three new speaking gigs have been announced. January (BBLISA in Cambridge, MA), February (Bucks County, PA), and March (Baltimore-area). The full list is on http://the-cloud-book.com/book-tour.html or subscribe to the RSS feed to learn about any new speaking engagements.

The next 3 speaking gigs is always listed on "see us live" box at the top of http://EverythingSysadmin.com.

Apple Pay and CurrentC

I predict one year from today CurrentC won't be up and running and, in fact, history will show it was just another attempt to stall and prevent any kind of mobile payment system in the U.S. from being a success. I'm not saying that there won't be NFC payment systems, just that they'll be marginalized and virtually usess as a result.

Posted by Tom Limoncelli

How many times have you seen this happen?

Email goes out that mentioned a date like "Wed, Oct 16". Since Oct 16 is a Thursday, not a Wednesday (this year), there is a flurry of email asking, "Did you mean Wed the 15th or Thu the 16th?" A correction goes out but the damage is done. Someone invariantly "misses the update" and shows up a day early or late, or is otherwise inconvenienced. Either way cognitive processing is wasted for anyone involved.

The obvious solution is "people should proofread better" but it is a mistake that everyone makes. I see the mistake at least once a month, and sometimes I'm the guilty party.

If someone could solve this problem it would be a big win.

Google's gmail will warn you if you use the word "attachment" and don't attach a file. Text editing boxes in all modern web browsers and operating systems have some kind of live spell-check that put a red mark under a word that is misspelled. Some do real-time grammar checking too.

How hard would it be to add a check for "Wed, Oct 16" and similar errors? Yes, there are many date formats, and in some cases one would have to guess the year.

It would also be nice if we could write "FILL, Oct 16" and the editor would fill in the day of the week. Or a context-sensitive menu (i.e. the left click menu) would offer to add the day of the week for you. If the time is included, it should offer to link to timeanddate.com.

Ok Gmail, Chrome, Apple and Microsoft: Who's going to be the first to implement this?

If someone owes you $5.35 and hands you a $20 bill, every reader of this blog can easily make change. You have a calculator, a cash register, or you do it in your head.

However there is a faster way that I learned when I was 12.

Today it is rare to get home delivery of a newspaper, but if you do, you probably pay by credit card directly to the newspaper company. It wasn't always like that. When I was 12 years old I delivered newspapers for The Daily Record. Back then payments were collected by visiting each house every other week. While I did eventually switch to leaving envelopes for people to leave payments for me, there was a year or so where I visited each house and collected payment directly.

Let's suppose someone owed me $5.35 and handed me a $20 bill. Doing math in real time is slow and error prone, especially if you are 12 years old and tired from lugging newspapers around.

Instead of thinking in terms of $20 minus $5.35, think in terms of equilibrium. They are handing you $20 and you need to hand back $20... the $5.35 in newspapers they've received plus the change that will total $20 and reach equilibrium.

So you basically count starting at $5.35. You say outloud, "5.35" then hand them a nickel and say "plus 5 makes 5.40". Next you hand them a dime and say "plus 10 makes 5.50". Now you can hand them 50 cents, and say "plus 50 cents makes 6". Getting from 6 to 20 is a matter of handing them 4 singles and counting out loud "7, 8, 9, and 10" as you hand them each single. Next you hand them 10 and say "and 10 makes 20".

Notice that the complexity of subtraction has been replaced by counting, which is much easier. This technique is less prone to error, and makes it easier for the customer to verify what you are doing in real time because they see what you are doing along the way. It is more transparent.

Buy a hotdog from a street vendor and you'll see them do the same thing. It may cost $3, and they'll count starting at 3 as they hand you bills, "3..., 4, 5, and 5 is 10, and 10 is 20."

I'm sure that a lot of people reading this blog are thinking, "But subtraction is so easy!" Well, it is but this is easiER and less error prone. There are plenty of things you could do the hard way and I hope you don't.

It is an important life skill to be able to do math without a calculator and this is one of the most useful tricks I know.

So why is this so important that I'm writing about it on my blog?

There are a number of memes going around right now that claim the Common Core curriculum standards in the U.S. are teaching math "wrong". They generally show a math homework assignment like 20-5.35 as being marked "wrong" because the student wrote 14.65 instead of .05+.10+.50+4+10.

What these memes aren't telling you is they are based on a misunderstanding of the Common Core requirements. The requirement is that students are to be taught both ways and that the "new way" is such that that they can do math without a calculator. It is important that, at a young age, children learn that there are multiple equivalent ways of getting the same answer in math. The multi-connectedness of mathematics is an important concept, much more important than the rote memorization of addition and multiplication tables.

If you've ever mocked the way people are being trained to "stop thinking and just press buttons on a cash register" then you should look at this "new math" as a way to turn that around. If not, what do you propose? Not teaching them to think about math in higher terms?

In the 1960s there was the "new math" movement, which was mocked extensively. However if you look at what "new math" was trying to do: it was trying to prepare students for the mathematics required for the space age where engineering and computer science would be primary occupations. I think readers of this blog should agree that is a good goal.

One of the 1960s "new math" ideas that was mocked was that it tried to teach Base 8 math in addition to normal Base 10. This was called "crazy" at the time. It wasn't crazy at all. It was recognized by educators that computers were going to be a big deal in the future (correct) and to be a software developer you needed to understand binary and octal (mostly correct) or at least have an appreciation for them (absolutely correct). History has proven they naysayers to be wrong.

When I was in 5th grade (1978-9) my teacher taught us base 8, 2 and 12. He told us this was not part of the curriculum but he felt it was important. He was basically teaching us "new math" even though it was no longer part of the curriculum. Later when I was learning about computers the concept of binary and hexadecimal didn't phase me because I had already been exposed to other bases. While other computer science students were struggling, I had an advantage because I had been exposed to these strange base systems.

One of these anti-Common Core memes includes note from a father who claims he has a Bachelor of Science Degree in Electronics Engineering which included an extensive study of differential equations and even he is unable to explain the Common Core. Well, he must be a terrible engineer since the question was not about doing the math, but to find the off-by-one error in the diagram. To quote someone on G+, "The supposed engineer must suck at his work if he can't follow the process, debug each step, and find the off-by-one error."

Beyond the educational value or non-value of Common Core, what really burns my butt is the fact that all these memes come from one of 3 sources:

  • Organizations that criticize anything related to public education while at the same time they criticize any attempt to improve it. You can't have it both ways.
  • Organizations who just criticise anything Obama is for, to the extent that if Obama changes his mind they flip and reverse their position too.
  • Organizations backed by companies that either benefit from ignorance, or profit from the privatization of education. This is blatant and cynical.

Respected computer scientist, security guru, and social commentator Gene "Spaf" Spafford recently blogged "There is an undeniable, politically-supported growth of denial -- and even hatred -- of learning, facts, and the educated. Greed (and, most likely, fear of minorities) feeds demagoguery. Demagoguery can lead to harmful policies and thereafter to mob actions."

These math memes are part of that problem.

A democracy only works if the populace is educated. Education makes democracy work. Ignorance robs us of freedom because it permits us to be controlled by fear. Education gives us economic opportunities and jobs, which permit us to maintain our freedom to move up in social strata. Ignorance robs people of the freedom to have economic mobility. The best way we can show our love for our fellow citizens, and all people, is to ensure that everyone receives the education they need to do well today and in the future. However it is not just about love. There is nothing more greedy you can do than to make sure everyone is highly educated because it grows the economy and protects your own freedom too.

Sadly, Snopes and skeptics.stackexchange.com can only do so much. Fundamentally we need much bigger solution.

Posted by Tom Limoncelli in Rants

Katherine Daniels (known as @beerops on Twitter) interviewed me about the presentations I'll be doing at the upcoming Usenix LISA '14 conference. Check it out:


Register soon! Seating in my tutorials is limited!

Posted by Tom Limoncelli in ConferencesLISA

Tom will be the speaker at the Tue, Oct 21, 2012 meeting of the Denver DevOps Meetup. I'll be talking about our new book, The Practice of Cloud System Administration.

For more info: http://www.meetup.com/DenverDevOps/events/213369602/

Hey Denver folks! Don't forget that tomorrow evening (Tue, Oct 21) I'll be speaking at the Denver DevOps Meetup. It starts at 6:30pm! Hope to see you there!


Register by Mon, October 20 and take advantage of the early bird pricing.

I'll be teaching tutorials on managing oncall, team-driven sysadmin tools, upgrading live services and more. Please register soon and save!


Posted by Tom Limoncelli in LISA

If you recall, the fine folks at Puppet Labs gave me a free ticket to PuppetConf 2014 to give away to a reader of this blog. Here's a report from our lucky winner!

Conference Report: PuppetConf 2014

by Anastasiia Zhenevskaia

You never know when you will be lucky enough to win a ticket to the PuppetConf, one of the greatest conferences of this year. My "moment" happened just 3 weeks before the conference and let me dive into things I've never thought about.

Being a person who worked mostly with the front-end development, I was always a little bit scared and puzzled by more complicated things. Fortunately, the Conference helped me to understand how important and simple all these processes could be. I was so impressed by personality of all speakers. Their eyes were full of passion, their presentations were clear, informational and breath-taking. Their attitude towards things they're working on - exceptional. Those are people you might want to work with, share ideas and create amazing things.

I'm so glad that I got this opportunity and wish that everybody could get this chance and taste the atmosphere of Puppet!

Posted by Tom Limoncelli in ConferencesPuppet

I'm teaching a tutorial at Usenix LISA called "Evil Genius 101: Subversive Ways to Promote DevOps and Other Big Changes".

Whether you are trying to bring "devops culture" to your workplace, or just get approval to purchase a new machine, convincing and influencing people is a big part of a system administrator's time.

For the last few years I've been teaching this class called "Evil Genius 101" where I reveal my tricks for understanding people and swaying their opinion. None of these are actually evil, nor do I teach negotiating techniques. I simply list 3-4 techniques I've found successful for each of these situations: talking to executives, talking to managers, talking to coworkers, and talking to users.

Seating is limited. Register now!

Evil Genius 101: Subversive Ways to Promote DevOps and Other Big Changes

Who should attend:

Sysadmins and managers looking to influence the technology and culture of your organization.


Monday, 10-Nov, 1:30pm-5pm at Usenix LISA


You want to innovate: deploy new technologies such as configuration management, kanban, a wiki, or standardized configurations. Your coworkers don't want change: they like the way things are. Therefore, they consider you evil. However you aren't evil, you just want to make things better. Learn how to talk your team, managers and executives into adopting DevOps techniques and culture.

Take back to work:

  • Help your coworkers understand and agree with your awesome ideas
  • Convince your manager about anything. Really.
  • Get others to trust you so they are more easily convinced
  • Deciding which projects to do when you have more projects than time
  • Turn the most stubborn user into your biggest fan
  • Make decisions based on data and evidence

Topics include:

  • DevOps "value mapping" exercise: Understand how your work relates to business needs.
  • So much to do! What should you do first?
  • How to sell ideas to executives, management, co-workers, and users.
  • Simple ways to display data to get your point across better.

Register today for Usenix LISA 2014!

Posted by Tom Limoncelli in Usenix

On Tuesday, Oct 21st, I'll be speaking at the Denver DevOps Meetup. It is short notice, but if you happen to be in the area, please come! I'll be talking about the new book and how DevOps principles can make the world a better place. I'll have a copy or two to give away, and special discount codes for everyone.

The meeting is at the Craftsy Offices, 999 18th St., Suite 240, Denver, CO. For more information and to RSVP, please go to http://www.meetup.com/DenverDevOps/events/213369602/

Step 1: turn off your pager. Step 2: disable the monitoring system. Or.... you can run oncall using modern methodologies that constantly improve the reliability of your system.

I'm teaching a tutorial at Usenix LISA called "How To Not Get Paged: Managing Oncall to Reduce Outages".

I'm excited about this class because I'm going to explain a lot of the things I learned at Google about how to turn oncall from a PITA to a productive use of time that improves the reliability of the systems you run. Most of the material is from our new book, The Practice of Cloud System Administration, but the Q&A always leads me to say things I couldn't put in print.

Seating is limited. Register now!

How To Not Get Paged: Managing Oncall to Reduce Outages

Who should attend:

Anyone with an oncall responsibility (or their manager).


Tuesday, 11-Nov, 1:30pm-5pm at Usenix LISA


People think of "oncall" as responding to a pager that beeps because of an outage. In this class you will learn how to use oncall as a vehicle to improve system reliability so that you get paged less often.

Take back to work:

  • How to monitor more accurately so you get paged less
  • How to design an oncall schedule so that it is more fair and less stressful
  • How to assure preventative work and long-term solutions get done between oncall shifts
  • How to conduct "Fire Drills" and "Game Day Exercises" to create antifragile systems
  • How to write a good Post-mortem document that communicates better and prevents future problems

Topics include:

  • Why your monitoring strategy is broken and how to fix it
  • Building a more fair oncall schedule
  • Monitoring to detect outages vs. monitoring to improve reliability
  • Alert review strategies
  • Conducting "Fire Drills" and "Game Day Exercises"
  • "Blameless Post-mortem documents"

Register today for Usenix LISA 2014!

Posted by Tom Limoncelli in Usenix

Holly from SpiceWorks interviewed me while I was in Austin for the SpiceWorld '14 conference. We talked about DevOps from the SMB "IT guy" perspective, Lord of the Rings, Chef vs. Puppet, and my secret desire start a podcast what would be "the Stephen Colbert of DevOps."

The interview has been published on their community website:

Demystifying DevOps: Q&A with Tom Limoncelli


I'm teaching a tutorial at Usenix LISA called "Live Upgrades on Running Systems: 8 Ways to Upgrade a Running Service With Zero Downtime".

Ever notice that Google, Facebook and other website aren't down periodically for software upgrades? That's because they're upgrading software on their service while it is live. As a result, they can push new features continuously. In this tutorial I'll describe 8 techniques they use... and so can you. Oh, and here's a secret: I'll have a 9th way to upgrade software... but it requires down-time. That said, it might not require down-time that is visible to users!

I'm excited about this tutorial because it covers a lot of the unique topics we cover in The Practice of Cloud System Administration that I haven't talked about publicly before.

Seating is limited. Register now!

Live Upgrades on Running Systems: 8 Ways to Upgrade a Running Service With Zero Downtime

Who should attend:

Sysadmins that run web-based services, or services that involve many machines.


Friday, 14-Nov, 9am-10:30am at Usenix LISA


How do you upgrade your service while it is running? This class covers nine techniques from the new book by Limoncelli/Chalup/Hogan, "The Practice of Cloud System Administration"... eight of which don't require downtime. Learn best practices from Google, Facebook, and other successful companies and apply them to your environment. Techniques include: The Google "Canary" process, Facebook "Dark Launches", proportional shedding, feature toggles, Erlang live-code upgrades, and live SQL and NoSQL schema changes.

Who should attend:

Sysadmins that run web-based services, or services that involve many machines.

Take back to work:

  • 8 ways to upgrade live systems without downtime
  • Techniques for cautious upgrades you may not have thought of
  • How to change SQL schemas without requiring downtime
  • Continuous Integration as a stepping stone to Continuous Deployment

Topics include:

  • Upgrade while the system is down (not viable for live upgrades)
  • Rolling upgrades
  • Google's "canary" upgrade system
  • Proportional Shedding
  • Feature Toggles
  • Facebook's Dark Launch system
  • Upgrades that involve SQL and NoSQL schema changes.
  • Languages that support live code upgrades
  • Continuous Deployment

Register today for Usenix LISA 2014!

Posted by Tom Limoncelli in Usenix

I'm teaching a tutorial at Usenix LISA called "Work Like a Team: Best Practices for Team Coordination and Collaborations So You Aren't Acting Like a Group of Individuals".

I'm excited about this class because I'm going to demo a lot of the Google Apps tricks I've accumulated over the years, and combine them with stories about successes (and failures) related to bringing teams together to work on projects. I also get to explain a lot of DevOps culture in ways that make sense to non-DevOps shops (mostly stuff I've been advocating for since before "devops" was a thing). A lot of the material will overlap with our new book, The Practice of Cloud System Administration.

Seating is limited. Register now!

Work Like a Team: Best Practices for Team Coordination and Collaborations So You Aren't Acting Like a Group of Individuals

Who should attend:

System administrators and managers that work on a team of 3 or more.


Sunday, 9-Nov, 9am-12:30pm at Usenix LISA


System Administration is a team sport. How can we better collaborate and work as a team? Techniques will include many uses of Google Docs, wikis and other shared document systems, as well as strategies and methods that create a culture of cooperation.

Take back to work:

  • Behavior that builds team cohesion
  • 3 uses of Google docs you had not previously considered
  • How to organize team projects to improve teamwork
  • Track projects using Kanban boards.
  • How to divide big projects among team members
  • Collaborating via the "Tom Sawyer Fence Painting" technique
  • How to criticize the work of teammates constructively
  • How to get agreement on big plans

Topics include:

  • Meetings: How to make them more effective, shorter, and more democratic
  • How to create accountability, stop re-visiting past decisions, improve involvement
  • Strategy for leaving "fire-fighting" mode, be more "project focused".
  • Project Work: Using "design docs" to get consensus on big and small designs before they are committed to code.
  • Service Docs: How to document services so any team member can cover for any other.
  • Kanban: How to manage work that needs to be done.
  • Chatroom effectiveness: How to make everyone feel included, not lose important decisions.
  • Playbooks: How to get consistent results across the team, train new-hires, make delegation easier.
  • Send more effective email: How to write email that gets read.

Register today for Usenix LISA 2014!

Posted by Tom Limoncelli in Usenix

Quoting from a community forum post on SpiceWorks:

It doesn't have "DevOps" in the name, but the new The Practice of Cloud System Administration ... covers a lot of the same concepts, more as "here's some things that have emerged as best practices in the modern world of system administration." Textbook-thick but destined to be a classic like his previous The Practice of System and Network Administration.

Thanks to Ernest Mueller for the kind words!

Apply now for a grant to attend LISA14. Submissions are due by Monday, October 13.


Are you a student? There are grants available for the general conference and the tutorial program.

Are you a woman? As part of its ongoing commitment to encourage women to excel in this field, Usenix is pleased to announce the return of the Google Grants for Women to support female computer scientists interested in attending the LISA14 conference. All female computer scientists from academia or industry are encouraged to apply.

Applications are due by October 13.


Posted by Tom Limoncelli in Usenix

Tom will be the speaker at the October 8th meeting of The NYC DevOps Meetup which meets (I kid you not) at the office of MeetUp, Inc in New York City. I'll be talking about our new book, The Practice of Cloud System Administration.

For more info: http://www.meetup.com/nycdevops/events/208856642/

Concerning PICC

Today, Wednesday, October 8, 2014, we, Matt Simmons and Thomas Limoncelli, resigned from the board of Professional IT Community Conferences, Inc. also known as "PICC". PICC is the New Jersey non-profit business entity that has backed LOPSA-East and Cascadia since 2011. Those two conferences should be unaffected as it was already agreed that they would find new organization(s) to work with for their 2015 conferences.

As of June 10, 2014, PICC, Inc. had voted to and was in the process of being dissolved. However we feel this process has become impossible due to the remaining board member's foot-dragging and at times outright deceptive actions. We can not be on a board of an organization that conducts business in that way. We feel that the community deserves better and should request transparency from PICC, Inc. during its dissolution process.

We look forward to the future success of the organizations and events with which PICC has been affiliated.

Posted by Tom Limoncelli in Community

I'm honored to be a keynote at NLUUG's Autumn Conference, 20-Nov-2014, in The Netherlands. I don't get to Europe often, so this may be the last chance to see me there for a while. I'm also trying to arrange a book-signing while I'm there.

For more info, visit https://www.nluug.nl/events/nj14/

Register now! Registration is limited!

Even though the registration page is in Dutch, the talk will be in English. Google translate is your friend.

I'll be the speaker at the Wed, October 8th meeting of the NYCDEVOPS Meetup which meets (I kid you not) at the office of MeetUp, Inc. in New York City. I'll be talking about our new book, The Practice of Cloud System Administration.

For more info: http://www.meetup.com/nycdevops/events/208856642/

Shane Madden, a coworker of mine, recently re-engineered our Stack Exchange's Puppet environment. It is now full of win. Read about it here: http://shanemadden.net/stackexchange-puppet-cleanup.html

Posted by Tom Limoncelli in Puppet

Fortune Magazine published an article called Why women leave tech: It's the culture, not because 'math is hard'

TL;DR version: We treat them like shit and are surprised when they leave. So, basically women leave tech because they have self-respect.

Good for them. Shame on our industry.

A few weeks ago I suggested that there aren't many women in tech because "women have good taste". Every woman that I've said this to has agreed... or at least laughed. However it is an uncomfortable laugh. A laugh that indicates that it is something we all know, but don't know how to talk about.

Let's talk about what we can we do to change the industry's culture to not suck. The first step is identifying the (to be polite) off-putting behaviors.

Here are my top 3:

Interrupting people when speaking: I think most men don't realize how often they interrupt women and not men, or that they interrupt everyone but since men interrupt back it is ok. Let's give men the benefit of the doubt and assume that they interrupt everyone equally... why would anyone want to work in such an environment? Men put up with it, women have better taste.

Assumption of competency or lack there of: A friend recently pointed out that as a professional services engineer, she has to prove herself to each new customer. She gets questioned until she demonstrates competency. Her male coworkers are assumed competent until proven otherwise. Her male coworkers confirm they see this behavior; it is not imagined. I've always felt that it is a character flaw of mine that I'm slow to trust people's technical skills until I see evidence. Do I do this with women more than men? I'm not sure. Lately I've worked at companies with brutal interview processes so I can just assume that by being there I should assume competence. That said, having to prove oneself every day is insulting and demoralizing. Why would anyone willingly put up with that? Industries with better gender balance don't have this problem. Women have better taste.

"Help" in the form of criticism: You can't make a technical proposal without the immediate reaction being everyone listing all the reasons why it won't work. What a shitty culture we have. Taking a moment to first say what you like about a proposal is a basic courtesy IMHO. In defense of the critics, I think engineers often feel they're being helpful by by pointing out the trouble spots in a proposal so that the person can engineer around them... as if the person hasn't thought of these caveats already. (ProTip: Its always easy to say why a proposal might not work. Showing a demo avoids this and starts the conversation beyond a debate over if something is possible.) Imagine how much more enjoyable a workplace would be if people acted collaboratively and cooperatively? Women have better taste.

I could list more reasons, and more anecdotes but these are the three cultural defects that I see as the most pressing.

Posted by Tom Limoncelli in Women in Computing

I'm excited to announce that I'm interviewed on the new episode of DevOps Cafe. We talk about the history of system administration leading up to DevOps, recent changes, how the Usenix LISA conference has changed this year, and more.

The live stream of Apple's announcement of the Apple Watch was marred by technical problems. Users saw messages about "could not load movie" and "you don't have permission to access".

As we read Dan Rayburn's excellent technical analysis of what went wrong, we couldn't help but think how easily preventable their problems were.

The problem was that Apple introduced a new feature that had unknown resource requirements and (oops!) they didn't have enough resources. For example, suppose a thousand website visitors requires a certain number of computers (resources) to serve the website. Some websites are "heavier" and require the same work to be spread over more computers, others require fewer resources per thousand users.

For previous events the live stream page was a static page that embedded the live stream video. This made the page highly cacheable and required very few resources.

However this event added a new interactive feature: a live scrolling display of tweets which created a live summary of what was being presented. The idea was brilliant and did (when it was working) created a richer user experience.

The problem was that this "tweet feed" went from having 0 to millions of users in a matter of minutes. That's much too fast to add new capacity if the number of resources had been under estimated.

Normally a web service starts small and grows over many months. For example, you might start a new website for trading Beanie Babies. At first you get a few dozen or hundred visitors each day. Soon your site becomes more popular and you have a constant flood of visitors. You add capacity to handle the flood and all is good. Eventually you notice you are growing at, for example, 10% per month. That gives you a good sense of how much capacity you have to add each week to handle the additional users.

When growth is spread out over a long amount of time it is easy to stay ahead of the curve. However what if on "day 1" you were going to have a million users? You would have to accurately predict how much capacity will be required with zero prior experience; just engineering estimates.

You can do some tests. You can easily simulate 1,000 users and multiply it up, but that kind of projection is rarely accurate. When dealing with growth in the hundreds of thousands you can't predict what problems will crop up.

For example, Apple may have tested their new interactive tweet feed with thousands of users, but until they had millions of users there was no way to know what their weakest link was. It could have been bandwidth, a software thread locking problem, not enough CPU, or a myriad other issues.

So how do you handle this kind of situation? Companies like Google and Facebook have developed ways to perform tests that more accurately predict the resources needed. They had no choice. Google can't announce a new feature without millions of people trying the new feature the moment it is available.

One technique is to slow down the rate of growth artificially. When Gmail was new, Google required new users to get an invitation to join. They controlled the rate at which invitations were distributed, thus controlling the rate of growth. Did a new shipment of hard disks arrive late? Delay the next batch of invites. Did a code optimization make the system more efficient? Hand out more invitations.

Another way to artificially control growth is to enable the feature without announcing it. This is called a "soft launch." Word of mouth only spreads so fast. This may slow down the growth enough that it can be monitored by engineers who can fix the problems as they are discovered. In the worst case, you can turn off the feature since it hasn't been officially announced yet. Users will be disappointed but the truth is that removing the feature might just create more hype for when it does return.

Apple's situation was a little different. Literally zero to millions in just a few minutes. For that, you need to do a dark launch. As described in our new book, The Practice of Cloud System and Network Administration:

The term dark launch was coined in 2008 when Facebook revealed the technique was used to launch Facebook Chat. The launch raised an important issue: How to go from zero to seventy million users overnight without scaling issues. An outage would be highly visible. Long before the feature was visible to users, Facebook pages were programmed to make connections to the chat servers, query for presence information and simulate message sends without a single UI element drawn on the page. This gave Facebook an opportunity to find and fix any issues ahead of time. If you were a Facebook user back then, you had no idea your web browser was sending simulated chat messages but the testing you provided was greatly appreciated.

Apple could have added some code to their homepage that would have queried the tweet stream but not displayed it. This would have enabled them to use the millions of people that visit their homepage to help them test this feature. In fact, they could have started small: enabling this hidden feature for 1% of all visitors and "turning up the dial" slowly over time to see how the tweet stream system reacted. Once it was at 100%, they'd have developed confidence in the system.

In the process,they would have probably found what is suspected to be a bug. The code was querying for updates 1000 times per second instead of a more reasonable ten times per second. This alone might have created 100 times the pressure on Apple's resources.

All of this could have been done weeks before the actual event.

What makes this such a shame is that the media spent time discussing the technical problems of the launch, even though this had no real bearing on the new Apple Watch itself. This subtracted from the amount of positive press that Apple was trying to achieve.

We're not saying that if Apple's engineers had read The Practice of Cloud System Administration the entire fiasco would have been prevented. In fact, we have to assume someone at Apple knows these techniques. The problem is that the right people didn't know. This is why we wrote this book: to spread these kind of ideas to more people.

It isn't just big companies like Apple that need to understand these techniques. Even small startups have unexpected success.

Big launches are high-stakes tests of a company's IT team. You have to get it right the first time or you'll end up like the successful restaurant Yogi Berra once described: "Nobody goes there anymore. It's too crowded."

For more information about The Practice of Cloud System Administration, please visit http://the-cloud-book.com.

  • LISA16