Awesome Conferences

March 2012 Archives

In 2007 when Peter H. Salus and I published all the April Fools RFCs in one book we also included the poetry RFCs and the funny RFCs published outside of April Fools timeframe.

Speaking of which... we included "RFC 2410: The NULL Encryption Algorithm and Its Use With IPsec" because, well, I thought it was funny. Specifying an encryption scheme for IPsec that does not encrypt the bytes is, well, funny. It turns out it wasn't published as a joke. Oops. No offense meant to the authors R. Glenn and S. Kent.

Nobody pointed this out to me until years after the book was printed. Sadly because this book is printed on dead trees we can't "take it back".

We don't have a new edition that includes the 2008-2013 RFCs but those are pretty easy to find. The book does include some commentary that isn't available on-line. That includes forewords by Mike O'Dell, Scott Bradner, and Brad Templeton. I re-read them today and was impressed at how they have stood the test of time.

More about the book here:

Order it on Amazon here:

Tom Limoncelli

Posted by Tom Limoncelli in Book News

I'll be giving a talk about Ganeti, the open source virtual cluster manager April 10th @ 8:00pm at the Woodbury Campus of Cold Spring Harbor Lab, in the Woodbury Auditorium.

For more information visit:

See you there!

Posted by Tom Limoncelli

[Note: "Early-bird" price ends in 3 days! Don't lose the discount!]

The PICC committee is excited to announce our closing keynote speaker:

Rebecca Mercuri on "The Black Swan and Information Security"

Dr. Mercuri is the lead forensic expert at Notable Software, Inc. Her caseload has included matters from contraband, murder, viruses and malware, and election recounts (most notably Bush vs. Gore). She has testified on the federal, state, and local level as well as to the U.K. Cabinet.

Talk abstract: The economic theories proposed by Nassim Nicholas Taleb in his book "The Black Swan" have strong parallels in information security. Indeed, the concepts of robustness and risk assessment mentioned in Taleb's writing are also well known to those who design software and systems intended to withstand attack. Such assaults on computers, networks and data are now so commonplace that if these threats all suddenly vanished, this would likely constitute a Black Swan Event. But whether a successful and novel attack should also be considered a Black Swan may be debatable. This talk will compare the shortcomings of bell curve (Mediocristan) and power law (Extremistan) event models. The idea that outlier occurrences should be considered more "normal" will shed insight on new methods for recovery mitigation. Attendees need no formal knowledge of statistics or economics in order to appreciate the concepts discussed in this talk.

Register now and avoid the rush!

Space is limited! Register now!

Note: The opening keynote speaker will be announced in a few days.

Posted by Tom Limoncelli in ConferencesLOPSA

Thanks to everyone that attended my tutorials and talk at Cascadia IT 2012. I finally got the timing right on both the Intro to Time Management for Sysadmins as well as The Limoncelli Test.

I also gave a talk about the open source virtual cluster manager called Ganeti which I'm a part of via my job at Google. I'll be repeating this talk at CrabbyAdmins in Columbia, MD on Wed, April 4th.

After the conference I got email from a fan that wrote "just an FYI, I've placed your book on a custom foam pedestal at my desk. Gave you the old WA state classiness." Here's a picture:

I look forward to seeing everyone at Cascadia in 2013!


P.S. On the east coast? Don't miss the 2012 LOPSA PICC conference, May 11-12, 2012 at the Hyatt Regency hotel in New Brunswick, NJ

Posted by Tom Limoncelli in Conferences

As requested, I've made a printable version of The Limoncelli Test.

I'll be teaching a class based on this article in Fri, March 23 2012 at the Cascadia IT Conference in Seattle, WA and on Fri, May 11, 2012 at the LOPSA PICC 2012 Conference in New Brunswick, NJ. Seating is limited. Register soon!

I'll also be teaching my Time Management class and giving an invited talk on the Ganeti virtual server cluster management software.

See you there!

Posted by Tom Limoncelli in Limoncelli Test

If you are in the Pacific North West I hope you are planning on attending the Cascadia IT conference March 23-24, 2012. And if you are attending, I hope you have signed up for one or both of my tutorials. I'll be teaching "Intro to Time Management" and a new class "The Limoncelli Test: Evaluating and improving sysadmin operations".

"The Limoncelli Test" is a tutorial I first did last December at LISA '11. It was kind of a half-baked idea and I got a lot of really excellent feedback. I've revamped a lot of it and I think the class it going to be much better. There are three changes I'm making:

  1. Order: I'm re-ordering the sections to be more logical.
  2. Emphasize making change: I'm putting an emphasis on how to convince your coworkers and managers to join you in making these changes. Attendees were easily convinced to adopt these practices but many pointed out that the hard part is how to get cooperation from others in your company. Thus, I'm emphasizing various techniques for influencing others. It's going to be about half the class.
  3. Homework: I'm asking all attendees to read "the test" as well as the many pages of explanation ahead of time. Grade your team. Come to the tutorial with questions. This will accelerate the class and, to be honest, let me pack in a full day of tutorial information in the half-day allotted to the class.

If you are signed up, please post a comment (just a "I'm in!" would be nice). If you took the class in December, please feel free to post feedback.

And of course... if you haven't registered for the conference: DO IT NOW!


Register for Cascadia IT 2012 here.

See you there!

Posted by Tom Limoncelli in Conferences

Fear of Rebooting

I have two fears when I reboot a server.[1]

Every time I reboot a machine I fear it won't come back up.

The first cause of this fear is that some change made since the last reboot will prevent it from being able to reboot. If that last reboot was 4 months ago it could have been any change made in the last 4 months. You spend all day debugging the problem. Is it some startup script that has a typo? Is it an incompatible DLL? Sigh. This sucks.

The second cause of this fear is when I've made a change to a machine (say, added new application service) and then rebooted it to make sure the service starts after reboot. Even if this reboot isn't required by the install instructions I do it anyway because I want to make sure the service will restart properly on boot. I want to discover this now rather than after an unexpected crash or a 4am power outage. If there is going to be a problem I want it to happen in a controlled way, on my schedule.

The problem, of course, is that if you've made a change to a machine and the reboot fails you can't tell if it is the first or second category. Early in my career I've had bad experiences trying to debug why a machine won't boot up only to find that it wasn't caused by my recent change but by some change made months earlier. It makes the detective work a lot more difficult.

Here are some thoughts on how I've eliminated or reduced these fears.

 1. Reboot, change, reboot.

If I need to make a big change to a server, first I reboot before making any changes. This is just to make sure the machine can reboot on its own. This eliminates the first category of fears. If the machine can't boot after the change then I know the reason is my most recent change only.

Of course, if I do this consistently then there is no reason to do that first reboot, right? Well, I'm the kind of person that looks both ways even when crossing a one-way street. You never know if someone else made a change since the last reboot and didn't follow this procedure. More likely there may have been a hardware issue or other "externality".

Therefore: reboot, change, reboot.

Oh, and if that last reboot discovers a problem (the service didn't start on boot, for example) and requires more changes to make things right, then you have to do another reboot. In other words, a reboot to test the last change.

 2. Reduce the number of services on a machine.

The reboot problem is bigger on machines that serve multiple purposes. For example that one machine that is the company DNS server, DHCP server, file server, plus someone put the Payroll app on it, and so on and so on. Now it has 15 purposes. I have less fear of rebooting a single-purpose server because there is less chance that a change to one service will break another service.

The problem is that machines are expensive so having one machine for each service is very costly. It also leaves machines idle most of the time; most applications are going to be idle a lot of the time.

The solution here is to use many virtual machines (VMs) on a single physical box. While there is more overhead than, say, running the same services natively on the same box, the manageability is better. By isolating each application it gives you better confidence when patching both the application and the underlying OS.

(As to which VM solution you should use, I'm biased since I work on The Ganeti Project which is a open source virtualization management system that doesn't require big expensive SANs or other hardware. And since I'm plugging that I'll also plug the fact that I'll be giving talks about Ganeti at the upcoming Maryland "Crabby Sysadmins" meeting, Cascadia IT conference in Seattle, and PICC conference in New Jersey)

 3. Better testing.

Upgrades should never be a surprise. If you have to roll out (for example) a new DNS server patch you should have a DNS server in a test lab where you test the upgrade first. If successful you roll out the change to each DNS server one at a time, testing as you go.

Readers from smaller sites are probably laughing right now. A test lab? Who has that? I can't get my boss to pay for the servers I need, let alone a test lab. Well, that is, sadly, one of the ways that system administration just plain makes more sense when it is done at scale. At scale it isn't just worth having a test lab, the risk of a failure that affects hundreds or thousands (or millions) of users is too great to not have one.

The best practice is to have a repeatable way to build a machine that provides a certain service. That way you can repeatably build the server with the old version of software, practice the upgrade procedure, and repeat if required. With VMs you might clone a running server and practice doing the upgrades on the clone.

 4. Better automation.

Of course, as with everything in system administration, automation is our ultimate goal. If your process for building a server for a particular service is 100% automated, then you can build a test machine reliably 100% of the time. You can practice the upgrade process many times until you absolutely know it will work. The upgrade process should be automated so that, once perfected, the exact procedure will be done on the real machines.

This is called "configuration management" or CM. Some common CM systems include CfEngine, Chef, and Puppet. These systems let you rapidly automate upgrades, deployments, and more. Most importantly they generally work by you specifying the end-result (what you want) and it figures out how to get there (update this file, install this package, etc.)

In a well-administered system with good configuration management an upgrade of a service is a matter of specifying that the test machines (currently at version X) should be running version X+1. Wait for the automation to complete its work. Test the machines. Now specify that the production machines should be at version X+1 and let it do the work.

Again, small sites often think that configuration management is something only big sites do. The truth is that every site, big and small, can use these configuration management tools. The truth is that every site, big and small, has an endless number of excuses to keep doing things manually. That's why we see the biggest adopters of these techniques are web service farms because they are usually starting from a "green field" and don't have legacy baggage to contend with.

Which brings me to my final point. I'm sick of hearing people say "we can't use [CfEngine/Chef/Puppet] because we have too many legacy systems. You don't have to manage ever byte on every machine at the beginning. In fact that wouldn't be prudent. You want to start small.

Even if you have the most legacy encrusted, old, systems a good start is to have your CM system keep /etc/motd updated on your Unix/Linux systems. That's it. It has business justification: there may be some standard message that should be there. Anyone claiming they are afraid you will interfere with the services on the machine can't possibly mean that modifying /etc/motd will harm their service. It reduces the problem to "we can't spare the RAM and CPU" that the CM system will require. That's a much more manageable problem.

Once you are past that, you can use the system to enforce security policies. Make sure /etc isn't world writable, disable telnetd, and so on. These are significant improvements in the legacy world.

Of course, now you have the infrastructure in place, all your new machines can be built without this legacy baggage. That new web farm can be build by coding up CM modules that create your 3 kinds of machines: static web servers, database servers, CGI/application servers. Using your CM system you build these machines. You now have all the repeatability and automation you need to scale (and as a bonus, /etc/motd contains the right message).

This is a "bottom up" approach: changing small things and working up to bigger things.

You can also go the other direction: Use CM to create your services "the right way", have your success be visible, and use that success to gain trust as you slowly, one step at a time, expand the CM system to include legacy machines.

Writing about my "fear of reboot" brought back a lot of old memories. They are, let me be clear, memories. I haven't felt the "fear of reboot" for years because I've been using the above techniques. None of this is rocket science. It isn't even trailblazing any more. It's 12+ year old, proven techniques. The biggest problem in our industry is convincing people to enter the 21st century and they don't want to be reminded that they're a decade late.


[1] Especially servers that have multiple purposes. By the way, for the purpose of this article I'll use the term "service" to mean an application or group of cooperating processes that provide an application to users, "machine" to mean any machine, and "server" to mean a machine that provides one or more services.

Posted by Tom Limoncelli in Career Advice

I got email from someone that was having trouble convincing a boss to spend money on new PCs. The current ones are 5 years old (or older). It is a small company, owned by one man, and he runs every detail. Part of my advice to him was:

Use "undeniable value" to describe requests.

State things in terms of "undeniable value". The statement "we need a faster PC" doesn't do that. To you it has undeniable value: faster is better and will solve a list of problems. But to a non-technical person they can't guess all the things in your head that it will solve. In fact, a non-technical person might think, "you're just trying to spend my money". A statement with undeniable value is one that has an obvious return-on-investment... one that creates profits directly.

So, if his salespeople are spending 3 hours a day typing invoices into these slow computers, and a faster one would let them do it in 1 hour, they could be spending 2 additional hours each day on the phone selling instead of bothering with the computer. "I have a plan that will enable your salespeople spend an additional 2 hours a day on the phone selling compared to today's workflow". That's undeniable value.

For a machine refresh policy: Computers, like old cars, need more maintenance the older they get. I'd have more time for your most important projects if I wasn't spending 15 hours a week repairing old machines. (Then do the math: 15/hours week is 35% of your salary which is $XX,XXX/year down the drain. By spending $XX,XXX on new machines, you'll free up time for me to do projects you find more important.)

Note that if the value can be profit or better efficiency. Your boss respects profit more than efficiency. Can you (or you and your CFO) work out a way to phrase things in terms of the profit it will bring? In fact, his priorities are probably: (from highest to lowest)

  1. Revenue. Guaranteeing a financial return. Actually making money from customers
  2. Increasing scarce productivity. Most attractive if product demand exceeds supply
  3. Cutting costs. Most attractive in a struggling company
  4. Competitive advantage. Even more attractive if your a behind the competition
  5. Technology for the sake of technology. For pizzazz or to maintain "cutting edge" reputation

"We need a faster PC" sounds like #5. "Sales people will spend more time on the phone, less time waiting for their computer" sounds like #1. They're the same thing to you, but very different to your boss.

Take the time to sit down and think out how you are going to describe your request in terms of undeniable value. It is very difficult. It will take a lot of time. You might want to beta-test it on a fellow employee, or maybe the CFO himself. The statement should be "undeniable": Nobody dislikes more profit, or making better use of a rare resource.

Your boss' priorities might be different than that list simply because as a small business person he's probably had to pave his own way, make his own rules, and do things differently. Maybe appealing to his ego or his sense of cheapness will be more successful. Maybe he's a wheeler-dealer and what would impress him most is that you've negotiated a "one time only" great price on these PCs; or maybe letting him do the price negotiations will stroke his ego enough to make it a worthy project.

Either way, good luck and let me know what happens. Also, I encourage people to post comments if they have thoughts and advice.

Posted by Tom Limoncelli in Business

I'm teaching Intro to Time Management for Sysadmins and a new class based on The Limoncelli Test. Register before the classes fill up!

My classes are both on Friday, March 23.

This is a rare opportunity to catch my classes in the PNW area.

The League of Professional System Administrators and the Seattle Area System Administrators Guild are proud to present the 2012 Cascadia IT Conference. Cascadia 2012 is a regional IT conference for all types of system administrators - computer, database, network, SAN, VMware, etc. It will take place on March 23 - 24th (Fri - Sat) of 2012 at Hotel DECA in Seattle's University District. We provide excellent training opportunities at a very reasonable price and it is a great way to meet other local system administrators.

Posted by Tom Limoncelli in Conferences