Awesome Conferences

June 2012 Archives

Is the person that hand-crafts a bed out of wood he personally chopped from the forest, designed, and built doing the same job as someone that builds a bed factory that makes 100 beds a day?

I don't think so.

So why do we use the same job title for a person at a 10-person company that maintains 1-2 custom-built, servers, and spends 70% of his or her day answer user questions as the person that maintains a massive 1,000-CPU cluster using Cfengine/Puppet/Chef to orchestrate hundreds of web front-ends, dozens of database servers, and huge numbers of application servers all mass-produced and automated?

Are those even the same job?

The latter has enough repetition that you can develop metrics and make data-driven decisions to constantly improve the quality using science. The former is an art form, and a labor of love and has quality that is not based on metrics and science.

That's my thought of the day.

What do you think?

P.S. I'm about to hop on an airplane. I'll reply to comments in a few days.

Posted by Tom Limoncelli

Rikki Endsley posted to Google Plus this week:

I saw this tweet today from a hiring manager: "Just interviewed for a sysadmin. I'm struggling since she has no social footprint. Is that wrong, or should social be key?" What are your thoughts on a 'social footprint' requirement for sysadmins? link

I'm very disturbed hearing a hiring manager say this. "Social Footprint" means how visible the person is on social networks like Facebook, G+, Twitter and so on. What does that have to do with whether or not the person is a good system administrator?

It could be a bad thing if it means the person is anti-social or doesn't keep up with the latest innovations. It could be a good thing if it means the person has privacy concerns. In fact, if someone has a background in security and has kept themselves invisible in light of all the social networking stuff that is out there, I'd say that indicates a particular skill. Guessing wrong in this area will result in a bad hiring decision.

The reason that this really struck me, however, is that the candidate is a "she". Is this a judgement we'd make about a male candidate? Take a moment to think about how you'd react differently to a woman saying she's not on Facebook vs. a man.

While discrimination in certain categories is illegal (this varies by state and country) let's talk about the broader definition of discrimination: Turning away a candidate because "they aren't like me".

The goal in hiring is to hire the absolute best person for the position. Discrimination is bad because it means you end up missing the best candidate. Put another way: Discrimination results in you hiring people that aren't as good as you could be hiring.

Let's look at some subtle ways that we discriminate that leads to bad hiring decisions:

Example 1: Candidate doesn't have a home network: I've heard this used as a "red flag". "How could they be a serious sysadmin if they don't have a network at home?" Here are a few reasons why this is terrible criteria to use:

  • The candidate can't afford one. Why discriminate against someone for being poorer than you? For most of my career my "home network" was paid for by my employer (either partially or substantially.... whether they knew it or not). Are you discriminating against someone for working for a cheap employer or are you discriminating against them from being too broke to buy equipment and too honest to steal from their employer?
  • The candidate has a huge lab at work and doesn't need to experiment at home.
  • The candidate has children at home and doesn't want them to break things. Are you discriminating against someone for having children?
  • The candidate keeps a good separation between homelife and worklife which is something that many fine time management books recommend. Are you discriminating against someone for having good time management skills? A good "work-life balance"?
  • The candidate just doesn't need one. Not everyone does.
  • The candidate has one, but doesn't call it that. When I began writing this article my plan was to point out that I don't have a home network. I don't think of myself as having a home network. However, my Cable TV provider's box includes a WiFi base station: my laptops, phones, Tivos and Wii connect to it. ...that's not a "network", is it? Well, ok, I guess technically it is. I don't think of it as one. I guess you wouldn't have hired me.

The issue of whether or not a candidate has a home network comes from the days when having a home network was difficult: it meant the person had experience running wires, connecting hubs and switches, configuring routers, setting up firewalls, and, if this was before DHCP, it meant knowing a lot about IP addressing. That's a lot of knowledge. While it is a plus to see a candidate with such experience, it isn't a minus if the candidate doesn't have that experience. It just means they have an awesome internet provider or are smart enough to buy a damn pre-made WiFi base station so they can spend more time having fun.

In the chapter on hiring sysadmins in TPOSANA (yes, there is a chapter on that!) we make the point that some people (often women and minorities) downplay their own experience. Quote...

Asking candidates to rate their own skills can help indicate their level of self-confidence, but little else. You don't know their basis for comparison. Some people are brought up being taught to always downplay their skills. A hiring manager once nearly lost a candidate who said that she didn't know much about Macintoshes. It turned out that she used one 8 hours a day and was supporting four applications for her department, but she didn't know how to program one. Ask people to describe their experience instead.

Which leads me to the next example...

Example 2: Candidate didn't grow up using computers: I hadn't realized that was a requirement for being a sysadmin!

  • The most obvious reason this is invalid reasoning is that some candidates were born before having a home computer was possible. Age discrimination is illegal in all 50 states (though the age range is different).
  • Many people just plain weren't interested in computers until later in life. Two women I know both tell the same story: it wasn't until sophomore year in college that they took a computer class and realized they had an aptitude for it. Soon they had changed major and the rest is history.
  • Many people grow up too poor to have a computer when growing up. Discrimination again people for being poor is just stupid. Not hiring someone because they were poor or are poor is helping create the problem of poverty that you so obviously dislike! Duh!

There are many other ways we turn down perfectly good candidates because "they aren't like us". It is an easy trap to get into. It is our responsibility be critically aware of our thinking when making hiring decisions and do our best to hire based on criteria that relates to job performance and nothing else. Hire the best.

Posted by Tom Limoncelli in Management

MicroReview: Tarsnap

I started using Tarsnap to backup my personal server "to the cloud". I found it was quick to set up, easy to learn, and works pretty well.

And, yes, I've already made a wiki page that documents how my monthly restore tests will be done. The data is encrypted, which means if you lose your crypto key you can't get your data back so my restore test is done from a different machine to force me to have a copy of the key stored safely off-line.

If you are looking to do backups over the internet, check this out.

Posted by Tom Limoncelli in Reviews

Someone recently asked me how I should handle a vendor that wasn't being responsive: "Twice now I've sent the support team requests and received an automated response and little else. The first ticket took a month for them to answer. The second was closed with a note that they had tried to call me, but I didn't answer. Mind you, they never emailed me to say they had called."

I've found that when opening a "case" or "ticket" with a vendor you have to "stay on them" or, more accurately, "manage it ruthlessly until the issue is resolved". Very few vendors are good at follow-through on tickets. Here's what I do:

I call them every day, first thing in the morning, and always refer to the same ticket; and I keep good notes.

Let's break that down:

a. I call them... I don't rely on them to call me back. If they tell me they'll call me, I say that's fine but I let them know that if I don't hear from them by (for example) 10am, I'll call them.

b. ...every day. I call and ask for an update every day. If they say something with take N says, then I don't call back for N days, but otherwise I call them daily. If is apparent that they are frustrated by my frequency, I either lie and claim that I have a mean boss that demands daily updates and graciously apologize ("I hope you understand") or I ask them when I should call next for a status update. I don't accept "we'll call you with a status". I just don't.

c. ...first thing in the morning. By calling them early in the day you are setting them up to spend the day working on your issue. If you call them at the end of the day, they'll decide to do your request "tomorrow morning" and now you are gambling that no other squeaky wheel will get their attention when they come into the office. Plus, you are giving them 12+ hours to forget your request. This is a Jedi mind trick to manage their time without them knowing.

d. ...always refer to the same ticket. Don't let them create a new ticket number if it is the same problem. If a problem re-occurs months later, request that they re-open that ticket. There are two main reasons for this: it takes away their excuse that "you just brought this to our attention" (this is important if the issue gets escalated) and as new people get involved it helps them to be able to read the complete history of the issue.

e. and I take notes. Every time an escalation goes wrong I wish I had taken notes ("evidence") from the start so now I always take notes. I ask the person their name (in a friendly "I'm just introducing myself" way), the date, and any commitments they make.

Two other tips:

f. Define the problem for them, and define it at a high level.

Defining the problem properly is actually quite difficult:

Once I opened a ticket that "I can't ping server xyz" and ended up talking to a zillion very confused people. It turns out I was supposed to start accessing a different server and was never informed. To make matters worse, the server I had been using was decommissioned. By escalating the issue the company nearly took the old machine out of the trash. I should have reported the problem as "I'm unable to do lookups in your LDAP server". By stating it at a higher level of abstraction (what I'm trying to achieve, not what I think the problem is) it lets them do their job better. In my case, what I wasn't trying to achieve was ping a machine... it was doing an LDAP lookup. If I had reported it this way from the start, they would have addressed this as an LDAP issue, not a networking issue, and it would have been resolved quicker. As a sysadmin you are most likely more technical than someone working customer support, therefore we tend to do our own diagnosis. It is actually better to give the high level complaint, let them digest a little, then feed them the diagnostic data you've accumulated so far.

Another time I asked the wrong question was dealing with a salesperson. The question I asked was along the lines of "is there anything else I need to do?". A week later the equipment I had ordered wasn't received. I called and he again told me there was nothing else I needed to do. A week later (this is before I learned that daily calls work better) I called to complain that I hadn't received the equipment and now my deadline is in jeopardy. He said "Of course you haven't gotten it yet! I can't even place the order into my system until your company establishes a credit line with us." You can imagine how angry I was. He, on the other hand, was confused why I would be so upset. He hadn't lied to me... there was nothing Tom Limoncelli needed to do: credit line was a something the CFO does. At this point my CFO hadn't been contacted. (Don't these people work on commission?) Well, that's the day I learned that the right question to ask is, "What day should I expect the box to arrive". That's the "higher level of abstraction" that covers all the bases. If I get a date, fine. If I don't get a date, it starts a conversation about why they can't give a date yet. If I know the roadblocks, I can work them until they are removed. I can take responsibility for making sure they get resolved. Credit line not established? Ok, I can talk with the CFO. If all roadblocks are cleared and they still can't give me a date, then I ask for a date they'll be able to give me a date. Now I know when to call back and ask the right question.

Luckily my current position doesn't involve dealing with external providers. However my previous job was at a very small company therefore we were too small to do many things ourself. We had to rely on external companies for a variety of things: Internet connection, web hosting, hardware depot repairs, and so on. Hopefully you can benefit from these lessons that I've learned.

A co-worker of mine recently noticed that I tend to use rsync in a way he hadn't seen before:

rsync -avP --inplace $FILE_LIST desthost:/path/to/dest/.

Why the "slash dot" at the end of the destination?

I do this because I want predictable behavior and the best way to achieve that is to make sure the destination is a directory that already exists. I can't be assured that /path/to/dest/ exists, but I know that if it exists then "." will exist. If the destination path doesn't exist, rsync makes a guess about what I intended, and I don't write code that relies on "guesses". I would rather the script fail in a way I can detect (shell variable $?) rather than have it "guess what I meant"; which is difficult to detect.

What? rsync makes a guess? Yes. rsync changes its behavior depending on a number of factors:

  • is there one source file or multiple source files?
  • is the destination a directory, a file, or doesn't exist?

There are many permutations there. You can eliminate most of them by having a destination directory end with "slash dot".

For example:

  • Example A: rsync -avP file1 host:/tmp/file
  • Example B: rsync -avP file1 file2 host:/tmp/file

Assume that host:/tmp/file exists. In that case, Example A copies the file and renames it in the process. Example B will fail because rsync's author (and I think this is the right decision) decided that it would be stupid to copy file1 to /tmp/file and then copy file2 over it. This is the same behavior as the Unix cp command: If there are multiple files being copied then the last name on the command line has to be a directory otherwise it is an error. The behavior changes based on the destination.

Let's look at those two examples if the destination name doesn't exist:

  • Example C: rsync -avP file1 host:/tmp/santa
  • Example D: rsync -avP file1 file2 host:/tmp/santa

In these examples assume that /tmp/santa doesn't exist. Example C is similar to Example A: rsync copies the file to /tmp/santa i.e. it renames it as it copies. Example B, however, rsync will assume you want it to create the directory so that both files have some place to go. The behavior changes due to the number of source files.

Remember that debugging, by definition, is more difficult than writing code. Therefore, if you write code that relies on the maximum of your knowledge, you have, by definition, written code that is beyond your ability to debug.

Therefore, if you are a sneaky little programmer and use your expertise in the arcane semantics and heuristics of rsync, congrats. However, if one day you modify the script to copy multiple files instead of one, or if the destination directory doesn't exist (or unexpectedly does exist), you will have a hard time debugging the program.

How might a change like this happen?

  • Your source file is a variable $SOURCE_FILES and occasionally there is only one source file. Or the variable represents one file but suddenly it represents multiple.
  • The script you've been using for years gets updated to copy two files instead of one.
  • Over time the list of files that need to be copied shrinks and shrinks and suddenly is just single file that needs to be copied.
  • Your destination directory goes away. In the example that my coworker noticed, the destination was /tmp. Well, everyone knows that /tmp always exists, right? I've seen it disappear due to typos, human errors, and broken install scripts. If /tmp disappeared I would want my script to fail.

It is good rsync hygiene to end destinations with "/." if you intend it to be a directory that exists. That way it fails loudly if the destination doesn't exist since rsync doesn't create the intervening subdirectories. I do this in scripts and on the command line. It's just a good habit to get into.


P.S. One last note. Much of the semantics described about change if you add the "-R". They don't get more consistent, they just become different. If you use this option make sure you do a lot of testing to be sure you cover all these edge cases.

Posted by Tom Limoncelli in Technical Tips

Short version: Take this survey, you might win a $100 Amazon gift card but more importantly you'll be helping great research.

Long version:

Hello All,

Some of you may recognize my name - and some of you may recognize my
research. :) I study sysadmins and help organizations find ways to
understand the work of system administration better, in part, so they
can build better software. I conducted a study a few years ago that I
presented at LISA, and I'm working on extending it to a journal paper.
This extended publication would dramatically increase readership of
the results to include top researchers and executives, so I think it's
a worthy endeavor.

But... I have some reviewers absolutely incredulous at my claim that
sysadmins, as software or tech users, are any different from everyone
else. They think that sysadmins' expertise and view of the system
doesn't change the way we work. However, I have other reviewers that
believe in the work and my results. I have been asked to re-collect
this data to verify my findings - in effect, prove that my earlier
survey results weren't a fluke.

So: I would really appreciate it if you would take about 15 minutes
to complete the survey, even if it looks familiar to you from a few
years ago. Also, please feel free to forward this on to any other
sysadmins you may know. One participant (aka survey completer) will
be randomly selected to win a $100 gift card. AND if I get
lots of responses, I'll go ahead and add in extra gift cards: so one
person for every 100 completed will win. (Just so no one tanks their
odds by referring friends.) I would also be happy to share my results
with anyone who is interested.

The link is below in the official invitation, but I've also copied it

Thanks again!



Dear System Administrator,

My name is Nicole Forsgren Velasquez, and I am a researcher at
Pepperdine University. I am conducting a study to examine the factors
and qualities that motivate system administrators to use or not use
software applications, and we would really appreciate your help. You
are being invited to participate because as a system administrator,
you have a unique understanding of what makes a software application
(i.e., "tool") useful in your job.

The survey asks about the tools you use in your work and will take
about 15 minutes to complete. There are no right or wrong answers; we
are interested in your opinions. Approximately 200 system
administrators are being asked to participate in this survey. As a
way of thanking you for your participation, one participant will be
randomly chosen to win a $100 gift certificate. The survey
can be accessed here:

We would like to get the opinions of as many system administrators as
possible, so feel free to tell fellow system administrators about the

Thank you very much!

The author can be reached via her web site:

Posted by Tom Limoncelli in Industry

LOPSA Elections

The LOPSA board elections are happening. Turn-out so far is around 11%, which is pathetic. Folks, if you are a member, vote!

This mailing list post has more details:

Voting takes just a few minutes.

(And if you aren't a member, join up and vote!)

Posted by Tom Limoncelli in LOPSA

Website latency is a major issue. Jeff Dean from Google has given a presentation that, for the first time, reveals some of the techniques used at Google. Seeing the presentation reminded me of the "shock and amazement" I had when RAID was invented (yes, kids, RAID used to be a "new thing"). An abstract and slides are available here

The slides are well worth a read.

Posted by Tom Limoncelli

System Administration is maturing and, yet, there is no accepted standard curriculum. It is ironic, and somewhat scary, that a field that society is more and more dependent on has no formal, accepted, educational path. I propose a framework that is similar to that of the electrical/electronics industry.

To become a doctor there is a generally accepted educational path. Undergraduate "pre med" or biology program, medical school, internship, and so on. It gives me great comfort that the doctors that I see follow a formal path. Sysadmins, however, often "fall into" the career. I know many sysadmins whose formal education is in physics, for example, because it teaches them the rigors of mathematics, measurement, and thinking in terms of systems. I know many sysadmins who got their start with computers as a hobby by experimenting at home, possibly fixing friend's computers, and then "fell into" system administration as a job and are enjoying a highly successful career. Yet, I know of exactly zero doctors who got their start performing medical experiments at home. The medical profession went from "barbershops" to the scientific study and practice of medicine. System administration must make a similar journey.

Education of system administration is evolving into a 3-part framework similar to the electrical industry. If we are to mirror that industry it is important to first understand their framework.

The electrical industry has three tiers: the technician, the engineer, and the researcher.

The technician is someone you might hire to install a new electrical outlet in your home, or on a construction site installs the electrical infrastructure. People in this role follow the accepted practices of the industry, called "building codes". A technician literally might not know Ohm's Law. They do know, however, that the building code says every n feet of this there has to be one of that. That every 15 Amp circuit can have a certain number of outlets. They might not know, or care, why these rules exist, but they are rules to be followed. They know that if such a rule is violated the work will not pass when the building inspector checks their work. Technician jobs generally do not require a college education.

The engineer generally has formal college education. They have a depth of knowledge that enables them to design the systems that technicians install. They understand not just what building codes exist but they understand the science behind them. They are responsible not just for small designs such as the wiring for a new home, but also for large designs such as the power of a stadium lighting grid. More senior engineers write new building codes. Some engineers have a general practice while others specialize. Some are involved in relatively mundane projects while others are on the cutting edge.

The researcher invents. While engineers may design something that has never been designed before, researchers create entirely new categories. They may have a design approach and invent new components or they may take a physics approach and invent entirely new paradigms.

The field of system administration would benefit from a similar approach.

The system administrator technician deploys and maintains the systems as designed by others. They may not know all the details of why a standard exists but they know how to stay within those bounds. This already exists in terms of "vendor certifications". A technician learning a Red Hat, Cisco, or Microsoft certification is equivalent to an electrician learning the building codes. Rather than a government inspector providing a "certificate of occupancy" (C.O.), the pressure to follow the standards set out by the vendors who withhold support from designs that do not follow certain best practices. When designing a MS-Exchange environment one could choose to not use ActiveDirectory but it would be against the vendor recommendation and would not be a supported configuration. I've been told by network engineers that they were choosing one design idea over another because Cisco would not "certify" designs of such stripe. While vendors use the "carrot" of the promise of support, it is as powerful as the "stick" of a building inspector's "C.O."

The system administration engineer is less well defined. There is a serious need for University level degrees to fill this void. There should be BA/BS level degrees as well as degrees at the Masters level.

The systems administration researcher is the Ph.D level. This does not need much explanation. However, it should be pointed out that one does not need a Ph.D to invent in the world of systems administration. The industry is moving too quickly to isolate the creation of new paradigms to an ivory tower. The entire DevOps paradigm is a "found pattern" i.e. evolved organically and was given a name once many individuals all reached the same conclusion.

What should our next steps be?

Educating technicians is being taken care of by vendors. That's fine and appropriate.

PhD level education is something that will come in time.

The gap is at the University level. That should be the focus. To be more specific, the ultimate goal should be to define a 4-year degree in systems. To that end, we should begin by finding who is currently teaching "system administration" at a University level, catalog what they are doing, and bring them together to flesh out standards for curriculum.

I would be interested in talking with university-level instructors that would like to join forces and do such a project.

Tom Limoncelli


Posted by Tom Limoncelli in Education

Someone recently asked me what language a sysadmin should learn.

If you are a sysadmin for Windows the answer is pretty easy: PowerShell.

The answer is more complicated for Unix/Linux sysadmins because there are more choices. Rather than start a "language war", let me say this:

I think every Unix/Linux sysadmin should know shell (sh or bash) plus one of Perl, Ruby, Python. It doesn't matter which.

The above statement is more important to me than whether I think Perl, Python or Ruby is better, or has more job openings, or whatever criteria you use. Let me explain:

It is really important to learn bash because it is so fundamental to so many parts of your job. Whether it is debugging an /etc/init.d script or writing a little wrapper. Every Unix/linux sysadmin should know: how to do a for loop, while loop, if with [[ or [, and $1, $2, $3... $* and $@, case statements, understand how variable substitution works, and how to process simple command-line flags. With those basic things you can go very far. I'm surprised at how many people I meet with a lot of Unix/Linux years under their belt that can't do a loop in bash; when they learn how they kick themselves for not learning earlier.

The choice of perl/python/ruby is usually driven by what is already in use at your shop. Ruby and Python became popular more recently than Perl, so a lot of shops are Perl-focused. If you use Puppet, knowing Ruby will help you extend it. I work at Google which is big on Python so I learned after coming here; it was a shock to the system after being a Perl person since 1991 (someone recently told me Perl didn't exist in 1991... I introduced him to a little something called Wikipedia).

From a career-management point of view, I think it is important to be really really really good at one of them and know a little of the others; even if that means just reading the first few chapters of a book on the topic. Being really really really good at one of them means that you have a deep understanding of how to use it and how it works "under the hood" so you can make better decisions when you design larger programs. The reason I suggest this as a career-management issue is that if you want to be hired by a shop that uses a different language, being "the expert that is willing to learn something else" is much more important than being the person that "doesn't know anything but has great potential" or "knows a little of this and that but never had the patience to learn one thing well".


P.S. Other thoughts on this topic: Joseph Kern has advice about the three languages every sysadmin should know and Phil Pennock has great advice and an interesting summary of the major scripting languages.

Posted by Tom Limoncelli in Career Advice