Awesome Conferences

Recently in Professionalism Category

Humans think in terms of mental models. In IT it is our responsibility to help them form accurate models as well as deal with inaccurate models that exist.

Humans use mental models of how things work to fill in context. If we are not given the model, we make one up. This made-up model may be unrelated to how things actually work, but if it is sufficient for us to get our job done then that's "good enough". I think this is evolutionary: we didn't know why the sun rose and fell, but we made up a model that included a god riding across the sky... a model that was good enough to deal with the fact that "night" and "day" alternate.

As sysadmins we often skip the step of helping users create that model then have an up-hill battle as we deal with users that have created their own, inaccurate, model. We see users do things that seem insane but are actually completely appropriate for the mental model they have created.

I saw a user plug two ports of his desktop ethernet switch into wall jacks so that he could get twice as much network speed. In his model, network bandwidth like electricity in a parallel circuit and this seemed like a reasonable way to get more bandwidth. Instead he crashed the network because he created an ethernet loop (this was before loop detection/prevention mechanisms were common).

When possible we must give users an accurate mental model. However, those opportunities are rare.

When we answer technical support questions we must be on the look-out for a user with a mental model that is inaccurate. It may have served them in the past but is insufficient for their current situation.

Years ago I was at a company that was changing VPN software. Announcement after announcement went out telling people that if they used the VPN they had to stop by the helpdesk to be switched to the new software. They were warned that the old VPN would stop working on a specific date. On that date, the helpdesk was flooded with users that couldn't understand why they couldn't connect to work. "Have you seen the emails about the old VPN being replaced?" "Sure, but I don't use a VPN."

In their mental model they didn't. The icon was called "Network Connect". Why would they pay attention to an email about "the VPN"? They never saw an icon or menu with the phrase "VPN" in it.

Who's fault was this situation? Hint: The user can't be blamed for not knowing that "Network Connect" is a VPN.

"Network Connect" was the icon they clicked that connected their laptop to the work network so they could access "work things." Their mental model was more like an on-off switch. When the switch is "on", web sites inside the company work. When it is "off", those sites don't work. The fact that they're still using the same web browser and other apps helped create this inaccurate model. The switch is turned on and things magically work; turn it off and things magically didn't. Their model didn't include encrypted packets. It didn't need to. In fact, during this we learned that many users were connecting to the VPN even when they were in the building. This, again, makes sense if "Network Connect" was an on switch that made internal services accessible.

This mental model mismatch contributes to a lot of the ills of corporate IT. In situations where people don't know I'm a sysadmin I often hear complaints about their company's IT department. Often I hear about the IT department doing bizarre things "just to make it more difficult to work here". It is a sad state of things that people feel that IT departments would do that. However with incorrect mental models so commonplace it makes total sense. Why would the IT department hide our websites unless a magic on-switch is flipped? What's to stop bad people from just having an on-switch installed on their laptops too? Now if enabling the VPN made their web browser display a "ah! this is an internal site! please wait while we connect through the VPN tunnel" message every time they accessed an internal website then the mental model would include some kind of tunnel analogy. Of course, that would be silly. Plus, the great thing about VPNs is that they are transparent to the applications.

The next time you send email to users consider, "What am I doing to create an accurate mental model?" and "What mental model might they have that I should play to?" When helping a confused user consider pausing to consider, "What is their mental model?" and either work within it or work to help create a new, more accurate, mental model.

Posted by Tom Limoncelli in Professionalism

Users tend to be concerned with what a system does (features, functionality) and sysadmins tend to be concerned with the operational aspects of a system. I just noticed this great Wikipedia page that lists "Non-functional requirements" of a system.

Broadly, functional requirements define what a system is supposed to do whereas non-functional requirements define how a system is supposed to be. Functional requirements are usually in the form of "system shall do <requirement>", while non-functional requirements are "system shall be <requirement>".
I could see myself using this as a tool for jogging my memory when I'm trying to think of all the aspects of a system that I need to be concerned with either operationally or when writing requirements.

Check it out: http://en.wikipedia.org/wiki/List_of_system_quality_attributes

Posted by Tom Limoncelli in Professionalism

Something happened at home today that reminded me of something I used to do when I worked at Bell Labs.

My rule was simple. If a machine in the computer room wasn't labeled, I was allowed to power it off. No warning. Click. No power.

If I logged into a machine as root and the prompt didn't include the hostname, the only command I was interested in typing was "halt".

Both of these rules came from the same source: If sloppy system administration was going to lead to errors and downtime, I wanted that downtime to happen during the day when we can fix it instead of late at night when we should be asleep.

(Of course, if a machine didn't have the hostname in its root prompt that also meant our configuration management system wasn't running on the machine which is a security violation. Therefore halting the machine, as far as I was concerned, was solving a security issue. But I digress...)

When I would explain this rule to people often they would ask, "Would you do that even if it was an important machine?"

"Ah ha!", I would exclaim, "My definition of 'important machine' includes that it is properly labeled! You see, if something is important we take good care of it. We protect it. One way we do that is to label it front, back, and in the root prompt."

Sometimes I would get a look of horror.

I never actually had to turn off a machine. I presume that if I did the owner would have come running soon after. I'd say something like, "Thank god your here! I was able to make a label for this machine and I need to ask you what its name is!" If they asked, "How did you know It was my machine?" I would have said, "Well, once it is labeled I'll be able to send email to the owner and ask!"

I did once threaten to power off a machine. It was a new machine and someone was standing there loading the operating system. "Hey! No fair! This machine was brand new! I was going to label it!"

"Amazingly enough", I pointed out, "machines can be labeled before you load the OS too."

[I wouldn't let him type until the machine was labeled.]

Labels are a very basic safety precaution. It prevents human error.

Labeling machines is obvious, or is it? I visit other people's data centers (or "computer closets") and find tons of unlabeled equipment. "It's ok, I know which machine is which." they say. That's just an accident waiting to happen! Without properly labeled machines it is just too easy to accidentally power off the wrong machine, disconnect the wrong cable, and so on.

Isn't this simple professionalism?

You don't label things for Today You. Today You is smart, knows what's going on, and got good night's rest. You label things for Tomorrow You plus Other People. Tomorrow You may be related to Today You but I assure you they are different people. Tomorrow You didn't get a good night's sleep. Tomorrow You was away for a few weeks and now can't believe how similar all those machines look. Tomorrow You left for a better job and is now someone else trying to figure out what the fark is going on.

Other People need labels on machines for all the obvious reasons so I won't bore you. However when I visit other people's computers rooms (and I do get invited on many tours) they often say that the lack of labels is OK because "they're a solo sysadmin." Let's debunk that right now. Nobody is totally solo. When you call into the office from 3,000 miles away and ask the secretary, office manager, janitor, or CEO to go into the computer room to powercycle a machine, you are no longer a solo sysadmin. Things must be labeled.

In my time management classes I talk about delegating work to other people. Usually someone laughs. "Delegate? What planet are you on? I can't delegate anything!" Of course you can't if you haven't labeled things.

I prefer to write about big networks, big data centers, and big sysadmin teams. To them, all of the above is obvious. It's a waste of time to write this, right? Sadly something happened to me in my non-work life that reminded me about my "no label, no power" rule.

To be honest I haven't touched actual server hardware in years. All my servers are in remote data centers, often in countries I've never been to, with highly skilled datacenter techs doing all the physical work. I've never seen or touched them.

However if I were to change jobs and found myself dealing with hardware and small computer rooms again the first thing I will do is put a big sign on the wall that says:

"This computer room is for important computers only. Important computers are labeled front and back. Unimportant computers will be powered off with no warning. -The Management"

Isn't that reasonable?

Posted by Tom Limoncelli in Professionalism

Since I can not attend the LISA Workshop on Teaching System Administration (I'll be teaching system administration that day!), I'd like to take a moment to say something to the attendees.

Often we are in the thick of things and we lose sight of how valuable our work is.

What you are doing is incredibly important; maybe more important that you realize. IT isn't just important, it is scary-important. The usual old sayings about how important IT is are now obsolete. It isn't that IT is a part of how food gets from the farm to our plate, we, as a society, no longer know how to provide food without IT. Medicine isn't just billed and administered with the assistance of IT, we can't provide medical services without IT anymore. Sysadmins are not just "important", the existence of excellence in system administration is key to sustaining civilization as we know it.

Those teaching system administrators need to step up to the plate. Our world depends on you.

It is time for an organization to take a leadership role in defining a standard sysadmin curriculum and get it adopted at all 4-year and 2-year schools. The 2-year training is embarrassingly bad. The 4-year training is bad to mediocre.[1]

Students are graduating 4-year programs without understanding the internals of systems, nor how they are used en masse in the real world. This would be like auto mechanics not being taught how an internal combustion engine works or doctors some how graduating medical school without knowing that patients are alive between office visits.

10% of us know the right way to do things. The other 90% don't. Why the un-even distribution of knowledge? The trouble this brings is far reaching. Sarbanes-Oxley essentially says, "If you are going to be so unbelievably stupid as to do backups without testing them, create accounts without having a mechanism to make sure they are disabled when the employee leaves, and letting developers have unrestricted raw access to live databases; then we're going to legislate how you have to do your job." HIPAA essentially says that our industry has proven itself too incompetent to be trusted with securing databases or WiFi networks in hospitals. Therefore how to do our jobs is being written into legislation.[2]

What's next? What will be the next example of rampant incompetence that leads to more legislation that tells us how we have to do our jobs? What crap caused by the worst of us will ruin it for the rest of us? What other obvious best practice that sites somehow still successfully ignore will become required by law? "have a helpdesks that don't suck"? "Track your customer requests with a 'ticket' system"? "buy load balancers in pairs"? "ping a machine after you've unplugged it to make sure you unplugged the right one"? "lock our screens when you leave your desk"? Many of these were "rocket science" 10 years ago. Now it's just embarrassing to see IT teams that are blind to these ideas.

This is a problem that is bigger than any one person can solve. You and I know this. We've written books to try to educate, but how much can one person do? These are the greatest challenge to our industry has ever faced. This is the kind of thing that requires group effort.

Creating such curriculum would take a long time, and getting it widely adopted even longer. However, with the power of Usenix, the expertise of LOPSA, and the academic ubiquity of ACM, this could really happen.

I hope that the members of the workshop take the time to think big.

Things don't get better on their own.

Sincerely, Tom Limoncelli

[1] These are based on indirect experience. The truth is that we don't have a measure for how to quantify if a school is doing a good job. First we need a standard to measure institutions by, then we need to go around measuring institutions. Providing a self-evaluation kit would even be a major step forward.

[2] One might say that it is the executive management of hospitals that is to blame. I disagree. We are at fault for not being able to explain the issue in a way that gets executive attention. Worse, often we are at companies that are selling systems with known problems. Why do we even offer a known-bad solution? Is it our own ignorance or is it like the consultant I once saw explain to a customer 3 options, one he pointed out that he recommends against. Of course the customer wanted the one he was recommending against. Why did he even mention that option? It wasn't an option. The customer wouldn't have thought of it on their own. It was a counter-example that you turned into an option. Knucklehead!

System administrators go by many different job titles. Network Admin, Network Engineer, Sysadmin, Computer Technician, etc.

When a campaign requests money from you and asks your title, what do you write?

I propose we all write the same exact thing: "Computer System Administrator"

It matters. It really really matters. Campaigns do data-mining on the various job titles people put. A 10% increase in donations from "doctors" is meaningful. Donations from twice as many [insert job title] than [insert other job title] is meaningful. When we diffuse our political power by dividing our donations between different job titles it hurts our political influence.

I know that in the past I've written everything from 'System Administrator" to "author" to "Software Engineer".

This political season, and from now on, I will always record my title as "Computer System Administrator". It isn't a perfect explanation of what I do, some would say I spend more time writing code that doing real system administration. But I feel that to the people working on political campaigns, who aren't the most technical in the world, the phrase "Computer System Administrator" is clear and concise.

If we all consistently use the same phrase it will have an impact.

In the US the election season is starting to heat up. Yesterday was the last primary. Between now and election day you will undoubtedly receive many MANY many emails asking for money. Whether you agree with my politically or are one of the bad, bad people that disagree (I'm kidding! Really!) let's all agree to do this.

I'm Tom Limoncelli, a computer system administrator, and I vote!

Posted by Tom Limoncelli in Professionalism

We are sysadmins. We love numbers. They mean a lot to us. They are specific and clean.

We also like a lot of details. When someone asks what operating system we use, we rattle off all fifty we can think of. That includes the embedded OS we know is buried deep in our toaster. Why? Because when we discovered it has a serial port, we plugged in and watched the bootup messages. That's why.

However, when writing and speaking the number of things we list means something to the reader/listener more than the number. Controlling the number of items in the list is more important than being complete.

Lists of length 1, 2 3 and "4 or more" have particular meaning.

A list with one element means "Hey! Look at this! Remember it!". If you ask me what operating system I use at work, the complete answer is a list a mile long. If I want you to remember that I am a Linux sysadmin, you won't remember that if I list "Linux, Mac OS X, Windows, IOS, JunOS, Android and ChromiumOS". The word Linux gets lost in the noise, even if it is the first item of the list. If I simply say, "I administer Linux machines" then that is what people will remember. If you want someone to remember what you said reduce the list down to one item.

A list with two elements implies comparison. "I am knowledgeable about Windows and Linux." invites comparison. It implies that these are different things and emphasizes that I have two very different skill sets: the ability to run Windows, and the ability to run Linux. A reader unfamiliar with computers will understand that these are two different things and might ask questions that relate to how they compare. It is actually jarring to list two items that you don't want the user to compare in their minds. In fact, the more similar they are, the more someone will think about the differences. "I run Ubuntu 9.1 and 9.2" makes people wonder what is so different about them that I list them both. Think about how these phrases invite comparison: "At home and at work", "night and day", "HTML5 and Flash", "Ubunto 9.x and 10.x", "apples and oranges". If your point isn't to emphasize differences (good or bad) make sure your list doesn't contain two items. If you want to emphasize differences, make sure your list has exactly two items.

A list with three elements implies (a) that you expect the reader/listener to hold all three in their head while I discuss them, (b) that you will discuss them in that order, (c) that the order matters. A three-item list is short enough that the reader can hold them all in their brain for the duration of the discussion. You haven't made the statement so complex as to have overloaded them. When you "drill down" on the items in the list, cover each item in the same order as the original list. This parallel format helps the reader/listener understand the flow. Lastly, order the items with great care. Often we put the most important item first but I find that people most remember the last item the most, so put it last. If I want you to "reboot the machine, make sure it comes back up, and come to my desk when you are done" I am emphasizing the need for you to come back to me. When writing an article or giving a presentation the last item often gets the most discussion. If you have one complicated and two short topics, end with the complicated item. This lets you cover the first two briefly and then focus on the third item for the remainder of your time.

A list with four or more elements implies that the point isn't the contents of the list, but that the list is very long. I might tell you that I use a lot of operating systems: MacOS, Ubuntu, Redhat, Windows, Android, JunOS and IOS. The point I am making is that the list is very long. The contents of the list is not so important. The reader/listener walks away remembering "Tom knows a lot of operating systems". If this is not what you intend, reduce the list to be shorter than 4 items. You may have to summarize ("Tom knows Linux, Windows, and some lesser-known operating systems.") If you don't want someone to focus on the details of the list, make sure there are 4 or more items on it.

We are sysadmins. Numbers are important to us. However, it is important to remember that the number of items in a list tells people a lot more than just what is on the list:

  1. Remember me.
  2. Comparison
  3. Things to keep in your head
  4. The quantity is more important that the details.

Posted by Tom Limoncelli in Professionalism

The biggest problem with transforming Art into Science is that people would rather be Artists than Scientists.  No, wait, you say, I love Science!  Yeah, now would you rather be a Rock Star or a Lab Tech?  Yes, you see the problem.

I recently read a New Yorker article that completely kicks ass in describing how medical science is poised on the cusp of a potential transformation into something that can save Even More Lives, but via a path that's difficult to take:  the humble, homely, not the science of the rocket, procedural checklist.   As the article states,

Tom Wolfe's "The Right Stuff" tells the story of our first astronauts, and charts the demise of the maverick, Chuck Yeager test-pilot culture of the nineteen-fifties. ... But as knowledge of how to control the risks of flying accumulated--as checklists and flight simulators became more prevalent and sophisticated--the danger diminished, values of safety and conscientiousness prevailed, and the rock-star status of the test pilots was gone.
Reading this, I was instantly transported into familiarity. This is the exact problem that I spent a decade banging my head against in Systems Administration, and what drove me to spend the next decade in Project Mangement to try to solve.  A number of us in the Usenix and LISA communities seemed to have a handle on this, but the way the blind men had a handle on the elephant.  We specialized in dealing with our rope, our fan, our spear, our wall, our tree, and, umm, whatever the sixth thing was that the elephant was like-- oh yes, our snake.  We didn't have the problem space sharply defined.  Author, and doctor, Atul Gawande describes the dilemma precisely:

We have the means to make some of the most complex and dangerous work we do--in surgery, emergency care, and I.C.U. medicine--more effective than we ever thought possible. But the prospect pushes against the traditional culture of medicine, with its central belief that in situations of high risk and complexity what you want is a kind of expert audacity--the right stuff, again. Checklists and standard operating procedures feel like exactly the opposite, and that's what rankles many people.
"Expert audacity."  Yes.  Absolutely.  It's what the cool kids do.  Indiana Jones meets skatepunk, and checklists ain't got the cool.

While I have been able to leverage automation and some ticketing systems to bring reproducible, higher levels of support to some of my clients, until recently I didn't Get It.  I did not see clearly enough that many people, even very well-meaning ones, will resist changes that reduce the intensity level of their daily jobs.   They fear becoming bored, unappreciated, less vital to the organization.   The addiction to the adrenaline cycle and the kind of "cult hero" status that goes with it is very, very difficult for an organization to break.   As Brent Chapman noted, discussing resistance to automated network management, everybody wants to be a hero.    

While I have always seen career mentoring as an important part of managing a team, I didn't realize how important it is to build up a vision of what people will be doing when they're no longer playing superhero.

 Systems people are keenly aware of projects that are languishing while they respond to interrupts.  It's rare to meet someone who doesn't have a "someday I'll get to this" list.    Stabilizing the network and systems environment and establishing strong processes, including checklists, is vital for scaling services and being responsive to the needs of the organization.   A decrease in emergent crises ("complications", in medical parlance) frees up cycles for complex projects that present true depth and scope challenges for individuals and teams.   

Being a Rock Star is fun-- as countless Guitar Hero and Rock Band fans, including myself, can attest.   Quiet, directed competence can be just as much fun, though, and allow personal and career growth with a bit less drama and a bit more sleep.   While networks, legacy applications, and odd emergent behaviors of client desktops aren't as complex (perhaps!) as a living organism, there is plenty in common.  As Dr. G says:

It's ludicrous, though, to suppose that checklists are going to do away with the need for courage, wits, and improvisation. The body is too intricate and individual for that: good medicine will not be able to dispense with expert audacity. Yet it should also be ready to accept the virtues of regimentation.
Sing it, brother.  

[ This is still "first draft" quality but I'm posting it rather than keeping it bottled up. Feedback appreciated.]

There are those that believe that the history of system administration will follow a similar path to electrical engineering. Broadly categorized, there are 3 types of careers in that area:

  • Electricians: People that have limited scientific education, but though apprenticeships and certifications they do the majority of the work in buildings, both deployments and repairs. They "follow the building code" (the building and safety guidelines for their state or country) but couldn't write new build codes (and would never try). Inspectors are paid to check their work for conformance to the "building code". 80% of all electrical work is in this category, and it is usually thankless and boring.
  • Electrical engineers: People that have university degrees and understand both the theory and practice of what they do. They specialize in specific areas (construction, circuit design, chip design, etc.). The design new products. More advanced EEs write the building codes that electricians follow.
  • Researchers: People (typically with PhDs) that are advancing the science of electrical engineering. They may invent entirely new ways of doing things, rather than just new products.

The field of system administration is already following this kind of trajectory. There are people in that first category: they have Cisco, MS, and LPI (Linux) certifications, they are mostly deploying vendor-approved architectures and design patterns (known as "best practices"). When they get creative you should be as scared as you would if an electrician installing a new circuit in your house told you he "got creative"). We don't have the auditing or inspection system yet, but SOX is the closest we have.

System administration has that second category too. They usually are the senior sysadmins in a company, and often are employed by vendors to create the best practice documents and certifications used by the first category. Sadly they often have the same titles as people in the first category which creates confusion.

The third category is quite rare in system administration. How often in our lives will something be invented that radically changes the way we do IT? There are a few that I can think of: Local storage vs. remote storage NFS. Individually managed accounts on each machine to NIS (laterLDAP). Waiting for users to complain vs. monitoring for outages. Keeping machines in sync by hand vs. cfengine (later Puppet).

All of these were major changes to our industry (and I profess that 80% of the industry doesn't do most of those things yet, so there is plenty of work to do).

There are very few schools that have Masters or PhD programs in system administration. Some call it IT, and dilute it with a lot of research around what we used to call MIS. A lot of the innovation in system administration comes from industry, which is usually good, but sometimes taints the research.

I believe there are many interesting areas of research that need more effort:

  • Why are good practices so rarely adopted?
  • What prevents a constant number of sysadmins from administrating growing populations of machines or users?
  • Why is debugging so complicated?
  • How to organize teams of system administrators to maximize macro efficiency and personal efficiency?
  • How to delegate to users without expecting users to be system administrators?
  • What traits do successful system administration organizations share?
  • Are we asking the right questions?

These are the same questions we've always asked yet the need for research grows as system administration becomes more complicated and society becomes more dependent on technology.

Maybe we need to write less code and spend more time thinking.

Insulting your users

It can be easy to accidentally insult someone that comes to you for support. Saying something like, "Let me show you... it's so easy a child could do it." might be well-intentioned, but think about how insulting it can be to the customer. (We caution against that exact phrase in TPOSANA.)

Continental Airlines could use some training in this area. Today I spent 30 minutes on hold waiting to talk to a human (grrr...). The hold music included adverts for how great they were and while listening to them over and over and over I had a lot of time to analyze what they were saying. (What they weren't saying is, "We've reduced fares to the point that we can't afford to handle luggage properly and thus you are now on hold waiting 30 minutes to find out where your bag is", but I digress.)

Here's the insulting part...

The advertisement for their on-line check in service began, "Still haven't checked in online? Why not? It's child's play!" Ok, you've insulted me. Can it get any worse? Well, the script ended with the jazzy music playing as the announcer says, "So next time check in online! Or have you kid do it for you!"

Insulting to the max. Way to go. Now get me a human so I can find out where my luggage is!

Posted by Tom Limoncelli in Professionalism

LOPSA has started opening chapters around the country. New Jersey's chapter is big enough that on odd-numbered months they meet at two locations ("north" and "south"). Dossy took pictures and wrote up a little about the one I attended. Read about the March 2007 OPSA North Jersey meeting. If you are in New Jersey, please join! Otherwise, join or find/start your own chapter.

Posted by Tom Limoncelli in Professionalism

Happy Birthday, LOPSA!

Happy Birthday, LOPSA!

You are one year old and look how far you've come! Like most births you were born amid a lot of shouting and confusion, but look how far you've grown! You've formed the organization, build a web site, and had your first regional conference. Congrats! Now you are truly defining yourself, growing up, and becoming your own person.

For those of you that don't know, LOPSA is the League of Professional System Administrators. The goal is to become like the AMA is to doctors, or the APA is to shrinks. That is, work on building the professionalism of our community. If you aren't a member, I highly recommend that you join. Heck, it's free to just register.

Two weeks ago I attended the first LOPSA regional conference in Phoenix, Arizona. I taught a full-day version of my Time Management for System Administrators class. What impressed me about this event was how different it was. Because it was regional most of the speakers were local. There are experts everywhere (not just in California) and seeing them get some spotlight really made me happy. The fact that it was small also meant that it could be at a less expensive hotel, who was more hungry for LOPSA's business. They had a lot of creative ideas that I haven't seen at big hotels. For example, one of the snack-breaks had cookies and milk! I was psyched!

At night we had a lot of deep discussions about the future of system administration, professionalism, and the future of LOPSA. I consulted with some board members about how to get to the next milestone now that the organization is running. I hope to see more regional conferences announced soon. I also brainstormed on ways to reach out to the segments of the IT world that are currently unaddressed.

Why not celebrate the 1st birthday by buying a gift for yourself? The LOPSA CafePress store is ready to fulfill your need for swag, and raises money for a good cause. And if you haven't registered, do that too. They have some extremely useful mailing lists.