Awesome Conferences

Labeling Machines Is A Safety Precaution

Something happened at home today that reminded me of something I used to do when I worked at Bell Labs.

My rule was simple. If a machine in the computer room wasn't labeled, I was allowed to power it off. No warning. Click. No power.

If I logged into a machine as root and the prompt didn't include the hostname, the only command I was interested in typing was "halt".

Both of these rules came from the same source: If sloppy system administration was going to lead to errors and downtime, I wanted that downtime to happen during the day when we can fix it instead of late at night when we should be asleep.

(Of course, if a machine didn't have the hostname in its root prompt that also meant our configuration management system wasn't running on the machine which is a security violation. Therefore halting the machine, as far as I was concerned, was solving a security issue. But I digress...)

When I would explain this rule to people often they would ask, "Would you do that even if it was an important machine?"

"Ah ha!", I would exclaim, "My definition of 'important machine' includes that it is properly labeled! You see, if something is important we take good care of it. We protect it. One way we do that is to label it front, back, and in the root prompt."

Sometimes I would get a look of horror.

I never actually had to turn off a machine. I presume that if I did the owner would have come running soon after. I'd say something like, "Thank god your here! I was able to make a label for this machine and I need to ask you what its name is!" If they asked, "How did you know It was my machine?" I would have said, "Well, once it is labeled I'll be able to send email to the owner and ask!"

I did once threaten to power off a machine. It was a new machine and someone was standing there loading the operating system. "Hey! No fair! This machine was brand new! I was going to label it!"

"Amazingly enough", I pointed out, "machines can be labeled before you load the OS too."

[I wouldn't let him type until the machine was labeled.]

Labels are a very basic safety precaution. It prevents human error.

Labeling machines is obvious, or is it? I visit other people's data centers (or "computer closets") and find tons of unlabeled equipment. "It's ok, I know which machine is which." they say. That's just an accident waiting to happen! Without properly labeled machines it is just too easy to accidentally power off the wrong machine, disconnect the wrong cable, and so on.

Isn't this simple professionalism?

You don't label things for Today You. Today You is smart, knows what's going on, and got good night's rest. You label things for Tomorrow You plus Other People. Tomorrow You may be related to Today You but I assure you they are different people. Tomorrow You didn't get a good night's sleep. Tomorrow You was away for a few weeks and now can't believe how similar all those machines look. Tomorrow You left for a better job and is now someone else trying to figure out what the fark is going on.

Other People need labels on machines for all the obvious reasons so I won't bore you. However when I visit other people's computers rooms (and I do get invited on many tours) they often say that the lack of labels is OK because "they're a solo sysadmin." Let's debunk that right now. Nobody is totally solo. When you call into the office from 3,000 miles away and ask the secretary, office manager, janitor, or CEO to go into the computer room to powercycle a machine, you are no longer a solo sysadmin. Things must be labeled.

In my time management classes I talk about delegating work to other people. Usually someone laughs. "Delegate? What planet are you on? I can't delegate anything!" Of course you can't if you haven't labeled things.

I prefer to write about big networks, big data centers, and big sysadmin teams. To them, all of the above is obvious. It's a waste of time to write this, right? Sadly something happened to me in my non-work life that reminded me about my "no label, no power" rule.

To be honest I haven't touched actual server hardware in years. All my servers are in remote data centers, often in countries I've never been to, with highly skilled datacenter techs doing all the physical work. I've never seen or touched them.

However if I were to change jobs and found myself dealing with hardware and small computer rooms again the first thing I will do is put a big sign on the wall that says:

"This computer room is for important computers only. Important computers are labeled front and back. Unimportant computers will be powered off with no warning. -The Management"

Isn't that reasonable?

Posted by Tom Limoncelli in Professionalism

No TrackBacks

TrackBack URL: http://everythingsysadmin.com/cgi-bin/mt-tb.cgi/1519

8 Comments | Leave a comment

Does the service tag count as a label? (Assuming you can tag the service tag to a group/user/entity through your host management database.)

This doesn't change, while the hostname might change.

The other part of this is to label all the cables on a system. I'm working as an FSE at the moment, and pretty much every site I go to, the first 20 minutes are spent labeling cables.

Today You might know that port A goes to switch C and port B goes to switch A, but when you disconnect all the cables, they all look alike.

I think it's a rather drastic approach to just turn off the machine. Allthough I can understand the notion. Labels make sense and a server should be identifiable.
But if you have over a hundred servers, a provsioning system like puppet and developers with fast changing demands in your back, a server may have four or five different hostnames in it's lifetime. I don't want to re-label them every time.
One could use an inventory system with unique identifiers like serial and model. The only downside is that you must run around with a tablet or a laptop in the server room.

Since I have over a hundred servers, let me say that you are 100% correct that you should use puppet/cfengine/chef and an automated provisioning system and so on. However, I respectfully say that if you change the names of machines "you're doing it wrong".

It is better to have a systematic naming scheme and have aliases if needed. For example, a scheme like having all hosts being a formula: XYYZZ.dom.main
X A code for the datacenter (A - Z)
YY A code for the rack (aa, ab, ac, ... or 01, 02, 03, ...)
ZZ Which RU the machine is installed in (1 through 40)

Now you never need to rename a machine.

If you are using Puppet you shouldn't care what the machine's name is. Puppet can configure xbb02.example.com just as well as mail-server2.example.com. If you absolutely need hostnames that are meaningful to humans set up additional AAAA records for it (or A if you use IPv4) or use CNAMEs for gosh sakes.

When you have hundreds of machines you probably have technicians in the datacenter that do all repairs. They don't know what the machine is used for, just that "machine xbb02 needs the hard disk replaced". A formulaic name scheme makes it easier for them to do their job if it directs them to the right rack and RU. I couldn't imagine telling a tech to find "mail-server-2" in a data center with hundreds of machines. They'd have to look up the actual location in a database and what if that database got out of order? You'd have to do annual audits and ... sheesh! Tons of extra work that would all go away if a simple name scheme. At least that way you know you can label things once and never need to re-label them again.

I label; I do it for myself as well as for other people (hey, maybe I want a vacation one of these years.)

Further, for things that break I put the necessary information right on the front, so when the fecal matter hits the rotating blades I'm not distracted trying to look all that stuff up.

I've started labeling shelves at home because I have enough tools that I'd like my wife to be able to find them. I've labeled every wall socket and light switch at home; those things are labeled at work by electricians. It's a good idea to understand what their labeling means and trace your wall sockets/whips back to their load centers now while you have the leisure time.

My definition of a "production" system is one which is documented (labelled, plus records of licensing, design, etc), monitored, and backed up. E.g. if the server is down and I don't get an alarm, it stays down.

I recently instituted a new policy that non-production systems are not run for more than 90 days. Something provided as a "test" system, my team configures a scheduled task to shut it down when the "testing" should be done by and since it's not monitored, it stays down until someone decides they need something done about it.

Boy did that ever shake the tree! Quite a few requests came in after that with "actually, this test system, we're now using it all the time for , please put it in production".

I guess you are saying this, but a lack of labels isn't a problem, it's a symptom. I have consulted to companies that had servers with no label, and I was never there to correct that problem, I was there to fix the shoddy craftsmanship of their SA(s) at the hardware/OS layer.

We had labels.
And we have MySQL.
Databases would get promoted, hostnames changed, but the label wasn't updated.
On multiple occasions, we would power cycle an important database master host because the label was inaccurate.

Since then, we took ops staff out of the datacenter. Reboots are handled through our host management system, which sends a request to the appropriate datacenter, requesting that we power-cycle a machine based on rack location and serial umber.

My other thought is that maybe hostnames should be boring and static, say a serial number. You then later role-based CNAMEs from your host management database on top of that. Then you make sure your machines are delivered with their serial number on the front and the back, and you're done.

And yeah, any machine that doesn't have its hostname in the prompt has some issues. :)

Leave a comment