Recently in Best of Blog Category

Teams working through The Three Ways need an unbiased way to judge their progress. That is, how do you know "Are we there yet?"

Like any journey there are milestones. I call these "look-for's". As in, these are the things to "look for" to help you determine how a group is proceeding on their journey.

Since there are 3 "ways" one would expect there to be 4 milestones, the "starting off point" plus a milestone marking the completion of each "Way". I add an additional milestone part way through The First Way. There is an obvious sequence point in the middle of The First Way where a team goes from total chaos to managed chaos.

The Milestones

DevOps Assessment Levels: Crayola Maraschino, Tangerine, Lemon, Aqua and Spring.

Posted by Tom Limoncelli in Best of BlogDevOps

Recently on a mailing list sysadmins were describing horrible management they've experienced. Here is my reply:

First, I want to say that my heart goes out to all of you describing terrible working conditions, bad management, and so on. I have huge amounts of sympathy for you all.

Health is more important than anything else. If your job is driving you crazy and giving you high BP, my prescription is, 'Try, try, then quit'. Try to change things, talk to management, work to create the workplace you desire. Try again, I'm sure you feel like you've tried a lot, but people aren't mind-readers... make sure you've had serious conversations with the right people. However step three is quit. Send resumes and get the hell out of there.

It is vitally important that we don't feel any guilt about leaving a bad job, especially if we've made a "good faith effort" to turn things around (as I'm sure you have). Just like when people being laid off are told, heartlessly, "Sorry, it was a business decision" there are times you have to tell a company, "Sorry, it was a personal decision". (I want to acknowledge that not everyone is in a position where they can just up and leave. Being able to do so is quite a privilege, but I think people that work in IT are more likely to be in this position than most fields.)

There are two reasons we shouldn't feel guilt about leaving these kind of "bad jobs". First, our health is more important than anything else. Second, it is important that we don't try to 'save' companies that are intrinsically bad at IT management. I say this not as a joke and I don't say it lightly. If you feel a company is incurably bad at IT, it makes the world a better place for that company to go out of business. IT is the lifeblood of companies. It is a requirement for nearly any facet of business to function in today's world. Companies that treat IT has an appendage are dinosaurs that need to be left to die.

IT is not a "speciality". It is a skill everyone should have. Any CEO, COO, or VP that doesn't understand IT and IT MANAGEMENT that ALSO thinks they don't need to understand it is fooling themselves. Expecting only the people in the IT department to have IT and IT management skills is insane. Expecting that IT and IT management astuteness only needs to be found in the IT department is insane. Companies don't have a 'math department' that people run to any time they need to add, subtract, multiply, and divide. They expect everyone to have basic math skill and only turn to mathematicians for advanced or specialized mathematics. Similarly a modern company must expect that every staff person understands the basics of IT and every manager, VP, and CxO executive should be expected to understand IT and IT management as it is a fundamental, essential, part of doing business.

IT and IT management is as essential to a business as accounting is. You don't expect your CEO and other managers to be experts at accounting, but you expect them to understand a lot more than just the basics. However if, during a job interview, you learned that the CEO didn't know that accountants existed, or thought financial statements "magically wrote themselves" you would run like hell as fast as possible, right? You would reject any job offers and hope, for the sake of the well-being of the economy, that such a company disappears as soon as possible.

Why wouldn't you do the same for a company that treats IT and IT management like that?

A co-worker watched me type the other day and noticed that I use certain Unix commands for purposes other than they are intended. Yes, I abuse Unix commands.

Posted by Tom Limoncelli in Best of BlogTime Management

April Showers bring May Flowers. What does May bring? Three-day weekends that make A/C units fail!

This is a good time to call your A/C maintenance folks and have them do a check-up on your units. Check for loose or worn belts and other problems. If you've added more equipment since last summer your unit may now be underpowered. Remember that if your computers consume 50Kw of power, your A/C units should be using about the same (or more) to cool those computers. That's the laws physics speaking, I didn't invent that rule. The energy it takes to create heat equals the energy required to remove that much heat.

Why do A/C units often fail on a 3-day weekend? During the week the office building has its own A/C. The computer room's A/C only has to remove the heat generated by the equipment in the room. On the weekends the building's A/C is powered off and now the 6 sides (4 walls, floor and ceiling) of the computer room are getting hot. Heat seeps in. Now the computer room's A/C unit has more work to do.

A 3-day weekend is 84 hours (Friday 6pm until Tuesday 6am). That's a lot of time to be running continuously. Belts wear out. Underpowered units overheat and die. Unlike a home A/C unit which turns on for a few minutes out of every hour, a computer-room A/C unit ("industrial unit") runs 12-24 hours out of every day. Industrial cooling costs more because it is an entirely different beast. Try waving your arms for 5 minutes per hour vs. 18 hours a day.

Most countries have a 3-day weekend in May. By the 2nd or 3rd day the A/C unit is working as much as a typical day during the summer. If it is about to break, this is the weekend it will break.

To prevent a cooling emergency make sure that your monitoring system is also watching the heat and humidity of your room. There are many SNMP-accessible units for less than $100. Dell recommends machines shouldn't run in a room that is hotter than 35c. I generally recommend that your monitoring system alert you at 33c; if you see now sign of it improving on its own in the next 30 minutes, start powering down machines. If that doesn't help, power them all off. (The Practice of System and Network Administration has tips about creating a "shutdown list"). Having the ability to remotely power off machines can save you a trip to the office. Most Linux systems have a "poweroff" command that is like "halt" but does the right thing to tell the motherboard to literally power off. If the server doesn't have that feature (because you bought it in the 1840s?) shutting it down and leaving it sitting at a "press any key to boot" prompt often generates little heat compared to a machine that is actively processing. If powering off the non-critical machines isn't enough, shut down critical equipment but not the equipment involved in letting you access the monitoring systems (usually the network equipment). That way you can bring things back up remotely. Of course, as a last resort you'll need to power off those bits of equipment too.

Having cooling emergency? Cooling units can be rented on an emergency basis to help you through a failed cooling unit, or to supplement a cooling unit that is underpowered. There are many companies looking to help you out with a rental unit.

If you have a small room that needs to be cooled (a telecom closet that now has a rack of machines) I've had good luck with a $300-600 unit available at Walmart. For $300-600 it isn't great, but I can buy one in less than an hour without having to wait for management to approve the purchase. Heck, for that price you can buy two and still be below the spending limit of a typical IT manager. The Sunpentown 1200 and the Amcor 12000E are models that one can purchase for about $600 that re-evaporates any water condensation and exhausts it with the hot air. Not having to empty a bucket of water every day is worth the extra cost. The unit is intended for home use, so don't try to use it as a permanent solution. (Not that I didn't use one for more than a year at a previous employer. Ugh.) It has one flaw... after a power outage it defaults to being off. I guess that is typical of a consumer unit. Be sure to put a big sign on it that explains exactly what to do to turn it back on after a power outage. (The sign I made says step by step what buttons to press, and what color each LED should be if it is running properly. I then had a non-system administrator test the process.)

In summary: test your A/C units now. Monitor them, especially on the weekends. Be ready with a backup plan if your A/C unit breaks. Do all this and you can prevent an expensive and painful meltdown.

Posted by Tom Limoncelli in Best of BlogTechnical Tips

The right answer

An iMac at work broke and AppleCare gave us the choice of bringing it to one of three places in the local area, or the local AppleStore. Since one of those choices was the CompUSA around the block from us, my co-worker brought it there. And waited. And waited. And was told they were waiting for the part. We called the Apple store who said that they could do most repairs in 24 hours. The question was, "How do we get it back?"

Posted by Tom Limoncelli in Best of Blog

I get a lot of questions about resume writing. Here are my resume tips. Please excuse the formatting. (a better formatted version is here)

Posted by Tom Limoncelli in Best of Blog

  • LISA16