May 2010 Archives

Lance Albertson wrote up a great description of how Ganeti Virtualization Manager performed under pressure during a power outage:

Nothing like a power outage gone wrong to test a new virtualization cluster. Last night we lost power in most of Corvallis and our UPS & Generator functioned properly in the machine room. However we had an unfortunate sequence of issues that caused some of our machines to go down, including all four of our ganeti nodes hosting 62 virtual machines went down hard. If this had happened with our old xen cluster with iSCSI, it would have taken us over an hour to get the infrastructure back in a normal state by manually restarting each VM.

But when I checked the ganeti cluster shortly after the outage, I noticed that all four nodes rebooted without any issues and the master node was already rebooting virtual machines automatically and fixing all of the DRBD block devices.

Ganeti is a management layer that makes it easy to set up large clusters of Xen or KVM (or other) virutalized machines. He has written a great explanation of what is Ganeti and its benefits too.

I use Ganeti for tons of projects at work.

Posted by Tom Limoncelli in Technical Tips

Dear readers in the United States,

I'm sorry. I have some bad news.  That tiny computer closet that has no cooling will overheat next weekend.

Remember that you aren't cooling a computer room, you are extracting the heat.  The equipment generates heat and you have to send it somewhere. If it stays there, the room gets hotter and hotter.

For the past few months you've been lucky.  That room benefited from the fact that the rest of the building was relatively chilly. The heat was drawn out to the rest of the building. During the winter, each weekend the heat was turned off (or down) and your uninsulated computer room leaked heat to the rest of the building. Now it's springtime, nearly summer.  The building A/C is on during the week. When it shuts down for the weekend the building is hot; hotter than your computer room.  The leaking that you were depending on is not going to happen.

Last weekend the temperature of your computer room got warm on Saturday and hot on Sunday. However, it was ok.

This weekend it will get hot on Saturday and very hot on Sunday. It will be ok.

However, next weekend is Memorial Day weekend. The building's cooling will be off for three days. Saturday will be hot. Sunday will very very hot.  Monday will be hot enough to kill a server or two.

If you have some cooling, Monday you'll discover that it isn't enough.  Or the cooling system will be overloaded and any weak, should-have-been-replaced, fan belts will finally snap.

How do we get into this situation?

Telecom closets don't have any cooling because they have no active components. It's just a room where wires connect to wires. That changed in the 1990s when phone systems changed. Now that telecom closet has a PBX, and an equipment rack.  If there is an equipment rack, why not put some PC servers into it? If there is one rack, why not another rack? By adding one machine at a time you never realize how overloaded the system has gotten.

Even if you have proper cooling, I bet you have more computers in that room than you did last year.

So what can you do now to prevent this problem?
  • Ask your facilities person to check the belts on your cooling system.
  • Set up monitoring so you'll be alerted if the room gets above 33 degrees C. (You probably don't have time to buy a environmental monitor, but chances are your router and certain servers have a temperature gauge on or near the hottest part of the equipment. It is most likely hotter than 33 degrees C during normal operation, but you can detect if it goes up relative to a baseline.)
  • Clean (remove dust from) the air vent screens, the fans, and any drives. That dust makes every mechanical component work harder. More stress == more likely to break.
  • Inventory the equipment in the room and shut off the unused equipment (I bet you find at least one server)
  • Inventory the equipment and rank by priority what you can power off if the temperature gets too high.
If you do have a system that overheats, remember that you can buy or rent temporary cooling systems very easily.

I don't generally make product endorsements, but at a previous company we had an overheating problem and it was cheaper and faster to buy a Sunpentown 9000 BTU unit at Walmart than to wait around for a rental. In fact, it was below my spending limit to purchase two and tell the CFO after the fact. I liked the fact that it self-evaporated the water that accumulated; I needed to exhaust hot air, not hot air and water.

Most importantly, be prepared. Have monitoring in place. Have a checklist of what to shut down in what order.

Good luck! Stay cool!

Tom

P.S. I wrote about this 2 years ago.

Posted by Tom Limoncelli in Technical Tips

Self-driving cars

Standford's team (which won the DARPA contest) is doing some great stuff.

When jet engines were new jet airplanes had to have 3 pilots. The third pilot did nothing but run the jet engines: constantly adjusting the settings, tuning them, and keeping them running manually. Eventually electronics were developed to the point that such controls could be automated, thus eliminating the third pilot. The electronic control system is not just less expensive, but it produces better fuel efficiency.

Cars today require a driver. The driver requires good health, must be rested and sober. Humans are not very good at optimizing fuel efficiency. Humans don't communicate very well between cars. Imagine a world where cars drove themselves. The computers could optimize for better fuel performance, people could relax during their commute, and the cars could network to get better performance. For example, if 10 cars were all driving to the same destination they could get into a line and drive like a 'train' eliminating wind resistance for each other. Who knows what other optimizations will be discovered: the lead car could take on different computational responsibilities than the other cars.

One problem with our current highway system is that we equate "safety" with "speed". What we want is a "safety limit" but that is hard to quantify so we make due with a reasonable approximation: the speed limit. Computer controlled cars could enable a true safety limit and be permitted to drive at any speed as long as their metric is maintained (super fast on straight roads, slowing down for curved roads or during rain). Wouldn't you prefer a driver that had a mathematical model of friction ratios based off of sensors on the tires?

Of course, as the "driver" we humans could select from a wide menu of maneuvers that are humanly impossible. Like, parking a car James Bond style.

Eventually the cost and safety issues will be worked out. At that point, autonomous cars may be a big time management win. In the meanwhile, the bar association should advocate for more research in this area. I don't mean the legal organization, I mean the association of bar owners!

Reblog this post [with Zemanta]

Posted by Tom Limoncelli in MediaTime Management

LOPSA PICC 2010 was a big success. Thanks to everyone that attended.

I was surprised when William asked people to raise their hand if they owned Time Management for System Administrators and nearly the entire room raised their hand.  Wow!

One project that was inspired by the conference was a new mentoring program. It is still being formulated, but people that are interested should sign up to receive more information by visiting http://lopsa.org/mentorship

I look forward to seeing you next year!

Posted by Tom Limoncelli in CommunityConferences

I'm spending a lot of time refining my keynote, updating slides for my Time Management and other tutorials.  It isn't too late to register.

The registration numbers look good.  We have people registered from all over the NY/NJ/PA area, with a bunch of people from as far as Boston and Virginia.  It will be great to meet everyone!

It isn't too late to register.  In fact, the organizers have announced a special rate of $99 for anyone that is unemployed (no questions asked... hint, hint).

PICC is for system administrators of all stripes, May 7-8, 2010 in New Brunswick, NJ.  It is easy to get there by train or car.  More info at http://picconf.org

See you there!

Tom

Posted by Tom Limoncelli in Conferences

 
LISA14 I'm Teaching button