Dear readers: I need your help. I feel like I've lost touch with what new sysadmins go through. I learned system administration 20+ years ago. I can't imagine what new sysadmins go through now.

In particular, I'd like to hear from new sysadmins about what their "rite of passage" was that made them feel like a "real sysadmin".

When I was first learning system administration, there was a rite of passage called "setting up an email server". Everyone did it.

This was an important project because it touches on so many different aspects of system administration: DNS, SMTP, Sendmail configuration, POP3/IMAP4, setting up a DNS server, debugging clients, and so on and so on. A project like this might take weeks or months depending on what learning resources you have, if you have a mentor, and how many features you want to enable and experiment with.

Nowadays it is easier to do that: Binary packages and better defaults have eliminated most of the complexity. Starter documentation is plentiful, free, and accessible on the web. DNS domain registrars host the zone too, and make updates easy. Email addressing has become banal, mostly thanks to uniformity (and the end of UUCP).

More "nails in the coffin" for this rite of passage include the fact that ISPs now provide email service (this didn't used to be true), hosted email services like Google Apps have more features than most open source products, and ...oh yeah... email is passe.

What is the modern rite of passage for sysadmins? I want to know.

If you became a sysadmin in the last 10 years: What project or "rite of passage" made you feel like you had gone from "beginner" to being "a real sysadmin!"

Please tell me here.

Posted by Tom Limoncelli in Education

Someone asked me in email for advice about how to move many machines to a new corporate standard. I haven't dealt with desktop/laptop PC administration ("fleet management") in a while, but I explained this experience and thought I'd share it on my blog:

I favor using "the carrot" over "the stick". The carrot is making the new environment better for the users so they want to adopt it, rather than using management fiat or threats to motivate people. Each has its place.

The more people feel involved in the project the more likely they are to go along with it. If you start by involving typical users by letting them try out the new configuration in a test lab or even loaning them a machine for a week, they'll feel like they are being listened to and will be your partner instead of a roadblock.

Once I was in a situation where we had to convert many PCs to a corporate standard.

First we made one single standard PC. We let people try it out and find problems. We resolved or found workarounds to any problems or concerns raised.

At that point we had a rule: all new PCs would be built using the standard config. No regressions. The number of standard PCs should only increase. If we did that and nothing more, eventually everything would be converted as PCs only last 3 years.

That said, preventing any back-sliding (people installing PCs with the old configuration by mistake, out of habit, or wanting an "exception") was a big effort. The IT staff had to be vigilant. "No regressions!" was our battlecry. Management had to have a backbone. People on the team had to police ourselves and our users.

We knew waiting for the conversion to happen over 3 years was much too slow. However before we could accelerate the process, we had to get those basics correct.

The next step was to convert the PCs of people that were willing and eager. The configuration was better, so some people were eager to convert. Updates happened automatically. They got a lot of useful software pre-installed. We were very public about how the helpdesk was able to support people with the new configuration better and faster than the old configuration.

Did some people resist? Yes. However there were enough willing and eager people to keep us busy. We let those "late adopters" have their way. Though, we'd mentally prepare them for the eventual upgrade by saying things like (with a cheerful voice), "Oh, we're a late adopter! No worries. We'll see you in a few months." By calling them "late adopter" instead of "resistor" or "hard cases" it mentally reframed the issue as them being "eventual" not "never".

Some of our "late adopters" volunteered to convert on their own. They got a new machine and didn't have a choice. Or, they saw that other people were happy with the new configuration and didn't want to be left behind. Nobody wants to be the only kid on the block without the new toy that all the cool kids have.

(Oh, did I mention the system for installing PCs the old way is broken and we can't fix it? Yeah, kind of like how parents tell little kids the "Frozen" disc isn't working and we'll have to try again tomorrow.)

Eventually those conversions were done and we had the time and energy to work on the long tail of "late adopters". Some of these people had verified technical issues such as software that didn't work on the new system. Each of these could be many hours or days helping the user make the software work or finding replacement products. In some cases, we'd extract the user's disk into a Virtual Machine (p2v) so that it could run in the old environment.

However eventually we had to get rid of the last few hold-outs. The support cost of the old machine was $x and if there are 100 remaining machines, $x/100 isn't a lot of money. When there are 50 remaining machines the cost is $x/50. Eventually the cost is $x/1 and that makes that last machine very very expensive. The faster we can get to zero, the better.

We announced that unconverted machines would be unsupported after date X, and would stop working (the file servers wouldn't talk to them) by date Y. We had to get management support on X and Y, and a commitment to not make any exceptions. We communicated the dates broadly at first, then eventually only the specific people affected (and their manager) received the warnings. Some people figured out that they could convince (trick?) their manager into buying them a new PC as part of all this... we didn't care as long as we got rid of the old configuration. (If I was doing this today, I'd use 802.11x to kick old machines off the network after date Z.)

One excuse we could not tolerate was "I'll just support it myself". The old configuration didn't automatically receive security patches and "self-supported machines" were security problems waiting to happen. The virtual machines were enough of a risk.

Speaking of which... the company had a loose policy about people taking home equipment that was discarded. A lot of kids got new (old) PCs. We were sure to wipe the disks and be clear that the helpdesk would not assist them with the machine once disposed. (In hindsight, we should have put a sticker on the machine saying that.)

Conversion projects like this pop up all the time. Sometimes it is due to a smaller company being bought by a larger company, a division that didn't use centralized IT services adopting them, or moving from an older OS to a newer OS.

If you are tasked with a similar conversion project you'll find you need to adjust the techniques you use depending on many factors. Doing this for 10 machines, 500 machines, or 10,000 machines all require adjusting the techniques for the situation.

If you manage server farms instead of desktop/laptop PC fleets similar techniques work.

Posted by Tom Limoncelli in Technical Tips

Are you a software developer that is facing rapidly changing markets, technologies and platforms? This new conference is for you.

ACM's new Applicative conference, Feb. 25-27, 2015 in Midtown Manhattan, is for software developers who work in rapidly changing environments. Technical tracks will focus on emerging technologies in system-level programming and application development.

The list of speakers is very impressive. I'd also recommend sysadmins attend as a way to stay in touch with the hot technologies that your developers will be using (and demanding) soon.

Early bird rates through Jan. 28 at http://applicative.acm.org

Posted by Tom Limoncelli in Conferences

Hi Boston-area friends! I'll be giving my "Radical ideas from The Practice of Cloud System Administration" talk at the Back Bay LISA user group meeting on Wednesday, January 14, 2015. Visit bblisa.org for more info.

Short version: My mailing list server no longer generates bounce messages for unknown accounts, thus eliminating the email backscatter is generates.

Longer version:

I have a host set up exclusively for running mailing lists using Mailman and battling spam has been quite a burden. I finally 'gave up' and made all the lists "member's only". Luckily that is possible with the email lists being run there. If I had any open mailing lists, I wouldn't have been so lucky. The result of this change was that it eliminated all spam and I was able to disable SpamAssassin and other measures put in place. SpamAssassin has been using more and more CPU time and was letting more and more spam through.

That was a few years ago.

However then the problem became Spam Backscatter. Spammers were sending to nearly every possible username in hopes of getting through. Each of these attempts resulted in a bounce message being sent to the (forged) email address the attempt claimed to come from. It got to the point where 99% of the email traffic on the machine were these bounces. The host was occasionally being blocked as punishment for generating so many bounces. Zero of these bounces were "real"... i.e. the bounce was going to an address that didn't actually send the original message and didn't care about the contents of the bounce message.

These unwanted bounce messages are called "Spam Backscatter".

My outgoing mail queue was literally filled with these bounce messages, being re-tried for weeks until Postfix would give up. I changed Postfix to delete them after a shorter amount of time, but the queue was still getting huge.

This weekend I updated the system's configuration so that it just plain doesn't generate bounces to unknown addresses on the machine. While this is something you absolutely shouldn't do for a general purpose email server (people mistyping the addresses of your users would get very confused) doing this on a highly specialized machine makes sense.

I can now proudly say that for the last 48 hours the configuration has worked well. The machine is no longer a source of backscatter pollution on the internet. The mail queue is empty. It's a shame my other mail servers can't benefit from this technique.

Here are my predictions for 2015:

  1. Bloggers who make stupid, attention-getting, predictions will not be held accountable when those predictions don't come true.
  2. Windows-only enterprises have started buying Apple laptops to run Windows 10 due to the lower repair rate of the higher quality hardware. This trend will increase and Apple will run a marketing campaign to take advantage of the trend.
  3. The battle between Docker and CoreOS to define the container format of the future will stall the industry as it gets more and more nasty. If you thought VHS vs. Betamax was bad, or that AT&T vs. BSD Unix was bad, this will be 100x worse.
  4. The Microsoft container equivalent of docker will create more confusion than anyone could have expected.
  5. Sadly the trend of encouraging girls to get interested in STEM at a young age will dissipate moments after the media spotlight goes away.
  6. At least one company will claim to sell a "DevOps Appliance" that you can plug in, press the "on" button, and "have devops at your company".
  7. I will not make fun of a company for marketing to my employer's WHOIS contact.
  8. The industry that makes and sells bloated, over-priced, shitty, software development tools (the ones that have caused many of the problems in our industry) will repackage those tools as "DevOps" and make a lot of money doing so.
  9. "DevOps is more than Release Engineering" will be the theme of at least one DevOps conference.
  10. Real products announced on April 1st will be dismissed as jokes... again.
  11. Google will cancel the following projects: Sites, Code, and... probably a lot more too.
  12. By December 2015, Tom Limoncelli will decide not to do another "Predictions for next year" blog post.

These are my predictions for 2015. I wish all my readers success and happiness in 2015.

Posted by Tom Limoncelli in Funny

Warning! Upgrade now! There is a security hole in the git client.

UNTIL YOU UPGRADE: Do not "git clone" or "git pull" from untrusted sources.

AFTER YOU UPGRADE: Do not "git clone" or "git pull" from untrusted sources. THE CODE YOU JUST DOWNLOADED IS UNTRUSTED AND SHOULD NOT BE RUN, YOU FOOL!

Posted by Tom Limoncelli in Security

InfoQ interviewed the authors of The Practice of Cloud System Administration and included it as part of their review of the book.

Read it here!

Win Treese interviewed me and my co-authors about the book.

An Interview with the authors of "The Practice of Cloud System Administration" on DevOps and Data Security

We discussed DevOps in the enterprise, trends in system administration, and at the end I got riled up and ranted about how terrible computer security has become.

ComputerWorld.com has published an excerpt from our book "The Practice of Cloud System Administration: Designing and Operating Large Distributed Systems Vol 2".

The article has a title that implies it is about capacity planning for data centers but it's really about capacity planning for any system or service.

Room to grow: Tips for data center capacity planning

If you like that it, there's 547 more pages of good stuff like that in the book.

 
  • LISA15