March 2016 Archives

CACM reprinted my article in the April edition. They don't usually publish April Fools articles, but I'll consider this the appropriate place for this article.

If you subscribe to CACM, you can read the article online, PDF, Ebook. You can also read it in the original publication, ACM Queue for free.

Posted by Tom Limoncelli in ACM Queue Column

The new season of Archer starts tonight!! (Mar 31, 2016) at 10pm!

  • The Powerpuff Girls 2016 Reboot starts on April 4th. Set your DVRs now!

  • The Detour starts on April 11. I have high hopes for this show. It is created by Samantha Bee and Jason Jones.

  • Speaking of Samantha Bee, her new weekly news program Full Frontal with Samantha Bee is my new favorite show. I think it shows that she should have replaced Jon Stewart.

  • Silicon Valley's season 3 premieres April 24. If you work at a startup, or just wish you did, you can't miss this show. ProTip: If you DVR can only set up new recordings 14 days in advance, set a reminder in your April 10 todo list.

And also remember my 3 tips for using a DVR to reduce how much time you waste watching TV:

  • Rule 1: If you watch all the way to the end of the program, you have to delete it. Don't give me any of that "Oh, I'll want to watch that again" logic. You don't have enough time to watch everything that gets recorded, let alone watch it a second time.
  • Rule 2: If you add anything to the list of shows that are automatically recorded (Season Passes), you have to delete something of equal length and frequency. Alternative: each month you have to delete at least one hour worth of Season Passes.
  • Rule 3: If it's about to get old enough to be automatically deleted, let it expire. No extending the date. Archiving it to tape because "I'll find time to watch it later" isn't allowed (see Rule 1 about how much free time you have). Dude, ya just gotta learn to let it go. For me, the only exceptions to this rule are the three most important shows I'm watching.

These Tivo tips are from the "how to avoid wasting time" section of Time Management for System Administrators.

This HuffPo article is worth reading:

According to an Ernst & Young internal study, for "each additional 10 hours of vacation employees took, their year-end performance ratings improved 8 percent, and frequent vacationers also were significantly less likely to leave the firm".

By the way... using your vacation time one day here and a long weekend there is not a "vacation". Generally I find that you're brain doesn't relax until day 3 of a vacation, especially if there is travel involved. If you don't take a long break you never reach that point. If you take a lot of long weekends you end up spending that time doing laundry and you rob yourself of the opportunity to actually relax.

That's why my time management book has an entire chapter just on vacations and relaxation. Taking a vacation isn't wasting time... it is an investment in yourself.

Posted by Tom Limoncelli in Time Management

This weekend is a good time to watch the video we'll be discussing on the next episode of LISA conversations: Caskey Dickson's talk from LISA '15 titled Why Your Manager LOVES Technical Debt and What to Do About It.

  • Homework: Watch his talk ahead of time.
    • Why Your Manager LOVES Technical Debt and What to Do About It
    • Recorded at LISA '15
    • Video and Slides

Then you'll be prepared when we record the episode on Tuesday, March 29, 2016 at 3:30-4:30 p.m. Pacific Time (convert). Register (optional) and watch via this link. Watching live makes it possible to participate in the Q&A.

The recorded episode will be available shortly afterwards on YouTube.

You won't want to miss this!

Posted by Tom Limoncelli in LISA Conversations

"Done" means "launched". It isn't "done" until it is launched. It annoys me to hear people say a project is "done... now I just have to launch it". It isn't done if it isn't in production.

There are a few reasons for this: people think that launch is "the last 5 percent of a project" but often 80 percent of your time will be consumed by this last 5 percent.

Also, you aren't "done" until other people are benefitting from your work (in business speak... "it is delivering value"). Written code has no business value. Launched code does.

You can rig this in your favor. Structure your project as a MVP (minimum viable product) launch followed by a series of mini-launches, one per feature. This way your written code stays unlaunched for the shortest amount of time. An MVP release might be just the main webpage and placeholders for every feature. However it forces you to go through all the launch tasks: setting up the web servers, load balancers, databases, and so on. These things can take a lot of time. Oh, and if there is a separate dev team and ops team, your ops team can start developing their runbook now, not the day before launch. This makes operations suck less.

Which brings me to a story about wasting one million dollars...

I once saw a project with a plan to launch after 2 years of development. After 1.9 years the SREs were needed for a higher priority project. The incomplete project was abandoned and the efforts of 5 SREs for 1.9 years was forgotten. Do the math... that's about a million dollars that Xxxxxx wasted.

If they had launched an MVP after a few months and then kept building on it (as I had recommended) Xxxxxx would have seen some benefit of the system. However they ignored this advice (I think someone used the term "trouble-maker" to describe me) and they went off to build their new system.

The goal of the project was to replace a legacy system that was missing one important feature, then use it as a platform for a number of new features. I don't mean to gloat, but after my warnings were ignored, I spent a little time making a gross, hacky, quick-and-dirty, version of the important feature and added it to the legacy system. I launched it, and the users were 90% happy. The 2-year project was going to fill in that last 10% of happiness... for a million dollars.

As far as I know the legacy system was used for a number of years after this.

Perhaps the success of my quick hack helped justify abandoning the bigger project. Management had to pick a project to kill so they could have 4-5 people for a higher priority project. Maybe the quick hack made the legacy system "good enough" and helped justify killing the project. Maybe this spared some other project from being killed. I wonder what that project was.

I'm sure the legacy system has become obsolete by now. I don't know or care. I do, however, care that a bunch of excellent SREs had their work thrown away... which must have been demoralizing.

Lately I've been thinking a lot about applying MVP-style project management everywhere. It just makes more sense. Once you've experienced it in one place, you can't help but want to do it everything: system administration, relationships, home repair, etc.

To that end I have one piece of advice: Rush to launch something... anything... and build on it. Reduce the scope to the minimum; avoid the temptation to add "just this one last thing" before you launch. Do this even if it is only usable by a small fraction of the users, or only helps a particular special case. People would rather have some features today than all the features tomorrow. Tomorrow may never come.

Posted by Tom Limoncelli in Management

Someone recently commented that with Github it is "a pain if you want to have a work and personal identity."

It is? I've had separate work and personal Github accounts for years. I thought everyone knew this trick.

When I clone a URL like [email protected]:TomOnTime/tomutils.git I simply change github to either github-home or github-work. Then I have my ~/.ssh/config file set with those two names configured to use different keys:

# TomOnTime
  User git
  IdentityFile ~/.ssh/id_rsa-githubpersonal
  PreferredAuthentications publickey
  PasswordAuthentication no
  IdentitiesOnly yes

# tlimoncelli
  User git
  IdentityFile ~/.ssh/id_rsa-githubwork
  PreferredAuthentications publickey
  PasswordAuthentication no
  IdentitiesOnly yes

I also have things set up so that if I leave the name alone, my work-owned machines default to the work key, and my personal machines default to my personal key.

As far as the web user interface, rather than switching between accounts by logging out and logging back in all the time, I simply use Chrome's multi-user feature. Each user profile has its own cookie jar, maintains its own set of bookmarks, color themes, and so on. One user is my "work" profile. It is green (work==money==green), has bookmarks that are work-related, and is logged into my work Github account. The other is my "home" profile. It is blue (I live in a blue house), has my personal bookmarks, and is logged into my personal Github account.

Having each profile be a very different color makes it very easy to tell which profile I am in. This prevents me from accidentally using my work profile for personal use or vice-versa.

I know some people do something similar by using different browsers but I like this a lot more.

Once I set this up using multiple accounts on Github was easy!

Posted by Tom Limoncelli in Technical Tips

Tavis Ormandy, Google security expert, is getting press for criticizing Meaningless Antivirus Excellence Awards. This is a good opportunity to mention some thoughts I've had about anti-malware software.

I believe that enterprise security defense software (anti-virus, anti-malware, host-based firewall, etc.) should have these qualities:

  • Silent Updating: The software should update silently. It does not need to pop up a window to ask if the new antivirus blacklist should be downloaded and installed. That decision is made by system administrators centrally, not by the user.
  • Hidden from view: The user should be able to determine that the software is activated, but it doesn't need an animated spinning status ball, nor popup windows to announce that updates were done. Such gimmicks slow down the machine and annoy users.
  • Negligible performance impact: Anti-malware software can have a significant impact on the performance of the machine. Examining every block read from disk, or snooping every network packet received, can use a lot of CPU, RAM, and other resources. When selecting anti-malware software, benchmark various products to determine the resource impact.
  • Centralized Control: Security defense software should be configured and controlled from a central point. The user may be able to make some adjustments but not disable the software.
  • Centralized Reporting: There should be a central dashboard that reports the status of all machines. This might include what viruses have been detected, when the last time the machine received its antivirus policy update, and so on. Knowing when the machine last checked in is key to knowing if the software was disabled.

Obviously "consumer" product can drop the last two requirements.

However "consumer" products also tend to violate the other items too! "Consumer" anti-malware products tend to be flashy and noisy. Why is this?

I have a theory.

Anti-malware software sold to the consumer needs to be visible enough so that the user feels like they're getting their money worth. Imagine if the product ran silently, protecting the user flawlessly, only popping up once a year to ask the user to renew for the low, low, price of $39.99? The company would go out of business. Nobody would renew as it appears to have done nothing for the last 12 months.

Profitable anti-malware companies make sure their customers are constantly reminded that they're being protected. Each week their software pops up to say there is a new update downloaded and asks them to click ``protect me'' to activate it. Firewall products constantly asks them to acknowledge that a network attack has been defended, even if it was just an errant ping packet. Yes, the animated desktop tray icon consumes CPU bandwidth and drains laptop batteries but that spinning 3D ball reassures the user and validates their purchase decision.

Would it have been less programming to simply do the update, drop the harmful packet, and not display any popups? Certainly. But it would have reduced brand recognition.

All of this works because there is an information deficit. Bruce Schneier's blog post, ``A Security Marketplace for Lemons'' explains that a typical consumer can not judge the quality of a complex product such as security software, therefore they are vulnerable to these shenanigans.

However you are a system administrator with technical knowledge and experience. It is your job to evaluate the software and determine what works best for your organization. You know that it should rapidly update itself and be as unobtrusive to the users as possible. Whether or not you renew the software will be based on the actionable data made visible by the dashboard, not due to the excitement generated by spinning animations.

Posted by Tom Limoncelli in Security

Our next guest will be Caskey Dickson. We'll be discussing his talk from LISA '15 titled Why Your Manager LOVES Technical Debt and What to Do About It.

Watch live! We'll be recording the episode on Tuesday, March 29, 2016 at 3:30-4:30 p.m. Pacific Time. Particpate in the live Q&A by submitting your questions during the broadcast. Pre-registration is recommended. Register and/or watch via this link.

  • Homework: Watch his talk ahead of time:

    • Why Your Manager LOVES Technical Debt and What to Do About It
    • Recorded at LISA '15
    • Video and Slides
  • Watch live!

    • LISA Conversations Episode #8
    • Co-hosts: Lee Damon and Thomas Limoncelli
    • Guest: Caskey Dickson
    • Will be recorded: Tuesday, March 29, 2016 at 3:30-4:30 p.m. Pacific Time (convert)

The recorded episode will be available shortly afterwards on YouTube.

You won't want to miss this!

Posted by Tom Limoncelli in LISA Conversations

Episode 7 of LISA Conversations is Kris Buytaert, who presented DevOps: The past and future are here. It's just not evenly distributed (yet) at LISA '11.

You won't want to miss this!

Posted by Tom Limoncelli in LISA Conversations

Step 1. Buy this for your boss or coworkers.

Step 2. Prepare for hijinks.



We're in the process of updating The Practice of System and Network Administration (read the drafts here) and I discovered an old section that was written with the assumption that DHCP was newish and readers would need encouragement to use it.

Of course, I ripped it out and replaced it with something more modern. However, I couldn't help but include an explanation for new sysadmins what life was like before DHCP (see The Importance of DHCP).

Which leads me to this video of teens reacting to Windows 95 (and associated article). The best quote is, "How do you get Internet without WiFi?"

Also note that when they explain that a modem connects to the phone to get internet access, the teen looks at her cell phone. I think they should have clarified that modems work on land lines, but then maybe they'd have to explain what that is.

It sure makes you appreciate how good things are today. It was worth spending some time watching it, if only to see how youth today describe things.

By the way... one thing I remember about Windows 95 was that the DHCP client was very brittle. If it saw optional parameters it didn't understand, it just would ignore the packet. This is the opposite of Postel's Robustness Principle, which states, "Be conservative in what you do, be liberal in what you accept from others". I can't remember exactly what the problem was, but I do remember reconfiguring our DHCP server so the default configuration was acceptable to Windows 95 machines, and providing host-specific configurations for other machines.

To this day I still hear about devices with crappy DHCP clients that get confused because they weren't tested very well. I imagine they just tried it on their corporate network and if it worked, assumed it was ok. Their corporate network, of course, uses an anemic configuration, and their code ends up untested in real-world conditions. Nine times out of ten these are IP-Phones, which is ironic because you'd think a device that is so network-centric would have an awesome IP stack.

Anyway... I think it would be interesting to make a video where old-timers watch these "Teens React" videos and pause them at certain points to go into more detail about what's going on, or the history of the device, and so on.

Posted by Tom Limoncelli

"Login", the Usenix Newsletter, has an excellent article about how Google manages oncall. Authors Andrea Spadaccini and Kavita Guliani did an excellent job of providing an overview of how Google seeks to balance oncall time with non-oncall time so that engineers have time for actual engineering.

While most of the article deals with how to prevent operations people from getting overloaded, they also raise the issue that operations underload is dangerous too. SREs get out of practice if they don't get paged enough. They describe games and simulations that SRE teams do to stay in practice.

The article is available for free to Usenix members and newsletter subscribers, or for a nominal charge to everyone else.

Being an On-Call Engineer: A Google SRE Perspective, Andrea Spadaccini and Kavita Guliani

(Side note: the article cites the Oncall chapter of TPOCSA for our analysis of various oncall rotation schemes. Read it for free on SBO.)

Posted by Tom Limoncelli in Usenix

  • LISA16