Awesome Conferences

March 2014 Archives

Good reads, March 2014

A summary of the interesting articles I've found this month.

Why Puppet/Chef/Ansible aren't good enough (and we can do better): This is mostly about the Nix package manager and the new linux distro NixOS which is entirely Nix-based down to the bone. I haven't used it yet, but I had to admit this is what I was trying to achieve back in the 1990s with the simple package management system I made... but I didn't go far enough. These people did. I'm looking forward to trying this out. DEC64 is a new (proposed) floating point format. I fear that most people don't understand how floating point numbers are stored on computers so this will be wasted. However I'm fascinated by the implications of this new (proposed) format. Basically 54-bits are used to store an integer and 8 bits are used to store the exponent. So, you know how big numbers are often written "1234E45"? Well, in this format you store "1234" in the 54-bit part and "45" in the 8-bit part. If two numbers have the same exponent the math is just integer math (assuming no overflow).

Multipath TCP: I had misconceptions about this. It turns out this is a system for doing TCP over all your interfaces at the same time. For example, a mobile phone has a Wifi NIC and an LTE "modem". MPTCP let's you open a connection to a web site on Wifi and LTE at the same time, load balancing between the two; transparently switching between them as one has more errors or dropouts, etc. I think this would make my mobile experience so much better that I plan on changing mobile platforms the moment someone supports this. Of course, it has to be supported on the website end also, but I can hope. Evil thought: The IPv6 people should convince kernel developers to only implement this for IPv6 and declare it to the "the killer feature of IPv6". Considering that LTE is IPv6, this isn't too far fetched.

Looking back on "Look Back" videos: Facebook is doing some interesting SRE and development work. This is an interesting look inside what they do.

Go Read: One Year with Money and App Engine: When Google Reader was cancelled, Matt made a clone called "Go Read". At the 1 year anniversary here's his look back at his experience building a business and making it profitable. It turns out a key part was optimizing not the code, but his usage of Google App Engine. Interesting quote: "App Engine charges for data stored in its amazing datastore (my favorite feature of App Engine and the only feature I'm aware of that has zero competitors in the cloud space. When you compare to AWS prices, no one mentions the datastore.)"

How We Make Trello: This is a great writeup of how Trello works ... on the inside. It turns out the web client is doing all the smarts in the browser and talks to their API just like the mobile app does. More web apps should be like that. If you aren't using Trello you should check it out. People love it so much that I get fanmail just for recommending it. One of my talks at Cascadia IT 2014 included 3 slides on Trello. The next week I got email that said, "I especially want to thank you for Trello - what a simply elegant app--wish I'd found this sooner--it's a breeze and SO HELPFUL! I've tried other PM tools that I like but that seemed to take too much setup and maintenance time (like Basecamp, etc.). Trello is about as perfect as it gets."

Why Roslyn is a big deal: I'm a total fanboy for reading about compiler internals. If reading about LLVM got you hot and bothered, check out Microsoft's new compiler project. By making the compiler out of re-usable components, it is going to make their IDEs and, heck, their entire tool chain a lot, lot, better. Why aren't the LLVM people applying this kind of thinking to IDEs?

Posted by Tom Limoncelli in Good Reads

I'll be teaching my "Evil Genius 101" half-day class at LOPSA-East

You want to innovate: deploy new technologies such as configuration management (CfEngine, Puppet, etc.), set up a wiki, or standardized configurations. Your coworkers don't want change. They like it the way things are. Therefore, they consider you evil. However you aren't evil, you just want to make things better.

In this class you will learn how to:

  • Help your coworkers understand and agree to your awesome ideas
  • Convince your manager about anything. Really
  • Turn the most stubborn user into your biggest fan
  • Get others to trust you so they are more easily convinced
  • Deciding which projects to do when you have more projects than time
  • Make decisions based on data and evidence
  • Drive improvements based on a methodology and planning instead of guessing and luck

LOPSA-East is a regional sysadmin conference in New Brunswick NJ, May 2-3, 2014. More info here:

This class also talks about the best DevOps techniques that you can steal for your organization.

The first half is spent analyzing your organization and helps you do some "soul searching" to figure out which projects are most in need of your Evil Plans.

Sign up today!


P.S. I'll also be teaching my "Intro to Time Management for System Administrators" half-day class, which gives you the tools to better manage your time, and avoid the interruptions and distractions that keep you from getting work done.

Posted by Tom Limoncelli in LOPSA-East

I'll be teaching my "Intro to Time Management for System Administrators" class at LOPSA-East. I haven't taught this class in ages so this is a good opportunity to check it out.

The class covers the most important points of my Time Management for System Administrators O'Reilly book. Sysadmins have a time management problem: There are too many projects. Too many interruptions. Too many distractions. This half-day class presents fundamental techniques for eliminating interruptions and distractions so you have more time for projects, prioritization techniques so the projects you do work on have the most impact. It wraps it all into "The Cycle System" which is the easiest and most effective way to juggle all your tasks without dropping any.

LOPSA-East is a regional sysadmin conference in New Brunswick NJ, May 2-3, 2014. More info here:

Personally my goal for this class is to give you the tools so that you can get a good start on the techniques so that you can mold and shape them to your own needs.

I'm surprised at how many people have taken this class multiple times. One person told me he takes it every few years "to brush up" on the basics.

Taking this class at LOPSA-East has the benefit that, due to the small class size, I have more time for Q&A.

Sign up today!


P.S. I'll also be teaching my "Evil Genius 101" half-day class, which focuses on how to convince coworkers and managers how to do "big changes" like adopt configuration management, DevOps techniques, and so on.

Posted by Tom Limoncelli in LOPSA-East

Strata Chalup, Christine Hogan, and I have been working on a new book titled, "The Practice of Cloud Administration".

This new book is all new material focused on the design and operation of distributed systems or "cloud" computing. The book is two parts: "Building it" and "Running it". It is the sequel to our enterprise-focused book, "The Practice of System and Network Administration".

If you want to see a preview, the only conference where you'll be able to get a sneak peek is at LOPSA-East. Otherwise you'll have to wait until at least September 2014 (we don't have the exact shipping date yet, it could be as late as November).

The preview will be during this session: I'll be talking about what you can expect in the book, our writing process, and I hope to have some sample chapters that I can hand out.

Register today:

LOPSA-East is a regional sysadmin conference in New Brunswick NJ, May 2-3, 2014. Two days of world-class training on a diverse range of topics plus community-selected talks:

I hope to see you there!

--Tom Limoncelli

Posted by Tom Limoncelli in Book News


I'm planning on making a big announcement on Monday. Nothing earth-shattering, but watch this space.

Posted by Tom Limoncelli in Book News

This year the LISA CFP is different both in content and form. This represents a big change for this conference LISA. There is less emphasis on academic talks and instead more emphasis on high-impact, cutting edge talks on what sysadmins need to know about today and in the coming 18-24 months. If you consider the changes over the last few years, soon LISA will be unrecognizable (in a good way) from LISA of the past.

This year the focus is on 5 topics:

  • Systems Engineering (Large scale design challenges, Cloud and hybrid cloud deployments, Software Defined Networks (SDN); Virtualization; HA and HPC Clustering; Cost effective, scalable storage; Hadoop/Big Data; Configuration management)
  • Culture (Business communication and capital planning; Continuous delivery and product management; Distributed and remote worker challenges; On-call challenges; Standardization to support automation; Standards and regulatory compliance)
  • Devops (Site reliability engineering; Development frameworks for Ops; Release engineering; API-driven operations; Continuous deployment and fault resilience; The Ops side of DevOps)
  • Monitoring/Metrics (Monitoring, alerting, and logging systems; Analytics, interpretation, and application of system data; Visualization of system data)

  • Security (Network IDS and IPS; Incident management; Disaster resilience and mitigation; Security testing frameworks; Continuous release security; Current security challenges)

I'm excited that LISA is modernizing and updating (and I'm glad to be on the committee).

The form of the CFP is very different too. In past years it has been page after page of text that, to be honest, makes my eyes hurt after a while. Now it is succinct and focused with a "Submit your proposal" link at the end.

I'd like to point out that rather than emphasizing academic research papers, that isn't even mentioned until the end. People that think of LISA as a "ivory tower researcher" conference will be pleasantly surprised. Research papers are now a specific track, constructed by a separate committee so that the main organizing committee can focus on bringing in the best talks and tutorials.

You should also notice that the term "invited talks" no longer appears in the CFP. Everyone is asked to submit their proposals and the committee will pick the best. (This was true in past years, but the term "invited" was left in place.) Of course, the committee will be chasing down particular people and topic experts, but if you don't hear from the committee, don't be shy! Reach out to us!

Proposals are due April 14, 2014. Please submit your proposals ASAP!

Posted by Tom Limoncelli in LISA

Just a reminder to everybody that the Early Bird Discount to LOPSA-East 2014 registrations ends on Sunday, March 23rd at 11:59pm - Register now before it's too late!


LOPSA-East is a regional sysadmin conference in New Brunswick NJ, May 2-3, 2014. Two days of world-class training on a diverse range of topics plus community-selected talks on everything from Active Directory to Code Review for Sys Admins! We have a very exciting lineup of tutorials and talks this year, you can find all of the exciting content at:

You can also take my "Intro to Time Management" half-day class, plus my "Evil Genius 101" half-day class.

Looking forward to seeing you all there!

Posted by Tom Limoncelli in LOPSA-East

Someone recently asked me if it was reasonable to expect their RelEng person also be responsible for the load balancing infrastructure and the locally-run virtualization system they have.

Sure! Why not! Why not have them also be the product manager, CEO, and the company cafeteria's chief cook?

There's something called "division of labor" and you have to draw the line somewhere. Personally I find that line usually gets drawn around skill-set.

Sarcasm aside, without knowing the person or the company, I'd have to say no. RelEng and Infrastructure Eng are two different roles.

Here's my longer answer.

A release engineer is concerned with building a "release". A release is the end result of source code, compiled and put into packages, and tested. Many packages are built. Some fail tests and do not become "release candidates". Of the candidates, some are "approved" for production.

Sub-question A: Should RelEng include pushing into production?

In some environments the RelEng pushes the approved packages into production. In other environments that's the sysadmin's job. Both can work, but IMHO sysadmins should build the production environment because they have the right expertise. Depending on the company size and shape, I can be convinced either way but in general I think RelEng shouldn't have that responsibility. On the other hand, if you have Continuous Deployment set up, then the RelEng person should absolutely be involved or own that aspect of the process.

Sub-question B: Should RelEng build the production infrastructure?

RelEng people are now expected to build AWS and Docker images, and therefore are struggling to learn things that sysadmins used to have a monopoly on. However you still need sysadmins to create the infrastructure under Docker or whatever virtual environment you are using.

Longer version: Traditionally sysadmins build the infrastructure that the service runs on. They know all the magic related to storage SANs, Cisco switches, firewalls, RAM/CPU specs for machines, OS configuration and so on. However this is changing. All of those things are now virtual: storage is virtual (SANs), machines are virtual (VMs), and now networks are too (SDN). So, you can now describe the infrastructure in code. The puppet/cfengine/whatever configs are versioned just like all other software. Thus, should they be the domain of RelEng or sysadmins?

I think it is pretty reasonable to expect RelEng people to be responsible for building Docker images (possibly with some help from sysadmins) and AWS images (possibly with a lot of help from sysadmins).

But what about the infrastructure under Docker/VMware/etc? It should also be "described in code" and therefore be kept under revision control, driven by Jenkins/TeamCity/whatever, and so on. I think some RelEng people can do that job, but it is a lot of work and highly specialized therefore the need for a "division of labor" outweighs whether or not a RelEng person has those skills. In general I'd have separate people doing that kind of work.

What do we do at StackExchange? Well, our build and test process is totally automated. Our process for pushing new releases into production is totally automated too, but requires a human to trigger it (possibly something we'll eliminate some day). So, the only RelEng we need a person for is to maintain the system and add occasional new features. Therefore, that role is done by Devs but the SREs can back-fill. The infrastructure itself is designed and run by SREs. So, basically the division of labor described above.

Obviously "your milage may vary". If you are entirely running out of AWS or GCE you might not have any infrastructure of your own.


Posted by Tom Limoncelli in DevOps

There is often a debate between software developers about whether it is best to branch software, do development, then merge back into HEAD, or just work from HEAD.

Jez Humble and others make the claim that the latter is better. If you make your changes in "small batches" this works. In fact, it works better than branching. When you merge your branch back in the bigger the merge, the more likely the merge will introduce problems.

Jez recently tweeted:

which caused a bit of debate between various twitterers (tweeters? twits?)

Jez co-wrote the definitive book on the subject, so he has a lot of authority in this area. If you haven't read Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation (by Jez Humble and David Farley), stop reading this blog post now and get it. Seriously. It is worth it.

Some things I'd like to point out about that slide from Google:

  • Yes, 10000 developers all working from HEAD actually works. People often say that this can't possibly scale, and yet here is an example of it working. There's a big difference between "it can't work", "I haven't gotten it to work", and "I'm conjecturing that it couldn't work".
  • Even though Google has one big monolithic code tree, each project can be checked out individually. That said, if your project is a library that other people use, compiling at timestamp T means getting the library as it is at timestamp T also.
  • Do some projects use a branch-and-merge methodology? When I started at Google some did use "branch and merge". However those numbers were shrinking. I'm sure there were still some that did this, for special edge cases. Not that I had visibility to every project at Google, but it was generally accepted as true that nearly everyone worked from HEAD.
  • "50% of code changes every month": A big part of why that is possible is that Google is very aggressive about deleting inactive code. It's still in the VCS's history if you need it, so why not delete it if it isn't being used? By being aggressive about deleting inactive code it greatly reduces the maintenance tax. Making a global change (like.... changing a library interface) is much easier when you only have to do it for active projects.

Of course, what's really amazing about that slide is that the entire company has one VCS for all projects. That requires discipline you don't see at most companies. I've worked at smaller companies that had different VCS software and different VCS repositories for every little thing. I'm surprised at how many companies have entire teams that don't even use VCS! (If there was an Unicode codepoint for a scream in agony, I'd insert that here).

By having one repo for the entire company you get leverage that is so powerful it is difficult to even explain. You can write a tool exactly once and have it be usable for all projects. You have 100% accurate knowledge of who depends on a library; therefore you can refactor it and be 100% sure of who will be affected and who needs to test for breakage. I could probably list 100 more examples. I can't express in words how much of a competitive advantage this is for Google.

In a literal sense not all Google code is in that one tree. When I was there, Chrome, Android and other projects had their own tree. Chrome and Android were open source projects and had very different needs. That said, they are "work from HEAD" so the earlier point is the same.


Disclaimer: This is all based on my recollection of how things were at Google. I have no reason to believe it hasn't changed, but I have no verification of it either.

Posted by Tom Limoncelli

Hey Seattle! This year's Cascadia IT conference has been great! Congrats to everyone that helped put it together and attended. I look forward to seeing you all next year!

(And since LISA is also in Seattle this year, I hope to see you all in November!)

Posted by Tom Limoncelli

I'll be teaching tutorials and giving a presentation. More info soon. Visit the conference site for details: