My open source project BlackBox is now available in the MacPorts collection. If you use MacPorts, simply type "sudo port install vcs_blackbox". There was already a package called "blackbox" so I had to call it something else.

Blackbox is a set of bash scripts that let you safely store secrets in a VCS repo (i.e. Git, Mercurial, or Subversion) using Gnu Privacy Guard (GPG). For more info, visit the homepage: https://github.com/StackExchange/blackbox

I'm looking for volunteers to maintain packages for Brew, Debian, and other package formats. If you are looking to learn how to make packages, this is a good starter project and will help people keep their files secure. Interested? Contact me by email open an issue in Github.

Posted by Tom Limoncelli in Blackbox

Usenix has announced the schedule for the second SREcon and the big surprise is that it is now 2 days long. The previous SREcon was a single day.

I wasn't able to attend last year's conference but I read numerous conference reports that were all enthusiastic about the presentations (you can see them online... I highly recommend the keynote).

I'm excited to also announce that my talk proposal was accepted. It is a case study of our experiences adopting SRE techniques at StackOverflow.com/StackExchange.com. The full description is here.

I've heard the hotel is nearly full (or full), so register fast and book your room faster. More info about the conference is on the Usenix web site: https://www.usenix.org/conference/srecon15

See you there!

Posted by Tom Limoncelli in ConferencesSpeaking

I don't get to California often so I'm excited to announce that I'll be the speaker at the March meeting of BayLISA. For more info check out their MeetUp page: http://www.meetup.com/BayLISA/events/219854117/ I'll be talking about our new book, The Practice of Cloud System Administration.

[This is not directly about system administration but it is of interest to many system administrators.]

The Northeast Conference on Science and Skepticism (NECSS) announced that world-renowned science educator Bill Nye will headline NECSS 2015. In addition to giving the conference keynote address on Saturday afternoon, he will be the special guest star of Friday night's SGU Skeptical Extravaganza (a special show open to both conference attendees and the general public) and sign copies of his latest book, Undeniable: Evolution and the Science of Creation.

Bill will join the existing speaker lineup, which is quite impressive.

NECSS is a four-day celebration of science and critical thinking held each year in New York City. Speakers include leading scientists, educators, activists, and performers from a variety of disciplines.

I've been going for years. I've already registered for 2015. I hope to see you there!

More information here: http://necss.org/

Posted by Tom Limoncelli

My co-worker Peter Grace will be the speaker at the Thursday, March 5, 2015 LOPSA-NJ meeting. His topic will be: "Systems Log Aggregation using ElasticSearch/LogSt­ash/Kibana"

For more info: http://www.meetup.com/LOPSA-NJ/events/220599343/

The meetings are near Princeton, in lovely Lawrenceville, NJ.

Don't forget to RSVP!

Posted by Tom Limoncelli

Registration is open for the Cascadia IT Conference 2015 in Seattle, WA on March 13-14, 2015. Cascadia is a regional conference, but people travel from all over to attend. Why? Because it is worth it.

I'll be teaching 2 tutorials on Friday and giving a talk on Saturday morning.

The tutorials are:

  • Time Management for Busy Devs and Ops
  • How To Not Get Paged: Managing Oncall to Reduce Outages

On Saturday morning I'll be giving a talk called "Live Upgrades on Running Systems: 8 Ways to Upgrade a Running Service with Zero Downtime". This will be a condensed version of what I taught at LISA 2014.

There are lots of other great talks too. View the complete schedule and register today.

Seating is limited and based on a first-come, first-served basis. If you are interested in attending either of my tutorials, please register early.

[Update 2015-02-20: The tutorials I'll be teaching has been revised.]

Posted by Tom Limoncelli

I'll be speaking at the Bucks County DevOps meetup this Wednesday. If you are in the New Hope, PA area, please don't miss this! I don't get out to Pennsylvania very often!

http://www.meetup.com/Bucks-County-DevOps/events/206523752/

Tom

The Queen of Code is a 17-minute documentary about Grace Hopper. It just came out today and I assure you that if you watch it, you'll be glad you did.

http://fivethirtyeight.com/features/the-queen-of-code/

On a personal note, Grace Hopper was going to be the graduation speaker when I graduated from Drew University in 1991 but she was ill. She passed away about 6 months later. I wish I could have met her.

Posted by Tom Limoncelli

SAGE-AU is doing some great media work, making the case that the proposed data-retention law in Australia would create a nightmare for businesses that use computers. They point out that every ill-defined or vague point in the law creates more and more problems.

"It's very immature legislation proposal. It's more holes than cheese. There's more questions around it than there are answers," he said.

Read the full article here: Australian sysadmins cop brunt of data-retention burden

Every country should have an organize that speaks for the IT workers. Go SAGE-AU!

Posted by Tom Limoncelli

Dear Tom,

I've been asked to document our company's System Integration process. Do you have any advice?

Sincerely,
A reader.

Dear Reader,

I get this question a lot whether it is system integration, setting up new computers, handling customer support calls, or just about anything. Documenting a process is an important first step to clarifying what the process is. It is a prerequisite to improving it, automating it, or both.

My general advice is to find the process that exists and document exactly how it is done now. Only after that can you evaluate what steps work well and which need to be improved. Don't try to invent a new system to replace the existing chaos. That chaos works (for some definition of "works") and embodies a lot of knowledge about all the little things that have to happen, including a lot of "realities" that may be invisible to managers. This is similar to why it is bad to try to rewrite software systems from scratch.

Creating the document involves interviewing the people that do the tasks, taking notes, and building up a big document. If the process has branches and options, draw a diagram. Meet with people one at a time or in small groups and ask them to explain what they think the process is. Ask clarifying questions. Don't ask them what the process should be, ask what they currently do. If they start talking about improvements write down what they say (so they feel listened to) but then get them back on the subject of what the process is, not what it should be.

If possible, you'll want to get to the point where you can do the process yourself, by following the document you wrote. The next step is to hand someone else the document and see if they can get through it without your help. If each step is done by a different team or department, you may need to get everyone in the same room and walk through the steps together.

When documenting the process (either by interviewing people or by working through the process solo), you'll find plenty of "issues":

  • Steps that are done differently depending on who does it. That needs to be reconciled. Get both people in the same room and help them work it out. Or, document both routes so that management is aware.
  • Steps that are undefined. If nobody can explain what happens at a certain step but the work is getting done somehow, it is better to document that the step has to be researched than leaving it out of your document.
  • Steps that are ill-defined. There may be steps that, for various reasons, one has to figure out in an ad hoc manner each time. If this is a 1-in-a-million edge case, that's ok. If it is in the main path, actual steps need to be clarified. A good start is to define the end-goal and come back later to work out how it actually gets done.

Each of these "problem steps" should be marked in the document as an "area for improvement" or "TODO". A good process engineer will, over time, eliminate these TODOs. It will impress your management to track how many TODOs are remaining. If, for example, every Monday you add a line to a spreadsheet with the current count, eventually you can produce a graph that shows progress. It is also more professional to say "there are 40 remaining TODOs" than "OMG this project is f---ed!". Having the graph makes this more data-driven: it gives visibility to management about the actual amount of chaos in the project. They might not be technical, but they'll understand that 500 is worse than 100, that "progress" looks like decreasing numbers.

In DevOps terminology this is called getting the "flow" right. The First Way of DevOps is about flow. First you need to get the process to be repeatable (i.e. no more "TODO"s). Then you can focus on making the process better: eliminating duplicate work, replacing steps that are problematic, finding and fixing bottlenecks.

The Second Way of DevOps is about the communication between the people involved in the steps. If each step (or groups of steps) is done by a different person or group of people, do they have a way to give feedback to each other? Do they attend each other's status meetings? If one team does something that causes problems for another, does that team muddle through it and suffer, or do they have a channel to raise the issue and get it fixed?

Here are some recommended reading:

Once the process is documented (defined), you'll want to improve it. Some general ways to do this are:

  • Tracking. If many sub-teams are involved, having a way to track which step is active and how things are handed off becomes critical. People need visibility to entire system so they know what is coming to them, and who is waiting for them.
  • Identify and Fix Bottlenecks. Every system has a bottleneck. Chapter 12 Section 12.4.3 of The Practice of Cloud System Administration discusses this more.
  • Improve steps. Are there steps that are unreliable? The cause of most failures? Fix the biggest problems.
  • Automation. Automation generally reduces variation, improves speed, and saves labor. More important than saving labor, it makes it possible for people to be doing some other work, thus multiplies the labor force.

The Practice of Cloud System Administration has lots of advice about all of these next steps.

Posted by Tom Limoncelli