I've been asked to document our company's System Integration process. Do you have any advice?
I get this question a lot whether it is system integration, setting up new computers, handling customer support calls, or just about anything. Documenting a process is an important first step to clarifying what the process is. It is a prerequisite to improving it, automating it, or both.
My general advice is to find the process that exists and document exactly how it is done now. Only after that can you evaluate what steps work well and which need to be improved. Don't try to invent a new system to replace the existing chaos. That chaos works (for some definition of "works") and embodies a lot of knowledge about all the little things that have to happen, including a lot of "realities" that may be invisible to managers. This is similar to why it is bad to try to rewrite software systems from scratch.
Creating the document involves interviewing the people that do the tasks, taking notes, and building up a big document. If the process has branches and options, draw a diagram. Meet with people one at a time or in small groups and ask them to explain what they think the process is. Ask clarifying questions. Don't ask them what the process should be, ask what they currently do. If they start talking about improvements write down what they say (so they feel listened to) but then get them back on the subject of what the process is, not what it should be.
If possible, you'll want to get to the point where you can do the process yourself, by following the document you wrote. The next step is to hand someone else the document and see if they can get through it without your help. If each step is done by a different team or department, you may need to get everyone in the same room and walk through the steps together.
When documenting the process (either by interviewing people or by working through the process solo), you'll find plenty of "issues":
- Steps that are done differently depending on who does it. That needs to be reconciled. Get both people in the same room and help them work it out. Or, document both routes so that management is aware.
- Steps that are undefined. If nobody can explain what happens at a certain step but the work is getting done somehow, it is better to document that the step has to be researched than leaving it out of your document.
- Steps that are ill-defined. There may be steps that, for various reasons, one has to figure out in an ad hoc manner each time. If this is a 1-in-a-million edge case, that's ok. If it is in the main path, actual steps need to be clarified. A good start is to define the end-goal and come back later to work out how it actually gets done.
Each of these "problem steps" should be marked in the document as
an "area for improvement" or "TODO". A good process engineer will,
over time, eliminate these TODOs. It will impress your management
to track how many TODOs are remaining. If, for example, every
Monday you add a line to a spreadsheet with the current count,
eventually you can produce a graph that shows progress. It is
also more professional to say "there are 40 remaining TODOs"
than "OMG this project is f---ed!". Having the graph makes
this more data-driven: it gives visibility to management
about the actual amount of chaos in the project. They
might not be technical, but they'll understand that 500
is worse than 100, that "progress" looks like decreasing numbers.
In DevOps terminology this is called getting the "flow" right.
The First Way of DevOps is about flow. First you need
to get the process to be repeatable (i.e. no more "TODO"s).
Then you can focus on making the process better: eliminating
duplicate work, replacing steps that are problematic, finding and fixing bottlenecks.
The Second Way of DevOps is about the communication
between the people involved in the steps. If each step
(or groups of steps) is done by a different person or
group of people, do they have a way to give feedback to
each other? Do they attend each other's status meetings?
If one team does something that causes problems for another,
does that team muddle through it and suffer,
or do they have a channel to raise the issue and get it fixed?
Here are some recommended reading:
Once the process is documented (defined), you'll want to improve it. Some general
ways to do this are:
- Tracking. If many sub-teams are involved, having a way to track which step is active and how things are handed off becomes critical. People need visibility to entire system so they know what is coming to them, and who is waiting for them.
- Identify and Fix Bottlenecks. Every system has a bottleneck. Chapter 12 Section 12.4.3 of The Practice of Cloud System Administration discusses this more.
- Improve steps. Are there steps that are unreliable? The cause of most failures? Fix the biggest problems.
- Automation. Automation generally reduces variation, improves speed, and saves labor. More important than saving labor, it makes it possible for people to be doing some other work, thus multiplies the labor force.
The Practice of Cloud System Administration has lots of advice about all of these next steps.