Awesome Conferences

Should RelEng also be responsible for infrastructure?

Someone recently asked me if it was reasonable to expect their RelEng person also be responsible for the load balancing infrastructure and the locally-run virtualization system they have.

Sure! Why not! Why not have them also be the product manager, CEO, and the company cafeteria's chief cook?

There's something called "division of labor" and you have to draw the line somewhere. Personally I find that line usually gets drawn around skill-set.

Sarcasm aside, without knowing the person or the company, I'd have to say no. RelEng and Infrastructure Eng are two different roles.

Here's my longer answer.

A release engineer is concerned with building a "release". A release is the end result of source code, compiled and put into packages, and tested. Many packages are built. Some fail tests and do not become "release candidates". Of the candidates, some are "approved" for production.

Sub-question A: Should RelEng include pushing into production?

In some environments the RelEng pushes the approved packages into production. In other environments that's the sysadmin's job. Both can work, but IMHO sysadmins should build the production environment because they have the right expertise. Depending on the company size and shape, I can be convinced either way but in general I think RelEng shouldn't have that responsibility. On the other hand, if you have Continuous Deployment set up, then the RelEng person should absolutely be involved or own that aspect of the process.

Sub-question B: Should RelEng build the production infrastructure?

RelEng people are now expected to build AWS and Docker images, and therefore are struggling to learn things that sysadmins used to have a monopoly on. However you still need sysadmins to create the infrastructure under Docker or whatever virtual environment you are using.

Longer version: Traditionally sysadmins build the infrastructure that the service runs on. They know all the magic related to storage SANs, Cisco switches, firewalls, RAM/CPU specs for machines, OS configuration and so on. However this is changing. All of those things are now virtual: storage is virtual (SANs), machines are virtual (VMs), and now networks are too (SDN). So, you can now describe the infrastructure in code. The puppet/cfengine/whatever configs are versioned just like all other software. Thus, should they be the domain of RelEng or sysadmins?

I think it is pretty reasonable to expect RelEng people to be responsible for building Docker images (possibly with some help from sysadmins) and AWS images (possibly with a lot of help from sysadmins).

But what about the infrastructure under Docker/VMware/etc? It should also be "described in code" and therefore be kept under revision control, driven by Jenkins/TeamCity/whatever, and so on. I think some RelEng people can do that job, but it is a lot of work and highly specialized therefore the need for a "division of labor" outweighs whether or not a RelEng person has those skills. In general I'd have separate people doing that kind of work.

What do we do at StackExchange? Well, our build and test process is totally automated. Our process for pushing new releases into production is totally automated too, but requires a human to trigger it (possibly something we'll eliminate some day). So, the only RelEng we need a person for is to maintain the system and add occasional new features. Therefore, that role is done by Devs but the SREs can back-fill. The infrastructure itself is designed and run by SREs. So, basically the division of labor described above.

Obviously "your milage may vary". If you are entirely running out of AWS or GCE you might not have any infrastructure of your own.


Posted by Tom Limoncelli in DevOps

No TrackBacks

TrackBack URL:

Leave a comment