Recently in Management Category

Wasting one million dollars

Mar. 22 2016

"Done" means "launched". It isn't "done" until it is launched. It annoys me to hear people say a project is "done... now I just have to launch it". It isn't done if it isn't in production.

There are a few reasons for this: people think that launch is "the last 5 percent of a project" but often 80 percent of your time will be consumed by this last 5 percent.

Also, you aren't "done" until other people are benefitting from your work (in business speak... "it is delivering value"). Written code has no business value. Launched code does.

You can rig this in your favor. Structure your project as a MVP (minimum viable product) launch followed by a series of mini-launches, one per feature. This way your written code stays unlaunched for the shortest amount of time. An MVP release might be just the main webpage and placeholders for every feature. However it forces you to go through all the launch tasks: setting up the web servers, load balancers, databases, and so on. These things can take a lot of time. Oh, and if there is a separate dev team and ops team, your ops team can start developing their runbook now, not the day before launch. This makes operations suck less.

Which brings me to a story about wasting one million dollars...

I once saw a project with a plan to launch after 2 years of development. After 1.9 years the SREs were needed for a higher priority project. The incomplete project was abandoned and the efforts of 5 SREs for 1.9 years was forgotten. Do the math... that's about a million dollars that Xxxxxx wasted.

If they had launched an MVP after a few months and then kept building on it (as I had recommended) Xxxxxx would have seen some benefit of the system. However they ignored this advice (I think someone used the term "trouble-maker" to describe me) and they went off to build their new system.

The goal of the project was to replace a legacy system that was missing one important feature, then use it as a platform for a number of new features. I don't mean to gloat, but after my warnings were ignored, I spent a little time making a gross, hacky, quick-and-dirty, version of the important feature and added it to the legacy system. I launched it, and the users were 90% happy. The 2-year project was going to fill in that last 10% of happiness... for a million dollars.

As far as I know the legacy system was used for a number of years after this.

Perhaps the success of my quick hack helped justify abandoning the bigger project. Management had to pick a project to kill so they could have 4-5 people for a higher priority project. Maybe the quick hack made the legacy system "good enough" and helped justify killing the project. Maybe this spared some other project from being killed. I wonder what that project was.

I'm sure the legacy system has become obsolete by now. I don't know or care. I do, however, care that a bunch of excellent SREs had their work thrown away... which must have been demoralizing.

Lately I've been thinking a lot about applying MVP-style project management everywhere. It just makes more sense. Once you've experienced it in one place, you can't help but want to do it everything: system administration, relationships, home repair, etc.

To that end I have one piece of advice: Rush to launch something... anything... and build on it. Reduce the scope to the minimum; avoid the temptation to add "just this one last thing" before you launch. Do this even if it is only usable by a small fraction of the users, or only helps a particular special case. People would rather have some features today than all the features tomorrow. Tomorrow may never come.

Posted by Tom Limoncelli in Management

All outages are due to a failure to plan

Dec. 31 2012

I can't take credit for this, as a co-worker recently introduced me to this point.

All outages are, at their core, a failure to plan.

If a dead component (for example, a hard drive) failed, then there was a lack of planning for failed components. Components fail. Hard disks, RAM chips, CPUs, mother boards, power supplies, even ethernet cables fail. If a component fails and causes a visible outage, then there was a failure to plan for enough redundancy to survive the outage. There are technologies that, with prior forethought, can be included in a design to make any single component's failure a non-issue.

If a user-visible outage is caused by human error it is still a failure to plan: someone failed to plan for enough training, failed to plan the right staffing levels and competencies, failure to plan disaster exercises to verify training, failure to plan to validate the construction and execution of training.

What about the kind of outages that "nobody could have expected"? Also a failure to plan.

What about the kind of outages that are completely unavoidable? That is a failure to plan to have an SLA that permits a reasonable amount of downtime each year. If you plan includes up to 4 hours of downtime each year, those first 239 minutes are not an outage. If someone complains that 4 hours isn't acceptable to them, there was a failure to communicate that everyone should only adopt plans that are ok with 4 hours of downtime each year; or the communication worked but the dependent plan failed to incorporate the agreed upon SLA. If someone feels they didn't agree to that SLA, there was a failure to plan how to get all stakeholders buy-in.

If designs that meet the SLA are too expensive then there was a failure to plan the budget. If a the product can not be made profitable at the expense required to meet the SLA, there was a failure to plan a workable business case.

If the problem is that someone didn't follow the plan, then the plan failed to include enough training, communication, or enforcement.

If there wasn't enough time to plan all of the above, there was a failure to start planning early enough to incorporate a sufficient level of planning.

The next time there is an outage, whether you are on the receiving end of the outage or not, think about what was the failure to plan at the root of this problem. I assure you the cause will have been a failure to plan.

Posted by Tom Limoncelli in Management

Discrimination means missing out on hiring the best sysadmins

Jun. 25 2012

Rikki Endsley posted to Google Plus this week:

I saw this tweet today from a hiring manager: "Just interviewed for a sysadmin. I'm struggling since she has no social footprint. Is that wrong, or should social be key?" What are your thoughts on a 'social footprint' requirement for sysadmins? link

I'm very disturbed hearing a hiring manager say this. "Social Footprint" means how visible the person is on social networks like Facebook, G+, Twitter and so on. What does that have to do with whether or not the person is a good system administrator?

It could be a bad thing if it means the person is anti-social or doesn't keep up with the latest innovations. It could be a good thing if it means the person has privacy concerns. In fact, if someone has a background in security and has kept themselves invisible in light of all the social networking stuff that is out there, I'd say that indicates a particular skill. Guessing wrong in this area will result in a bad hiring decision.

The reason that this really struck me, however, is that the candidate is a "she". Is this a judgement we'd make about a male candidate? Take a moment to think about how you'd react differently to a woman saying she's not on Facebook vs. a man.

While discrimination in certain categories is illegal (this varies by state and country) let's talk about the broader definition of discrimination: Turning away a candidate because "they aren't like me".

The goal in hiring is to hire the absolute best person for the position. Discrimination is bad because it means you end up missing the best candidate. Put another way: Discrimination results in you hiring people that aren't as good as you could be hiring.

Let's look at some subtle ways that we discriminate that leads to bad hiring decisions:

Example 1: Candidate doesn't have a home network: I've heard this used as a "red flag". "How could they be a serious sysadmin if they don't have a network at home?" Here are a few reasons why this is terrible criteria to use:

The candidate can't afford one. Why discriminate against someone for being poorer than you? For most of my career my "home network" was paid for by my employer (either partially or substantially.... whether they knew it or not). Are you discriminating against someone for working for a cheap employer or are you discriminating against them from being too broke to buy equipment and too honest to steal from their employer?
The candidate has a huge lab at work and doesn't need to experiment at home.
The candidate has children at home and doesn't want them to break things. Are you discriminating against someone for having children?
The candidate keeps a good separation between homelife and worklife which is something that many fine time management books recommend. Are you discriminating against someone for having good time management skills? A good "work-life balance"?
The candidate just doesn't need one. Not everyone does.
The candidate has one, but doesn't call it that. When I began writing this article my plan was to point out that I don't have a home network. I don't think of myself as having a home network. However, my Cable TV provider's box includes a WiFi base station: my laptops, phones, Tivos and Wii connect to it. ...that's not a "network", is it? Well, ok, I guess technically it is. I don't think of it as one. I guess you wouldn't have hired me.

The issue of whether or not a candidate has a home network comes from the days when having a home network was difficult: it meant the person had experience running wires, connecting hubs and switches, configuring routers, setting up firewalls, and, if this was before DHCP, it meant knowing a lot about IP addressing. That's a lot of knowledge. While it is a plus to see a candidate with such experience, it isn't a minus if the candidate doesn't have that experience. It just means they have an awesome internet provider or are smart enough to buy a damn pre-made WiFi base station so they can spend more time having fun.

In the chapter on hiring sysadmins in TPOSANA (yes, there is a chapter on that!) we make the point that some people (often women and minorities) downplay their own experience. Quote...

Asking candidates to rate their own skills can help indicate their level of self-confidence, but little else. You don't know their basis for comparison. Some people are brought up being taught to always downplay their skills. A hiring manager once nearly lost a candidate who said that she didn't know much about Macintoshes. It turned out that she used one 8 hours a day and was supporting four applications for her department, but she didn't know how to program one. Ask people to describe their experience instead.

Which leads me to the next example...

Example 2: Candidate didn't grow up using computers: I hadn't realized that was a requirement for being a sysadmin!

The most obvious reason this is invalid reasoning is that some candidates were born before having a home computer was possible. Age discrimination is illegal in all 50 states (though the age range is different).
Many people just plain weren't interested in computers until later in life. Two women I know both tell the same story: it wasn't until sophomore year in college that they took a computer class and realized they had an aptitude for it. Soon they had changed major and the rest is history.
Many people grow up too poor to have a computer when growing up. Discrimination again people for being poor is just stupid. Not hiring someone because they were poor or are poor is helping create the problem of poverty that you so obviously dislike! Duh!

There are many other ways we turn down perfectly good candidates because "they aren't like us". It is an easy trap to get into. It is our responsibility be critically aware of our thinking when making hiring decisions and do our best to hire based on criteria that relates to job performance and nothing else. Hire the best.

Posted by Tom Limoncelli in Management

Awesome Conferences

Recently in Management Category

Wasting one million dollars

All outages are due to a failure to plan

Discrimination means missing out on hiring the best sysadmins

Best of Blog

Navigation

Recent Entries

Search

Archives

RSS Feed

Credits