Awesome Conferences

Can my SLA rule work for networks? Yes.

Last week I mentioned that that if you have a service that requires a certain SLA, it can't depend on things of lesser SLA.

My networking friends balked and said that this isn't a valid rule for networks. I think that violations of this rule are so rare they are hard to imagine. Or, better stated, networking people do this so naturally that it is hard to imagine violating this rule.

However, here are 3 from my experience:

  • Situation: A company who's internet connection is a DSL modem. The modem is in the hallway near the computer room, but not in the computer room. As a result, when someone knocks the modem over, the company's website is down. (web site depending on router). Improvement: move the router into the computer room.
  • A computer room with excellent UPS and power infrastructure... but the router isn't on the UPS for weird historical reasons (it is depending on external power). Improvement: move the router onto the UPS.
  • An excellent computer room with fine ethernet switches... but the router is in the lab one room over. Each VLAN has a physical cable connected to it with a cable that runs to that other room. I was told, "the researchers are doing some experiments on the router so they wanted it in their lab". Improvement: Move the router into the computer room.

3 true stories.

No TrackBacks

TrackBack URL:

2 Comments | Leave a comment

It is surprising how often this is ignored.

In reality, each serial dependency that is necessary to meet and SLA must be added together. To make that problem easy, I tend to assume that there are about 10 serial dependencies for a typical application stack (switching, routing, firewalls, load balancers, app servers, database servers, SAN, power, cooling, etc).

Then the seat of the pants, rough calc is decimal-shiftingly easy. To meet three nines, you need to build each technology layer to about four nines. To meet four nines, you need to build each layer to about five nines, etc.

It's obviously more complicated than that, but it's a good place to start.

I think one of the mistakes being made is that folks assume you can have a high-level SLA and a component of it has a low-level SLA. Fact of the matter is, your SLA is at the level of your lowest common denominator (or SLA).

Yea, you've got redundant servers across redundant data centers across redundant UPSs but only one guy has the key to the on/off switch. Guess what? You've turned that large investment into a pocket calculator 'cause the SLA for the LCD is M-F, 9-5, 50wks/Year.

(I realize that this is what you're talking about but no one ever seems to say it in the plainest of language.)

Where I used to work we had claimed to have a 2hr SLA but the fastest turnaround time we had on some of the vital components of the system was 4hrs. End result? We had a 4hr SLA with a 50% success rate 'cause we couldn't meet it if we needed one of these parts.

Leave a comment