Awesome Conferences

See us live(rss)   

Recently in Technical Tips Category

I have some PDFs that have to be reviewed in Adobe Reader, because they include "comments" that OS X "Preview" can't display and edit.

This alias has saved me hours of frustration:

alias reader='open -a /Applications/Adobe\ Reader.app'

Now I can simply type "reader" instead of, say, "cat", and view the PDF:

reader Limoncelli_Ch13_jh.pdf

For those of you that are unfamiliar with Adobe Acrobat Reader, it is Adobe's product for distributing security holes to nearly every computer system in the world. It is available for nearly every platform, which makes it a very convenient way to assure that security problems can be distributed quickly and globally. Recently Adobe added the ability to read and display PDFs. Previously I used Oracle Java to make sure all my systems were vulnerable to security problems, but now that Reader can display PDFs, they're winning on the feature war. I look forward to Oracle's response since, as I've always said, when it comes to security, the free market is oh so helpful.

Posted by Tom Limoncelli in RantsTechnical Tips

How not to use Cron

A friend of mine told me of a situation where a cron job took longer to run than usual. As a result the next instance of the job started running and now they had two cronjobs running at once. The result was garbled data and an outage.

The problem is that they were using the wrong tool. Cron is good for simple tasks that run rarely. It isn't even good at that. It has no console, no dashboard, no dependency system, no API, no built-in way to have machines run at random times, and its a pain to monitor. All of these issues are solved by CI systems like Jenkins (free), TeamCity (commercial), or any of a zillion other similar systems. Not that cron is all bad... just pick the right tool for the job.

Some warning signs that a cron job will overrun itself: If it has any dependencies on other machines, chances are one of them will be down or slow and the job will take an unexpectedly long time to run. If it processes a large amount of data, and that data is growing, eventually it will grow enough that the job will take longer to run than you had anticipated. If you find yourself editing longer and longer crontab lines, that alone could be a warning sign.

I tend to only use cron for jobs that have little or no dependencies (say, only depend on the local machine) and run daily or less. That's fairly safe.

There are plenty of jobs that are too small for a CI system like Jenkins but too big for cron. So what are some ways to prevent this problem of cron job overrun?

It is tempting to use locks to solve the problem. Tempting but bad. I once saw a cron job that paused until it could grab a lock. The problem with this is that when the job overran there was now an additional process waiting to run. They ended up with zillions of processes all waiting on the lock. Unless the job magically started taking less time to run, all the jobs would never complete. That wasn't going to happen. Eventually the process table filled and the machine crashed. Their solution (which was worse) was to check for the lock and exit if it existed. This solved the problem but created a new one. The lock jammed and now every instance of the job exited. The processing was no longer being done. This was fixed by adding monitoring to alert if the process wasn't running. So, the solution added more complexity. Solving problems by adding more and more complexity makes me a sad panda.

The best solution I've seen is to simply not use cron when doing frequent, periodic, big processes. Just write a service that does the work, sleeps a little bit, and repeats.

while true ; do
   process_the_thing
   sleep 600
done

Simple. Yes, you need a way to make sure that it hasn't died, but there are plenty of "watcher" scripts out there. You probably have one already in use. Yes, it isn't going to run precisely n times per hour, but usually that's not needed.

You should still monitor whether or not the process is being done. However you should monitor whether results are being generated rather than if the process is running. By checking for something that is at a high level of abstraction (i.e. "black box testing"), it will detect if the script stopped running or the program has a bug or there's a network outage or any other thing that could go wrong. If you only monitor whether the script is running then all you know is whether the script is running.

And before someone posts a funny comment like, "Maybe you should write a cron job that restarts it if it isn't running". Very funny.

Posted by Tom Limoncelli in Technical Tips

I write a lot of small bash scripts. Many of them have to run on MacOS as well as FreeBSD and Linux. Sadly MacOS comes with a bash 3.x which doesn't have many of the cooler features of bash 4.x.

Recently I wanted to use read's "-i" option, which doesn't exist in bash 3.x.

My Mac does have bash 4.x but it is in /opt/local/bin because I install it using MacPorts.

I didn't want to list anything but "#!/bin/bash" on the first line because the script has to work on other platforms and on other people's machines. "#!/opt/local/bin/bash" would have worked for me on my Mac but not on my Linux boxes, FreeBSD boxes, or friend's machines.

I finally came up with this solution. If the script detects it is running under an old version of bash it looks for a newer one and exec's itself with the new bash, reconstructing the command line options correctly so the script doesn't know it was restarted.

#!/bin/bash
# If old bash is detected. Exec under a newer version if possible.
if [[ $BASH_VERSINFO < 4 ]]; then
  if [[ $BASH_UPGRADE_ATTEMPTED != 1 ]]; then
    echo '[Older version of BASH detected.  Finding newer one.]'
    export BASH_UPGRADE_ATTEMPTED=1
    export PATH=/opt/local/bin:/usr/local/bin:"$PATH":/bin
    exec "$(which bash)" --noprofile "$0" """$@"""
  else
    echo '[Nothing newer found.  Gracefully degrading.]'
    export OLD_BASH=1
  fi
else
  echo '[New version of bash now running.]'
fi

# The rest of the script goes below.
# You can use "if [[ $OLD_BASH == 1]]" to
# to write code that will work with old
# bash versions.

Some explanations:

  • $BASH_VERSINFO returns just the major release number; much better than trying to parse $BASH_VERSION.
  • export BASH_UPGRADE_ATTEMPTED=1 Note that the variable is exported. Exported variables survive "exec".
  • export PATH=/opt/local/bin:/usr/local/bin:"$PATH":/bin We prepend a few places that the newer version of bash might be. We postpend /bin because if it isn't found anywhere else, we want the current bash to run. We know bash exists in /bin because of the first line of the script.
  • exec $(which bash) --noprofile "$0" """$@"""
    • exec This means "replace the running process with this command".
    • $(which bash) finds the first command called "bash" in the $PATH.
    • "$(which bash)" By the way... this is in quotes because $PATH might include spaces. In fact, any time we use a variable that may contain spaces we put quotes around it so the script can't be hijacked.
    • --noprofile We don't want bash to source .bashrc and other files.
    • "$0" The name of the script being run.
    • """$@""" The command line arguments will be inserted here with proper quoting so that if they include spaces or other special chars it will all still work.
  • You can comment out the "echo" commands if you don't want it to announce what it is doing. You'll also need to remove the last "else" since else clauses can't be empty.

Enjoy!

Posted by Tom Limoncelli in Technical Tips

People say things like, "Can you just send me a copy of data?"

If people are taking your entire database as a CSV file and processing it themselves, your API sucks.

(Overheard at an ACM meeting today)

Posted by Tom Limoncelli in Technical Tips

DKhMYli.jpg[ This is a guest post from Dan O’Boyle, who I met at a LOPSA-NJ meeting. I asked him to do a guest post about this subject because I thought the project was something other schools would find useful ]

I’m a systems engineer for a moderately sized school district in NJ.  We own a number of different devices, but this article is specifically about the AcerOne line of netbooks.  I was recently tasked with finding a way to breath new life into about 500 of these devices.  The user complaints on using these models ranged from “constant loss of wireless connectivity” to the ever descriptive “slow”.  The units have 1 gig of ram, and our most recent image build had them domain joined, running windows 7N 32bit.  

These machines were already running a very watered down Windows experience.  I considered what the typical user experience was - They would boot the device, login to windows, login to Chrome (via Google Apps for Education accounts) and then begin their browsing experience.  Along the way they would lose wireless connection (due to a possibly faulty Windows driver), experience CPU and memory bottlenecks due to antivirus and other background windows processes, and generally have a bad time.  The worst part was I couldn’t see a way to streamline this experience short of removing windows.  It turns out that was exactly the solution we needed.

Chromium OS is the open source version of Google’s ChromeOS operating system. The project provides instructions on how to build your own distro and a fairly responsive development community.  Through the community, I was able to find information on 2 major build distributors - Arnold the bat and Hexxah.  Hexxah’s builds seem to get a bit less attention than Arnolds, so after testing both I decided to use one of Arnolds most recent builds.

The AcerOnes took the build without issue.  A few gotcha’s to be aware of are hard drive size, unique driver needs and method of deployment.  Before I describe those problems, I’ll need to explain a bit about our planned method of deployment.

Individual Device configuration:

Configuring the OS on one device took about an hour from download to tweaking.  After copying the build to a USB stick, I installed it to the local HDD of my AcerOne.  I noticed that the wireless card was not detected by default.  This is typically due to a driver issue, and can often be solved by adding drivers to the /lib/firmware directory.  With the wireless card up and running, I added flash/java/PDF/mp3 support with this script (Note that the script is listed to work with Hexxah’s builds but also works with Arnolds.  The default password on arnold’s builds is password.)

Deployment:

Finally, I was ready to try cloning my machine to distribution.  My first successful attempt was using Clonezilla to make a local Clonezilla repo to USB.  This was effective, but it wasn’t pretty.  To distribute this build out to multiple buildings I needed to boot the ISO created by clonezilla over PXE, and given that some of my AcerOnes had 2gig of ram, and some only had 1 many of the devices wouldn’t be able to load the ISO locally into RAM to perform the install.

The next attempt I made was using FOG.  FOG was able to capture the image and store it on a PXE server.  FOG boots machines into a small linux kernel, then issues commands through that kernel to perform disk operations.  This method would work even on my 1gig machines.  At this point I discovered the hard disk problem mentioned earlier. I had originally build my image on a 250gig HDD.  some of my machines only had a 160gig drive.  Even though the image is much smaller than that, (about 4gig) FOG felt that the smaller HDD wouldn’t be able to handle the image and refused to deploy.  This can be solved by ensuring that your build machine has a smaller HDD than any machine you intend to deploy to.

Final Deploy time:

Overall I was able to take the 1 hour configure time it took for me to setup 1 machine, and cut it down to about 5min for a technician in the field.  Stored information about the wireless networks I pre-configured on the master device seems to be in a protected area on the disk that FOG couldn’t read.  The end result is that a technician must image a unit, then enter wireless key information after it’s deployed.

The user experience on the new “ChromiumBooks” has been right on target so far. The devices boot in about 40 seconds. Most of that time is the hardware boot process. Once that is complete ChromiumOS still loads in under 8 seconds. Users are immediately able to login to their Google Apps for Education accounts and begin browsing.

The linux driver for the wifi cards seems to be more stable than the windows driver, and I have much fewer reports of “wifi drop offs”.

Overall, getting rid of windows has been great for these devices.

If you liked this story, or want to shoot me some questions feel free to find me at www.selfcommit.com.

Posted by Dan O'Boyle in Technical Tips

Hey fellow sysadmins! Please take 5 minutes to make sure your DNS servers aren't open to the world for recursive queries. They can be used as amplifiers in DDOS attacks.

https://www.isc.org/wordpress/is-your-open-dns-resolver-part-of-a-criminal-conspiracy/

The short version of what you need to do is here.

Posted by Tom Limoncelli in Technical Tips

Reverting in "git"

I'm slowly learning "git". The learning curve is hard at first and gets better as time goes on. (I'm also teaching myself Mercurial, so let's not start a 'which is better' war in the comments).

Reverting a file can be a little confusing in git because git uses a different model than, say, SubVersion. You are in a catch-22 because to learn the model you need to know the terminology. To learn the terminology you need to know the model. I think the best explanations I've read so far have been in the book Pro Git, written by Scott Chacon and published by Apress. Scott put the entire book up online, and for that he deserves a medal. You can also buy a dead-tree version.

How far back do you want to revert a file? To like it was the last time you did a commit? The last time you did a pull? Or revert it back to as it is on the server right now (which might be neither of those)

Revert to like it was when I did my last "git commit":

git checkout HEAD -- file1 file2 file3

Revert to like it was when I did my last "pull":

git checkout FETCH_HEAD -- file1 file2 file3

Revert to like it is on the server right now:

git fetch
git checkout FETCH_HEAD -- file1 file2 file3

How do these work?

The first thing you need to understand is that HEAD is an alias for the last time you did "git commit".

FETCH_HEAD is an alias for the last time you did a "git fetch". "git fetch" pulls the lastest release from the server, but hides it away. It does not merge it into your workspace. "git merge" merges the recently fetched files into your current workspace. "git pull" is simply a fetch followed by a merge. I didn't know about "git fetch" for a long time; I happily used "git pull" all the time.

You can set up aliases in your ~/.gitconfig file. They act exactly like real git commands. Here are the aliases I have:

[alias]
  br = branch
  st = status
  co = checkout
  revert-file = checkout HEAD --
  revert-file-server = checkout FETCH_HEAD --
  lg = log --graph --pretty=format:'%Cred%h%Creset -%C(yellow)%d%Creset %s %Cgreen(%cr) %C(bold blue)<%an>%Creset' --abbrev-commit --date=relative

This means I can do git br instead of "git branch", saving me a lot of typing. git revert-file file1 file2 file3 is just like the first example above. git revert-file-server is a terrible name, but it basically diffs between the last fetch and my current workspace. git lg outputs a very pretty log of recent changes (I stole that from someone who probably stole it from someone else. Don't ask me how it works).

To add these aliases on your system, find or add a [alias] stanza to your ~/.gitconfig file and add them there.

Posted by Tom Limoncelli in Technical Tips

Matt Simmons of the Standalone Sysadmin blog asked about labeling network cables in a datacenter on the LOPSA-Tech mailing list which brought up a number of issues.

He wrote:

So, my current situation is that I'm working in a datacenter with 21 racks arranged in three rows, 7 racks long. We have one centralized distribution switch and no patch panels, so everything is run to the switch which lives in the middle, roughly. It's ugly and non-ideal and I hate it a bunch, but it is what it is. And it looks a lot like this.

Anyway, so given this really suboptimal arrangement, I want to be able to more easily identify a particular patch cable because, as you can imagine, tracing a wire is no fun right now.

He wanted advice as to whether the network cables should be labeled with exactly what the other end is connected to, including hostname and port number, or use a unique ID on each cable so that as they move around they don't have to be relabeled.

We write about this in the Data Centers chapter of The Practice of System and Network Administration but I thought I'd write a bit more for this blog.

My reply is after the bump...


Posted by Tom Limoncelli in Technical Tips

If you use the Ganeti command line you probably have used gnt-instance list and gnt-node list. In fact, most of the gnt-* commands have a list subcommand. Here's some things you probably didn't know.

Part 1: Change what "list" outputs

Unhappy with how verbose gnt-instance list is? The -o option lets you pick which fields are output. Try this to just see the name:

gnt-instance list -o name

I used to use awk and tail and other Unix commands to extract just the name or just the status. Now I use -o name,status to get exactly the information I need.

I'm quite partial to this set of fields:

gnt-instance list --no-headers -o name,pnode,snodes,admin_state,oper_state,oper_ram

The --no-headers flag means just output the data, no column headings.

What if you like the default fields that are output but want to add others to them? Prepend a + to the option:

gnt-node list --no-headers -o +group,drained,offline,master_candidate 

This will print the default fields plus the node group, and the three main status flags nodes havee: is it drained (no instances can move onto it), offline (the node is essentially removed from the cluster), and whether or not the node can be a master.

How does one find the list of all the fields one can output? Use the list-fields subcommand. For each gnt-* command it lists the fields that are available with that list command. That is, gnt-instance list-fields shows a different set of names than gnt-node list-fields.

Putting all this together I've come up with three bash aliases that make my life easier. They print a lot of information but (usually) fit it all on an 80-character wide terminal:

alias i='gnt-instance list --no-headers -o name,pnode,snodes,admin_state,oper_state,oper_ram | sed -e '\''s/.MY.DOMAIN.NAME//g'\'''
alias n='gnt-node list --no-headers -o +group,drained,offline,master_candidate | sed -e '\''s/.MY.DOMAIN.NAME//g'\'''
alias j='gnt-job list | tail -n 90 | egrep --color=always '\''^|waiting|running'\'''

(Change MY.DOMAIN.NAME to the name of your domain.)

Part 2: Filter what's output

The -F option has got to be the least-known about feature of the Ganeti command line tools. It lets you restrict what nodes or instances are listed.

List the instances that are using more than 3 virtual CPUs:

gnt-instance list -F 'oper_vcpus > 3'

List the instances that have more than 6G of RAM (otherwise known as "6144 megabytes"):

 gnt-instance list -F 'be/memory > 6144'

The filtering language can handle complex expressions. It understands and, or , ==, <, > and all the operations you'd expect. The ganeti(7) man page explains it all.

Which nodes have zero primary instances? Which have none at all?

bc..
gnt-node list --filter 'pinst_cnt 0' gnt-node list -F 'pinst_cnt 0 and sinst_cnt == 0'

Strings must be quoted with double-quotes. Since the entire formula is in single-quotes this looks a bit odd but you'll get used to it quickly.

Which instances have node "fred" as their primary?

gnt-instance list --no-header -o name  -F  'pnode == "fred" '

(I included a space between " and ' to make it easier to read. It isn't needed otherwise.)

Which nodes are master candidates?

gnt-node list --no-headers -o name -F 'role == "C" '

Do you find typing gnt-cluster getmaster too quick and easy? Try this command to find out who the master is:

gnt-node list --no-headers -o name -F 'role == "M" '

Like most gnt-* commands it must be run on the master, so be sure to use gnt-cluster getmaster to find out who the master is and run the command there.

If you use the "node group" feature of Ganeti (and you probably don't) you can find out which nodes are in node group foo:

gnt-node list -o name -F 'group == "foo" '

and which instances have primaries that are in group foo:

 gnt-instance list --no-header -o name  -F  "pnode.group == "foo"'

It took me forever to realize that, since snodes is a list, one has to use in instead of ==. Here's a list of all the instances whose secondary is in node group "bar":

gnt-instance list --no-header -o name  -F '"bar" in snodes.group'

("snodes" is plural, "pnode" is singular")

To recap:

  1. The following commands have a list-fields subcommand and list accepts -o and -F options: gnt-node , gnt-instance , gnt-job , gnt-group , gnt-backup .
  2. -o controls which fields are output when using the list subcommand.
  3. -F specifies a filter that controls which items are listed.
  4. The field names used with -o and -F are different for each gnt-* command.
  5. Use the list-fields subcommand to find out what fields are available for a command.
  6. The filtering language is documented in ganeti(7). i.e. view with: man 7 ganeti
  7. The man pages for the individual gnt-* commands give longer explanations of what each field means.
  8. In bash , filters have to be in single quotes so that the shell doesn't interpret <, >, double-quotes, and other symbols as bash operators.

Enjoy!

Posted by Tom Limoncelli in GanetiTechnical Tips

What's wrong with this as a way to turn a hostname into a FQDN?

FQDN=$(getent hosts "$SHORT" | awk '{ print $2 }')

Answer: getent can return multiple lines of results. This only happens if the system is configured to check /etc/hosts before DNS and if /etc/hosts lists the hostname multiple times. There may be other ways this can happen too, but that's the situation that bit me. Of course, there shouldn't be multiple repeated lines in /etc/hosts but nothing forbids it.

As a result you can end up with FQDN="hostname.dom.ain hostname.dom.ain which, and I'm just guessing here, is going to cause problems elsewhere in your script.

The solution is to be a bit more defensive and only take the first line of output:

FQDN=$(getent hosts "$SHORT" | head -1 | awk '{ print $2 }')

Of course, there is still error-checking you should do, but I'll leave that as an exercise to the reader. (Hint: You can check if $? is non-zero; you can also check if FQDN is the null string.)

Technically the man page covers this situation. It says the command "gathers entries" which, being plural, is a very subtle hint that the output might be multiple lines. I bet most people reading the man page don't notice this. It would be nice if the man page explicitly warned the user that the output could be multiple lines long.

P.S. I'm sure the first comment will be a better way to do the same thing. I look forward to your suggestions.

Posted by Tom Limoncelli in Technical Tips

Here's a good strategy to improve the reliability of your systems: Buy the most expensive computers, storage, and network equipment you can find. It is the really high-end stuff that has the best "uptime" and "MTBF".

Wait... why are you laughing? There are a lot of high-end, fault-tolerant, "never fails" systems out there. Those companies must be in business for a reason!

Ok.... if you don't believe that, let me try again.

Here's a good strategy to improve the reliability of your systems: Any time you have an outage, find who caused it and fire that person. Eventually you'll have a company that only employs perfect people.

Wait... you are laughing again! What am I missing here?

Ok, obviously those two strategies won't work. System administration is full of examples of both. At the start of "the web" we achieved high uptimes by buying Sun E10000 computers costing megabucks because "that's just how you do it" to get high performance and high uptimes. That strategy lasted until the mid-2000's. The "fire anyone that isn't perfect" strategy sounds like something out of an "old school" MBA textbooks. There are plenty of companies that seem to follow that rule.

We find those strategies laughable because the problem is not the hardware or the people. Hardware, no matter how much or how little you pay for it, will fail. People, no matter how smart or careful, will always make some mistakes. Not all mistakes can be foreseen. Not all edge cases are cost effective to prevent!

Good companies have outages and learn from them. They write down those "lessons learned" in a post-mortem document that is passed around so that everyone can learn. (I've written about how to do a decent postmortem before.)

If we are going to "learn something" from each outage and we want to learn a lot, we must have more outages.

However (and this is important) you want those outages to be under your control.

If you knew there was going to be an outage in the future, would you want it at 3am Sunday morning or 10am on a Tuesday?

You might say that 3am on Sunday is better because users won't see it. I disagree. I'd rather have it at 10am on Tuesday so I can be there to observe it, fix it, and learn from it.

In school we did this all the time. It is called a "fire drill". The first fire drill of the school year we usually did a pretty bad job. However, the second one was much better. The hope is that if there was a real fire it will be after we've gotten good at it.

Wouldn't you rather just never have fires? Sure, and when that is possible let me know. Until then, I like fire drills.

Wouldn't you rather have computer systems that never fail? Sure, and when that's possible let me know. Until then I like sysadmin fire drills.

Different companies call them different things. Jesse Robins at Twitter calls them GameDay" exercises. John Allspaw at Etsy calls refers to "resilience testing" in his new article on ACM Queue. Google calls them something else.

The longer you go without an outage, the more rusty you get. You actually improve your uptime by creating outages periodically so that you don't get rusty. It is better to have a controlled outage than waiting for the next outage to find you out of practice.

Fire drills don't have to be visible to the users. In fact, they shouldn't be. You should be able to fail over a database to the hot spare without user-visible affects.

Systems that are fault tollerant should be peridically tested. Just like you test your backups by doing an occasional full restore (don't you?) you should periodically fail over that datbase server, web server, RAID system, and so on. Do it in a controlled way: plan it, announce it, make contingency plans, and so on. Afterwords write up a timeline of what happened, what mistakes were made, and what can be done to improve this next time. For each improvement file a bug. Assign someone to hound people until the list of bugs are all closed. Or, if a bug is "too expensive to fix", have management sign off on that decision. I believe that being unwilling to pay to fix a problem ("allocate resources" in business terms) is equal to saying "I'm willing to take the risk that it won't happen." So make sure they understand what they are agreeing to.

Most importantly: have the right attitude. Nobody should be afraid to be mentioned in the "lesson's learned" document. Instead, people should be rewarded, publically, for finding problems and taking responsibility to fix them. Give a reward, even a small one, to the person that fixes the most bugs filed after a fire drill. Even if the award is a dorky certificiate to hang on their wall, a batch of cookies, or getting to pick which restaurant we go to for the next team dinner, it will mean a lot. Receiving this award should be something that can be listed on the person's next performance review.

The best kind of fire drill tests cross-team communication. If you can involved 2-3 teams in planning the drill you have potential to learn a lot more. Does everyone involved know how to contact each other? Is the conference bridge big enough for everyone? If the managers of all three teams have to pretend to be unavailable during the outage, are the three teams able to complete the drill?

My last bit of advice is that fire drills need management approval. The entire management chain needs to be aware of what is happening and understand the business purpose of doing all this.

John's article has a lot of create advice about explaining this to management, what push-back you might expect, and so on. His article, Fault Injection in Production is so well written even your boss will understand it. (ha ha, a little boss humor there)

[By the way... ACM Queue is really getting 'hip' lately by covering these kind of "DevOps" topics. I highly recommend visiting queue.acm.org periodically.]

A co-worker watched me type the other day and noticed that I use certain Unix commands for purposes other than they are intended. Yes, I abuse Unix commands.

Queue Magazine (part of ACM) has published my description of OpenFlow. It's basically "the rant I give at parties when someone asks me to explain OpenFlow and why it is important". I hope that people actually involved in OpenFlow standardization and development forgive me for my simplifications and possibly sloppy use of terminology but I think the article does a good job of explaining OF to people that aren't involved in networking:

OpenFlow: A Radical New Idea in Networking

I hope that OpenFlow is adopted widely. It has some cool things in it.

Enjoy.

Tom

Posted by Tom Limoncelli in Technical Tips

A co-worker of mine recently noticed that I tend to use rsync in a way he hadn't seen before:

rsync -avP --inplace $FILE_LIST desthost:/path/to/dest/.

Why the "slash dot" at the end of the destination?

I do this because I want predictable behavior and the best way to achieve that is to make sure the destination is a directory that already exists. I can't be assured that /path/to/dest/ exists, but I know that if it exists then "." will exist. If the destination path doesn't exist, rsync makes a guess about what I intended, and I don't write code that relies on "guesses". I would rather the script fail in a way I can detect (shell variable $?) rather than have it "guess what I meant"; which is difficult to detect.

What? rsync makes a guess? Yes. rsync changes its behavior depending on a number of factors:

  • is there one source file or multiple source files?
  • is the destination a directory, a file, or doesn't exist?

There are many permutations there. You can eliminate most of them by having a destination directory end with "slash dot".

For example:

  • Example A: rsync -avP file1 host:/tmp/file
  • Example B: rsync -avP file1 file2 host:/tmp/file

Assume that host:/tmp/file exists. In that case, Example A copies the file and renames it in the process. Example B will fail because rsync's author (and I think this is the right decision) decided that it would be stupid to copy file1 to /tmp/file and then copy file2 over it. This is the same behavior as the Unix cp command: If there are multiple files being copied then the last name on the command line has to be a directory otherwise it is an error. The behavior changes based on the destination.

Let's look at those two examples if the destination name doesn't exist:

  • Example C: rsync -avP file1 host:/tmp/santa
  • Example D: rsync -avP file1 file2 host:/tmp/santa

In these examples assume that /tmp/santa doesn't exist. Example C is similar to Example A: rsync copies the file to /tmp/santa i.e. it renames it as it copies. Example B, however, rsync will assume you want it to create the directory so that both files have some place to go. The behavior changes due to the number of source files.

Remember that debugging, by definition, is more difficult than writing code. Therefore, if you write code that relies on the maximum of your knowledge, you have, by definition, written code that is beyond your ability to debug.

Therefore, if you are a sneaky little programmer and use your expertise in the arcane semantics and heuristics of rsync, congrats. However, if one day you modify the script to copy multiple files instead of one, or if the destination directory doesn't exist (or unexpectedly does exist), you will have a hard time debugging the program.

How might a change like this happen?

  • Your source file is a variable $SOURCE_FILES and occasionally there is only one source file. Or the variable represents one file but suddenly it represents multiple.
  • The script you've been using for years gets updated to copy two files instead of one.
  • Over time the list of files that need to be copied shrinks and shrinks and suddenly is just single file that needs to be copied.
  • Your destination directory goes away. In the example that my coworker noticed, the destination was /tmp. Well, everyone knows that /tmp always exists, right? I've seen it disappear due to typos, human errors, and broken install scripts. If /tmp disappeared I would want my script to fail.

It is good rsync hygiene to end destinations with "/." if you intend it to be a directory that exists. That way it fails loudly if the destination doesn't exist since rsync doesn't create the intervening subdirectories. I do this in scripts and on the command line. It's just a good habit to get into.

Tom

P.S. One last note. Much of the semantics described about change if you add the "-R". They don't get more consistent, they just become different. If you use this option make sure you do a lot of testing to be sure you cover all these edge cases.

Posted by Tom Limoncelli in Technical Tips

I don't think I really understood SSH "Agent Forwarding" until I read this in-depth description of what it is and how it works:

http://www.unixwiz.net/techtips/ssh-agent-forwarding.html

In fact, I admit I had been avoiding using this feature because it adds a security risk and it is best not to use something risky without knowing the internals of why it is risky.

Now that I understand it and can use it, I find it saves me a TON of time. Highly recommended (when it is safe to use, of course!)

Tom

Posted by Tom Limoncelli in Technical Tips

QueueICPC_coercion.jpg

ACM Queue is hosting an online programming competition on its website from January 15 through February 12, 2012.

Using either Java, C++, C#, Python, or JavaScript, code an AI to compete against other participant's programs in a territory-capture game called, "Coercion".

The competition is open to everyone.

Details at: http://queue.acm.org/icpc/

Posted by Tom Limoncelli in Technical Tips

Yesterday on the SysAdvent calendar Aleksey Tsalolikhin has an article about configuration management. It includes a comparison of how to the same in in various languages: bash, CFEngine, chef and Puppet. Seeing how the languages differ is very interesting!

SysAdvent: December 19 - Configuration Management

Posted by Tom Limoncelli in Technical Tips

A great explanation about "yield" followed by a discussion of coroutines and more:

In the sequel, he goes into even more detail and the uses all the information to write an operating system in Python.

Posted by Tom Limoncelli in PythonTechnical Tips

Fabric is a new tool for ssh'ing to many hosts. It has some nice properties, such as lazy execution. You write the description of what is to be done in Python and Fabric takes care of executing it on all the machines you specify. Once you've used it a bunch of times you'll accumulate many "fab files" that you can re-use. You can use it to create large systems too. The API is simple but powerful.

The tutorial gives you a good idea of how it works: http://docs.fabfile.org/en/1.2.2/tutorial.html

It is written using the Paramiko module which is my favorite way to do SSH and SSH-like things from Python.

The Fabric homepage is: http://www.fabfile.org

Thanks to Joseph Kern for this tip!

Posted by Tom Limoncelli in Technical Tips

The Google flags parser (available for Python and C++) is very powerful. I use it for all my projects at work (of course) and since it has been open sourced, I use it for personal projects too.

While I support open source 100% I rarely get to submit much code into other people's projects (I contribute to documentation more than code... go figure). So, even though it is only a few lines of new code, I do want to point out that the 1.6 release of the Python library has actual code from me.

One of the neat features of this flags library is that you can specify a file to read the flags from. That is, if your command line is too long, you can stick all or some of the flags in a file and specify "--flagfile path/to/file.flags" to have them treated as if you put them on the command line. Imagine having one flags file that you use in production and another one that points the server to a test database using a higher level of debug verbosity and enabling beta features. You can specify multiple files even with overlapping flags and it does the right thing, keeping the last value.

My patch was pretty simple. I discovered, through a painful incident, that if the flagfile were silently skipped if they were not readable. No warning, no error message. (You can imagine that my discovery was during a frantic "why is this not working???" afternoon.). Anyway... now you get an error instead and the program stops (in python terms... it raises an exception). I think the unit tests are bigger than the actual code but I'm glad the patch was accepted.  I hope nobody was depending on this bug as a "feature". Seriously... nobody would turn off flags via "chmod 000 filename.flags", right? So far I haven't gotten any complaints.

Anyway... if you write code in C++ or Python I highly recommend you give gflags a try. Both are available under the New BSD License on Google Code:

Enjoy!

--Tom

Posted by Tom Limoncelli in Technical Tips

  1. On a Mac, if you SHIFT-CLICK the green dot on a window it opens it as wide and tall as possible (instead of the application-defined behavior)

  2. Even though "ls -l" displays a files permissions as "-rw-r--r--", you can't use "-rw-r--r--" in a chmod command. This is probably one of the most obvious but overlooked UI inconsistencies in Unix that nobody has fixed after all these years. Instead we force people to learn octal and type 0644. Meanwhile every book on Unix/Linux spends pages explaining octal just for this purpose. Time would have been better spent contributing a patch to chmod.

  3. If a network problem always happens 300 seconds after an event (like a VPN coming up or a machine connecting to the network) the problem is ARP, which has to renew every 300 seconds. Similarly, if it times out after exactly 2 hours, the problem is your routing system which typically expires routes after 2 hours of not hearing them advertised.

  4. Git rocks. I should have converted from SubVersion to Git years ago. Sadly I like the name SubVersion better. I hear Hg / Mercurial is better than Git, but Git had better marketing.

  5. Keep all your Unix "dot files" in sync with http://wiki.eater.org/ocd (and I'm not just saying that because my boss wrote it).

  6. People that use advanced Python-isms should not complain when I use features that have been in bash forever and, in fact, were in /bin/sh before most of us knew how to read.

  7. Years ago IETF started telling protocol inventors to avoid using broadcasts and use "local multicast" instead because it will help LAN equipment vendors scale to larger and larger LANs. If your LAN network vendor makes equipment that goes south when there is a lot of multicast traffic because it is "slow path'ed" through the CPU, remind them that They're Doing It Wrong.

  8. The best debugging tool in the world is "diff". Save the output /tmp/old. As you edit your code, write the output to /tmp/new then do "diff /tmp/old /tmp/new". When you see the change you want, you know you are done. Alternatively edit /tmp/old to look like the output you want. You've fixed the bug when diff has no output.

  9. Attend your local sysadmin conference. Regional conferences are your most cost effective career accelerator. You will learn technical stuff that will help you retain your job, do your job better, get promoted, or find a new job. Plus, you'll make local friends and contacts that will help you more than your average call to a vendor tech support line. There are some great ones in the Seattle and NJ/NY/Philly area all listed here.

True story:

My first job out of college we made our own patch cables. Usually we'd make them "on demand" as needed for a new server or workstation. My (then) boss didn't want to buy patch cables even though we knew that we weren't doing a perfect job (we were software people, eh?). Any time we had a flaky server problem it would turn out to be the cable... usually one made by my (then) boss. When he left the company the first policy change we made was to start buying pre-made cables.

That was during the days of Category 3 cables. You can make a Category 3 cable by hand without much skill. With Category 5 and 6 the tolerances are so tight that just unwinding a pair too far (for example, to make it easier to crimp) will result in enough interference that you'll see errors. It isn't just "having the right tools". An Ohm Meter isn't the right testing tool. You need to do a series of tests that are well beyond simple electrical connectivity.

That's why it is so important to make sure the cables are certified. It isn't enough to use the right parts, you need to test it to verify that it will really work. There are people that will install cable in your walls and not do certification. Some will tell you they certified it but they really just plug a computer at each end; that's not good enough. I found the best way to know the certification was really done is have them produce a book of printouts, one from each cable analysis. Put it in the contract: No book, no payment. (and as a fun trick... the next time you do have a flaky network connection, check the book and often you'll find it just barely passed. You might not know how to read the graph, but you'll see the line dip closer to the "pass" line than on the other graphs.)

If your boss isn't convinced, do the math. Calculate how much you are paid in 10 minutes and compare that to the price of the pre-made cable.

Posted by Tom Limoncelli in Technical Tips

I needed a way to backup a single server to a remote hard disk. There are many scripts around, and I certainly could have written one myself, but I found Duplicity and now I highly recommend it:

http://duplicity.nongnu.org

Duplicity uses librsync to generate incremental backups that are very small. It generates the backups, GPG encrypts them, and then sends them to another server by all the major methods: scp, ftp, sftp, rsync, etc. You can backup starting at any directory, not just at mountpoints and there is a full language for specifying files you want to exclude.

Installation: The most difficult part is probably setting up your GPG keys if you've never set them up before. (Note: you really, really, need to protect the private key. It is required for restores. If you lose your machine due to a fire, and don't have a copy of the private key somewhere, you won't be able to do a restore. Really. (I burned mine on a few CDs and put them in various hidden places.)

The machine I'm backing up is a virtual machine in a colo. They don't offer backup services, so I had to take care of it myself. The machine runs FreeBSD 8.0-RELEASE-p4 and it works great. The code is very portable: Python, GPG, librsync, etc. Nothing digs into the kernel or raw devices or anything like that.

I wrote a simple script that loops through all the directories that I want backuped, and runs:

duplicity --full-if-older-than 5W --encrypt-key="${PGPKEYID}" $DIRECTORY scp://myarchives@mybackuphost/$BACKUPSET$dir

The "--full-if-older-than 5W" means that it does an incremental backup, but a full back every 35 days. I do 5W instead of 4W because I want to make sure no more than 1 full backup happens every billing cycle. I'm charged for bandwidth and fear that two full dumps in the same month may put me over the limit.

My configuration: I'm scp'ing the files to another machine, which has a cheap USB2.0 1T hard disk. I set it up so that I can ssh from the source machine to the destination machine without need of a password ("PubkeyAuthentication yes"). In the example above "myarchives" is the username that I'm doing the backup to, and "mybackuphost" is the host. Actually I just specify the hostname and use a .ssh/config entry to set the default username to be "myarchives". That way I can specify "mybackuphost" in other shell scripts, etc. SSH aliases FTW!

Restores: Of course, I don't actually care about backups. I only care about restores. When restoring a file, duplicity figures out which full and incremental backups need to be retrieved and decrypted. You just specify the date you want (default "the latest") and it does all the work. I was impressed at how little thinking I needed to do.

After running the system for a few days it was time to do a restore to make sure it all worked.

The restore syntax is a little confusing because the documentation didn't have a lot of examples. In particular, the most common restore situation is not restoring the full backupset, but "I mess up a file, or think I messed it up, so I want to restore an old version (from a particular date) to /tmp to see what it used to look like."

What confused me: 1) you specify the path to the file (or directory) but you don't list the path leading up to the mountpoint (or directory) that was backuped. In hindsight that is obvious but it caught me. What saved me was that when I listed the files, they were displayed without the mountpoint. 2) You have to be very careful to specify where you put the backup set. You specify that on the command line as the source, and you specify the file to be restored in the "--file-to-restore" option. You can't specify the entire thing on the command line and expect duplicity to guess where to split it.

So that I don't have to re-learn the commands at a time when I'm panicing because I just deleted a critical file, I've made notes about how to do a restore. With some changes to protect the innocent, they look like:

Step 1. List all the files that are backuped to the "home/tal" area:

duplicity list-current-files scp://mybackuphost/directoryname/home/tal

To list what they were like on a particular date, add: --restore-time "2002-01-25"

Step 2. Restore a file from that list (not to the original place):

duplicity restore --encrypt-key=XXXXXXXX --file-to-restore=path/you/saw/in/listing scp://mybackuphost/directoryname/home/tal /tmp/restore

Assume the old file was in "/home/tal/path/to/file" and the backup was done on "/home/tal", you need to specify --file-to-restore as "path/to/file", not "/home/tal/path/to/file". You can list a directory to get all files. The /tmp/restore should be a directory that already exists.

To restore the files as of a particular date, add: --restore-time "2002-01-25"

Conclusion: Duplicity is a great piece of engineering. It is very fast, both because they make good use of librsync to make the backups small, but also because they store indexes of what files were backuped so that the entire backup doesn't have to be read just to get a file list. The backup files are small, split across many small files so that not a lot of temp space is required on the source machine. The tools are very easy to use: they do all the machinations about full and incremental sets, so you can focus on what to backup and what to restore.

Caveats: Like any backup system, you should do a "firedrill" now and then and test your restore procedure. I recommend you encapsulate your backup process in a shell script so that you do it the same way every time.

I highly recommend Duplicity.

http://duplicity.nongnu.org

Posted by Tom Limoncelli in Technical Tips

Google Forms

Someone asked me how I did my survey in a way that the data went to a Google spreadsheet automatically. The forms capability is built into the spreadsheet system. You can even do multi-page forms with pages selected based on previous answers.

More info here

Posted by Tom Limoncelli in Technical Tips

A coworker debugged a problem last week that inspired me to relay this bit of advice:

Nothing happens at "random times". There's always a reason why it happens.

I once had a ISDN router that got incredibly slow now and then. People on the far side of the router lost service for 10-15 seconds every now and then.

The key to finding the problem was timing how often the problem happened. I used a simple once-a-second "ping" and logged the times that the outages happened.

Visual inspection of the numbers turned up no clues. It looked random.

I graphed how far apart the outages happened. The graph looked pretty random, but there were runs that were always 10 minutes apart.

I graphed the outages on a timeline. That's where I saw something interesting. The outages were exactly 10 minutes apart PLUS at other times. I wouldn't have seen that without a graph.

What happens every 10 minutes and other times too? In this case, the router recalculated its routing table every time it got a route update. The route updates came from its peer router exactly every 10 minutes plus any time an ISDN link went up or down. The times I was seeing a 10-minute gap was when we went an entire 10 minutes with no ISDN links going up or down. With so many links, and the fact that they were home users intermittently using their connections, meant that it was pretty rare to go the full 10 minutes with no updates. However, by graphing it the periodic outages were visible.

I've seen other outages that happened 300 seconds after some other event: a machine connects to the network, etc. A lot of protocols do things in 300 second (5 minute) intervals. The most common is ARP: A router expires ARP entries every 300 seconds. Some vendors extend the time any time they receive a packet from the host, others expire the entry and send another ARP request.

What other timeouts have you found to be clues of particular bugs? Please post in the comments!

Posted by Tom Limoncelli in Technical Tips

CSS Positioning

I admit it. I use tables for positioning in HTML. It is easy and works.

However, I just read this excellent tutorial on CSS positioning and I finally understand what the heck all that positioning stuff means.

http://www.barelyfitz.com/screencast/html-training/css/positioning/

I promise not to use tables any more.

I highly recommend this tutorial.

Posted by Tom Limoncelli in Technical Tips

Being a long-time "vi" user I find that I am constantly surprised by the little (and not-so-little) enhancements vim has added. One of them is the "inner" concept.

Any vi user knows that "c" starts a c change and the next keystroke determines what will be changed. "cw" changes from where the cursor is until the end of the word. For example, "c$" chances from where the cursor is to the end of the line. Think of a cursor movement command, type it after "c" and you are pretty sure that you will change from where the cursor is to.... wherever you've directed.

"d" works the same way. "dw" deletes word. "d$" deletes to the end of the line. "d^" deletes to the beginning of the line ("^"? "$"? gosh, whoever invented this stuff must have known a lot about regular expressions).

VIM adds the concept of "inner text". Text is structured. We put things in quotes, in parenthesis, between mustaches (that's "{" and "}") and so on. The text between those two things are the "inner text".

So suppose we have:

<span style="clean">Interesting quote here.</span>

but we want to change the style from "clean" to "unruly". Move the cursor anywhere between the quotes and type ci then a quote (read that as "change inner quote"). VIM will seek out the opening and closing quotes that surround the cursor and the next stuff you type will replace it.

It works for all three kinds of quotes (single, double, and backtick), it works for all the various braces: ( { and <. You can type the opening or the closing brace, they both do the same thing.

Therefore you can move the cursor to the word "style" in the above example and type "ci<" to change everything within that tag.

I find this particularly useful when editing python code. I'm often using ci' to change a single quoted string.

If there is an "inner", you'd expect there is an "outer" too, right? (How many of you tried typing co" to see if it worked?) Well, there is an there isn't.

In VIM the opposite of "inner" is "block". A block is kind of special. It don't just include the opening and closing elements plus sometimes a the space or two that follow. Given this text:

  • The quick <span class="foo">>brown</span> fox.

If the cursor is in the <span> element, "cb<" will replace the entire element from the < all the way to the >. The whitespace after the element is also replaced for text-related things like change word (caw) and change sentence (cas).

Not having to move the cursor to the beginning of an element to change the entire thing is a great time saver. It is these little enhancements that makes using VIM so much more pleasant that using VI.

Give it a try!

More information about this is in the "Text Objects" section of Michael Jakl's excellent VIM tutorial.

--Tom

P.S. My second favorite thing about VIM? gVIM (The graphical version of VIM) preserves TABs when you use the windowing system to cut and paste.

Posted by Tom Limoncelli in Technical Tips

Remember when you were a little kid and had a clubhouse? Did you let someone in only if they knew "the secret knock"? Lately people have talked about various implementations for doing that with ssh. The technique, called "Port Knocking" permits SSH if someone has touched various ports recently. For example, someone has to ping the machine, then telnet to port 1234, then for the next 2 minutes they can ssh in.

This can be difficult to implement securely, as this video demonstrates: http://www.youtube.com/watch?v=9IrCgCKrv8U

IBM's Developerworks recently posted an article about tightening SSH security. The topic also came up on the mailing list for the New Jersey LOPSA chapter.

I had an idea that I haven't seen published before.

I have a Unix (FreeBSD 8.0) system that is live on the open internet and it is so locked down that I don't permit passwords. To SSH to the machine you have to pre-arranged to set up SSH keys for "passwordless" connections. However, it does not run a firewall because it is literally running with no ports open (except ssh). There is nothing to firewall.

Problem: What if I am stuck and need to log in remotely with a password?

Most of the portknocking techniques I've seen leverage the firewall running on the system. I didn't want to enable a firewall, so I came up with this.

Idea #1: A CGI script to grant access.

Connect to a particular URL, it runs SSH on port 9999 with a special configuration file that permits passwords:

/etc/ssh/sshd_config:

PasswordAuthentication no
PermitEmptyPasswords no
PermitRootLogin no
UsePAM no

/etc/ssh/sshd_config-port9999:

Port 9999
AllowAgentForwarding no
AllowTcpForwarding no
GatewayPorts no
LoginGraceTime 30
MaxAuthTries 3
X11Forwarding no
PermitTunnel no
PasswordAuthentication no
PermitEmptyPasswords no
PermitRootLogin yes
UsePAM yes

Translation: If someone is going to get special access on port 9999, they can't use it to set up tunnels or gateways. It is just for either quick access; enough to fix your SSH keys.

The CGI script is essentially runs:

/usr/sbin/sshd -p 9999 -d

Which permits a single login session on port 9999.

Try #2:

FreeBSD defaults to an inetd that uses Tcpwrappers.

So, try #2 was similar to #1 but appends info to /etc/hosts.allow so that the person has to come from the same IP address as the recent web connection. The problem with that is sometimes people connect to the web via proxies, and adding the proxy to the hosts.allow list isn't going to help.

Try #3:

We all know that you can't run two daemons on the same port number, right? Wrong.

You can have multiple daemons listening on the same port number if they are listening on different interfaces. If two conflict, the connection goes to the "most specific" listening daemon.

What does that mean? That means you can have sshd with one configuration listening on port .22 (any interface, port 22) and another listening on 10.10.10.10.22 (port 22 of the interface configured for address 10.10.10.10). But I only have one interface, you say? I disagree. You have 127.0.0.1 plus your primary IP address, plus any IPv6 addresses. Heck, even if you really only had one IP address, "" and a specific address can both be listening to port 22 at the same time.

That's what the "*" on "netstat -l" means. "Any interface."

So, back to our port knocking configuration.

The normal port 22 sshd runs with a configuration that disables all passwords (only permits SSH keys).

/etc/ssh/sshd_config:

Port 22
ListenAddress 0.0.0.0
ListenAddress ::
PAMAuthenticationViaKBDInt no
PasswordAuthentication no
PermitEmptyPasswords no
PermitRootLogin no
UsePAM no

And the CGI script enables a sshd with this configuration:

/etc/ssh/sshd_config-permit-passwords:

Port 22
ListenAddress 64.23.178.12
ListenAddress fe80::5154:ff:fe25:1234
PAMAuthenticationViaKBDInt no
PasswordAuthentication no
PermitEmptyPasswords no
PermitRootLogin no
UsePAM yes

The wrapper simply runs:

/usr/sbin/sshd -d -f /etc/ssh/sshd_config-permit-passwords

That's all there is to it!

Posted by Tom Limoncelli in Technical Tips

Google Chrome supports multiple profiles. The feature is just hidden until it is ready for prime-time. It is really easy to set up on the Linux and Windows version of Chrome. On the Mac it takes some manual work.

I'm sure eventually the Mac version will have a nice GUI to set this up for you. In the meanwhile, I've written a script that does it:

chrome-with-profile-1.0.tar.gz

Tom

xed 2.0.2 released!

xed is a perl script that locks a file, runs $EDITOR on the file, then unlocks it.

It also checks to see if the file is kept under RCS control. If not, it offers to make it so. RCS is a system that retains a history of a file. It is the predecessor to GIT, SubVersion, CVS and such. It doesn't store the changes in a central repository; it comes from a long-gone era before servers and networks. It simply stores the changes in a subdirectory called "RCS" in the same directory as the file. (and if it can't find that directory, it puts the information in the same directory as the file: named the same as the file with ",v" at the end.)

[More about this little-known tool after the jump.]

Posted by Tom Limoncelli in Technical Tips

I wrote about upgrading to IPv6 in the past, but I have more to say.

The wrong way: I've heard a number of people say they are going to try to convert all of their desktops to IPv6. I think they are picking the wrong project. While this is a tempting project, I think it is a mistake (well-intentioned, but not a good starter project). Don't try to convert all your desktops and servers to IPv6 as your first experiment. There's rarely any immediate value to it (annoys management), it is potentially biting off more than you can chew (annoys you), and mistakes affect people that you have to see in the cafeteria every day (annoys coworkers).

Instead copy the success stories I've detailed below. Some use a "outside -> in" plan, others pick a "strategic value".

Story 1: Work from the outside -> in

The goal here is to get the path from your ISP to your load balancer to use IPv6; let the load balancer translate to IPv4 for you. The web servers themselves don't need to be upgraded; leave that for phase 2.

It is a small, bite-sized project that is achievable. It has a real tangible value that you can explain to management without being too technical: "the coming wave of IPv6-only users will have faster access to our web site. Without this, those users will have slower access to our site due to the IPv4/v6 translaters that ISPs are setting up as a bandaid.". That is an explanation that a non-technical executive will understand.

It also requires only modest changes: a few routers, some DNS records, and so on. It is also a safe place to make changes because your external web presence has a good dev -> qa -> production infrastructure that you can leverage to test things properly (it does, right?).

Technically this is what is happening:

At many companies web services are behind a load balancer or reverse proxy.

ISP -> load balancer -> web farm

If your load balancer can accept IPv6 connections but send out IPv4 connections to the web farm, you can offer IPv6 service to external users just by enabling IPv6 the first few hops into your network; the path to your load balancer. As each web server becomes IPv6-ready, the load balancer no longer needs to translate for that host. Eventually you're entire web farm is native IPv6. Doing this gives you a throttle to control the pace of change. You can make small changes; one at a time; testing along the way.

The value of doing it this way is that it gives customers IPv6 service early, and requires minimal changes on your site. We are about 280 days away from running out of IPv4 addresses. Around that time ISPs will start to offer home ISP service where IPv6 is "normal" and attempts to use IPv4 will result in packets being NATed at the carrier level. Customers in this situation will get worse performance for sites that aren't offering their services over IPv6. Speed is very important on the web. More specifically, latency is important.

[Note: Depending on where the SSL cert lives, that load balancer might need to do IPv6 all the way to the frontends. Consult your load balancer support folks.]

Sites that are offering their services over IPv6 will be faster for new customers. Most CEOs can understand simple, non-technical, value statements like, "new people coming onto the internet will have better access to our site" or "the web site will be faster for the new wave of IPv6-only users."

Of course, once you've completed that and shown that the world didn't end, developers will be more willing to test their code under IPv6. You might need to enable IPv6 to the path to the QA lab or other place. That's another bite-sized project. Another path will be requested. Then another. Then the desktop LAN that the developer use. Then it makes sense to do it everywhere. Small incremental roll-outs FTW!

During Google's IPv6 efforts we learned that this strategy works really well. Most importantly we've learned that it turned out to be pretty easy and not expensive. Is IPv6 code in routers stable? Well, we're now sending YouTube traffic over IPv6. If you know of a better load test for the IPv6 code on a router, please let me know! (Footnote: "Google: IPv6 is easy, not expensive Engineers say upgrading to next-gen Internet is inexpensive, requires small team")

Story 2: Strategic Upgrades

In this story we are more "strategic".

Some people run into their boss's office and say, "OMG we have to convert everything to IPv6". They want to convert the routers, the DNS system, the DHCP system, the applications, the clients, the desktops, the servers.

These people sound like crazy people. They sound like Chicken Little claiming that the sky is falling.

These people are thrown out of their boss's office.

Other people (we'll call these people "the successful ones") go to their boss and say, "There's one specific thing I want to do with IPv6. Here's why it will help the company."

These people sound focused and determined. They usually get funding.

Little does the boss realize that this "one specific thing" requires touching many dependencies. That includes the routers, the DNS system, the DHCP system, and so on. Yes, the same list of things that the "crazy" person was spouting off about.

The difference is that these people got permission to do it.

According to a presentation I saw them give in 2008, Comcast found their 'one thing" to be: Settop box management. Every settop box needs an IP address so they can manage it. That's more IP addresses than they could reasonably get from ARIN. So, they used IPv6. If you get internet service from Comcast, the settop box on your TV set is IPv6 even though the cable modem sitting next to it providing you internet service is IPv4. They had to get IPv6 working for anything that touches the management of their network: provisioning, testing, monitoring, billing. Wait, billing? Well, if you are touching the billing system, you are basically touching a lot of things. Ooh, shiny dependencies. (There used to be a paper about this at http://www.6journal.org/archive/00000265/01/alain-durand.pdf but the link isn't working. I found this interview with the author but not the original paper.)

Nokia found their "one thing" to be: power consumption. Power consumption, you say? Their phones waste battery power by sending out pings to "keep the NAT session alive". By switching to IPv6 they didn't need to send out pings. No NAT, no need to keep the NAT session alive. Their phones can turn off their antenna until they have data to send. That saves power. In an industry where battery life is everything, any CxO or VP can see the value. A video from Google's IPv6 summit details Nokia's success in upgrading to IPv6.

Speaking of phones, T-Mobile's next generation handset will be IPv6-only. Verizon's LTE handsets are required to do IPv6. If you have customers that access your services from their phone, you have a business case to start upgrading to IPv6 now.

In the long term we should be concerned with converting all our networks and equipment to IPv6. However the pattern we see is that successful projects have picked "one specific thing to convert", and let all the dependencies come along for the ride.

Summary:

In summary, IPv6 is real and real important. We are about a year away from running out of IPv4 addresses at which point ISPs will start offering IPv6 service with translation for access to IPv4-only web sites. Successful IPv6 deployment projects seem to be revealing two successful patterns and one unsuccessful pattern. The unsuccessful pattern is to scream that the sky is falling and ask for permission to upgrade "everything". The sucessful patterns tend to be one of:

  • Find one high-value (to your CEO) reason to use IPv6: There are no simple solutions but there are simple explanations. Convert just that one thing and keep repeating the value statement that got the project approved. There will be plenty of dependencies and you will end up touching many components of your network. This will lead the way to other projects.
  • Work 'from the outside -> in": A load balancer that does IPv6<->IPv4 translation will let you offer IPv6 to external customers now, gives you a "fast win" that will bolster future projects, and gives you a throttle to control the speed at which services get native support.

I'd love to hear from readers about their experiences with convincing management to approve IPv6 projects. Please post to the comments section!

-Tom

P.S. Material from the last Google IPv6 conference is here: http://sites.google.com/site/ipv6implementors/2010/agenda

Posted by Tom Limoncelli in Technical Tips

A friend of mine who is an old-time Unix/Linux user asked me for suggestions on how to get used to Mac OS X.

The first mistake that Unix users make when they come to OS X is that they try to use X Windows (X11) because it is what they are used to. My general advice: Don't use X windows. Switching between the two modes is more work for your hands. Stick with the non-X11 programs until you get used to them. Soon you'll find that things just "fit together" and you won't miss X11.

Terminal is really good (continued lines copy and paste correctly! resize and the text reformats perfectly!). I only use X windows when I absolutely have to. Oh, and if you do use X11 and find it to be buggy, install the third-party X replacement called XQuartz (sadly you'll have to re-install it after any security or other updates)

Now that I've convinced you to stick with the native apps, here's why:

  1. pbcopy <file

Stashes the contents of "file" into your paste buffer.

  1. pbpaste >file

Copies the paste buffer to stdout.

  1. pbpaste | sed -e 's/foo/bar/g' | pbcopy

Changes all occurances of "foo" to "bar" in the paste buffer.

  1. "open" emulates double-clicking on an icon.

    open file.txt

If you had double-clicked on file.txt, it would have bought it up in TextEdit, right? That's what happens with "open file.txt". If you want to force another application, use "-a":

open -a /Applications/Microsoft\ Office\ 2008/Microsoft\ Word.app file.txt

Wonder how to start an ".app" from Terminal? Double click it:

open /Applications/Microsoft\ Office\ 2008/Microsoft\ Word.app

Want to find a directory via "cd"ing around on the Terminal, but once you get there you want to use the mouse?

cd /foo/bar/baz
open .

I use this so much I have an alias in my .bash_profile:

alias here="open ."

Now after "find"ing and searching and poking around, once I get to the right place I can type "here" and be productive with the mouse.

  1. Want to use a Unix command on a file you see on the desktop? Drag the icon onto the terminal.

type: od (space) -c (space)

Then drag an icon onto that Terminal. The entire path appears on the command line. If the path has spaces or other funny things the text will be perfectly quoted.

  1. Dislike the File Open dialog? Type "/" and it will prompt you to type the path you are seeking. Name completion works in that prompt. Rock on, File Dialog!

  2. Word processors, spread sheets, video players and other applications that work with a file put an icon of that file in the title bar. That isn't just to be pretty. The icon is useful. CMD-click on it to see the path to the file. Select an item in that path and that directory is opened on the Desktop.

That icon in the title bar is draggable too! Want to move the file to a different directory? You don't have to poke around looking for the source directory so you can drag it to the destination directory. Just drag the icon from the title bar to the destination directory. The app is aware of the change too. Lastly, drag the icon from the title bar into a Terminal window. It pastes the entire path to the file just like in tip 5.

  1. If you want to script the things that Disk Util does, use "hdiutil" and "diskutil". You can script ripping a DVD and burning it onto another one with "hdiutil create" then "diskutil eject" then "hdiutil burn".

  2. rsync for Mac OS has an "-E" option that copies all the weird Mac OS file attributes including resource forks. ("rsync -avPE src host:dest")

  3. "top" has stupid defaults. I always run "top -o cpu". In fact, put this in your profile:

    alias top='top -o cpu'

  4. For more interesting ideas, read the man pages for:

    screencapture mdutil say dscl dot_clean /usr/bin/util pbcopy pbpaste open diskutil hdiutil

Enjoy!

P.S. others have recommended this list: http://superuser.com/questions/52483/terminal-tips-and-tricks-for-mac-os-x

I try not to use this blog to flog my employer's products but I just used the open source "Google Command Line" program and I have to spread the word... this really rocked.

I wanted to upload a bunch of photos to Picasa. I didn't want to sit there clicking on the web interface to upload each one, I didn't want to import them into iPhoto and then use the Picasa plug-in to upload them. I just wanted to get them uploaded.

Google CL to the rescue! It is a program that lets you access many google properties from the command line. It works on Mac, Linux and Windows. After a (rather simple) install process I was ready to give it a try.

Here's the command line that I typed:

$ google picasa create --title "2010-08-09-Hobart-Tasmania-SAGE-AU" ~/Desktop/PHOTOS-AU/*

I was expecting it to ask me for a username and password but I was surprised when it my web browser popped up, asked me to authorize this script to have permission to log in (just like third-party apps that authenticate against Google), and when I was back at the command line I pressed "return" to continue. The upload began and finished a few minutes later.

In addition to picasa, the command can also access blogger, youtube, docs, contacts and calendar.

Posted by Tom Limoncelli in Technical Tips

Google App Inventor

At the SAAD-NYC event last night I explained how Google App Inventor lets you make apps for Android phones without knowing how to program. It was beta tested "mainly in schools with groups that included sixth graders, high school girls, nursing students and university undergraduates who are not computer science majors."

He said, "Why haven't you written about this amazing thing on your blog?"

I dunno! So here. I'm mentioning it now.

(I think the NY Times article is the best overview.)

Happy, Jim?

Posted by Tom Limoncelli in Technical Tips

Oh that's how they get such amazing speed on a web server! http://www.eecs.harvard.edu/~mdw/papers/seda-sosp01.pdf

In the future, all servers will work like this.

Well worth reading.

Posted by Tom Limoncelli in Technical Tips

(Reposting this announcement from Dan)

Fellow SysAds etc.-

First, I'd like to make sure you are all aware of the Configuration Management Summit next week in Boston on June 24 (details are at http://www.usenix.org/events/config10/). The first Configuration Management Summit aims to bring together developers, power users, and new adopters of open source configuration management tools for automating system administration. Configuration management is a growth area in the IT industry, and open source solutions, with cost savings and an active user community, are presenting a serious challenge to today's "big vendor" products. Representatives from Bcfg2, Cfengine, Chef, and Puppet will all be participating in the summit - this will be a valuable opportunity if you have been contemplating a configuration management solution for your systems.

There is also a special one-day training on Cfengine being taught by Mark Burgess on June 25 (details are at http://www.usenix.org/events/config10/#tut_cfengine). This class might be a review session for anyone on this mailing list, but it will also offer useful insights for people who are not new to Cfengine. Additionally, If you have colleagues who need to come up to speed on Cfengine quickly, this class will be an excellent opportunity for them to learn Cfengine directly from the author.

If you are interested in either event, you can register at http://www.usenix.org/events/confweek10/registration/ (and if you have questions, you can email me directly). I hope to see you in Boston!

Daniel Klein
Education Director
USENIX

Lance Albertson wrote up a great description of how Ganeti Virtualization Manager performed under pressure during a power outage:

Nothing like a power outage gone wrong to test a new virtualization cluster. Last night we lost power in most of Corvallis and our UPS & Generator functioned properly in the machine room. However we had an unfortunate sequence of issues that caused some of our machines to go down, including all four of our ganeti nodes hosting 62 virtual machines went down hard. If this had happened with our old xen cluster with iSCSI, it would have taken us over an hour to get the infrastructure back in a normal state by manually restarting each VM.

But when I checked the ganeti cluster shortly after the outage, I noticed that all four nodes rebooted without any issues and the master node was already rebooting virtual machines automatically and fixing all of the DRBD block devices.

Ganeti is a management layer that makes it easy to set up large clusters of Xen or KVM (or other) virutalized machines. He has written a great explanation of what is Ganeti and its benefits too.

I use Ganeti for tons of projects at work.

Posted by Tom Limoncelli in Technical Tips

Dear readers in the United States,

I'm sorry. I have some bad news.  That tiny computer closet that has no cooling will overheat next weekend.

Remember that you aren't cooling a computer room, you are extracting the heat.  The equipment generates heat and you have to send it somewhere. If it stays there, the room gets hotter and hotter.

For the past few months you've been lucky.  That room benefited from the fact that the rest of the building was relatively chilly. The heat was drawn out to the rest of the building. During the winter, each weekend the heat was turned off (or down) and your uninsulated computer room leaked heat to the rest of the building. Now it's springtime, nearly summer.  The building A/C is on during the week. When it shuts down for the weekend the building is hot; hotter than your computer room.  The leaking that you were depending on is not going to happen.

Last weekend the temperature of your computer room got warm on Saturday and hot on Sunday. However, it was ok.

This weekend it will get hot on Saturday and very hot on Sunday. It will be ok.

However, next weekend is Memorial Day weekend. The building's cooling will be off for three days. Saturday will be hot. Sunday will very very hot.  Monday will be hot enough to kill a server or two.

If you have some cooling, Monday you'll discover that it isn't enough.  Or the cooling system will be overloaded and any weak, should-have-been-replaced, fan belts will finally snap.

How do we get into this situation?

Telecom closets don't have any cooling because they have no active components. It's just a room where wires connect to wires. That changed in the 1990s when phone systems changed. Now that telecom closet has a PBX, and an equipment rack.  If there is an equipment rack, why not put some PC servers into it? If there is one rack, why not another rack? By adding one machine at a time you never realize how overloaded the system has gotten.

Even if you have proper cooling, I bet you have more computers in that room than you did last year.

So what can you do now to prevent this problem?
  • Ask your facilities person to check the belts on your cooling system.
  • Set up monitoring so you'll be alerted if the room gets above 33 degrees C. (You probably don't have time to buy a environmental monitor, but chances are your router and certain servers have a temperature gauge on or near the hottest part of the equipment. It is most likely hotter than 33 degrees C during normal operation, but you can detect if it goes up relative to a baseline.)
  • Clean (remove dust from) the air vent screens, the fans, and any drives. That dust makes every mechanical component work harder. More stress == more likely to break.
  • Inventory the equipment in the room and shut off the unused equipment (I bet you find at least one server)
  • Inventory the equipment and rank by priority what you can power off if the temperature gets too high.
If you do have a system that overheats, remember that you can buy or rent temporary cooling systems very easily.

I don't generally make product endorsements, but at a previous company we had an overheating problem and it was cheaper and faster to buy a Sunpentown 9000 BTU unit at Walmart than to wait around for a rental. In fact, it was below my spending limit to purchase two and tell the CFO after the fact. I liked the fact that it self-evaporated the water that accumulated; I needed to exhaust hot air, not hot air and water.

Most importantly, be prepared. Have monitoring in place. Have a checklist of what to shut down in what order.

Good luck! Stay cool!

Tom

P.S. I wrote about this 2 years ago.

Posted by Tom Limoncelli in Technical Tips

Posted by Tom Limoncelli in Technical Tips

Posted by Tom Limoncelli in Technical Tips

Previously I wrote about the Google Apps shortname service which lets you set up a tinyurl service for your enterprise.

The article implies that the service can be used without using the FQDN. This is not true. In other words, I had said that "go.example.com/lunch" could be shortened to "go/lunch".

There is a workaround that makes it work. It is difficult to configure, but I've set up a Community Wiki on ServerFault.com that explains all the steps. As a wiki, I hope people can fill in the items I left blank, particularly specific configuration snippets for ISC BIND, Windows DHCP server, Linux DHCP clients, and so on.

The new article is here: How to set up Google ShortName service for my domain, so that the FQDN isn't needed

Posted by Tom Limoncelli in Technical Tips

Update 2010-01-26: There is a follow-up article to this here

Update 2009-12-20: Enabling the service wasn't working for a few days. It is now working again. It does not require Premier service. Any Google Apps customer should be able to use it.

Where I work we have a service called "go" which is a tinyURL service. The benefit of it being inside our domain is huge. Since "go" (the shortname) is found in our DNS "search path", you can specify "go" links without entering the FQDN.

That means we can enter "go/payroll" in your browser to get to the payroll system and "go/lunchmenu" to find out what's for lunch today. That crazy 70-char long URL that is needed to get to that third-party web-based system we use? I won't name the vendor, but let me just say that I now get there via "go/expense".

Posted by Tom Limoncelli in Technical Tips

SysAdvent has begun!

SysAdvent has started its second year.  SysAdvent is a project to count down the 24 days leading to Christmas with a sysadmin tip each day.  Last year Jordan Sissel wrote all 24 days (amazing job, dude!). This year he has enlisted guest bloggers to help out. You might see a post of mine one of these days.

While I don't celebrate the holiday that the event is named after, I'm glad to participate.

Check out this and last year's postings on the SysAdvent Blog: sysadvent.blogspot.com


Last week I mentioned that that if you have a service that requires a certain SLA, it can't depend on things of lesser SLA.

My networking friends balked and said that this isn't a valid rule for networks. I think that violations of this rule are so rare they are hard to imagine. Or, better stated, networking people do this so naturally that it is hard to imagine violating this rule.

However, here are 3 from my experience:

Matt Simmons interviews me about "Design Patterns for System Adminsitrators".

This is a tutorial that I've never taught before. You can see it first at LISA 2009 in November.

In case you missed it, Matthew Sacks interviewed me about my other LISA tutorial. That tutorial also has a lot of new material.

Sysadmins have a love-hate relationship with shared libraries. They save space, they make upgrades easier, and so on.  However, they also cause many problems.  Sometimes they cause versioning problems (Windows DLLs), security problems, and (at least when they were new) performance problems.  I won't go into detail, just mention them on a technical email list and you'll get an earful.

Here's one example that hits me a lot. On my Linux box, if I run an update of Firefox, my current Firefox browser keeps running. However, the next time it needs to load a shared library, it is now loading the upgraded version which is incompatible and my Firefox goes bonkers and/or crashes. On the Mac and Windows this doesn't happen because the installer waits for you to close any Firefox instances before continuing.

Google Chrome browser does its updates in the background while you use it. The user doesn't have to wait for any painful upgrade notification. Instead, the next time they run Chrome they are simply told that they are now running the newest release. I call this a "parent-friendly" feature because the last time I visited my mom much of her software had been asking to be upgraded for months.  I wish it could have just upgraded itself and kept my mom's computer more secure. ACM has an article by the Chrome authors about why automatic upgrades are a key security issue. (with graphs of security attacks vs. upgrade statistics)

However, if Google Chrome upgrades itself in place, how does it keep running without crashing? Well, it turns out, they use a technique called the LinuxZygote.  The libraries they need are loaded at startup into a process which then fork()s any time they need, for example, a renderer. The Zygote pattern is usually done for systems that have a slow startup time. However, they claim that in their testing there was no performance improvement. They do this to make the system more stable.

Read the (brief) article for more info: LinuxZygote


Posted by Tom Limoncelli in Technical Tips

(I'm setting up Debian PPC on an old PowerBook G4.)

The installation went really well.  I downloaded the stable 5.0.2 DVD image, burned it onto a DVD from my Mac (note: Safari warned that the file system might be corrupted, but I ran "md5" on the .iso and the output matched what the web site said it should be) and it booted without incident and I was able to go through the entire installation without fail.  I am cheating a little since I'm not doing a multi-boot.  I hear that is more difficult.

When the machine booted the first time I was able to log in!  Sadly, the touchpad wasn't working, and there was only so much I could do from the keyboard.

Using TAB and SPACEBAR I was able to navigate around a little.  Sometimes I would get into a corner where TAB nor SPACEBAR was really helpful.

Luckily you can always log out of an X11 session by pressing CTRL-OPTION-BACKSPACE. Warning: this zaps the entire X11 window session.  All your apps are instantly killed. You are logged out.  Don't press it unless you mean it.  (And, yes, the keyboard sequence is an homage to CTRL-ALT-DEL).  While this wasn't the best option, sometimes it was all I had.

To fix these problems I thought the best thing to do would be to SSH to it from another machine.  The default Debian configuration doesn't include openssh-server, just the -client.  This is wise from a security standpoint, but wasn't helping me fix the machine.

From the initial login screen I was able to set up a "Failsafe" xterm window.  From there I could become root.  "apt-get install ssh" tried to the right thing, but it couldn't get access the DVD drive.

"ls /dev" wasn't showing very much.  No /dev/sd* or hd* or sr0 (CD-ROM) at all.  This was distressing.  My touchpad wasn't working, my CD-ROM (well, DVD) wasn't showing up.

I couldn't load new packages if the DVD didn't work.  I couldn't fix the machine if I couldn't SSH in.  Ugh.

I searched a lot of web sites for information about how to fix this and nearly gave up.

Finally I remember that in the old days zapping the "PRAM" fixed a heck of a lot of problems.  The PRAM is a battery-backed bit of RAM (or NVRAM) that stores a few critical settings like boot parameters and such.  To zap the PRAM, you boot while holding these four keys: Command, Option, P and R.  It takes some practice.

After zapping the PRAM Debian booted and the mouse and touchpad magically worked.  When I logged in, I could see that the DVD was working.  "apt-get install ssh" worked without a hitch.  The DVD had automatically been detected and mounted.  I was impressed!

"ls /dev" now showed many, many more devices.

Later I installed SSH ("apt-get install ssh"), configure my SSH keys so I can log in easily from my primary computer, and even added the Ethernet MAC address to my DHCP server so that it always gets the same IP address.

To be honest, I don't know if zapping the PRAM fixed it or it was the reboot.  udevd may not have started (I forgot to check).  Either way, I was very happy that things worked.  I started up a web browser, went to www.google.com and when it came up it felt like home.


Posted by Tom Limoncelli in Technical Tips

You know that here at E.S. we're big fans of monitoring.  Today I saw on a mailing list a post by Erinn Looney-Triggs who wrote a module for Nagios that uses dmidecode to gather a Dell's serial number then uses their web API to determine if it is near the end of the warantee period.  I think that's an excellent way to prevent what can be a nasty surprise.

Link to the code is here: Nagios module for Dell systems warranty using dmidecode

What unique things do you monitor for on your systems?

Posted by Tom Limoncelli in Technical Tips

Google has enabled IPv6 for most services but ISPs have to contact them and verify that their IPv6 is working properly before their users can take advantage of this.

I'm writing about this to spread the word.  Many readers of this blog work at ISPs and hopefully many of them have IPv6 rolled out, or are in the process of doing so.

Technically here's what happens:  Currently DNS lookups of www.google.com return A records (IPv4), and no AAAA records (IPv6).  If you run an ISP that has rolled out IPv6, Google will add you (your DNS servers, actually) to a white-list used to control Google's DNS servers.  After that, DNS queries of www.google.com will return both an A and AAAA record(s).

What's the catch?  The catch is that they are enabling it on a per-ISP basis. So, you need to badger your ISP about this.

Why not just enable it for all ISPs?  There are some OSs that have default configurations that get confused if they see an AAAA record yet don't have full IPv6 connectivity.  In particular, if you have IPv6 enabled at your house, but your ISP doesn't support IPv6, there is a good chance that your computer isn't smart enough to know that having local IPv6 isn't the same as IPv6 connectivity all the way across the internet.  Thus, it will send out requests over IPv6 which will stall as the packets get dropped by the first non-IPv6 router (your ISP).

Thus, it is safer to just send AAAA records if you are on an ISP that really supports IPv6.  Eventually this kind of thing won't be needed, but for now it is a "better safe than sorry" measure.  Hopefully if a few big sites do this then the internet will become "safe" for IPv6 and everyone else won't need to take such measures.

If none of this makes sense to you, don't worry. It is really more important that your ISP understands.  Though, as a system administrator it is a good idea to get up to speed on the issues.  I can recommend 2 great books:
The Google announcement and FAQ is here: Google announces "Google over IPv6". Slashdot has an article too.
Everyone from Slashdot to people I talk with on the street are shocked, shocked, shocked, by the report in the New York Times that TXTing costs carriers almost nothing, even though they've been raising the price dramatically.  (SMS is "Short Message Service", the technical name for what Americans call "TXTing" and what the rest of the world calls "SMS".)

People have asked me, "Is this true?" (it is) so I thought this would be a good time to explain how all of this works.

The phone system uses a separate network for "signaling" i.e. messages like "set up a phone call between +1-862-555-1234 and  +353(1)555-1234".  The fact that it is a separate network is for security.  When signally was "in band" it was possible for phone users to play the right tones and act just like an operator (see Phreaking).  It is also for speed reasons; one wants absolute priority for signaling data.

The protocol is called "SS7" (Signaling System 7).  Like most teleco protocols it is difficult to parse and ill-defined.  This is how telcos keep new competition from starting.  They hype SS7 as something so complicated that only rocket scientists could ever understand it.  Of course, it is an ITU standard, so it isn't a secret how it works.  You just have to pay a lot of money to get a copy of the standard. In fact, once Cisco had a working SS7 software stack the downfall of Lucent/AT&T/others was only years away.  Heck, Cisco published a book demystifying SS7.  It turns out the emperor had no clothes and Cisco wanted everyone to know.  SS7 is big and scary, but only as bad as most protocols. I guess SMTP or SNMP would be scary too if you had never seen a protocol before. (Remember that non-audio networks are still "new" to the telecom world, or at least their executives.)

SS7 is all about setting up "connections".  When I dial a number, SS7 packets are sent out that query databases to translate the phone number I want to dial to a physical address to connect to, then an SS7 query goes out to request that all the phone switches from point A to point B allocate bandwidth and start letting audio through.  The nomenclature dates back to what was used when phone calls were set up by ladies sitting in front of switchboards.

What makes international dialing work is that there are SS7 gateways between all the carriers.  They don't charge each other for this bandwidth because it is just the cost of doing business.  The logs of what calls are actually made is used to create billing records, and the carrier do charge each other for the actual calls.  Thus, there is no charge for the SS7 packets between AT&T and O2 (O2 is a big cell provider in Europe), but O2 does back-bill AT&T for the phone call that was made. (This is called "Settlement" and my previous employer processed 80% of the world's settlement records on behalf of the phone companies.)

Setting up a connection for an SMS would be silly.  An entire connection for just a 160-byte message?  No way.  That's more trouble than it is worth.  Therefore, SMS is the only service where the actual service is provided over SS7.  The 160-byte limit comes from a limit in SS7 packet size.

However, the phone companies don't really do anything for free.  The SMS records are used to construct billing data and the companies certainly do back-bill each other for SMS carried by each other's networks.  If you SMS from AT&T to O2, there is settlement going on after the fact. However, SMS between two AT&T customers has no real cost.

"Multimedia SMS" (photos) are not sent over SS7, though SS7 is used to setup/teardown the connection just like a phone call.  If they were smart they'd use SS7 to just transmit an email address and then send the photo over the internet.  It would probably be cheaper.  (Though, when has a telco has a well-run email system?  Sigh.)

So, SMS is "free" because it rides on the back of pre-existing infrastructure.  The "cost" is due to the false economics created to "extract value" out of the system (i.e. "charge money").

If they were doing it all from scratch, they could probably run it all over the internet for "free" too.  Heck, it wouldn't be much bandwidth even if people learned to type 100x faster.

Why was SMS permitted to use SS7 unlike any other service? The real reason, I'm told, wasn't entirely technical.  It was due to the fact that the telecos thought that nobody would actually use the service. Little did they know that it would catch on among teens and then spread!

More info:

Posted by Tom Limoncelli in Technical Tips

Amazon's Kindle

I got a demo of Amazon's Kindle the other day and was very impressed. I hadn't realized that it had a built-in cellphone-based data connection so you could always download more content. The speed was a little slow, but for reading a book I think it was perfect. I'm considering getting one.

Today I got email from Amazon reminding me that if I shill for them on my blog, readers can get a $100 discount. You just have to apply for an Amazon credit card and use this link.

Do I feel bad about shilling for Amazon? Well, not if it gets my readers a $100 discount. It is a product that friends of mine are happy with and I'm impressed by the demos I've seen.

Posted by Tom Limoncelli in Technical Tips

April Showers bring May Flowers. What does May bring? Three-day weekends that make A/C units fail!

This is a good time to call your A/C maintenance folks and have them do a check-up on your units. Check for loose or worn belts and other problems. If you've added more equipment since last summer your unit may now be underpowered. Remember that if your computers consume 50Kw of power, your A/C units should be using about the same (or more) to cool those computers. That's the laws physics speaking, I didn't invent that rule. The energy it takes to create heat equals the energy required to remove that much heat.

Why do A/C units often fail on a 3-day weekend? During the week the office building has its own A/C. The computer room's A/C only has to remove the heat generated by the equipment in the room. On the weekends the building's A/C is powered off and now the 6 sides (4 walls, floor and ceiling) of the computer room are getting hot. Heat seeps in. Now the computer room's A/C unit has more work to do.

A 3-day weekend is 84 hours (Friday 6pm until Tuesday 6am). That's a lot of time to be running continuously. Belts wear out. Underpowered units overheat and die. Unlike a home A/C unit which turns on for a few minutes out of every hour, a computer-room A/C unit ("industrial unit") runs 12-24 hours out of every day. Industrial cooling costs more because it is an entirely different beast. Try waving your arms for 5 minutes per hour vs. 18 hours a day.

Most countries have a 3-day weekend in May. By the 2nd or 3rd day the A/C unit is working as much as a typical day during the summer. If it is about to break, this is the weekend it will break.

To prevent a cooling emergency make sure that your monitoring system is also watching the heat and humidity of your room. There are many SNMP-accessible units for less than $100. Dell recommends machines shouldn't run in a room that is hotter than 35c. I generally recommend that your monitoring system alert you at 33c; if you see now sign of it improving on its own in the next 30 minutes, start powering down machines. If that doesn't help, power them all off. (The Practice of System and Network Administration has tips about creating a "shutdown list"). Having the ability to remotely power off machines can save you a trip to the office. Most Linux systems have a "poweroff" command that is like "halt" but does the right thing to tell the motherboard to literally power off. If the server doesn't have that feature (because you bought it in the 1840s?) shutting it down and leaving it sitting at a "press any key to boot" prompt often generates little heat compared to a machine that is actively processing. If powering off the non-critical machines isn't enough, shut down critical equipment but not the equipment involved in letting you access the monitoring systems (usually the network equipment). That way you can bring things back up remotely. Of course, as a last resort you'll need to power off those bits of equipment too.

Having cooling emergency? Cooling units can be rented on an emergency basis to help you through a failed cooling unit, or to supplement a cooling unit that is underpowered. There are many companies looking to help you out with a rental unit.

If you have a small room that needs to be cooled (a telecom closet that now has a rack of machines) I've had good luck with a $300-600 unit available at Walmart. For $300-600 it isn't great, but I can buy one in less than an hour without having to wait for management to approve the purchase. Heck, for that price you can buy two and still be below the spending limit of a typical IT manager. The Sunpentown 1200 and the Amcor 12000E are models that one can purchase for about $600 that re-evaporates any water condensation and exhausts it with the hot air. Not having to empty a bucket of water every day is worth the extra cost. The unit is intended for home use, so don't try to use it as a permanent solution. (Not that I didn't use one for more than a year at a previous employer. Ugh.) It has one flaw... after a power outage it defaults to being off. I guess that is typical of a consumer unit. Be sure to put a big sign on it that explains exactly what to do to turn it back on after a power outage. (The sign I made says step by step what buttons to press, and what color each LED should be if it is running properly. I then had a non-system administrator test the process.)

In summary: test your A/C units now. Monitor them, especially on the weekends. Be ready with a backup plan if your A/C unit breaks. Do all this and you can prevent an expensive and painful meltdown.

Posted by Tom Limoncelli in Best of BlogTechnical Tips

HostDB 1.002 released!

A few years ago I released HostDB, my simple system for generating DNS domains. The LISA paper that announced it was called: HostDB: The Best Damn host2DNS/DHCP Script Ever Written.

I just released 1.002 which adds some new features that make it easier to generate MX records for domain names with no A records, and not generate NS records for DNS masters. Other bug fixes and improvements are included.

HostDB is released under the GPL, supported on the HostDB-fans mailing list, and supported by the community. This recent release includes patches contributed by Sebastian Heidl.

HostDB 1.002 is now available for download.

Posted by Tom Limoncelli in Technical Tips

Managing Xen instances is a drag. So my buddies in the Google Zürich office built a system for managing them . Now life is great! The team I manage has put Xen clusters all over the world, all managed with Ganeti. It rocks. I'm proud to see it is available to everyone now under a GPLv2 license.

When I first heard the name, I thought it sounded like an new kind of Italian dessert. But what do you expect from a guy with a last name like "Limoncelli"?

Posted by Tom Limoncelli in Technical Tips

Hardware didn't used to have passwords. Your lawnmower didn't have a password, your car didn't have a password, and your waffle iron didn't have a password.

But now things are different. Hardware is much smarter and now often requires a password. Connecting to the console of a Cisco router asks for a password. A Honda Prius has an all-software entry system.

Posted by Tom Limoncelli in Technical Tips

There is an anti-spam technique called "Grey Listing" which has almost completely eliminated spam from my main server. What's left still goes through my SpamAssassin and Amavis-new filters, but they have considerably much less work to do.

The technique is more than a year old but I've only installed a greylist plug in recently and I'm impressed at how well it works. I hope by writing this article other people that have procrastinated will decide to install a greylist system.

Posted by Tom Limoncelli in Technical Tips

If you write to a file that is SUID (or SGID) the SUID (and SGID) bits on the file are removed as a security precaution against tampering (unless uid 0 is doing the writing).

(See FreeBSD 5.4 source code, sys/ufs/ffs/ffs_vnops.c:739)

Posted by Tom Limoncelli in Technical Tips

The Jifty buzz

Everyone that has seen me speak knows that I love RT for tracking user requests. I was IMing with the author of RT today and he said that for his next product he realized he should first write a good tool that lets him make AJAXy applications without having to do all the work manually. He's done that, and its called Jifty. Now he's building apps based on that. The first one has as many features as RT but is 1/10th the code base. Awesome! Sounds like Jifty is going to be a big hit! (You can find Jifty in CPAN already.)

Oh, and what's the new app called? Hiveminder.

Let the rumors fly! :-)

Posted by Tom Limoncelli in Technical Tips

It's obvious but I didn't think of one particular reason why until the end of this journey.

Read more...

Posted by Tom Limoncelli in Technical Tips

techtarget.com reports:
The problem is, directing cold air is like trying to herd cats. Air is unpredictable. Your cooling unit is sucking in air, cooling it and then throwing it up through a perforated floor. But you have little control over where that air is actually ending up.
Two different vendors are promoting more aggressive cooling systems for modern racks.

Posted by Tom Limoncelli in Technical Tips

Ars Technica has an excellent article about MSH.

If you love perl and/or bash, you'll be interested in reading this tutorial. It gives some excellent examples that explain the language.

Posted by Tom Limoncelli in Technical Tips

"When I see a person I don't recognize in the office, I always smile, stop, introduce myself, and ask for the person's name. I then ask to read it off his ID badge "to help me remember it. I'm a visual learner." New people think I'm being friendly. I'm really checking for trespassers."
This and other great tips can be found in here.

Posted by Tom Limoncelli in Technical Tips

Posted by Tom Limoncelli in Technical Tips

A while back I recommend BlastWave as a great source of pre-built binaries for Solaris. Their service has saved me huge amounts of time.

Sadly, they are running low on funds. It's expensive to keep a high-profile web site like this up and running. Corporate donors are particularly needed.

I just donated $50. I hope you consider donating to them too. Otherwise, in less than 48 hours, they may have to shut down.

Posted by Tom Limoncelli in Technical Tips

Solaris package tip

Since I'm more of an OS X/FreeBSD/Linux person lately, I've gotten a bit out of touch with Solaris administration. I was quite pleasently surprised to find CSW - Community SoftWare for Solaris which includes hundreds of pre-built packages for Solaris. More importantly, it provided the three I really needed and didn't have time to build. :-)

The system is really well constructed. I highly recommend it to everyone. Give this project your support!

Posted by Tom Limoncelli in Technical Tips