I needed a way to backup a single server to a remote hard disk. There are many scripts around, and I certainly could have written one myself, but I found Duplicity and now I highly recommend it:
http://duplicity.nongnu.org
Duplicity uses librsync to generate incremental backups that are very small. It generates the backups, GPG encrypts them, and then sends them to another server by all the major methods: scp, ftp, sftp, rsync, etc. You can backup starting at any directory, not just at mountpoints and there is a full language for specifying files you want to exclude.
Installation: The most difficult part is probably setting up your GPG keys if you've never set them up before. (Note: you really, really, need to protect the private key. It is required for restores. If you lose your machine due to a fire, and don't have a copy of the private key somewhere, you won't be able to do a restore. Really. (I burned mine on a few CDs and put them in various hidden places.)
The machine I'm backing up is a virtual machine in a colo. They don't offer backup services, so I had to take care of it myself. The machine runs FreeBSD 8.0-RELEASE-p4 and it works great. The code is very portable: Python, GPG, librsync, etc. Nothing digs into the kernel or raw devices or anything like that.
I wrote a simple script that loops through all the directories that I want backuped, and runs:
duplicity --full-if-older-than 5W --encrypt-key="${PGPKEYID}" $DIRECTORY scp://myarchives@mybackuphost/$BACKUPSET$dir
The "--full-if-older-than 5W" means that it does an incremental backup, but a full back every 35 days. I do 5W instead of 4W because I want to make sure no more than 1 full backup happens every billing cycle. I'm charged for bandwidth and fear that two full dumps in the same month may put me over the limit.
My configuration: I'm scp'ing the files to another machine, which has a cheap USB2.0 1T hard disk. I set it up so that I can ssh from the source machine to the destination machine without need of a password ("PubkeyAuthentication yes"). In the example above "myarchives" is the username that I'm doing the backup to, and "mybackuphost" is the host. Actually I just specify the hostname and use a .ssh/config entry to set the default username to be "myarchives". That way I can specify "mybackuphost" in other shell scripts, etc. SSH aliases FTW!
Restores: Of course, I don't actually care about backups. I only care about restores. When restoring a file, duplicity figures out which full and incremental backups need to be retrieved and decrypted. You just specify the date you want (default "the latest") and it does all the work. I was impressed at how little thinking I needed to do.
After running the system for a few days it was time to do a restore to make sure it all worked.
The restore syntax is a little confusing because the documentation didn't have a lot of examples. In particular, the most common restore situation is not restoring the full backupset, but "I mess up a file, or think I messed it up, so I want to restore an old version (from a particular date) to /tmp to see what it used to look like."
What confused me: 1) you specify the path to the file (or directory) but you don't list the path leading up to the mountpoint (or directory) that was backuped. In hindsight that is obvious but it caught me. What saved me was that when I listed the files, they were displayed without the mountpoint. 2) You have to be very careful to specify where you put the backup set. You specify that on the command line as the source, and you specify the file to be restored in the "--file-to-restore" option. You can't specify the entire thing on the command line and expect duplicity to guess where to split it.
So that I don't have to re-learn the commands at a time when I'm panicing because I just deleted a critical file, I've made notes about how to do a restore. With some changes to protect the innocent, they look like:
Step 1. List all the files that are backuped to the "home/tal" area:
duplicity list-current-files scp://mybackuphost/directoryname/home/tal
To list what they were like on a particular date, add: --restore-time "2002-01-25"
Step 2. Restore a file from that list (not to the original place):
duplicity restore --encrypt-key=XXXXXXXX --file-to-restore=path/you/saw/in/listing scp://mybackuphost/directoryname/home/tal /tmp/restore
Assume the old file was in "/home/tal/path/to/file" and the backup was done on "/home/tal", you need to specify --file-to-restore as "path/to/file", not "/home/tal/path/to/file". You can list a directory to get all files. The /tmp/restore should be a directory that already exists.
To restore the files as of a particular date, add: --restore-time "2002-01-25"
Conclusion: Duplicity is a great piece of engineering. It is very fast, both because they make good use of librsync to make the backups small, but also because they store indexes of what files were backuped so that the entire backup doesn't have to be read just to get a file list. The backup files are small, split across many small files so that not a lot of temp space is required on the source machine. The tools are very easy to use: they do all the machinations about full and incremental sets, so you can focus on what to backup and what to restore.
Caveats: Like any backup system, you should do a "firedrill" now and then and test your restore procedure. I recommend you encapsulate your backup process in a shell script so that you do it the same way every time.
I highly recommend Duplicity.
http://duplicity.nongnu.org