Setting up backups with tarsnap

Having already outlined my reasons for using tarsnap for online backups this post will detail how exactly I'm using it.

The instructions on the tarsnap site are really very easy to follow. I was momentarily caught out by not importing the code signing key but after getting that sorted out it was fine. I did need to use sha256sum rather than sha256 as suggested. Installation went well and then I had a little play with creating, listing, deleting and recovering data from backups. It was at this point when my only real gripes with the software started to become obvious - you can't humanize the data size figures when using --list-archives and there is no shortcut for --list-archives. As gripes go these are fairly minor though and everything else works nicely.

With the tarsnap client running on my server it was time to automate my backups. I put together a small script which creates a dump of my database and then creates a new backup with the tarsnap client.

#!/bin/bash
dateString=`date +%F`
echo "Beginning backup for $dateString" >> /home/streety/sources/backup/tarsnap.log
#dump the mysql database
rm -f /home/streety/mysql-backup.sql
mysqldump --user=backup -ppassword --all-databases > /home/streety/mysql-backup.sql
#backup to tarsnap
tarsnap -c -f linode-jscom-$dateString /home/streety /etc/apache2
echo "Backup complete for $dateString" >> /home/streety/sources/backup/tarsnap.log

That script worked fine when I ran it from the shell but cron didn't seem to be running it. I needed to specify the path to the tarsnap script. Easily enough done.

PATH=/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/bin
MAILTO=jonathan@jonathanstreet.com
# m h  dom mon dow   command
5 0 * * * /home/streety/sources/backup/backup.sh >> /home/streety/sources/backup/output.log 2>&1

With everything working I wanted to get permissions set up. Again this was very easy.

tarsnap-keymgmt --outkeyfile /root/limited-tarsnap.key -r -w /root/tarsnap.key

The original key is then removed from the system and kept in a secure place. The new limited key should allow us to create and read from backups but not to delete them.

streety@jonathanstreet:~$ tarsnap -c -f anothertestbackup /home/streety
tarsnap: fopen(/root/tarsnap.key): Permission denied
tarsnap: Cannot read key file: /root/tarsnap.key
streety@jonathanstreet:~$ sudo !!
sudo tarsnap -c -f anothertestbackup /home/streety
[sudo] password for streety:
tarsnap: Removing leading '/' from member names
                                       Total size  Compressed size
All archives                           1804387231        685263319
  (unique data)                         481384333        178645934
This archive                            746610352        296516102
New data                                   721055           196300
streety@jonathanstreet:~$ tarsnap --list-archives
tarsnap: fopen(/root/tarsnap.key): Permission denied
tarsnap: Cannot read key file: /root/tarsnap.key
streety@jonathanstreet:~$ sudo !!
sudo tarsnap --list-archives
testbackup
anothertestbackup
linode-jscom-2009-11-30
streety@jonathanstreet:~$ sudo tarsnap -d -f anothertestbackup
tarsnap: The delete authorization key is required for -d but is not available

As you can see I keep forgetting to use sudo but it all works. I can create backups, list the existing backups but I can't delete them, at least not from this server. Success.

I've been running this script for a little more than a month now and so far I'm very happy with it.

Ditching the custom wheel in backups

I've heard enough horror stories about lost data to know that backups are important. For those files that mainly live on my laptop I use Jungledisk to automatically backup the important files daily. At the time I signed up the program cost $20 and my storage costs are about $0.50 a month. Today you have to pay at least $2/month and then the storage fees as well. Not bad but after a couple of years those fees are going to add up. When I used shared hosting I sporadically backed up the files and then emailed a database dump to my gmail account daily. This worked perfectly well as the files rarely changed and gmail was able to hold several hundred copies of the database for my little blog. When I needed to I could simply go in and delete last years backup emails. Recently though I've started renting a VPS from Linode and I'm now in the position where both the files and the database are frequently changing. I need a way to backup both the files and the database and as I'm lazy I want it to be automated. I started looking around for information on how other people were handling this.

The Plan

I came across a post from John Eberly discussing how he automates his backups to amazon s3. This looked like a good place to start but I was sceptical about how rsync would work with amazon s3 as described and there was only one backup. Based on this I formulated the following plan: At the start of each week copy the directories to be backed up to a temporary directory using rsync and then encrypt using gnupg. Then push the resulting file to amazon s3. On each subsequent day make a differential backup using the batch mode of rsync, encrypt and then push to s3. Repeat for the start of the next week. After putting a surprisingly short script together I had a working approach. Except nothing was actually being pushed to s3. I still need to investigate why this was happening but it isn't at the top of my list of things to do as I have since found a far better way to handle my backups.

Tarsnap

I'm not an expert at backups. Nor am I a security expert. Nor am I interested in becoming an expert at either backups or security. This means someone has likely already built a better backup utility than I could. I believe I have found it in Tarsnap. Below is a list of what tarsnap does. I've highlighted the features which take it above and beyond my approach.
  • Multiple backups
  • Backups on my schedule
  • Files are encrypted
  • Utility pricing - pay only for what you use with no standing charges
  • Open source - I can check that only what I want to happen is really happening
  • Efficient - backups take up no more space than my full+differential strategy and yet each backup can be manipulated independently of any other backup
  • Permissions - With tarsnap I can allow my server to create and read backups but not delete them
The efficiency is nice but a difference between $0.50/month and $0.60/month isn't a massive deal. What is a big deal is the permissions. Backing up my files anywhere with an online connection has always made me slightly uneasy. Email works well as once an email is sent it can't be called back. If you want to backup to amazon s3 you have to give unrestricted access to read, edit and delete which means it is possible to loose all your backups. Tarsnap is not vulnerable to this weakness though and this is a big deal. It's one less thing to worry about which is certainly worth the $0.15/GB premium over s3 alone. My next post will detail how I have implemented backups using tarsnap.