I've heard enough horror stories about lost data to know that backups are important. For those files that mainly live on my laptop I use Jungledisk to automatically backup the important files daily. At the time I signed up the program cost $20 and my storage costs are about $0.50 a month. Today you have to pay at least $2/month and then the storage fees as well. Not bad but after a couple of years those fees are going to add up. When I used shared hosting I sporadically backed up the files and then emailed a database dump to my gmail account daily. This worked perfectly well as the files rarely changed and gmail was able to hold several hundred copies of the database for my little blog. When I needed to I could simply go in and delete last years backup emails. Recently though I've started renting a VPS from Linode and I'm now in the position where both the files and the database are frequently changing. I need a way to backup both the files and the database and as I'm lazy I want it to be automated. I started looking around for information on how other people were handling this.
I came across a post from John Eberly
discussing how he automates his backups to amazon s3. This looked like a good place to start but I was sceptical about how rsync would work with amazon s3 as described and there was only one backup. Based on this I formulated the following plan: At the start of each week copy the directories to be backed up to a temporary directory using rsync and then encrypt using gnupg. Then push the resulting file to amazon s3. On each subsequent day make a differential backup using the batch mode of rsync, encrypt and then push to s3. Repeat for the start of the next week. After putting a surprisingly short script together I had a working approach. Except nothing was actually being pushed to s3. I still need to investigate why this was happening but it isn't at the top of my list of things to do as I have since found a far better way to handle my backups.
I'm not an expert at backups. Nor am I a security expert. Nor am I interested in becoming an expert at either backups or security. This means someone has likely already built a better backup utility than I could. I believe I have found it in Tarsnap
. Below is a list of what tarsnap does. I've highlighted the features which take it above and beyond my approach.
- Multiple backups
- Backups on my schedule
- Files are encrypted
- Utility pricing - pay only for what you use with no standing charges
- Open source - I can check that only what I want to happen is really happening
Efficient - backups take up no more space than my full+differential strategy and yet each backup can be manipulated independently of any other backup
Permissions - With tarsnap I can allow my server to create and read backups but not delete them
The efficiency is nice but a difference between $0.50/month and $0.60/month isn't a massive deal. What is a big deal is the permissions. Backing up my files anywhere with an online connection has always made me slightly uneasy. Email works well as once an email is sent it can't be called back. If you want to backup to amazon s3 you have to give unrestricted access to read, edit and delete which means it is possible to loose all your backups. Tarsnap is not vulnerable to this weakness though and this is a big deal. It's one less thing to worry about which is certainly worth the $0.15/GB premium over s3 alone. My next post will detail how I have implemented backups using tarsnap