Photo by godog – found on Flickr tagged “backup” :)
These are just some draft plans to use a lightweight local backup device to support off-site backups for my mom and dad’s MS Windows machines. I think it solves some typical problems with “online backup” (limited bandwidth), which is otherwise very cost-effective if you only have a few GBs to store (like mom and dad…).
But let’s start with some background blabla. Eons ago I said I was going to review Jungle Disk (a backup utility that stores your data at Amazon Simple Storage Service (S3)). It didn’t happen, but not because I forgot. I have been keeping a draft around, but I felt I hadn’t actually spent enough time with the tool to make any fair comments about it. By now Jungle Disk have moved on to version 2.0 of their software – so my notes have passed their expiry date :)
Learning the hard way
Let me summarise what happened: I installed the Jungle Disk utility on my mom’s Windows machine, then when I visited about a month later I looked at the results: much to my surprise, not a single backup session had finished succesfully. And my mom uses her laptop every day.
It wasn’t Jungle Disk’s fault – in fact, Jungle Disk had done just what should be expected from it: it wrote a log entry for every backup that was incomplete. Looking through the logs, I found three problems:
- Despite the use of an incremental backup scheme, daily backup volumes were very large. This was mainly due to the email-client data. The client stores all email in a single (huge) file, which thus changes every time you receive email. Simple incremental schemes just send all changed files, and so effectively your whole email archive ends up in the daily increments.
- Combining the large backup volumes with a “slow” 256kbit/s ADSL upload rate was even more problematic. My mom didn’t have her laptop online long enough per day for the backups to finish.
- The version of Jungle Disk used didn’t support resuming backup sessions after the laptop had been in stand-by mode. Thus, the unfinished backups were really just lost efforts. According to the release notes, the present version is capable of resuming backup sessions – a welcome improvement.
Essentially, I learned then that there’s no such thing as unattended backup. Don’t pay attention for a few days and you might just find a long list of failed sessions in the logs. So now I’m looking for a backup solution that provides the following:
- Fast backups. If, say, you have to leave and catch a train, you want to be able to hit “backup” quickly without having to wait for the data to drip through the ADSL bottleneck.
- Off-site backups. That’s what I thought was cool about Amazon S3. It used to be a feature that wasn’t for home users, but at Amazon S3 prices it’s suddenly become affordable (unless you have a serious amount of data to store).
- Sending only the changed portions of files (a la rdiff/rsync). In the end, sending data costs time and money, so let’s minimise it. Alternatively I could find an email client that supports storing every email in a file of its own, but that’s a user-bullying alternative…
- Freedom: as I mentioned before I’d prefer to use e.g. duplicity, but I was worried about the surprises I might have when running that in MS Windows.
- Remote manageability: having let go of the illusion of unattended backup now, I would like to be able to check on the backup system and perhaps do some remote troubleshooting.
Up for take two
The solution (I hope) is in a storage device named NSLU2. At about 55GBP or 65Euro it’s inexpensive (it has to be, I won’t spend several hundred Euros on a serious NAS just to back up 10GB or so of data) and it’s good (unlike the typical cheap dodgy NASes). And it runs Debian.
So the little NSLU2 with an old and smallish hard disk will sit next to the home router, always ready for the Windows machines to connect. Backing up data over the local network is fast (I suppose we’ll just have samba file sharing), and subsequent off-site backups no longer involve the Windows machines – so it’s fine if they take all day. Power consumption should be minimal.
I suppose it shouldn’t be a problem to use duplicity on the little Debian box – I’m however not aware of its hardware requirements (maybe the encryption part will take forever with the XScale CPU?). Duplicity conveniently supports the Amazon S3 API, uses a standard and open file format, and does the rdiff magic, too.
I’m still asking myself if this is “overkill” for the task at hand, but actually I can hardly wait to order an NSLU2… wait, did you just point out it can also serve as a print server? Of course!… Maybe the decision has just been made…