Posts Tagged 'backup'

Fully restricting rsync options server-side

Let me tell you up front: the following is a respin of information I found elsewhere. And it was very well written. Normally, then, I wouldn’t blog this, and rather add a link in my RSS feed in the sidebar – after all that’s the most compact form of “code reuse” and of upping the PageRank for a good site I found. What makes things different today? Well, it took me a crazy number of  search-engine queries to find the info.

Maybe I’m just stoopid, but let’s assume I’m not. I hope this respin ranks a bit better for keywords, so I can help some other lost souls find the site that I found. If you want the expert story, click straight through to the source – actually, that whole site is simply excellent, and well worth browsing thoroughly if you’re looking to learn cool sysadmin stuff and more.

I learned (at least) two new things today:

  • how to extract the rsync command that your locally-executed rsync sends to the remote machine’s ssh-server
  • how to make the ssh-server execute that exact command, regardless what someone tries to feed it from the local machine

But first, let me explain (to myself) why I wanted to know these things.

The issue with remote backups

You want off-site backups, because, well, that’s rule #3. But, by rule #2, you also want to backup often, and there’s only one way to guarantee that that will work out: automation. There’s a problem with that though: to attain automation, your remote backups will need some unprotected authentication token, e.g. an ssh-certificate with an empty passphrase.

Obviously, you want to restrict what that dangerous key lets you do on the remote system. Simply put, you don’t want someone that managed to break into your backup client to be able to erase both your backups and the originals. The solutions I had seen so far included creating a separate backup-user on the server, and providing a restricted shell of some sort. That’s one way of doing things, but it’s not easy to set up:

  • you only want to allow a select few commands, say rsync for transport, and perhaps some scripts to prepare the backup
  • ideally you want to have read-only access so that the client performing the backup cannot damage files, which might even occur without malicious intent, say by a wrong string of rsync options
  • but maybe you want to run some sort of hotcopy command on some database you’re using, and this does require write access
  • do you create yet another user for that?
  • and are you sure your shell is really as restricted as you think? No tricks to break out of it?
  • aaaaaahhrghh…..

Right. I’m *that* mistrusting, and especially when it comes to my own competence. I’d definitely bork that restricted shell setup. Please give me something dead simple.

Figuring out what your local rsync needs from the remote rsync

Okay. Assume we’ll always call the server with the exact same rsync command, perhaps something like

bash$ rsync -avz remote_host:/var/backup/ /var/remote_host_backup

(On a side-note, I’m still doubting myself every time: trailing slash or no trailing slash? Terrible.)

Now, you can see what rsync command will get executed on the remote host if you add another -v:

bash$ rsync -avvnz remote_host:/var/backup/ /var/remote_host_backup

where I also added an -n to have a dry-run. The first line of output reads something like

opening connection using ssh remote_host rsync --server --sender -vvnlogDtprz . /var/backup

…which runs off the page here because I didn’t pay WordPress for my own custom CSS yet, but you can try this yourself anyway. What we’re interested in is the part that starts at “rsync”: this is what is executed on the remote host.

Using sshd with the command=”” option

Remember we’re using a passphrase-less ssh-certificate for the sake of automation. On the server, that requires an entry like this in $HOME/.ssh/authorized_keys:

ssh-rsa AAYourVeryLongPublicKeyThingy== plus arbitrary comment/username at the end

The sshd manpage tells you you can insert quite a few options at the start of this line. You should really consider all of those options, but the cool one for now is the command=”” option. Between the quotes we put the result of the previous section minus the -n (or you’ll have only dry runs…).

command="rsync --server --sender -vlogDtprz . /var/backup" ssh-rsa AAYourVeryLongPublicKeyThingy== plus arbitrary comment/username at the end

…that’s probably running off the page big time now. Sorry. And I didn’t even add all the other restrictive options you ought to consider.

The beauty of this is that sshd will now ignore whatever abuse you’re feeding it from the ssh client. Whenever you authenticate using this specific certificate, it will only run that exact command.

Let me put this  yet another way. The only way to successfully talk to the server with that certificate is to say what it expects you to say: you can only run the matching local rsync command or the two rsync instances will not understand eachother. All the options are fixed, client-side and server-side.

This is what you want. Or, it is what I wanted, anyway.

What about running scripts before the actual rsync?

Okay, I learned a third thing. This was in the rsync manpage: your remote rsync “can be any program, script, or command sequence you’d care to run, so long as it does not corrupt the standard-in & standard-out that rsync is using to communicate”.

In other words: you can run any database hotcopy command on the server, as long as it cares to shut up, so that to the client, it looks as if only rsync was called. Your authorized_keys entry now looks somewhat like this:

command="do_db_hotcopy >> /var/log/hotcopy.log 2>&1 ; rsync --server --sender -vlogDtprz . /var/backup" ssh-rsa AAYourVeryLongPublicKeyThingy== plus arbitrary comment/username at the end

… where you’re being careful to make sure the only output sent comes from rsync. This works for me; I could imagine a long script might cause your local rsync to time-out in some way, so ymmv.

One more thing

I’ll shut up soon, too, but there was actually also a fourth thing… how do you make sure your local rsync command uses the restricted, passphraseless key under all circumstances? When I’m actually logged in myself, often ssh-agent is keeping my less-restricted key available. The problem with this is that ssh will prefer using that key, but when I use that, my fancy hotcopy (from the previous section) never gets called.

To fix this, my backup script on the client contains an extra -e option to rsync, which is self-explanatory, but that’s not enough: ssh still prefers the key held by ssh-agent. The full solution (as the ssh-agent manpage more or less documents) is thus:

#! /bin/bash
rsync -avz -e 'ssh -i .ssh/restricted_key' remote_host:/var/backup/ /var/remote_host_backup

Sometime soon I might respin this whole thing with rdiff-backup (…you want to keep multiple states of your backup, because, well, that’s rule #4 :P). I just need to figure out how client-server communication works for that.


Systematically backing up a wordpress blog

I’ve been meaning to write about this forever, but now someone else has done all the work already! The script presented there works wonderfully. I’ve just made a tiny addition, because I don’t like ending up with a directory full of huge XML files. Those of you who have been here before will have probably guessed already: the addition is to check the backup in to a revision control system.

To begin with, this is what my patch to the script looks like:

--- wordpress_backup.perl	2008-11-12 22:24:57 +0000
+++ wordpress_backup.perl	2008-11-12 23:05:32 +0000
@@ -13,14 +13,13 @@
 my $path_to_file="/path/to/file/";
 my $url="https://$";
 my $author="all";
-#Filename format:
+#Filename format: fileprefix.xml
 my $fileprefix="wordpress";

 #Change that if you want
 my $agent="unixwayoflife/1.0";  

-my $date=((localtime)[5] +1900)."-".((localtime)[4] +1)."-".(localtime)[3];
 my $mech = WWW::Mechanize->new( agent => $agent );
 $mech->get( $url."wp-login.php" );  

@@ -38,7 +37,7 @@

 ##Download the file
 $mech->save_content( $path_to_file.$file_name );
 print ("Download ttt[OK]n");  

@@ -50,4 +49,5 @@
 print ("Login out ttt[OK]n");  

 print("File successfully saved in $path_to_file$file_namen");
+system("bzr commit -m "* wordpress_backup.perl was here"");
 exit 0;

As you can see, only four lines changed – the maximum I could manage, since I haven’t ever edited a perl script before :P

All you then need to do is putting in your blog’s details at the top, and run it. The last thing will fail – the “bzr commit”, but running it once will get you the initial XML file to check in to revision control (if you happen to be a native English speaker, can you tell me if that should be “check-in to” or “check into”, or yet something else?). Having the file, say “wordpress.xml”, and assuming you have bazaar, you can now bring the backups under version control with a simple

bzr init #this will create a .bzr tree under your current working directory
bzr add wordpress.xml
bzr commit -m "some happy message about the first commit"

The next time you run the perl script it will overwrite wordpress.xml and commit changes to the repository. If you have a big weblog, this could save you quite some disk space…

Draft plan: online backup with a local buffer

Photo by godog – found on Flickr tagged “backup” :)

These are just some draft plans to use a lightweight local backup device to support off-site backups for my mom and dad’s MS Windows machines. I think it solves some typical problems with “online backup” (limited bandwidth), which is otherwise very cost-effective if you only have a few GBs to store (like mom and dad…).

But let’s start with some background blabla. Eons ago I said I was going to review Jungle Disk (a backup utility that stores your data at Amazon Simple Storage Service (S3)). It didn’t happen, but not because I forgot. I have been keeping a draft around, but I felt I hadn’t actually spent enough time with the tool to make any fair comments about it. By now Jungle Disk have moved on to version 2.0 of their software – so my notes have passed their expiry date :)

Learning the hard way

Let me summarise what happened: I installed the Jungle Disk utility on my mom’s Windows machine, then when I visited about a month later I looked at the results: much to my surprise, not a single backup session had finished succesfully. And my mom uses her laptop every day.

It wasn’t Jungle Disk’s fault – in fact, Jungle Disk had done just what should be expected from it: it wrote a log entry for every backup that was incomplete. Looking through the logs, I found three problems:

  • Despite the use of an incremental backup scheme, daily backup volumes were very large. This was mainly due to the email-client data. The client stores all email in a single (huge) file, which thus changes every time you receive email. Simple incremental schemes just send all changed files, and so effectively your whole email archive ends up in the daily increments.
  • Combining the large backup volumes with a “slow” 256kbit/s ADSL upload rate was even more problematic. My mom didn’t have her laptop online long enough per day for the backups to finish.
  • The version of Jungle Disk used didn’t support resuming backup sessions after the laptop had been in stand-by mode. Thus, the unfinished backups were really just lost efforts. According to the release notes, the present version is capable of resuming backup sessions – a welcome improvement.

Essentially, I learned then that there’s no such thing as unattended backup. Don’t pay attention for a few days and you might just find a long list of failed sessions in the logs. So now I’m looking for a backup solution that provides the following:

  • Fast backups. If, say, you have to leave and catch a train, you want to be able to hit “backup” quickly without having to wait for the data to drip through the ADSL bottleneck.
  • Off-site backups. That’s what I thought was cool about Amazon S3. It used to be a feature that wasn’t for home users, but at Amazon S3 prices it’s suddenly become affordable (unless you have a serious amount of data to store).
  • Sending only the changed portions of files (a la rdiff/rsync). In the end, sending data costs time and money, so let’s minimise it. Alternatively I could find an email client that supports storing every email in a file of its own, but that’s a user-bullying alternative…
  • Freedom: as I mentioned before I’d prefer to use e.g. duplicity, but I was worried about the surprises I might have when running that in MS Windows.
  • Remote manageability: having let go of the illusion of unattended backup now, I would like to be able to check on the backup system and perhaps do some remote troubleshooting.

Up for take two

The solution (I hope) is in a storage device named NSLU2. At about 55GBP or 65Euro it’s inexpensive (it has to be, I won’t spend several hundred Euros on a serious NAS just to back up 10GB or so of data) and it’s good (unlike the typical cheap dodgy NASes). And it runs Debian.

So the little NSLU2 with an old and smallish hard disk will sit next to the home router, always ready for the Windows machines to connect. Backing up data over the local network is fast (I suppose we’ll just have samba file sharing), and subsequent off-site backups no longer involve the Windows machines – so it’s fine if they take all day. Power consumption should be minimal.

I suppose it shouldn’t be a problem to use duplicity on the little Debian box – I’m however not aware of its hardware requirements (maybe the encryption part will take forever with the XScale CPU?). Duplicity conveniently supports the Amazon S3 API, uses a standard and open file format, and does the rdiff magic, too.

I’m still asking myself if this is “overkill” for the task at hand, but actually I can hardly wait to order an NSLU2… wait, did you just point out it can also serve as a print server? Of course!… Maybe the decision has just been made…

Flyback? Rdiff-backup!

I just ran into Flyback, which is presented as “Apple’s Time Machine for Linux”. To be precise, I read about it here on WordPress. Now, I’ve been using rdiff-backup for a while, so the first thing I thought was: what’s new about it?

Well, flyback is basically a GUI frontend for rsync, and rdiff-backup is a command-line utility based on librsync. Both programs are written with the same idea in mind. So: the new thing is the GUI. I guess that’s good, because setting up a backup system is daunting enough even with the help of a good GUI. On the other hand, we won’t interact with it daily (I hope) – so there’s no real need for it to look pretty. Both points seem valid to me. Edited: see comment below.

Nonetheless, I can see flyback overshadowing its command-line family with its good looks, so I’ll try and give rdiff-backup some blogging attention again soon (in fact there’s a special use of it that I want to run some tests with).

One good reason to prefer rdiff-backup over flyback? Here you go:

$ sudo aptitude install flyback
Reading package lists… Done
Building dependency tree… Done
Reading extended state information
Initializing package states… Done
Reading task descriptions… Done
Building tag database… Done
Couldn’t find any package whose name or description matched “flyback”



Ok, I moved this weblog to (it was on Blogger before). I wouldn’t even have been looking for something else if it wasn’t for Blogger lacking any decent form of backup options. The “advanced use” section of the help pages has a slightly bizarre option that involves changing your template settings. And then there is a Windows-only tool. I’m sure this will change sometime soon, since Blogger is owned by Google and I believe they support, but it wasn’t soon enough for me – having backup plans in the future doesn’t protect your data (as I also just wrote on my new personal blog)…

WordPress not only allows you to create backups (using an export operation), but also to import your Blogger weblog (how convenient!). I can go on to sum up a lot of other nice features, but in short: I created an account, played around for a bit in the powerful Dashboard interface, and was sold.

Another charming feature is its GPL licensing, which makes your blog even more portable – not only can you take your posts and comments, you can simply set all of them up on a server of your own. In fact, a wordpress package is available in the Etch repository, as I found while I was exploring and comparing alternatives for weblog services. I also found that it almost didn’t make it into Etch – the discussion linked to has a well-balanced discussion about security issues, should you be interested in that sort of thing.

Now that I’ve moved, I’ll set up a complete backup mechanism (doing the xml export is of course only a first, be it essential, step) and will post about that here soon.

Backup using dump

Following up on my earlier post today about setting up my USB backup drive, here are some pointers to using dump and restore to back up my whole file system. All the information is in this how-to. While dump should be able to run with the whole system online, to make life easier for it and since I don’t have anything running that has to stay up, I dropped into single user mode (init 1). Then

dump -0uf /media/usbdisk/{insert/filename} /
restore -C -f /media/usbdisk/{insert/filename}

makes a full backup of the file system into the specified file (in this case mounted on /media/usbdisk), and tests it afterwards (that’s what restore -C does). In my case, a roughly 6.5G backup took 12 minutes to complete (over USB2.0) – I didn’t keep track of the test time.

Of course, for single-file or -directory backups, you need something else. I haven’t completely settled on a structural solution, at the moment I’m using a mix of rsync (to save to a network drive) and bzr (to keep revisions of my work without need for a server).

With thanks to our very knowledgeable Computer Officer at work for pointing me to dump.