The Adventures of Systems Boy!

Confessions of a Mac SysAdmin...

Backing Up with RsyncX

In an earlier post I talked generally about my backup procedure for large amounts of data. In the post I discussed using RsyncX to back up staff Work drives over a network, as well as my own personal Work drive data, to a spare hard drive. Today I'd like to get a bit more specific.

Installing RsyncX
I do not use, nor do I recommend the version of rsync that ships with Mac OS X 10.4. I've found it, in my own personal tests, to be extremely unreliable, and unreliability is the last thing you want in a backup program. Instead I use — and have been using without issue for years now — RsyncX. RsyncX is a GUI wrapper for a custom-built version of the rsync command that's made to properly deal with HFS+ resource forks. So the first thing you need to do is get RsyncX, which you can do here. To install RsyncX, simply run the installer. This will place the resource-fork-aware version of rsync in /usr/local/bin/. If all you want to do is run rsync from the RsyncX GUI, then you're done, but if you want to run it non-interactively from the command-line — which ultimately we do — you should put the newly installed rsync command in the standard location, which is /usr/bin/.¹ Before you do this, it's always a good idea to make a backup of the OS X version. So:

sudo cp /usr/bin/rsync /usr/bin/rsync-ORIG
sudo cp /usr/local/bin/rsync /usr/bin/rsync

Ah! Much better! Okay. We're ready to roll with local backups.²

Local Backups
Creating local backups with rsync is pretty straightforward. The RsyncX version of the command acts almost exactly like the standard *NIX version, except that it has an option to preserve HFS+ resource forks. This option must be provided if you're interested in preserving said resource forks. Let's take a look at a simple rsync command:

/usr/bin/rsync -a -vv /Volumes/Work/ /Volumes/Backup --eahfs

This command will backup the contents of the Work volume to another volume called Backup. The -a flag stands for "archive" and will simply backup everything that's changed while leaving files that may have been deleted from the source. It's usually what you want. The -vv flag specifies "verbosity" and will print what rsync is doing to standard output. The level of verbosity is variable, so "-v" will give you only basic information, "-vvvv" will give you everything it can. I like "-vv." That's just the right amount of info for me. The next two entries are the source and target directories, Work and Backup. The --eahfs flag is used to tell rsync that you want to preserve resource forks. It only exists in the RsyncX version. Finally, pay close attention to the trailing slash in your source and target paths. The source path contains a trailing slash — meaning we want the command to act on the drive's contents, not the drive itself — whereas the target path contains no trailing slash. Without the trailing slash on the source, a folder called "Work" will be created inside the WorkBackup drive. This trailing slash behavior is standard in *NIX, but it's important to be aware of when writing rsync commands.

That's pretty much it for simple local backups. There are numerous other options to choose from, and you can find out about them by reading the rsync man page.

Network Backups
One of the great things about rsync is its ability to perform operations over a network. This is a big reason I use it at work to back up staff machines. The rsync command can perform network backups over a variety of protocols, most notably SSH. It also can reduce the network traffic these backups require by only copying the changes to files, rather than whole changed files, as well as using compression for network data transfers.

The version of rsync used by the host machine and the client machine must match exactly. So before we proceed, copy rsync to its default location on your client machine. You may want to back up the Mac OS X version on your client as well. If you have root on both machines you can do this remotely on the command line:

ssh -t root@mac01.systemsboy.com 'cp /usr/bin/rsync /usr/bin/rsync-ORIG'
scp /usr/bin/rsync root@mac01.systemsboy.com:/usr/bin/

Backing up over the network isn't too much different or harder than backing up locally. There are just a few more flags you need to supply. But the basic idea is the same. Here's an example:

/usr/bin/rsync -az -vv -e SSH mac01.systemsboy.com:/Volumes/Work/ /Volumes/Backups/mac01 --eahfs

This is pretty similar to our local command. The -a flag is still there, and we've added the -z flag as well, which specifies to use compression for the data (to ease network traffic). We now also have an -e flag which tells rsync that we're running over a network, and an SSH option that specifies the protocol to use for this network connection. Next we have the source, as usual, but this time our source is a computer on our network, which we specify just like we would with any SSH connection — hostname:/Path/To/Volume. Finally, we have the --eahfs flag for preserving resource forks. The easiest thing to do here is to run this as root (either directly or with sudo), which will allow you to sync data owned by users other than yourself.

Unattended Network Backups
Running backups over the network can also be completely automated and can run transparently in the background even on systems where no user is logged in to the Mac OS X GUI. Doing this over SSH, of course, requires an SSH connection that does not interactively prompt for a password. This can be accomplished by establishing authorized key pairs between host and client. The best resource I've found for learning how to do this is Mike Bombich's page on the subject. He does a better job explaining it than I ever could, so I'll just direct you there for setting up SSH authentication keys. Incidentally, that article is written with rsync in mind, so there are lots of good rsync resources there as well. Go read it now, if you haven't already. Then come back here and I'll tell you what I do.

I'd like to note, at this point, that enabling SSH authentication keys, root accounts and unattended SSH access is a minor security risk. Bombich discusses this on his page to some extent, and I want to reiterate it here. Suffice to say, I would only use this procedure on a trusted, firewalled (or at least NATed) network. Please bear this in mind if you proceed with the following steps. If you're uncomfortable with any of this, or don't fully understand the implications, skip it and stick with local backups, or just run rsync over the network by hand and provide passwords as needed. But this is what I do on our network. It works, and it's not terribly insecure.

Okay, once you have authentication keys set up, you should be able to log into your client machine from your server, as root, without being prompted for a password. If you can't, reread the Bombich article and try again until you get it working. Otherwise, unattended backups will fail. Got it? Great!

I enable the root account on both the host and client systems, which can be done with the NetInfo Manger application in /Applications/Utilities/. I do this because I'm backing up data that is not owned by my admin account, and using root gives me the unfettered access I need. Depending on your situation, this may or may not be necessary. For the following steps, though, it will simplify things immensely if you are root:

su - root

Now, as root, we can run our rsync command, minus the verbosity, since we'll be doing this unattended, and if the keys are set up properly, we should never be prompted for a password:

/usr/bin/rsync -az -e SSH mac01.systemsboy.com:/Volumes/Work/ /Volumes/Backups/mac01 --eahfs

This command can be run either directly from cron on a periodic basis, or it can be placed in a cron-run script. For instance, I have a script that pipes verbose output to a log of all rsync activity for each staff machine I back up. This is handy to check for errors and whatnot, every so often, or if there's ever a problem. Also, my rsync commands are getting a bit unwieldy (as they tend to do) for direct inclusion in a crontab, so having the scripts keeps my crontab clean and readable. Here's a variant, for instance, that directs the output of rsync to a text file, and that uses an exclude flag to prevent certain folders from being backed up:

/usr/bin/rsync -az -vv -e SSH --exclude "Archive" mac01.systemsboy.com:/Volumes/Work/ /Volumes/Backups/mac01 --eahfs > ~/Log/mac01-backup-log.txt

This exclusion flag will prevent backup of anything called "Archive" on the top level of mac01's Work drive. Exclusion in rsync is relative to the source directory being synced. For instance, if I wanted to exclude a folder called "Do Not Backup" inside the "Archive" folder on mac01's Work drive, my rsync command would look like this:


/usr/bin/rsync -az -vv -e SSH --exclude "Archive/Do Not Backup" mac01.systemsboy.com:/Volumes/Work/ /Volumes/Backups/mac01 --eahfs > ~/Log/mac01-backup-log.txt

Mirroring
The above uses of rsync, as I mentioned before, will not delete files from the target that have been deleted from the source. They will only propagate changes that have occurred on the existing files, but will leave deleted files alone. They are semi-non-destuctive in this way, and this is often useful and desirable. Eventually, though, rsync backups will begin to consume a great deal of space, and after a while you may begin to run out. My solution to this is to periodically mirror my sources and targets, which can be easily accomplished with the --delete option. This option will delete any file from the target not found on the source. It does this after all other syncing is complete, so it's fairly safe to use, but it will require enough drive space to do a full sync before it does its thing. Here's our network command from above, only this time using the --delete flag:

/usr/bin/rsync -az -vv -e SSH --exclude "Archive/Do Not Backup" mac01.systemsboy.com:/Volumes/Work//Volumes/Backups/mac01 --delete --eahfs > ~/Log/mac01-backup-log.txt

Typically, I run the straight rsync command every other day or so (though I could probably get away with running it daily). I create the mirror at the end of each month to clear space. I back up about a half dozen machines this way, all from two simple shell scripts (daily and weekly) called by cron.

Conclusion
I realize that this is not a perfect backup solution. But it's pretty good for our needs, given what we can afford. And so far it hasn't failed me yet in four years. That's not a bad track record. Ideally, we'd have more drives and we'd stagger backups in such a way that we always had at least a few days backup available for retrieval. We'd also probably have some sort of backup to a more archival medium, like tape, for more permanent or semi-permanent backups. We'd also probably keep a copy of all this in some offsite, fireproof lock box. I know, I know. But we don't. And we won't. And thank god, 'cause what a pain in the ass that must be. It'd be a full time job all its own, and not a very fun one. What this solution does offer is a cheap, decent, short-term backup procedure for emergency recovery of catastrophic data loss. Hard drive fails? No trouble. We've got you covered.

Hopefully, though, this all becomes a thing of the past when Leopard's Time Machine debuts. Won't that be the shit?

1. According to the RsyncX documentation, you should not need to do this, because the RsyncX installer changes the command path to its custom location. But if you'll be running the command over the network or as root, you'll either have to change that command path for the root account and on every client, or network backups will fail. It's much easier to simply put the modified version in the default location on each machine.

2. Updates to Mac OS X will almost always overwrite this custom version of rsync. So it's important to remember to replace it whenever you update the system software.

Labels: , , , ,

« Home | Next »
| Next »
| Next »
| Next »
| Next »
| Next »
| Next »
| Next »
| Next »
| Next »

2:39 AM

hmmm. i would leave non-standard bins in /usr/local, and call bin with full path, instead of replacing original.

i wish apple would fix apple supplied bins. rsync and ditto should be all u need    



2:49 AM

Yeah, I can see the logic. The problem is that if I did that with the 8 machines I backup, I'd have to set the path for rsync on the root account on every machine. And we both know I'm way to lazy to do that.

I was so happy when I heard that Tiger would have resource aware rsync. And I was so bummed when it just plain old didn't work. I wish they'd fix 'em too.

-systemsboy    



10:50 AM

I was under the impression, perhaps wrong that the current rsync was, like tar, and cp, resource fork and extended attribute aware. Is it not. My tests seemed to show that if I used
rsync -Ea
that the transferred files came across unharmed.
Perhaps some one can clear this up.
The downside of using -E is that it seems that it detects a lot more files have changed. I suspect that this may be something like the access time attribute being changed (perhaps by spotlight or rsync itself) and thus having to be copied.
Doesn't rsyncX have this problem too?
Finally, In the past I stopped using rsyncX because it had a know bug having to do with deep directory trees and long names causing a malloc error. The original rsync had this too but it was fixed, but since the fix had not migrated into rsyncX I stopped using it. Perhaps this is now fixed.

In the meantime I find that RdiffBackup is a much better program for the purpose of backups and not simply mirrors. My understanding is that it is based on rssync but is even more parcemonius about tranferring only the parts of a file that changed and taking care of the extended attributes. It also handles the snapshotting for reverting backups to any point in time.

the other problem I had with rsyncX was that it was not cross platfrom compatible. That is if I was backing up to a linux server from my mac the chances that the two rsyncs were compatible seemed to be nil.

For backups I can't think of a good reason to use RsyncX when one can use rdiffbackup. it's crossplatform and actually does the backup part right.

Rsyncx is best for mirrors not backups.

If you do want to use rsyncX for backups then you can also do the following after you backup

find -d ./ | cpio -dpl snapshot1

to create a snapshot that preserves everything except permissions but takes up no space on the hard disk unless you change a file.    



2:48 PM

Charlie,

Thanks for all the interesting information. I totally agree that rsync is wrong for cross-platform backups. Good point.

The current version of rsync in Tiger is supposed to be resource fork aware. But resource fork awareness seems to be the least of its problems. I could never even get a complete backup with it.

RsyncX may have a similar problem with resource forks and changes, and yeah, it makes sense that that would add another level of change to files. I honestly don't know the answer to that for sure, but my incremental backups complete in a very reasonable time frame, so I'm not too concerned.

I'm not aware of the malloc error you speak of. But it does cause me some concern. I've thought of looking into rdiffbackup, but just never got around to it. And RsyncX has worked well for me thus far, so I haven't been super motivated. My hope is that Leopard's Time Machine is robust enough to handle our backups in the future, and that I can offload some of that to our staff members. From what I've read about it, it would actually be much better for our needs, providing staff the ability to backup and retrieve lost files on their own. Which would be brilliant. If it works well, of course. My fingers are crossed. If it doesn't, I may look into rdiffbackup.

That find command trick sounds interesting too. I'll have to check that out.

Thanks again for the comment!

-systemsboy    



11:52 AM

mike bombich released the first public beta of CarbonCopyCloner 3 today. this beta offers the ability to make backups over the network using rsync. since there's no rsync binary included in the package i assume it uses the standard osx-binary. therefor all issues related to rsync on osx still exist. this site claims to solve some of the problems with rsync. http://www.lartmaker.nl/rsync/. i haven't tested it though.
also the interface of ccc seems pretty limited regarding options for rsync. the main advantage of the new ccc is the fact that it has the ability to setup everything needed - i.e. public keys - for a unattended operation of rsync. my attempt for a deployment would be:
1. build the patched version of rsync -> install(/backup of original rsync) on the clients using ARD.
2. setup authentication on the clients using ccc.
3. schedule a hand-coded rsync-command on the clients using launchd/cron through ARD    



12:26 PM

That's funny. I was just taking the new CCC for a test spin when I got this comment.

Yes, CCC does now ease the pain associated with creating SSH keys. (Which is also funny, 'cause everything I know about creating SSH keys I learned from Bombich's site.) If you're not comfortable creating keys on the command-line, or if it's just too big a pain, I recommend giving CCC a try. Please keep in mind, it's still in beta. But the new version, once finalized, sounds like it will be well worth the wait.

Regarding rsync in CCC, in theory, if you followed my instructions and replaced the native OSX rsync binary with the rsyncx version, CCC will use the rsyncx version. In fact, it should use whatever version you wish to supply, as long as it's in the default location. So, again in theory, CCC could be used to do the basics of what is outlined in this the article, assuming the scheduler runs without a user logged in to the GUI. I'm not sure how much flexibility CCC offers when it comes to setting up remote clones. Exclusions and other fine-tuning might better be done from the command-line. Or not. I just haven't tested it yet.

In any case, your method sounds perfectly good to me. Have at it! Let me know how it goes.

-systemsboy    



6:50 AM

Hi system boy. Thanks for the hints on rsync X.... but

I do exactyly what you say :

* install the latest build of rssynCX on both my local and remote machines with the installer
* create a script which is :

*******
rsync -a -v -z -e ssh "/Users/laurentades/Movies/Famille/" "cactusbunker.local:/Volumes/ChouRave/Backup/test" --eahfs; exit
*******

i get this error :

*******
rsync: on remote machine: --eahfs: unknown option
rsync error: syntax or usage error (code 1) at /SourceCache/rsync/rsync-24/rsync/main.c(1099)
rsync: connection unexpectedly closed (0 bytes read so far)
rsync error: error in rsync protocol data stream (code 12) at io.c(189)
logout
*******

What the heck ... does that actualy mean that the remote version of rsync does not recognise the hfs option ... even though it's the same as the one i instaleld locally ?

any idea ?

thanks...

laurent    



1:06 PM

Laurent,

Like the error says, the version of rsync on the remote machine is wrong. I.e., it's not rsyncX. I usually see this error if I've left the original version of rsync installed in /usr/bin. Which is why I recommend putting a copy of rsyncX in /usr/bin and renaming the original rsync to rsync-ORIG. This needs to be done on both the client and the server.

Also, it's usually good form to use full file paths for commands in shell scripts, and in this particular instance, where there a couple different versions of the command installed, it's an even better idea. So instead of using "rsync" in your command, try:
/usr/bin/rsync
or /usr/local/bin/rsync.

But the error you're seeing is definitely because the client machine version is not rsyncX. To see what version of rsync you're using, run:
/usr/bin/rsync --version

If you have rsyncX installed in /usr/bin, you'll see the following line in the output:
HFS+ filesystem support for OSX (C)2004 Kevin A. Boyd

If you don't see that somewhere, go get the right version and try again.

Hope that helps. Good luck!

-systemsboy    



4:08 PM

hi,

Indeed.. you were right... and just because i did not read your instructions correctly (---shame---) !!!

Bonus questions about rSync : i have gathered a ew posts in diferent forums + articles here and there and there seems to be a number of issues around rSync on OS x apart from the resource fork one. I've read about one around finder metadata.. + another one about compatibility with non OS X target hosts + ....

At the end of the day i'm quite confused : is there one good way to use an rsunc solution to do :

* Automated unattended backups (ssh)
* from OS X box o OS X and/or linux box
* without any kind of "data loss" (res fork / metadata and whatever else)
* using tiger as clients ant OS X server 10.4

???

A lot of the articles i've found seem to be dated back to 2005 and i'm not sure whether they are still relevent, not knowing what build of rsnc is originally used and not knowing if this patch or that badger have been combined to whatever else trick.... ;(?!?!?!?!?!?!?!)

What are your thoughts... and recommendations ?

thx a lot

ciao

laurent    



11:30 PM

Laurent,

The short answer to your question is no, there is no way to do cross-platform, data-lossless backups with rsync. Going cross-platform is a problem, of course, because the versions must match between server and client, which is impossible across platforms. The solution to this would be to mount the Linux machines using NFS or some such, and then using rsync to perform the sync as if the Linux mount were local to the system. This should be possible, and not overly complicated. And it should work, though I've never tried it myself.

All versions of rsync suffer from some kind of something missing on the Mac, mainly because the Mac crams in all this extra data that rsync was never built to handle. In theory, the rsync included with the Mac OS should handle all that metadata, but in practice, using that version causes all kinds of other problems and still messes up some of the metadata.

You can, however, use rsync to do backups between OSX machines, if you don't mind losing some of the metadata (and frankly, this just refers to things like Spotlight tags and such, that lots of people don't even use). To do so, use rsync (the rsyncX version) as outlined in this article. It can be scripted and run automatically, and unattended, via cron. If you're unfamiliar with cron, there's plenty on the internet. Look it up. It's easy.

Cheers!

-systemsboy    



2:24 AM

Hi... Great article!

If you have not switched over to TimeMachine then you could have a look at LBackup?

Posted by a member of the LBackup team.    



1:22 PM

Hi,

I'm looking at LBackup now, and it looks pretty cool. It's something I wish I'd been aware of years ago. At this point I already have a few scripts that do what LBackup appears to do. I'm sure it would have been much easier to use your implementation than it was to write my scripts. But... They're done now...

Oh well.

I will say, I love that you're doing "Rotations," which I assume will create backups that go back in time, akin to Time Machine backups. Nice! This is something I haven't done, and if I needed it I'd definitely try your product.

One thing I do not see in your implementation (maybe I'm just missing it) is a place to configure version-specific rsync options. For instance, rsyncx and the Mac version of rsync each have a flag for backing up HFS+ resource forks (--eahfs and -E, respectively). For me to use a product such as LBackup, I'd need a way to specify these options.

In any case, I think this is a really cool idea, and a nice implementation. Good work!

-systemsboy    



6:13 PM

LBackup has been designed to handle the rsync options automatically. Therefore, in the current version if you wish to change these options then you are required to edit the backup script which is bundled with LBackup.

However, if you are interested in having control over the rsync options from within the backup configuration files then visit the LBackup web site and let us know.

It would be great to work with you to implement the the features you require in your environment.

You are probably not the only person who will find these additional options useful.

Looking forward to working with you. If you have time of course.    



1:59 PM

Much as I'd love to help you guys out, I'm pretty swamped for the foreseeable future. And my backup system is in pretty good shape.

If I end up getting some free time, I'd be happy to work with you. But this seems unlikely at this point.

However, if I happen to run across anyone who could use something like this (and this is somewhat more likely) I'll DEFINITELY send them your way.

How's that, LBackup Team.

Cool.

-systemsboy    



4:20 AM

Sounds Good!    



» Post a Comment