rsync and bzip2-compressed data

As it only transfers deltas between source and destination files, rsync is a great backup tool when working with uncompressed data. The structure of compressed data, however, can change drastically between backups, defeating the benefits of rsync. I’d read somewhere recently, however, that bzip2’s “blocking” design might make it a viable compression to use with rsync. Ran an ad hoc experiment this morning to check this out.

Uncompressed Data

Here are the results from rsync’ing an uncompressed MySQL database with a few minor record changes.

total: matches=1634 hash_hits=2136 false_alarms=0 data=21227
sent 6.73K bytes received 9.92K bytes 6.66K bytes/sec
total size is 2.69M speedup is 161.44

Nice. Only about 7K transferred. Roughly the size of the change.

bzip2 Compressed Data

And here are the results from rsync’ing the same MySQL database, compressed in advance with bzip2.

total: matches=596 hash_hits=17533 false_alarms=1 data=876602
sent 876.99K bytes received 7.73K bytes 353.89K bytes/sec
total size is 1.64M speedup is 1.85

Woah! What amounts to about a 7K change is resulting 10x the data transfer.

Which makes sense, as — digging into the details of bzip2 — I see that the bzip2 algorithm chunks data in 100K – 900k blocks.  So I suppose that using bzip2 might make sense if you have an incrementally growing data store that adds about 100K of data between backups; and where the older data rarely, if ever, changes.  Barring that, to achieve the benefits of rsync, uncompressed data is probably the way to go.

That said, there seems to be a version of gzip with an --rsyncable switch for Debian.  The BeezNest has a great article on this here.

Practicing Safe Emacs

This has been bugging me for years and I finally got around to looking up a solution.

Everyone knows that Emacs is the greatest editor on the planet, superior in all ways to vi. Unfortunately, Emacs has this nasty habit of tossing tilde-terminated backup files around whatever directory in which you happen to be editing. Not only does this clutter up the file system, it can also be something of a security risk; especially if you’re working on web servers. And really, snapshots of your most recent save is not so useful if, like me, you pathologically ctrl-x ctrl-s your code every few characters or so.

After a bit of hunting I discovered that Emacs’ built-in versioning could solve both of these problems. Cons the following two commands into your local .emacs:

;; Enable versioning with default values.
(setq version-control t)

;; Save all backup file into the designated directory.
(setq backup-directory-alist (quote ((".*" . "~/.emacs.d/backups/"))))

and not only will Emacs remember all saves:

$ ls .emacs.d/backups/
!home!gates!GPLv3.txt.~1~ !home!gates!GPLv3.txt.~2~

but, as you see, it also tucks your backups safely out of harm’s way.