rsync and bzip2-compressed data

As it only transfers deltas between source and destination files, rsync is a great backup tool when working with uncompressed data. The structure of compressed data, however, can change drastically between backups, defeating the benefits of rsync. I’d read somewhere recently, however, that bzip2’s “blocking” design might make it a viable compression to use with rsync. Ran an ad hoc experiment this morning to check this out.

Uncompressed Data

Here are the results from rsync’ing an uncompressed MySQL database with a few minor record changes.

total: matches=1634 hash_hits=2136 false_alarms=0 data=21227
sent 6.73K bytes received 9.92K bytes 6.66K bytes/sec
total size is 2.69M speedup is 161.44

Nice. Only about 7K transferred. Roughly the size of the change.

bzip2 Compressed Data

And here are the results from rsync’ing the same MySQL database, compressed in advance with bzip2.

total: matches=596 hash_hits=17533 false_alarms=1 data=876602
sent 876.99K bytes received 7.73K bytes 353.89K bytes/sec
total size is 1.64M speedup is 1.85

Woah! What amounts to about a 7K change is resulting 10x the data transfer.

Which makes sense, as — digging into the details of bzip2 — I see that the bzip2 algorithm chunks data in 100K – 900k blocks.  So I suppose that using bzip2 might make sense if you have an incrementally growing data store that adds about 100K of data between backups; and where the older data rarely, if ever, changes.  Barring that, to achieve the benefits of rsync, uncompressed data is probably the way to go.

That said, there seems to be a version of gzip with an --rsyncable switch for Debian.  The BeezNest has a great article on this here.

One thought on “rsync and bzip2-compressed data

  1. I was aware of gzip's rsyncable flag, but was not aware that bzip2 was natively compatible with rsync. I'll need to test it out and give it a try. Why not do a test with 100k of changes in sql database to see if your hypothesis is correct? Nice blog 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *