rsync and bzip2-compressed data

As it only transfers deltas between source and destination files, rsync is a great backup tool when working with uncompressed data. The structure of compressed data, however, can change drastically between backups, defeating the benefits of rsync. I’d read somewhere recently, however, that bzip2’s “blocking” design might make it a viable compression to use with rsync. Ran an ad hoc experiment this morning to check this out.

Uncompressed Data

Here are the results from rsync’ing an uncompressed MySQL database with a few minor record changes.

total: matches=1634 hash_hits=2136 false_alarms=0 data=21227
sent 6.73K bytes received 9.92K bytes 6.66K bytes/sec
total size is 2.69M speedup is 161.44

Nice. Only about 7K transferred. Roughly the size of the change.

bzip2 Compressed Data

And here are the results from rsync’ing the same MySQL database, compressed in advance with bzip2.

total: matches=596 hash_hits=17533 false_alarms=1 data=876602
sent 876.99K bytes received 7.73K bytes 353.89K bytes/sec
total size is 1.64M speedup is 1.85

Woah! What amounts to about a 7K change is resulting 10x the data transfer.

Which makes sense, as — digging into the details of bzip2 — I see that the bzip2 algorithm chunks data in 100K – 900k blocks.  So I suppose that using bzip2 might make sense if you have an incrementally growing data store that adds about 100K of data between backups; and where the older data rarely, if ever, changes.  Barring that, to achieve the benefits of rsync, uncompressed data is probably the way to go.

That said, there seems to be a version of gzip with an --rsyncable switch for Debian.  The BeezNest has a great article on this here.

Configuring Debian for UTF-8

Easier to configure than I thought it would be. The defacto document on this lists a series of convoluted steps to get UTF-8 working.  All you really need to do is as follows:

  1. Launch the local configurator via # dpkg-reconfigure locales
  2. Select the locales you would like to support. I need Japanese support so I selected ja_JP.EUC, ja_JP.UTF-8, and en_US.UTF-8 in addition to already available locales. Click “Ok”.
  3. Choose the default locale from the list; make sure it is a UTF-8 locale. Click “Ok”.
  4. Login to another shell.

That’s it!

# locale charmap should now return “UTF-8”.

Scratch that WordPress Etch

This post comes via notes taken from another blog… seemingly gone awol.   Here for posterity.

Etch naturally has an older version of WordPress.  To upgrade to a more recent, possibly though unlikely more secure version,  the easiest way is to change your tracking to testing, install WordPress and then change back.  How to do it:

1.  Edit your /etc/apt/sources.list to track testing. If your sources.list says etch or stable, change that to testing.  For example if your source.list has:

deb http://debian.lcs.mit.edu/debian etch main contrib non-free
deb-src http://debian.lcs.mit.edu/debian etch main contrib non-free

change those to:

deb http://debian.lcs.mit.edu/debian testing main contrib non-free
deb-src http://debian.lcs.mit.edu/debian testing main contrib non-free

2.  After you change your tracking to testing do an apt-get update.

apt-get update

3.  Install WordPress

apt-get install wordpress

this may pull in a few new php libraries, such as libphp-phpmailer.  Let it.

(Note that you do not want to do an “upgrade”, but rather “install” as listed above.   “upgrade” will try to upgrade a ridiculous number of packages, whereas install will focus only on packages required for installing and/or upgrading WordPress.)

4.  Change your tracking back to Etch. just reverse what you did in step 1. that is change testing back to etch.

5.  Clean everything up.

apt-get clean && apt-get update

After all this is done, login back in as admin to WordPress, and it will tell you that you have to update your WordPress database tables.  Do that and you’re done.

clamav crashing on Debian

I’d seen this happen occasionally over the last couple of months, but it seemed to get really bad on Friday.

Fri Mar 21 21:31:45 2008 -> SelfCheck: Database modification detected. Forcing reload.
Reading databases from /var/lib/clamav
ERROR: reload db failed: Unable to lock database directory (try 1)
ERROR: reload db failed: Unable to lock database directory (try 2)
ERROR: reload db failed: Unable to lock database directory (try 3)
ERROR: reload db failed: Unable to lock database directory
Terminating because of a fatal error.
Socket file removed.
Pid file removed.
--- Stopped at Fri Mar 21 21:38:16 2008

Not entirely sure what the problem is, but it seems like clamav is choking on recent updates from freshclam.

And apparently I’m not the only one. Took advice from this thread and updated clamav to the version in debian-volatile. The official ClamAV documentation also recommends using the volatile repositories.

I’m new to Debian and almost took this to mean that I should use etch. Good to know that Debian maintains a volatile repository. To pull packages from volatile, just add:

deb http://volatile.debian.net/debian-volatile etch/volatile main contrib non-free
(though preferably use a mirror)

to /etc/apt/sources.list. Running a simple apt-get update clamav or aptitude update clamav will find and install the appropriate volatile updates. Nice.

How to segfault Apache 2+ with mod_security

Build mod_security against the wrong set of headers, and Apache 2 will mysteriously begin to segfault in a persistent manner. Check which version you’re running with dpkg --get-selections | grep apache2.

Seems my shiny new Debian distro running the prefork version of Apache had the threaded (worker) headers installed against it. Duh. apt-get install apache2-prefork-dev reinstalled the correct prefork headers and Apache is happy again.

Mathiew Dessus has a great article about installing mod_securty on Debian for those interested.