Character encoding translation breaks when upgrading WordPress

Recent versions of WordPress like to second-guess the internal and external encodings used by mbstring when assembling data from the database and spitting it out to blog pages.  Probably this works fine for most, but if you have an older database storing text as something other then UTF-8, probably you have a custom chunk of mbstring configuration in an .htaccess file or similar; perhaps like this:

php_value output_buffering 1
php_value output_handler mb_output_handler
php_value mbstring.language Japanese
php_value mbstring.internal_encoding EUC-JP
php_value mbstring.http_input auto
php_value mbstring.http_output SJIS
php_value mbstring.encoding_translation 1

WordPress’s second-guessing will break this, resulting in a bunch of garbled posts. To fix, comment out this chunk of code in wp-settings.php:

 * In most cases the default internal encoding is latin1, which is of no use,
 * since we want to use the mb_ functions for utf-8 strings
if (function_exists('mb_internal_encoding')) {
        if (!@mb_internal_encoding(get_option('blog_charset')))

With this gone, WordPress will stop second-guessing and custom encoding translations should flow through just fine.

3 thoughts on “Character encoding translation breaks when upgrading WordPress

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.