Rails 3.2.12 not ready for mysql 5.5 utf8mb4

I'm working on a rails app that stores UTF8 strings. It turns out MYSQL support for UTF8 is for 3 byte characters, while UTF8 is capable of 4 byte characters. The various encodings for the client connection and database can be set correctly and still crash because a 4-byte UTF8 character was sent.

This is the error you'll see


Incorrect string value: '\xF0\x9F\x91\x88' 
ActiveRecord::StatementInvalid: Mysql2::Error: Incorrect string value: '\xF0\x9F\x91\x88'
for column 'text' at row 1: 

MYSQL 5.5.3 added a new character set: utf8mb4 to support 4 byte characters. Also the utf8mb3 alias for utf was created to more accurately represent the encoding. This creates a problem for rails (3.2.12 as of this writing).

The first compatibility problem is the mysql2 driver itself. The current release is 0.3.11, committed on 2011-12-06. Driver support for utf8mb4 was committed on 2011-12-20.

So to even begin using utf8mb4, use the git head version of mysql2.


gem 'mysql2', :git => "https://github.com/brianmario/mysql2.git"

add this to config/database.yml


development:
  adapter: mysql2
  encoding: utf8mb4
  collation: utf8mb4_unicode_ci

If you create a new database, you'll run into this error:


Mysql::Error: Specified key was too long; max key length is 767 bytes:
 CREATE UNIQUE INDEX `unique_schema_migrations` ON `schema_migrations` (`version`)

Which comes from a limitation created by utf8mb4, indexes can be at most 191 chars and schema_migrations is a varchar 255. (see http://dev.mysql.com/doc/refman/5.6/en/charset-unicode-upgrading.html)

If you absolutely have to have 4byte utf8 chars in your text column that was setup with utf8, you can add utf8mb4 support for a single column with the following sql alter table notes modify text varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

That was enough to get me going again. Until then I'd avoid mysql on rails until the index creation code becomes aware of the mysql index limitations with utf8mb4.

tags: