I'm working on a rails app that stores UTF8 strings. It turns out MYSQL support for UTF8 is for 3 byte characters, while UTF8 is capable of 4 byte characters. The various encodings for the client connection and database can be set correctly and still crash because a 4-byte UTF8 character was sent.
This is the error you'll see
Incorrect string value: '\xF0\x9F\x91\x88'
ActiveRecord::StatementInvalid: Mysql2::Error: Incorrect string value: '\xF0\x9F\x91\x88'
for column 'text' at row 1:
MYSQL 5.5.3 added a new character set: utf8mb4 to support 4 byte characters. Also the utf8mb3 alias for utf was created to more accurately represent the encoding. This creates a problem for rails (3.2.12 as of this writing).
The first compatibility problem is the mysql2 driver itself. The current release is 0.3.11, committed on 2011-12-06. Driver support for utf8mb4 was committed on 2011-12-20.
So to even begin using utf8mb4, use the git head version of mysql2.
gem 'mysql2', :git => "https://github.com/brianmario/mysql2.git"
add this to config/database.yml
development:
adapter: mysql2
encoding: utf8mb4
collation: utf8mb4_unicode_ci
If you create a new database, you'll run into this error:
Mysql::Error: Specified key was too long; max key length is 767 bytes:
CREATE UNIQUE INDEX `unique_schema_migrations` ON `schema_migrations` (`version`)
Which comes from a limitation created by utf8mb4, indexes can be at most 191 chars and schema_migrations is a varchar 255. (see http://dev.mysql.com/doc/refman/5.6/en/charset-unicode-upgrading.html)
If you absolutely have to have 4byte utf8 chars in your text column that was setup with utf8, you can add utf8mb4 support for a single column with the following sql
That was enough to get me going again. Until then I'd avoid mysql on rails until the index creation code becomes aware of the mysql index limitations with utf8mb4.