I'm working on a rails app that stores UTF8 strings. It turns out MYSQL support for UTF8 is for 3 byte characters, while UTF8 is capable of 4 byte characters. The various encodings for the client connection and database can be set correctly and still crash because a 4-byte UTF8 character was sent.
This is the error you'll see
Incorrect string value: '\xF0\x9F\x91\x88' ActiveRecord::StatementInvalid: Mysql2::Error: Incorrect string value: '\xF0\x9F\x91\x88' for column 'text' at row 1:
MYSQL 5.5.3 added a new character set: utf8mb4 to support 4 byte characters. Also the utf8mb3 alias for utf was created to more accurately represent the encoding. This creates a problem for rails (3.2.12 as of this writing).
The first compatibility problem is the mysql2 driver itself. The current release is 0.3.11, committed on 2011-12-06. Driver support for utf8mb4 was committed on 2011-12-20.
So to even begin using utf8mb4, use the git head version of mysql2.
gem 'mysql2', :git => "https://github.com/brianmario/mysql2.git"
add this to config/database.yml
development: adapter: mysql2 encoding: utf8mb4 collation: utf8mb4_unicode_ci
If you create a new database, you'll run into this error:
Which comes from a limitation created by utf8mb4, indexes can be at most 191 chars and schema_migrations is a varchar 255. (see http://dev.mysql.com/doc/refman/5.6/en/charset-unicode-upgrading.html)
Mysql::Error: Specified key was too long; max key length is 767 bytes: CREATE UNIQUE INDEX `unique_schema_migrations` ON `schema_migrations` (`version`)
If you absolutely have to have 4byte utf8 chars in your text column that was setup with utf8, you can add utf8mb4 support for a single column with the following sql
alter table notes modify text varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
That was enough to get me going again. Until then I'd avoid mysql on rails until the index creation code becomes aware of the mysql index limitations with utf8mb4.