In this blog post, we’ll look at options for migrating database charsets to utf8mb4.
Migrating charsets, in my opinion, is one of the most tedious tasks in a DBA’s life. There are so many things involved that can screw up our data, making it work is always hard. Sometimes what seems like a trivial task can become a nightmare very easily, and keeps us working for longer than expected.
I’ve recently worked on a case that challenged me with lots of tests due to some existing schema designs that made InnoDB suffer. I’ve decided to write this post to put together some definitive guide to enact charset conversion with minimal downtime and pain.
My approach involves at least one slave for failover and logical/physical backup operations to make sure that data is loaded properly using the right charset.
In this case, we are moving from latin1 (default until MySQL 8.0.0) to utf8mb4 (new default from 8.0.1). In this post, Lefred refers to this change and some safety checks for upgrading. For our change, an important thing to consider: Latin1 charset stores one byte per character, while utf8mb4 can store up to four bytes per character. This change definitely impacts the disk usage, but also makes us hit some limits that I describe later in the plan.
So let’s put out hands in action. First, let’s create a slave using a fresh (non-locking) backup. Remember that these operations are designed to minimize downtime and reduce any potential impact on our production server.
If you already have a slave that can act as a master replacement then you can skip this section. In our source server, configure binlog_format and flush logs to start with fresh binary logs:
|
1 |
set global binlog_format=MIXED;<br>flush logs; |
Start a streaming backup using Percona Xtrabackup through netcat in the destination server:
|
1 |
nc -l 9999 | cat - > /dest/folder/backup.tar |
and in our source server:
|
1 |
innobackupex --stream=tar ./ | nc dest_server_ip 9999 |
Once the backup is done, untar and restore the backup. Then set up the slave:
|
1 |
tar -xif /dest/folder/backup.tar<br>innobackupex --apply-log /dest/folder/<br>/etc/init.d/mysql stop<br>rm -rf /var/lib/mysql/<br>mv /dest/folder/* /var/lib/mysql/<br>chown -R mysql:mysql /var/lib/mysql<br>/etc/init.d/mysql start<br>cat /var/lib/mysql/xtrabackup_binlog_info<br>change master to master_host='master_host', master_user='master_user, master_password='master_password', master_log_file='file_printed_in_xtrabackup_binlog_info', master_log_pos=pos_printed_in_xtrabackup_binlog_info;<br>start slave; |
Now that we have the slave ready, we prepare our dataset by running two mysqldump processes so we have data and schemas in separate files. You can also run this operation using MyDumper or mysqlpump, but I will keep it easy:
|
1 |
STOP SLAVE;<br>SHOW SLAVE STATUS; |
Write down this output, as it may be needed later:
|
1 |
mysqldump --skip-set-charset --no-data --databases `mysql --skip-column-names -e "SELECT GROUP_CONCAT(schema_name SEPARATOR ' ') FROM information_schema.schemata WHERE schema_name NOT IN ('mysql','performance_schema','information_schema');"` > schema.sql<br>mysqldump --skip-set-charset -n -t --databases `mysql --skip-column-names -e "SELECT GROUP_CONCAT(schema_name SEPARATOR ' ') FROM information_schema.schemata WHERE schema_name NOT IN ('mysql','performance_schema','information_schema');"` > data.sql |
Notice that I’m passing a command as an argument to –databases to dump all databases but mysql, performance_schema and information_schema (hack stolen from this post, with credit to Ronald Bradford). It is very important to keep the replication stopped, as we will resume replication after fully converting our charset.
Now we have to convert our data to utf8mb4. This is easy as we just need to touch the schema.sql file by running few commands:
|
1 |
sed -e -i "s/DEFAULT CHARACTER SET latin1/DEFAULT CHARACTER SET utf8mb4/g" schema.sql<br>sed -e -i "s/DEFAULT CHARSET=latin1/DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci /" schema.sql<br>sed -e -i "s/SET character_set_client = utf8/SET character_set_client = utf8mb4/" schema.sql |
Can this be a one-liner? Yes, but I’m not a good basher. 🙂
Now we are ready to restore our data using new encoding:
|
1 |
mysql -e "set global innodb_large_prefix=1;"<br>mysql < schema.sql<br>mysql < data.sql |
Notice I’ve enabled the variable innodb_large_prefix. This is important because InnoDB limits index prefixes to 768 bytes by default. If you have an index based in a varchar(255) data type, you will get an error because the new charset exceeds this limit (up to four bytes per character goes beyond 1000 bytes) unless you limit the index prefix. To avoid issues during data load, we enable this variable to extend the limit to 3072 bytes.
Finally, let’s configure our server and restart it to make sure to set new defaults properly. In the my.cnf file, add:
|
1 |
[client]<br>default-character-set=utf8mb4<br>[mysqld]<br>skip-slave-start<br>character-set-server=utf8mb4<br>collation-server=utf8mb4_unicode_ci<br>innodb_large_prefix=1 |
Let’s resume replication after the restart, and make sure everything is ok:
|
1 |
START SLAVE;<br>SHOW SLAVE STATUS; |
Ok, at this point we should be fine and our data should be already converted to utf8mb4. So far so good. The next step is to failover applications to use the new server, and rebuild the old server using a fresh backup using xtrabackup as described above.
There are few things we need to consider now before converting this slave into master:
|
1 |
ALTER TABLE big_table MODIFY latin1_column varbinary(250);<br>ALTER TABLE big_table MODIFY latin1_column varchar(250) CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci; |
It’s important to understand why we need the double conversion from latin1 to varbinary to utf8mb4. This post from Marco Tusa largely explains this.
I wrote this guide from my experience working with these type of projects. If you Google a bit, you’ll find a lot of resources that make this work, along with different solutions. What I’ve tried to present here is a guide to help you deal with these projects. Normally, we have to perform these changes in existing datasets that sometimes are big enough to prevent any work getting done via ALTER TABLE commands. Hopefully, you find this useful!
When it comes to choosing the right database for your needs, you want to ensure you’re making the best decision for your company. Read our white paper on which database to choose for guidance on this decision. The paper compares basic use cases for MongoDB®, MySQL®, and PostgreSQL®, three of the most popular open source database options available today.
If you’re looking for a way to ensure database replication and multi-master capabilities for your on-premise architecture, read our solution brief: Always Online, Critical Website with Percona XtraDB Cluster. The brief outlines how to achieve this solution using a Percona XtraDB Cluster on-premises architecture that is built upon Percona Server for MySQL and Percona’s enhanced Codership Galera library.
Resources
RELATED POSTS