October 22, 2014

Enabling crash-safe slaves with MySQL 5.6

Being able to configure slaves to be crash-safe is one of the major improvements of MySQL 5.6 with regards to replication. However we noticed confusion on how to enable this feature correctly, so let’s clarify how it should be done.

In short

1. Stop MySQL on slave
2. Add relay_log_info_repository = TABLE and relay_log_recovery = ON in my.cnf
3. Restart MySQL and relax

The gory details

To fully understand why you should change the above settings if you want crash-safe slaves, let’s first look at the reasons why replication can break when a slave crashes.

On a slave, replication involves 2 threads: the IO thread which copies the binary log of the master to a local copy called the relay log and the SQL thread which then executes the queries written in the relay log. The current position of each thread is stored in a file: master.info for the IO thread and relay-log.info for the SQL thread.

So far, so good. The first problem is that these files are not synced to disk each time they are written to: whenever there is a crash, positions that are stored are likely to be incorrect. MySQL 5.5 has a fix for this: you can set sync_master_info = 1 and sync_relay_log_info = 1 to make sure both files are written and synced to disk after each transaction. Syncing is not free of course but if you have write-back cache, these settings can be valuable.

But wait, even with sync_master_info = 1 and sync_relay_info = 1, bad things can happen. The reason is that replication information is written after the transaction is committed. So if a crash occurs after the transaction is committed and before the replication information is updated, replication information will be wrong when the server restarts and a transaction could executed twice. The effect will depend on the transaction: replication may still run fine or it may be broken or inconsistencies can even be silently created.

MySQL 5.6 tackles this problem by letting us store replication information in tables instead of files (mysql.slave_relay_log_info table is created when relay_log_info_repository = TABLE and mysql.slave_master_info table is created with master_info_repository = TABLE). The idea is simple: we can include the update of the replication information inside the transaction, making sure it is always in sync with the data.

In pseudo-code, instead of having:

The server now behaves as if we had:

Unfortunately, this is not as straightforward as it may appear. For the SQL thread, it works well because the server can update the slave_relay_info_info table at the same time it commits a transaction. However for the IO thread, the update of the table is not related to any transaction execution, so how can the server know when to update the table?

The answer is: it is controlled by sync_master_info. The default is 10,000 meaning that the IO thread position is only updated every 10,000 transactions. This is obviously not good to make the slave crash-safe. One solution is to set sync_master_info = 1, but as mentioned, it may have a performance impact (this is why 1 is not the default setting).

However there is a more elegant solution by using relay_log_recovery = ON, which will require a MySQL restart. This setting makes sure that when the server starts up, position for the IO thread is recovered from the slave_relay_log_info table, which is always up-to-date. Thus you do not even need to store IO thread information in a table for the slave to be crash-safe. In other words, setting master_info_repository = TABLE is not necessary.

As a final sidenote, once relay_log_info_repository = TABLE, the sync_relay_log_info setting becomes irrelevant as the table will always be updated at each transaction commit whatever the value of the setting. So you can safely remove it from your configuration file.

I hope this post will help you benefit from this great feature!

About Stephane Combaudon

Stéphane joined Percona in July 2012, after working as a MySQL DBA for leading French companies such as Dailymotion and France Telecom.

In real life, he lives in Paris with his wife and their twin daughters. When not in front of a computer or not spending time with his family, he likes playing chess and hiking.

Comments

  1. Laurynas says:

    Is –master-info-repository=TABLE required too? “In short” does not mention it, and “Gory details” mentions it.

  2. Stephane Combaudon says:

    master-info-repository=TABLE is not required, as relay_log_recover = ON will take care of updating the position of the IO thread in case of a crash.

    I’ve updated the post to clarify.

  3. Laurynas says:

    Stephane –

    Thanks. The MySQL docs at https://dev.mysql.com/doc/refman/5.6/en/slave-logs.html seem to say that master-info-repository=TABLE is required too, see https://dev.mysql.com/doc/refman/5.6/en/slave-logs.html:

    “In order for replication to be crash-safe when using tables for logging status and relay information, the tables must use a transactional storage engine, such as InnoDB.”. Note the plural “tables”.

    Should that be reported as a doc bug?

  4. Stephane Combaudon says:

    The whole paragraph is confusing for sure, because it’s not clearly explained what you need to do to make replication crash-safe on a slave. I’ll try to make a doc bug for it to be fixed.

    If you don’t use multi-threaded slaves, it’s easy to see that master-info-repository=TABLE is no required: stop the slave, manually edit master.info, restart the server.

    If relay_log_recovery = ON, it will restart correctly (with a message like ‘2013-09-13 10:28:53 2755 [Warning] Recovery from master pos 120 and file mysql-bin.000012.’ in the error log).

    If relay_log_recovery = OFF, replication breaks.

  5. Marcelo Altmann says:

    How does relay_log_info_repository=TABLE behaves when is dealing with MyISAM Tables? do we have the same problem when using the repository in disk and sync it each transaction (execute the Query then update repository) as MyISAM does not support transactions?

  6. Stephane Combaudon says:

    Mauricio: enabling crash-safe slaves requires that all tables use InnoDB. There’s no way to make things safe with MyISAM unfortunately.

  7. Flavian says:

    Hi Stephane, good post… What is the over head of using
    relay_log_info_repository=’TABLE’
    relay_log_recovery=’ON’
    master_info_repository=’TABLE’

    Compared to File based and sync disabled for performance.

  8. Stephane Combaudon says:

    Flavian,

    There’s an overhead when using crash-safe slaves. I don’t have specific numbers to share, but you can find some interesting thoughts in this article from Yoshinori:
    http://yoshinorimatsunobu.blogspot.fr/2013/12/single-thread-performance-regression-in.html

  9. Flavian says:

    Stephane,

    What i noticed today, even though i had set repo to TABLE & recovery ON still when the 5.6 crashed it dint recover proper point and got DUPLICATE insert error etc. Also since the the default value of sync_master_info, sync_relay_log,sync_relay_log_info is 10000 so i guess it will update only after 10,000 transaction. Previously in 5.5 File based setting setting sync_% to 0 used to depend on OS flush time and till date i have lost only max 1 or 2 transaction with very good performance. Now The default being 10000 i will now lose 10,000 transaction during crash? Also in 5.6 if repo is FILE and sync_% is set to 0 it will flush during OS timing. But If repo is table and sync_% is 0 it will disable completely so forcefully i have to be depended on lowering sync_% value hence losing performance.

  10. Stephane Combaudon says:

    Flavian,

    As long as you have relay_log_recovery = ON, the position in the slave_master_info table is overwritten by the position stored in slave_relay_info so it doesn’t matter if you have sync_master_info = 10000.

    You should also make sure that innodb_flush_log_at_trx_commit = 1 otherwise InnoDB is not guaranteed durable, which breaks the purpose of storing replication metadata in tables.

Speak Your Mind

*