Enabling crash-safe slaves with MySQL 5.6

Posted on:



Share Button

Being able to configure slaves to be crash-safe is one of the major improvements of MySQL 5.6 with regards to replication. However we noticed confusion on how to enable this feature correctly, so let’s clarify how it should be done.

In short

1. Stop MySQL on slave
2. Add relay_log_info_repository = TABLE and relay_log_recovery = ON in my.cnf
3. Restart MySQL and relax

The gory details

To fully understand why you should change the above settings if you want crash-safe slaves, let’s first look at the reasons why replication can break when a slave crashes.

On a slave, replication involves 2 threads: the IO thread which copies the binary log of the master to a local copy called the relay log and the SQL thread which then executes the queries written in the relay log. The current position of each thread is stored in a file: master.info for the IO thread and relay-log.info for the SQL thread.

So far, so good. The first problem is that these files are not synced to disk each time they are written to: whenever there is a crash, positions that are stored are likely to be incorrect. MySQL 5.5 has a fix for this: you can set sync_master_info = 1 and sync_relay_log_info = 1 to make sure both files are written and synced to disk after each transaction. Syncing is not free of course but if you have write-back cache, these settings can be valuable.

But wait, even with sync_master_info = 1 and sync_relay_info = 1, bad things can happen. The reason is that replication information is written after the transaction is committed. So if a crash occurs after the transaction is committed and before the replication information is updated, replication information will be wrong when the server restarts and a transaction could executed twice. The effect will depend on the transaction: replication may still run fine or it may be broken or inconsistencies can even be silently created.

MySQL 5.6 tackles this problem by letting us store replication information in tables instead of files (mysql.slave_relay_log_info table is created when relay_log_info_repository = TABLE and mysql.slave_master_info table is created with master_info_repository = TABLE). The idea is simple: we can include the update of the replication information inside the transaction, making sure it is always in sync with the data.

In pseudo-code, instead of having:

The server now behaves as if we had:

Unfortunately, this is not as straightforward as it may appear. For the SQL thread, it works well because the server can update the slave_relay_info_info table at the same time it commits a transaction. However for the IO thread, the update of the table is not related to any transaction execution, so how can the server know when to update the table?

The answer is: it is controlled by sync_master_info. The default is 10,000 meaning that the IO thread position is only updated every 10,000 transactions. This is obviously not good to make the slave crash-safe. One solution is to set sync_master_info = 1, but as mentioned, it may have a performance impact (this is why 1 is not the default setting).

However there is a more elegant solution by using relay_log_recovery = ON, which will require a MySQL restart. This setting makes sure that when the server starts up, position for the IO thread is recovered from the slave_relay_log_info table, which is always up-to-date. Thus you do not even need to store IO thread information in a table for the slave to be crash-safe. In other words, setting master_info_repository = TABLE is not necessary.

As a final sidenote, once relay_log_info_repository = TABLE, the sync_relay_log_info setting becomes irrelevant as the table will always be updated at each transaction commit whatever the value of the setting. So you can safely remove it from your configuration file.

I hope this post will help you benefit from this great feature!

Share Button

Stephane Combaudon

Stéphane joined Percona in July 2012, after working as a MySQL DBA for leading French companies such as Dailymotion and France Telecom. In real life, he lives in Paris with his wife and their twin daughters. When not in front of a computer or not spending time with his family, he likes playing chess and hiking.



Insight for DBAs, MySQL

  • Is –master-info-repository=TABLE required too? “In short” does not mention it, and “Gory details” mentions it.

  • Stephane Combaudon Post author

    master-info-repository=TABLE is not required, as relay_log_recover = ON will take care of updating the position of the IO thread in case of a crash.

    I’ve updated the post to clarify.

  • Stephane –

    Thanks. The MySQL docs at https://dev.mysql.com/doc/refman/5.6/en/slave-logs.html seem to say that master-info-repository=TABLE is required too, see https://dev.mysql.com/doc/refman/5.6/en/slave-logs.html:

    “In order for replication to be crash-safe when using tables for logging status and relay information, the tables must use a transactional storage engine, such as InnoDB.”. Note the plural “tables”.

    Should that be reported as a doc bug?

  • Stephane Combaudon Post author

    The whole paragraph is confusing for sure, because it’s not clearly explained what you need to do to make replication crash-safe on a slave. I’ll try to make a doc bug for it to be fixed.

    If you don’t use multi-threaded slaves, it’s easy to see that master-info-repository=TABLE is no required: stop the slave, manually edit master.info, restart the server.

    If relay_log_recovery = ON, it will restart correctly (with a message like ‘2013-09-13 10:28:53 2755 [Warning] Recovery from master pos 120 and file mysql-bin.000012.’ in the error log).

    If relay_log_recovery = OFF, replication breaks.

  • How does relay_log_info_repository=TABLE behaves when is dealing with MyISAM Tables? do we have the same problem when using the repository in disk and sync it each transaction (execute the Query then update repository) as MyISAM does not support transactions?

  • Stephane Combaudon Post author

    Mauricio: enabling crash-safe slaves requires that all tables use InnoDB. There’s no way to make things safe with MyISAM unfortunately.

  • Hi Stephane, good post… What is the over head of using

    Compared to File based and sync disabled for performance.

  • Stephane Combaudon Post author


    There’s an overhead when using crash-safe slaves. I don’t have specific numbers to share, but you can find some interesting thoughts in this article from Yoshinori:

  • Stephane,

    What i noticed today, even though i had set repo to TABLE & recovery ON still when the 5.6 crashed it dint recover proper point and got DUPLICATE insert error etc. Also since the the default value of sync_master_info, sync_relay_log,sync_relay_log_info is 10000 so i guess it will update only after 10,000 transaction. Previously in 5.5 File based setting setting sync_% to 0 used to depend on OS flush time and till date i have lost only max 1 or 2 transaction with very good performance. Now The default being 10000 i will now lose 10,000 transaction during crash? Also in 5.6 if repo is FILE and sync_% is set to 0 it will flush during OS timing. But If repo is table and sync_% is 0 it will disable completely so forcefully i have to be depended on lowering sync_% value hence losing performance.

  • Stephane Combaudon Post author


    As long as you have relay_log_recovery = ON, the position in the slave_master_info table is overwritten by the position stored in slave_relay_info so it doesn’t matter if you have sync_master_info = 10000.

    You should also make sure that innodb_flush_log_at_trx_commit = 1 otherwise InnoDB is not guaranteed durable, which breaks the purpose of storing replication metadata in tables.

  • hi Stephane Combaudon : 

    set relay_log_info_repository=’TABLE’ at slave , every transaction will update this table . In a busy oltp system , this will be a problem , like all transaction on slave into a serial 。 the lag between master and slave will became very large . AM I RIGHT ?

  • Stephane Combaudon Post author


    If you’re not using multithreaded slaves, transactions are anyway serialized so performance will not suffer a lot.

  • Patrice Levesque

    Followed your simple recipe, but it appears it doesn’t work for my Percona-5.6 install. I get this in the error log:

    [Warning] It is not possible to change the type of the relay log repository because there are workers repositories with possible execution gaps. The value of –relay_log_info_repository is altered to one of the found Worker repositories. The gaps have to be sorted out before resuming with the type change.

    Search engines give me MySQL source code but no explanation whatsoever about what the problem is and how I sort out gaps. Any clue? Might it be linked to multi-threaded replication?

  • Stephane Combaudon Post author


    I’ve not been able to reproduce the issue, but the warning indeed suggests that it’s related to multithreaded replication.
    So after STOP SLAVE, run START SLAVE UNTIL SQL_AFTER_MTS_GAPS. Then you should be able to change relay_log_info_repository to TABLE.

  • Patrice Levesque


    Thanks, it worked. For the record, the switch to relay_log_info_repository needed to be applied while the database server was running – the change in my.cnf plus a database server restart didn’t suffice. So this sequence of commands was needed:

    — Wait until gaps are resolved – a message will appear in the server’s error log
    SET GLOBAL relay_log_info_repository = ‘TABLE’;

    After that, a database server stop/start confirmed the change was really applied — no more warnings.

Leave a Reply