Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

Enabling crash-safe slaves with MySQL 5.6

September 13, 2013

Author

Stephane Combaudon

Insight for DBAs

MySQL

Share this Post:

Being able to configure slaves to be crash-safe is one of the major improvements of MySQL 5.6 with regards to replication. However we noticed confusion on how to enable this feature correctly, so let’s clarify how it should be done.

In short

1. Stop MySQL on slave
2. Add relay_log_info_repository = TABLE and relay_log_recovery = ON in my.cnf
3. Restart MySQL and relax

The details

To fully understand why you should change the above settings if you want crash-safe slaves, let’s first look at the reasons why replication can break when a slave crashes.

On a slave, replication involves 2 threads: the IO thread which copies the binary log of the master to a local copy called the relay log and the SQL thread which then executes the queries written in the relay log. The current position of each thread is stored in a file: master.info for the IO thread and relay-log.info for the SQL thread.

So far, so good. The first problem is that these files are not synced to disk each time they are written to: whenever there is a crash, positions that are stored are likely to be incorrect. MySQL 5.5 has a fix for this: you can set sync_master_info = 1 and sync_relay_log_info = 1 to make sure both files are written and synced to disk after each transaction. Syncing is not free of course but if you have write-back cache, these settings can be valuable.

But wait, even with sync_master_info = 1 and sync_relay_info = 1, bad things can happen. The reason is that replication information is written after the transaction is committed. So if a crash occurs after the transaction is committed and before the replication information is updated, replication information will be wrong when the server restarts and a transaction could executed twice. The effect will depend on the transaction: replication may still run fine or it may be broken or inconsistencies can even be silently created.

MySQL 5.6 tackles this problem by letting us store replication information in tables instead of files (mysql.slave_relay_log_info table is created when relay_log_info_repository = TABLE and mysql.slave_master_info table is created with master_info_repository = TABLE). The idea is simple: we can include the update of the replication information inside the transaction, making sure it is always in sync with the data.

In pseudo-code, instead of having:

START TRANSACTION;
-- Statement 1
-- ...
-- Statement N
COMMIT;

-- Update replication info files

START TRANSACTION;

-- Statement 1

-- ...

-- Statement N

COMMIT;

-- Update replication info files

The server now behaves as if we had:

START TRANSACTION;
-- Statement 1
-- ...
-- Statement N
-- Update replication info
COMMIT;

START TRANSACTION;

-- Statement 1

-- ...

-- Statement N

-- Update replication info

COMMIT;

Unfortunately, this is not as straightforward as it may appear. For the SQL thread, it works well because the server can update the slave_relay_info_info table at the same time it commits a transaction. However for the IO thread, the update of the table is not related to any transaction execution, so how can the server know when to update the table?

The answer is: it is controlled by sync_master_info. The default is 10,000 meaning that the IO thread position is only updated every 10,000 transactions. This is obviously not good to make the slave crash-safe. One solution is to set sync_master_info = 1, but as mentioned, it may have a performance impact (this is why 1 is not the default setting).

However there is a more elegant solution by using relay_log_recovery = ON, which will require a MySQL restart. This setting makes sure that when the server starts up, position for the IO thread is recovered from the slave_relay_log_info table, which is always up-to-date. Thus you do not even need to store IO thread information in a table for the slave to be crash-safe. In other words, setting master_info_repository = TABLE is not necessary.

As a final sidenote, once relay_log_info_repository = TABLE, the sync_relay_log_info setting becomes irrelevant as the table will always be updated at each transaction commit whatever the value of the setting. So you can safely remove it from your configuration file.

I hope this post will help you benefit from this great feature!

0 0 votes

Article Rating

17 Comments

Oldest

Newest Most Voted

Laurynas

12 years ago

Is –master-info-repository=TABLE required too? “In short” does not mention it, and “Gory details” mentions it.

Author

Stephane Combaudon

12 years ago

master-info-repository=TABLE is not required, as relay_log_recover = ON will take care of updating the position of the IO thread in case of a crash.

I’ve updated the post to clarify.

Laurynas

12 years ago

Stephane –

Thanks. The MySQL docs at https://dev.mysql.com/doc/refman/5.6/en/slave-logs.html seem to say that master-info-repository=TABLE is required too, see https://dev.mysql.com/doc/refman/5.6/en/slave-logs.html:

“In order for replication to be crash-safe when using tables for logging status and relay information, the tables must use a transactional storage engine, such as InnoDB.”. Note the plural “tables”.

Should that be reported as a doc bug?

Author

Stephane Combaudon

12 years ago

The whole paragraph is confusing for sure, because it’s not clearly explained what you need to do to make replication crash-safe on a slave. I’ll try to make a doc bug for it to be fixed.

If you don’t use multi-threaded slaves, it’s easy to see that master-info-repository=TABLE is no required: stop the slave, manually edit master.info, restart the server.

If relay_log_recovery = ON, it will restart correctly (with a message like ‘2013-09-13 10:28:53 2755 [Warning] Recovery from master pos 120 and file mysql-bin.000012.’ in the error log).

If relay_log_recovery = OFF, replication breaks.

Marcelo Altmann

12 years ago

How does relay_log_info_repository=TABLE behaves when is dealing with MyISAM Tables? do we have the same problem when using the repository in disk and sync it each transaction (execute the Query then update repository) as MyISAM does not support transactions?

Author

Stephane Combaudon

12 years ago

Mauricio: enabling crash-safe slaves requires that all tables use InnoDB. There’s no way to make things safe with MyISAM unfortunately.

Flavian

12 years ago

Hi Stephane, good post… What is the over head of using
relay_log_info_repository=’TABLE’
relay_log_recovery=’ON’
master_info_repository=’TABLE’

Compared to File based and sync disabled for performance.

Author

Stephane Combaudon

12 years ago

Flavian,

There’s an overhead when using crash-safe slaves. I don’t have specific numbers to share, but you can find some interesting thoughts in this article from Yoshinori:
http://yoshinorimatsunobu.blogspot.fr/2013/12/single-thread-performance-regression-in.html

Flavian

11 years ago

Stephane,

What i noticed today, even though i had set repo to TABLE & recovery ON still when the 5.6 crashed it dint recover proper point and got DUPLICATE insert error etc. Also since the the default value of sync_master_info, sync_relay_log,sync_relay_log_info is 10000 so i guess it will update only after 10,000 transaction. Previously in 5.5 File based setting setting sync_% to 0 used to depend on OS flush time and till date i have lost only max 1 or 2 transaction with very good performance. Now The default being 10000 i will now lose 10,000 transaction during crash? Also in 5.6 if repo is FILE and sync_% is set to 0 it will flush during OS timing. But If repo is table and sync_% is 0 it will disable completely so forcefully i have to be depended on lowering sync_% value hence losing performance.

Author

Stephane Combaudon

11 years ago

Flavian,

As long as you have relay_log_recovery = ON, the position in the slave_master_info table is overwritten by the position stored in slave_relay_info so it doesn’t matter if you have sync_master_info = 10000.

You should also make sure that innodb_flush_log_at_trx_commit = 1 otherwise InnoDB is not guaranteed durable, which breaks the purpose of storing replication metadata in tables.

ruiaylin

11 years ago

hi Stephane Combaudon ：　

set relay_log_info_repository=’TABLE’ at slave , every transaction will update this table . In a busy oltp system , this will be a problem , like all transaction on slave into a serial 。 the lag between master and slave will became very large . AM I RIGHT ?

Author

Stephane Combaudon

11 years ago

ruiaylin,

If you’re not using multithreaded slaves, transactions are anyway serialized so performance will not suffer a lot.

Patrice Levesque

11 years ago

Followed your simple recipe, but it appears it doesn’t work for my Percona-5.6 install. I get this in the error log:

[Warning] It is not possible to change the type of the relay log repository because there are workers repositories with possible execution gaps. The value of –relay_log_info_repository is altered to one of the found Worker repositories. The gaps have to be sorted out before resuming with the type change.

Search engines give me MySQL source code but no explanation whatsoever about what the problem is and how I sort out gaps. Any clue? Might it be linked to multi-threaded replication?

Author

Stephane Combaudon

11 years ago

Patrice,

I’ve not been able to reproduce the issue, but the warning indeed suggests that it’s related to multithreaded replication.
So after STOP SLAVE, run START SLAVE UNTIL SQL_AFTER_MTS_GAPS. Then you should be able to change relay_log_info_repository to TABLE.

Patrice Levesque

11 years ago

Stephane,

Thanks, it worked. For the record, the switch to relay_log_info_repository needed to be applied while the database server was running – the change in my.cnf plus a database server restart didn’t suffice. So this sequence of commands was needed:

STOP SLAVE;
START SLAVE UNTIL SQL_AFTER_MTS_GAPS;
— Wait until gaps are resolved – a message will appear in the server’s error log
STOP SLAVE;
SET GLOBAL relay_log_info_repository = ‘TABLE’;
START SLAVE;

After that, a database server stop/start confirmed the change was really applied — no more warnings.

Olu

8 years ago

Stephane,
This is very useful information. I am using mariadb 10.2.6 and have a master-slave setup. I recently turned on relay_log_recovery option (but didn’t know about the relay_log_info_repository as you explained in your post). Consequently I started noticing this entry in error log which slave server eventually restarts after several hours and breaks replication. The error message is InnoDB: page_cleaner: 1000ms intended loop took 10776ms. The settings might not be optimal. (flushed=0 and evicted=0, during the time.). So i began on the journey to find a fix until i landed on your blog post. Now, unfortunately i am finding out that mariadb does not have the relay_log_info_repository` option. See: https://mariadb.com/kb/en/mariadb/system-variable-differences-between-mariadb-100-and-mysql-56/
How do you suggest keeping the slave fully crash-safe in this regard?
Thanks

Olu