MySQL 5.7 multi-source replication

Recently Oracle announced several new features for the latest available development version of MySQL that is 5.7.2 at the time of writing this article. Most of them are performance and replication related that show us how incredible the new release will be.

In this post I’m going to try to explain in some easy steps how the new multi-source replication works and how we can configure it for our own tests. It is important to mention that this is a development release, so it is not production ready. Therefore this post is intend to people that want to test the new feature and see how it works with their application, always in a staging environment.

What is multi-source replication?

First, we need to have clear that multi-master and multi-source replication are not the same. Multi-Master replication is the usual circular replication where you can write on any server and data gets replicated to all others.

Multi master replication
Multi-source is different. MySQL replication had a limitation, fixed with this new release, that said that one slave could have only one master. That is a limiting factor when we are designing our replication environment. There were some “hacks” to make it work, but now there is an official way. So, in a nutshell, Multi-Source means that a slave can have more than one master. Now, replication environments like these one are possible:

Multi source replication
This will help us create some replication hierarchies that were impossible in the past. For example, you can have a slave in your office replicating the data from all servers that you have in the offices spread around the world.

How does it work?

Now we have the concept of communication channels. Each communication channel is a connection from slave to master to get the binary log events. That means we will have one IO_THREAD for each communication channel. We will need to run different “CHANGE MASTER” commands, one for each master, with the “FOR CHANNEL” argument that we will use to give a name to that channel.

Pretty easy. There is one single pre-requisite. The slave should have been configured first with the crash-safe feature of MySQL 5.6. That means that info usually included in master.info or relay-log.info should be on a table. Let’s start with the configuration.

Show me an example!

First you need to download the lab version of mysql from this link.

We have a sandbox environment with 2 masters and 1 slave servers. I won’t go over the details of how to configure the server_id, binary logs or replication users. I assume they are well configured. If you need a howto, you can follow this one.

First, we have to enable the crash safe feature on the slave:

After a restart of the slave we can start creating the channels with the names “master1” and “master2”:

To start the slave processes you need to specify what channel are you referring to:

Now, we want to check the status of the slave:

Oh, it is empty. We have to specify again which channel we want to check:

and we can also check that the IO_THREAD and SQL_THREADS are running:

Let’s test it:

It works, that easy!

Conclusion

The new multi-source feature allow us to create new replication environments that were impossible in the past without some complex “hacks”. Of course, your application should be designed and developed with this new architecture in mind. Like with multi-master, multi-source needs special care to not end up with your data messed up.

MySQL replication is getting better on every release giving us more configuration, performance and design possibilities. And all those new features can be combined. Your replication environment can be even better if you mix some of the new (and old) features added recently to the replication. For example, you can configure GTID or enable multi-threaded slave per schema.

Share this post

Comments (54)

  • Marc Castrovinci

    Great feature! It looks helpful for when you want to funnel sharded data into a single location so you can run OLAP type operations over all the data.

    October 2, 2013 at 10:05 am
    • Nick

      Marc? As in SOLN, Marc Castrovinci?

      January 18, 2016 at 6:08 pm
  • Jonatas

    Very nice feature. Keeping eye on 5.7…

    October 2, 2013 at 10:40 am
  • shiv

    Hi Miguel,
    You can use the new replication performance schema tables to look at all the channels or choose which channel you want…

    October 2, 2013 at 11:32 am
  • Miguel Angel Nieto

    shiv, yes, it is possible to get that info from P_S tables:

    replication_connection_configuration
    replication_execute_configuration

    but… in my personal opinion that should be in I_S. That info is not performance related, it just show us configuration information so it should be available via SHOW (for example SHOW SLAVE STATUS CHANNEL ALL) or via I_S tables. Maybe I’m wrong, but I’ve opened a feature request anyway:

    http://bugs.mysql.com/bug.php?id=70434

    October 2, 2013 at 1:22 pm
  • Tariq

    I trying to imagine the practical implementation of this new ‘multi source replication’ and how does it really help having a slave getting replicated from multiple sources when the data_set_from_every_master is the same ?

    Consider the M-M replication master1 and master2 having the dataset X and I could easily run a slave1 either from Master1 or Master2 and possibly have many other slaves from slave1 or master[1-2] and so on, why would one need to replicate from multiple masters if data is all the same unless for some arcane reason i want to replicate a particular db from master 1 and another db2 from another master which is not necessarily in the M-M replication chain .

    October 2, 2013 at 2:25 pm
  • Miguel Angel Nieto

    Tariq, the data from masters don’t need to be the same. The masters on the second graph they are not in M-M replication. There are lot of examples where multi-source can be applied:

    – maybe you have hundred of apache servers, logging accesses to a mysql. in that case, having all those mysql replicating to a single slave could help you to