Where the open source database community meets: Use code PERCONA75 and secure your spot for Percona Live. Register

Downloads

Blog

Percona XtraDB Cluster Feature 2: Multi-Master replication

January 19, 2012

Author

Share this Post:

This is about the second great feature – Multi-Master replication, what you get with Percona XtraDB Cluster.

It is recommended you get familiar with general architecture of the cluster, described on the previous post.

By Multi-Master I mean the ability to write to any node in your cluster and do not worry that eventually you get out-of-sync situation, as it regularly happens with regular MySQL replication if you imprudently write to the wrong server.

This is long-waited feature, I’ve seen growing demand for it for last two years or even more.
Percona XtraDB Cluster provides it, and let’s see how it works.

With our Cluster you can write to any node, and the Cluster guarantees consistency of writes. That is the write is either committed on all nodes or not committed at all.

Let me draw some diagram there. For the simplicity I will use a two-nodes example, but the same logic is applied when you have N nodes.

As you see all queries are executed locally on the node, and only on COMMIT there is special handling.
When you issue COMMIT, your transaction has to pass certification on all nodes. If it does not pass, you
will receive “ERROR” as response on your query. After that transaction is applied on the local node.

That is response time of COMMIT consist of several parts:

Network roundtrip

Certification time

Local applying

Please note that applying the transaction on remote nodes does not affect the response time of COMMIT,
as it happens in the background after the response on certification.

The two important consequences of this architecture.

First: we can have several appliers working in parallel. This gives us true parallel replication. Slave can have many parallel threads, and you can tune it by variable wsrep_slave_threads

Second: There might be a small period of time when the slave is out-of-sync from master. This happens because the master may apply event faster than a slave. And if you do read from the slave, you may read data, that has not changes yet. You can see that from diagram. However you can change this behavior by using variable wsrep_causal_reads=ON. In this case the read on the slave will wait until event is applied (this however will increase the response time of the read. This gap between slave and master is the reason why this replication named “virtually synchronous replication”, not real “synchronous replication”

The described behavior of COMMIT also has the second serious implication.
If you run write transactions to two different nodes, the cluster will use an optimistic locking model.
That means a transaction will not check on possible locking conflicts during individual queries, but rather on the COMMIT stage. And you may get ERROR response on COMMIT. I am highlighting this, as this is one of incompatibilities with regular InnoDB, that you may experience. In InnoDB usually DEADLOCK and LOCK TIMEOUT errors happen in response on particular query, but not on COMMIT. Well, if you follow a good practice, you still check errors code after “COMMIT” query, but I saw many applications that do not do that.

So, if you plan to use Multi-Master capabilities of XtraDB Cluster, and run write transactions on several nodes, you may need to make sure you handle response on “COMMIT” query.

0 0 votes

Article Rating

24 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Admin

Peter Zaitsev

14 years ago

Vadim,

Do I understand correctly if Master will fail on commit after slave has committed it will fail and drop out of cluster ?

Author

Vadim Tkachenko

14 years ago

Peter,

That is the idea I guess.
If Master fails to make commit it will make an act of suicide and throw itself from cluster.
The same with the slave, if it fails to commit certified transaction.

Alex

14 years ago

Peter, yes, that’s the case. The only exception is DDL statements which are allowed to fail.

Admin

Peter Zaitsev

14 years ago

Alex,

Why DDL would be allowed to fail ?

Alex

14 years ago

Peter,

DDLs are not transactional, so we need to deal with them differently: first replicate and then execute (this is the only way to ensure that they are executed in the same order on all off the nodes). So at the point of execution we don’t know if this DDL would have failed on master or not. Perhaps you’re trying to drop a table which does not exist any more (some other client has dropped it while your statement was being replicated)

Doug

14 years ago

Vadim,

I use Percona Server 5.1 in a Two Master No Slave MMM configuration. The read write ratio of our application is 90/10. How do I distribute the reads and writes in a 3 Node Percona XtraDB Cluster? My method today is via MySQL MMM of Read & Write Roles virtual IP through a JDBC connection.

Author

Vadim Tkachenko

14 years ago

Doug,

You actually may not worry about read-write split now.
You just do both queries to any available node.

If you still would like to have dedicated write server, then you pick one of threes for Read-Write, and two that left – for Read

Alex

14 years ago

Depending on how much control you want over R/W splitting, you may try internal multimaster JDBC load balancer (developed for NDB, sends writes everywhere):
jdbc:mysql:loadbalance://,,…/?loadBalanceBlacklistTimeout=5000

Alex

14 years ago

oops, comment formatting ate my text. Check the full version here: http://www.codership.com/content/setting-multi-master-jdbc-mysqlgalera

To blog maintainer: when comments are formatted by the engine, a preview button is a must. Or at least a warning.

Brian Lovett

14 years ago

You could always use haproxy to distribute your requests evenly across the cluster or nodes, or even distribute them based on server load. We do that currently for our non sql front end servers, so adding the database cluster would be trivial. I’m really excited about this development and waiting to see it in production. We were researching cassandra and contemplating a switch as our system and dataset grow, but this is a far easier solution for us. (and we already know how to deal with most mysql issues, so that’s an added bonus.)

Eitan Ben-Noach

14 years ago

Hi Alex/Vadim,

We had install mysql-server-wsrep-5.5.20-23.4 with galera-23.2.0.

When starting the second node and higher we get a Warning that IST will be unavailable ( The logs are bellow).
Other then that, the replication works fine. We are concern of the possibility that incremental state transfer is not working.
Does it mean that the cluster copies all data from main the node to the others on each transaction?

Can you please help us trace this problem?
Thanks,
Eitan

Apr 1 14:14:26 ubuntu-server-129 mysqld: 120401 14:14:26 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (8630f2a8-7be8-11e1-0800-7e1ae413f999): 1 (Operation not permitted)
Apr 1 14:14:26 ubuntu-server-129 mysqld: #011 at galera/src/replicator_str.cpp:prepare_for_IST():429. IST will be unavailable.

Apr 1 14:14:26 ubuntu-server-129 mysqld: 120401 14:14:26 [Note] WSREP: Node 1 (ubuntu-server-129) requested state transfer from ‘*any*’. Selected 0 (ubuntu-server-128)(SYNCED) as donor.
Apr 1 14:14:26 ubuntu-server-129 mysqld: 120401 14:14:26 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 5010)
Apr 1 14:14:26 ubuntu-server-129 mysqld: 120401 14:14:26 [Note] WSREP: Requesting state transfer: success, donor: 0
Apr 1 14:14:27 ubuntu-server-129 mysqld: 120401 14:14:26 [Warning] IP address ‘192.168.210.128’ could not be resolved: Name or service not known
Apr 1 14:14:27 ubuntu-server-129 mysqld: 120401 14:14:27 [Note] WSREP: wsrep_start_position var submitted: ‘8630f2a8-7be8-11e1-0800-7e1ae413f999:5010’
Apr 1 14:14:27 ubuntu-server-129 mysqld: 120401 14:14:27 [Note] WSREP: Received SST: 8630f2a8-7be8-11e1-0800-7e1ae413f999:5010
Apr 1 14:14:27 ubuntu-server-129 mysqld: 120401 14:14:27 [Note] WSREP: SST received: 8630f2a8-7be8-11e1-0800-7e1ae413f999:5010
Apr 1 14:14:27 ubuntu-server-129 mysqld: 120401 14:14:27 [Note] WSREP: 1 (ubuntu-server-129): State transfer from 0 (ubuntu-server-128) complete.
Apr 1 14:14:27 ubuntu-server-129 mysqld: 120401 14:14:27 [Note] WSREP: Shifting JOINER -> JOINED (TO: 5010)
Apr 1 14:14:27 ubuntu-server-129 mysqld: 120401 14:14:27 [Note] WSREP: Member 1 (ubuntu-server-129) synced with group.
Apr 1 14:14:27 ubuntu-server-129 mysqld: 120401 14:14:27 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 5010)
Apr 1 14:14:27 ubuntu-server-129 mysqld: 120401 14:14:27 [Note] WSREP: Synchronized with group, ready for connections
Apr 1 14:14:27 ubuntu-server-129 mysqld: 120401 14:14:27 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
Apr 1 14:14:27 ubuntu-server-129 mysqld: 120401 14:14:27 [Note] WSREP: 0 (ubuntu-server-128): State transfer to 1 (ubuntu-server-129) complete.
Apr 1 14:14:27 ubuntu-server-129 mysqld: 120401 14:14:27 [Note] WSREP: Member 0 (ubuntu-server-128) synced with group.
Apr 1 14:17:01 ubuntu-server-129 CRON[3512]: (root) CMD ( cd / && run-parts –report /etc/cron.hourly)

Eitan Ben-Noach

14 years ago

Hi,

I have a thought that the warning appears since the node is not yet in the cluster, and in that case the only way for sync is SST.

Am I right?

I have came to this conclusion since, when a I’ve dropped the second node for a short while, did some changes to the main node, restart the second node again: this time I can see some messages logs relating to IST:

120401 19:31:44 [Note] WSREP: Prepared IST receiver, listening at: tcp://192.168.210.129:4568
…….

120401 19:31:45 [Note] WSREP: Received SST: 8630f2a8-7be8-11e1-0800-7e1ae413f999:5011
120401 19:31:45 [Note] WSREP: SST received: 8630f2a8-7be8-11e1-0800-7e1ae413f999:5011
120401 19:31:45 [Note] WSREP: Receiving IST: 1 writesets, seqnos 5011-5012

Alex

14 years ago

Hi Eitan,

You’re absolutely correct!

Eitan Ben-Noach

14 years ago

Thanks a lot Alex!

One more question relating to the (write) performance of Galera.

We have compared a stand alone MySQL with one node cluster of Galera (inserting new records).
The one node cluster is, sometimes, slower by about 20% then the stand alone MySQL.
Is that an expected behavior?

Thanks,
Eitan

erkules

14 years ago

Etian what have you compared?
Using Galera one approach is to compare with semisync replication. Even if Galera offers much more than semisync have a look at http://linsenraum.de/erkules/2012/03/galera-als-replikationsersatz.html it is written in german but the graphs count.
You see galera faster than MySQL 5.5.21 semisync (blue line). MySQL was getting out of semisync after a concurrency of 64.
The green line ist MariaDB 5.5.20 with group commit and xtradb. Even if they are faster.
Galere performs awesome and you don’t have the data in the relay log only.
And don’t forgett you can write to the ‘slave’ also 🙂

Alex

14 years ago

Etian,

depending on the configuration you used, it may be. Understandably Galera adds some overhead even in a single node cluster since it still has to prepare and pack the writesets. The reason for this is keeping track of global transaction ID and caching the writesets for potential IST. This overhead can be turned off by either unloading Galera wsrep provider or setting wsrep_on=0 globally. But that of course will effectively turn it into a non-Galera node.

So if you want single node comparison, you probably should compare single Galera node with plain mysql with binlog turned on.

Having said that, 20% may be not really justified (so it is our bad), but 10%, is probably unavoidable.

Saeed

13 years ago

Hi,

Could you please point to any online tutorial for setting up master-master replication with XtraDB Cluster?

Thanks

Eugene

13 years ago

Hi,

I understand that Percona XtraDB Cluster helps with High Availability,But how does it help with scalability? because as i understand, replication just copies/or makes sure all database server have the same information (data) so that every server act as master. to me this also means that all server may likely get filled-up at the same time. does Percona XtraDB Cluster has some kind of shedding or such that each database server(mysql) that is added into the Percona XtraDB Cluster will help increase storage space?

alex

13 years ago

Eugene,
Sure, replication “just copies” changes made by transaction. Copying changes is much faster than executing the transaction. So this leaves the slave with spare cycles to execute more transactions.

san

13 years ago

How do I separate Read and write traffic in typical jdbc web application? There is only one solution to use mysql-proxy to do so ? I would like to use HAProxy because it is highly recommended for Percona Xtradb Cluster.

gpfeng.cs

13 years ago

Does the master need to get the slave’s certifiction result before real Physical COMMIT(or ROLLBACK) ?

What about that master and slave get different certifiction results? upon that, which one make the suicide?

from http://www.mysqlperformanceblog.com/2012/11/20/understanding-multi-node-writing-conflict-metrics-in-percona-xtradb-cluster-and-galera/#comment-1279058

it seems that local trxs(befor broadcast) will be aborted/rollbacked

gpfeng.cs

13 years ago

it seems that local trxs(befor broadcast) will be aborted/rollbacked
–>
it seems that ONLY local trxs(befor broadcast) will be aborted/rollbacked

Ashok

12 years ago

Team

I want to sync percona extraDb cluster with another percona extraDB cluster.

Please letus know how can we do it.

Thanks
Ashok

Tom Diederich

12 years ago

Hi Ashok,

Thanks for the comment, but 1) this is an old post and 2) the place to ask this question is on our discussion forums, under Percona XtraDB Cluster. Here’s the URL: https://www.percona.com/forums/questions-discussions/percona-xtradb-cluster

That’s the place to connect with your peers and many times with Percona experts, too.