Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

Resolving the MySQL Active-Active Replication Dilemma

April 28, 2021

Author

Przemysław Malkowski

Insight for DBAs

Insight for Developers

MySQL

Share this Post:

MySQL Active-Active Replication Dilemma Multi-writer replication has been a challenge in the MySQL ecosystem for years before truly dedicated solutions were introduced – first Galera (and so Percona XtradDB Cluster (PXC)) replication (around 2011), and then Group Replication (first GA in 2016).

Now, with both multi-writer technologies available, do we still need traditional asynchronous replication, set up in active-active topology? Apparently yes, there are still valid use cases. And you may need it not only when for some reason Galera/PXC or GR are not suitable, but also when you actually use them. Of course, the most typical case is to have a second cluster in a different geographic location, as Disaster Recovery. If you still wonder why you would need it, just read how a whole data center can disappear in the news a few weeks ago, about the OVH incident.

So, a DR site needs to replicate online from the primary cluster and be able to take over the workload very fast if needed. But also it is expected to be able to switch back effortlessly, hence very often the async replication channels are set up in both directions.

A very good writeup about this can be found here: How Not to do MySQL High Availability: Geographic Node Distribution with Galera-Based Replication Misuse.

Now, after seeing repeating problems with active-active setups falling over and over for years, I thought there are still too few warnings out there about the risks, so I decided to add one more little stone to the stack.

Before I continue, I have to mention this great webinar made last year by my colleague Sveta. You should definitely watch if you are interested in the subject: How Safe is Asynchronous Master-Master Setup in MySQL?.

Failure Test

So, let me demonstrate a simple test case, which may be an eye-opener to some.

First, let’s use the great dbdeployer tool to launch two MySQL instances with active-active replication in just one command:

$ dbdeployer deploy replication --topology=all-masters --nodes=2 --concurrent 8.0.23
all-masters directory installed in $HOME/sandboxes/all_masters_msb_8_0_23
run 'dbdeployer usage multiple' for basic instructions'
$HOME/sandboxes/all_masters_msb_8_0_23/initialize_ms_nodes
# server: 1 
# server: 2

$ dbdeployer deploy replication --topology=all-masters --nodes=2 --concurrent 8.0.23

all-masters directory installed in $HOME/sandboxes/all_masters_msb_8_0_23

run 'dbdeployer usage multiple' for basic instructions'

$HOME/sandboxes/all_masters_msb_8_0_23/initialize_ms_nodes

# server: 1

# server: 2

Now, create a very simple table with one example data row:

node1 (test) > create table t1 (id int primary key auto_increment, a int);
Query OK, 0 rows affected (0.04 sec)

node1 (test) > insert into t1 set a=500;
Query OK, 1 row affected (0.01 sec)

node1 (test) > create table t1 (id int primary key auto_increment, a int);

Query OK, 0 rows affected (0.04 sec)

node1 (test) > insert into t1 set a=500;

Query OK, 1 row affected (0.01 sec)

So, at this point both nodes have the same data:

node1 (test) > select * from test.t1;
+----+------+
| id | a    |
+----+------+
|  1 | 500  |
+----+------+
1 row in set (0.00 sec)

node2 (test) > select * from test.t1;
+----+------+
| id | a    |
+----+------+
|  1 | 500  |
+----+------+
1 row in set (0.00 sec)

node1 (test) > select * from test.t1;

+----+------+

| id | a |

+----+------+

| 1 | 500 |

+----+------+

1 row in set (0.00 sec)

node2 (test) > select * from test.t1;

+----+------+

| id | a |

+----+------+

| 1 | 500 |

+----+------+

1 row in set (0.00 sec)

In the next step, let’s simulate some little replication lag by introducing a delay of one second, as it will allow reproducing the problem at will:

node1 (test) > stop replica sql_thread; CHANGE REPLICATION SOURCE TO SOURCE_DELAY=1 FOR CHANNEL "node2"; start replica sql_thread;
Query OK, 0 rows affected (0.01 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)

node2 (test) > stop replica sql_thread; CHANGE REPLICATION SOURCE TO SOURCE_DELAY=1 FOR CHANNEL "node1"; start replica sql_thread;
Query OK, 0 rows affected (0.01 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)

node1 (test) > stop replica sql_thread; CHANGE REPLICATION SOURCE TO SOURCE_DELAY=1 FOR CHANNEL "node2"; start replica sql_thread;

Query OK, 0 rows affected (0.01 sec)

Query OK, 0 rows affected (0.00 sec)

node2 (test) > stop replica sql_thread; CHANGE REPLICATION SOURCE TO SOURCE_DELAY=1 FOR CHANNEL "node1"; start replica sql_thread;

Query OK, 0 rows affected (0.01 sec)

Query OK, 0 rows affected (0.00 sec)

OK, so what if we send an update, very close in time, to both nodes, where the same row gets a different value:

$ all_masters_msb_8_0_23/use_all -e "update test.t1 set a=@@server_id where id=1"
# server: 1 
# server: 2

$ all_masters_msb_8_0_23/use_all -e "update test.t1 set a=@@server_id where id=1"

# server: 1

# server: 2

As a result, both nodes have different column values!

node1 (test) > select * from test.t1;
+----+------+
| id | a    |
+----+------+
|  1 | 200  |
+----+------+
1 row in set (0.00 sec)

node2 (test) > select * from test.t1;
+----+------+
| id | a    |
+----+------+
|  1 | 100  |
+----+------+
1 row in set (0.00 sec)

node1 (test) > select * from test.t1;

+----+------+

| id | a |

+----+------+

| 1 | 200 |

+----+------+

1 row in set (0.00 sec)

node2 (test) > select * from test.t1;

+----+------+

| id | a |

+----+------+

| 1 | 100 |

+----+------+

1 row in set (0.00 sec)

Is Replication Broken?

You may think replication is now broken and some error will alert you about the situation? Not at all!

node1 (test) > show replica statusG
*************************** 1. row ***************************
...
             Replica_IO_Running: Yes
            Replica_SQL_Running: Yes
...
        Seconds_Behind_Source: 0
Source_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
...
            Executed_Gtid_Set: 00023824-1111-1111-1111-111111111111:1-3,
00023825-2222-2222-2222-222222222222:1
...

node2 (test) > show replica statusG
*************************** 1. row ***************************
...
             Replica_IO_Running: Yes
            Replica_SQL_Running: Yes
...
        Seconds_Behind_Source: 0
Source_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
...
            Executed_Gtid_Set: 00023824-1111-1111-1111-111111111111:1-3,
00023825-2222-2222-2222-222222222222:1

node1 (test) > show replica statusG

*************************** 1. row ***************************

...

Replica_IO_Running: Yes

Replica_SQL_Running: Yes

...

Seconds_Behind_Source: 0

Source_SSL_Verify_Server_Cert: No

Last_IO_Errno: 0

Last_IO_Error:

Last_SQL_Errno: 0

Last_SQL_Error:

...

Executed_Gtid_Set: 00023824-1111-1111-1111-111111111111:1-3,

00023825-2222-2222-2222-222222222222:1

...

node2 (test) > show replica statusG

*************************** 1. row ***************************

...

Replica_IO_Running: Yes

Replica_SQL_Running: Yes

...

Seconds_Behind_Source: 0

Source_SSL_Verify_Server_Cert: No

Last_IO_Errno: 0

Last_IO_Error:

Last_SQL_Errno: 0

Last_SQL_Error:

...

Executed_Gtid_Set: 00023824-1111-1111-1111-111111111111:1-3,

00023825-2222-2222-2222-222222222222:1

Or, did you hope enabling the GTID feature would prevent inconsistency from happening? Well, see the Executed_Gtid_Set on both nodes – it is identical, yet the nodes have different data.

What happens next? Well, it depends, maybe replication will eventually fail someday with an error if the same row is modified again, but also it is possible the inconsistency will spread further without you even notice it!

Lesson learned?

I hope this simple example emphasized the need for extra care when dealing with multi-primary replication topologies. The remedy though is usually quite simple:

1 – Make sure to allow only one node at a time for writes (super_read_only is here to help you).

2 – Check the data consistency regularly with appropriate tools (MySQL replication primer with pt-table-checksum and pt-table-sync).

Percona Distribution for MySQL is the most complete, stable, scalable, and secure, open-source MySQL solution available, delivering enterprise-grade database environments for your most critical business applications… and it’s free to use!

Download Percona Distribution for MySQL Today

0 0 votes

Article Rating

7 Comments

Oldest

Newest Most Voted

Eero Teerikorpi

5 years ago

Przemyslaw, you may a bit young to remember that the first two synchronous, MySQL multi-primary solutions were m/cluster (2004, a predecessor of Galera) and uni/cluster (2007) by Continuent. We abandoned from those solutions due the limitations of the synchronous replication for the geo-distributed needs, both active/passive and active/active (multi-primary). The best available (complete, proven, fully-tested) MySQL clustering solution for these geo-distributed needs is Tungsten Cluster (https://www.continuent.com/products/tungsten-clustering/topologies). Please take a look!

Matthew Lang

5 years ago

You’ve correctly stated that asynchronous replication is necessary for any deployment at geo-scale or simply replication over a WAN. Marco has presented the physics of this several times in Percona Live! presentations. As for the remedies (Lesson learned), you CAN have active/active clusters at geo-scale. While using a tool to check data consistency is helpful, it’s not a remedy when using Active/Active deployments because it’s too late to identify data drift and decide which version of the data is correct. Data drift needs to be prevented by having intelligence built into the application about the active/active topology, such as having site affinity.

Also mixing of technologies (one tool for local clustering, another to connect sites via asynchronous replication, and yet another for MySQL proxying) adds complexity to the overall deployment without solving orchestration issues like site failover and failback. I’ve done a comparison between this kind of deployment and Tungsten Clustering (a complete geo-scale clustering solution for MySQL) here:

https://www.continuent.com/resources/blog/galera-high-noon-tungsten-clustering-mysql-mariadb-perconaxtradb

Eric Rasmussen

5 years ago

Another great danger of active-active replication is schema changes, which can either break replication or, worse, store data into the WRONG COLUMNS. See https://bugs.mysql.com/bug.php?id=88595. To the best of my knowledge, this affects all versions of MySQL using row-based replication, and would require a change to the protocol to fix it.

Author

Przemysław Malkowski

5 years ago

Reply to Eric Rasmussen

Good point, I agree that having different schemas between the nodes may result in pretty unexpected results, including also silently loosing data in special cases. That’s one of the the reasons why DDLs are executed in isolation and basically blocking the whole cluster in Galera/PXC.

Maciek Brzeziński

5 years ago

I was recently involved in a project, where there was a need for active-active setup and eventually we’ve used replication with binlog. With correct setup of auto increments on both machines the setup works reliably under heavy load (we performance tested the solution for write load exciding several times the load we observed on production before introducing the active-active architecture).

It’s by no means “internet-scale” architecture, but in our situation it reliably does what it’s supposed to and (according to our tests) will perform correctly for years to come (in our project however there is no chance that the load will increase a 1000 times).