Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

Galera 4 Streaming Replication in Percona XtraDB Cluster 8.0

May 13, 2020

Author

Tibor Korocz

MySQL

Percona Software

Share this Post:

Galera 4 Streaming Replication in Percona XtraDB Cluster I was testing the latest Percona XtraDB Cluster 8.0 (PXC) release which has the Galera 4 plugin, and I would like to share my experiences and thoughts on the Streaming Replication feature so far.

What Is Streaming Replication, in One Sentence?

In Galera 4, the large transaction could split into smaller fragments, and even before it got committed these fragments have been replicated to the other nodes and have already started the certification and apply process.

The manual describes all the pros and cons, but let’s see how it works. I have created a table with 10M rows and I am going to run some large updates on that.

First I have run the updates without Streaming Replication, and because it is disabled by default we do not have to do anything but simply run the update. On node1 I am recording the time before and after the update, and on node2 I am running a select in every second to see when this update gets actually committed on the other nodes.

On the writer:

node1-T1 mysql> select now();update sbtest1 set k=1 where id < 1000000;select now();
+---------------------+
| now() |
+---------------------+
| 2020-03-24 14:25:20 |
+---------------------+
1 row in set (0.00 sec)

Query OK, 999997 rows affected (34.53 sec)
Rows matched: 999997 Changed: 999997 Warnings: 0

+---------------------+
| now() |
+---------------------+
| 2020-03-24 14:25:54 |
+---------------------+
1 row in set (0.00 sec)

node1-T1 mysql> select now();update sbtest1 set k=1 where id < 1000000;select now();

+---------------------+

| now() |

+---------------------+

| 2020-03-24 14:25:20 |

+---------------------+

1 row in set (0.00 sec)

Query OK, 999997 rows affected (34.53 sec)

Rows matched: 999997 Changed: 999997 Warnings: 0

+---------------------+

| now() |

+---------------------+

| 2020-03-24 14:25:54 |

+---------------------+

1 row in set (0.00 sec)

It took around 34s to run this query.

Check the other node:

+---------------------+
| now() |
+---------------------+
| 2020-03-24 14:26:22 |
+---------------------+
+---+
| k |
+---+
| 1 |
+---+

+---------------------+

| now() |

+---------------------+

| 2020-03-24 14:26:22 |

+---------------------+

+---+

| k |

+---+

| 1 |

+---+

As I said, I was running a query in every second to see when the k changes to 1. It took 28s for node2 to certify, apply, and commit this update. It is important to understand that while the node is certifying this large update, all the other changes on any other table will be blocked, so basically the node will stall/freeze for almost 30s.

Will Streaming Replication Decrease This Delay?

In theory, the other nodes should catch up faster because while the update is running on the node1 the fragments have been already replicated and applied on the other nodes as well.

I am going to set the fragment size to 1MB.

set session wsrep_trx_fragment_size=1048576;

node1-T1 mysql> select now();update sbtest1 set k=2 where id < 1000000;select now();
+---------------------+
| now() |
+---------------------+
| 2020-03-24 14:44:08 |
+---------------------+
1 row in set (0.00 sec)

Query OK, 999997 rows affected (40.08 sec)
Rows matched: 999997 Changed: 999997 Warnings: 0

+---------------------+
| now() |
+---------------------+
| 2020-03-24 14:44:48 |
+---------------------+
1 row in set (0.00 sec)

set session wsrep_trx_fragment_size=1048576;

node1-T1 mysql> select now();update sbtest1 set k=2 where id < 1000000;select now();

+---------------------+

| now() |

+---------------------+

| 2020-03-24 14:44:08 |

+---------------------+

1 row in set (0.00 sec)

Query OK, 999997 rows affected (40.08 sec)

Rows matched: 999997 Changed: 999997 Warnings: 0

+---------------------+

| now() |

+---------------------+

| 2020-03-24 14:44:48 |

+---------------------+

1 row in set (0.00 sec)

Now it took 40s to run. Let’s see the other node.

+---------------------+
| now() |
+---------------------+
| 2020-03-24 14:44:48 |
+---------------------+
+---+
| k |
+---+
| 2 |
+---+

+---------------------+

| now() |

+---------------------+

| 2020-03-24 14:44:48 |

+---------------------+

+---+

| k |

+---+

| 2 |

+---+

In the same second, k was already 2 on node2 as well. We do not have the 28s delay anymore which is great, but the query itself got a bit slower. I ran these updates many times and it was always faster without Streaming Replication, but with large transactions, I would prefer Streaming Replication because the other nodes will not be drifted too far.

Why Could It Be Slower With Streaming Replication?

One of the reasons could be because, with Streaming Replication, Galera basically double-writes the write-sets. It records them in the mysql.wsrep_streaming_log table to ensure Streaming Replication updates in case of a crash, as well. This is why it is not a good idea to enable Streaming Replication for all the queries.

Does the Fragment Size Matter?

Let’s change the fragment size to 0.1MB:

node1-T1 mysql> set session wsrep_trx_fragment_size=104857;
Query OK, 0 rows affected (0.01 sec)

node1-T1 mysql> select now();update sbtest1 set k=5 where id < 1000000;select now();
+---------------------+
| now() |
+---------------------+
| 2020-03-24 15:28:30 |
+---------------------+
1 row in set (0.00 sec)

Query OK, 999997 rows affected (51.21 sec)
Rows matched: 999997 Changed: 999997 Warnings: 0

+---------------------+
| now() |
+---------------------+
| 2020-03-24 15:29:21 |
+---------------------+
1 row in set (0.00 sec)

node1-T1 mysql> set session wsrep_trx_fragment_size=104857;

Query OK, 0 rows affected (0.01 sec)

node1-T1 mysql> select now();update sbtest1 set k=5 where id < 1000000;select now();

+---------------------+

| now() |

+---------------------+

| 2020-03-24 15:28:30 |

+---------------------+

1 row in set (0.00 sec)

Query OK, 999997 rows affected (51.21 sec)

Rows matched: 999997 Changed: 999997 Warnings: 0

+---------------------+

| now() |

+---------------------+

| 2020-03-24 15:29:21 |

+---------------------+

1 row in set (0.00 sec)

Now it took more than 50s, so as we can see size matters. I also did a test when I increased the fragment size to 100MB, and in that case the query took around 40s again, but the other nodes needed more time to catch up, around 10s.

Finding the right fragment size is important, and in my tests 1MB was acceptable, but you could also define the number of rows, and create fragments after every 10000 rows.

Locking

Before Galera 4, locks were not propagated to other nodes. If you were running a query on node1, InnoDB locked the necessary rows on node1 only, but node2 and node3 did not know which rows were locked. If someone was writing node2 and node3, Galera had to handle these conflicts, basically the transaction which was committed first, won. If you are writing multiple nodes, you can have a lot of conflicts because of this.

But with Streaming Replication this has also changed. The fragments are replicated to the other nodes and they will hold the necessary locks there as well.

Let’s see how this works, first without Streaming Replication, step by step.

On node1 and node2 I start a transaction, on node1 I run a large update, on node2 I delete a single row, then I commit the transaction on node2. When I commit on node1 I get a deadlock as we except:

node1-T1 mysql> begin;
node2-T1 mysql> begin;

node1-T1 mysql> update sbtest1 set k=8 where id < 1000000;
Query OK, 999997 rows affected (27.87 sec)
Rows matched: 999997 Changed: 999997 Warnings: 0

node2-T1 mysql> delete from sbtest1 where id=12;
Query OK, 1 row affected (0.01 sec)
node2-T1 mysql> commit;
Query OK, 0 rows affected (0.00 sec)

node1-T1 mysql> commit;
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

node1-T1 mysql> begin;

node2-T1 mysql> begin;

node1-T1 mysql> update sbtest1 set k=8 where id < 1000000;

Query OK, 999997 rows affected (27.87 sec)

Rows matched: 999997 Changed: 999997 Warnings: 0

node2-T1 mysql> delete from sbtest1 where id=12;

Query OK, 1 row affected (0.01 sec)

node2-T1 mysql> commit;

Query OK, 0 rows affected (0.00 sec)

node1-T1 mysql> commit;

ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

Because there is no cluster-wide locking, in this case, it will use optimistic locking and will get an error during the certification.

How Does This Work With Streaming Replication?

Repeat the same steps:

node1-T1 mysql> set session wsrep_trx_fragment_size=1048570;
Query OK, 0 rows affected (0.00 sec)
node1-T1 mysql> begin;
Query OK, 0 rows affected (0.00 sec)

node2-T1 mysql> begin;
Query OK, 0 rows affected (0.00 sec)

node2-T1 mysql> update sbtest1 set k=9 where id < 1000000;
Query OK, 999996 rows affected (38.15 sec)
Rows matched: 999996 Changed: 999996 Warnings: 0
node2-T1 mysql> delete from sbtest1 where id=13;
Waiting.....

node1-T1 mysql> commit;
Query OK, 0 rows affected (2.46 sec)
node2-T1 mysql> commit;
Query OK, 0 rows affected (0.00 sec)

node1-T1 mysql> set session wsrep_trx_fragment_size=1048570;

Query OK, 0 rows affected (0.00 sec)

node1-T1 mysql> begin;

Query OK, 0 rows affected (0.00 sec)

node2-T1 mysql> begin;

Query OK, 0 rows affected (0.00 sec)

node2-T1 mysql> update sbtest1 set k=9 where id < 1000000;

Query OK, 999996 rows affected (38.15 sec)

Rows matched: 999996 Changed: 999996 Warnings: 0

node2-T1 mysql> delete from sbtest1 where id=13;

Waiting.....

node1-T1 mysql> commit;

Query OK, 0 rows affected (2.46 sec)

node2-T1 mysql> commit;

Query OK, 0 rows affected (0.00 sec)

Same steps, but when we are trying to run the delete on node2 it is just hanging there, it is waiting for the locks which are actually coming from node1 because of the Streaming Replication. When I commit the transaction on node1, then node2 can run the delete query. This could save us from a lot of deadlocks and conflicts. Queries might have to wait for the locks but in the end they should still finish.

This sounds great, but I would still not recommend enabling Streaming Replication as default, as it has a performance impact and it double-writes everything into the mysql.wsrep_streaming_log table. Also if you would like to roll back a big transaction, it has to be done on all the nodes, not just one.

Metrics/Monitoring

There is a table called mysql.wsrep_streaming_log which contains information about the currently running Streams.

node1-T1 mysql> SELECT node_uuid, trx_id, seqno, flags FROM mysql.wsrep_streaming_log;
+--------------------------------------+--------+-------+-------+
| node_uuid                            | trx_id | seqno | flags |
+--------------------------------------+--------+-------+-------+
| 9d7f7e09-6d20-11ea-842b-26b6d24e606d |   8571 | 24712 |     1 |
+--------------------------------------+--------+-------+-------+

node1-T1 mysql> SELECT node_uuid, trx_id, seqno, flags FROM mysql.wsrep_streaming_log;

+--------------------------------------+--------+-------+-------+

+--------------------------------------+--------+-------+-------+

| 9d7f7e09-6d20-11ea-842b-26b6d24e606d | 8571 | 24712 | 1 |

+--------------------------------------+--------+-------+-------+

But be careful, because the column called frag contains the whole binary log replication event, which can be quite large. Before I realized that, I got the following errors:

node1-T1 mysql> select * from mysql.wsrep_streaming_log;
ERROR 2020 (HY000): Got packet bigger than 'max_allowed_packet' bytes

1 2	node1-T1 mysql> select * from mysql.wsrep_streaming_log; ERROR 2020 (HY000): Got packet bigger than 'max_allowed_packet' bytes

Unfortunately, as far as I know, there is no counter to monitor how many queries are replicated by Streaming Replication. I created a feature request to cover this.

Conclusion

So far, Streaming Replication works very well and I think it has great potential. Because it is possible to change it only in the sessions, yourself or the application can easily turn it on when it is necessary, but I am also waiting to see real-life experiences with it.