While we should be aiming for small and fast transactions with Galera, it is always possible at some point you might want a single large transaction, but what is involved?
First, this is supposed to be controlled by two settings, wsrep_max_ws_rows and wsrep_max_ws_size . The first variable is not yet enforced and has no effect – see here and here – so don’t bother tuning this knob just yet. In my opinion, I would rather implement only one – having a limit by rows is hard to control as a DBA since each row’s size can be very different per workload.
The second variable restricts the writeset size in bytes and has better control on cluster performance. If your network and CPU can only process N amount of bytes per second, this is a good variable to enforce. Additionally, the maximum allowed value for this setting is only 2GB, with a default of 1GB. You can actually set this higher than 2GB, but only the 2GB is being enforced in my tests.
Let’s see what happens when we adjust this setting to large values. I have a sysbench table with 10M rows below and will update all rows in a single transaction.
|
1 |
-rw-rw---- 1 mysql mysql 8.5K Oct 20 03:25 t.frm<br>-rw-rw---- 1 mysql mysql 2.4G Oct 20 03:46 t.ibd<br><br>node1 mysql> set global wsrep_max_ws_size=1024*1024*1024*2;<br>Query OK, 0 rows affected (0.00 sec)<br><br>node1 mysql> update t set k=k+1;<br>ERROR 1180 (HY000): Got error 5 during COMMIT |
On the MySQL error log this translates to:
|
1 |
2015-10-20 04:27:14 10068 [Warning] WSREP: transaction size limit (2147483648) exceeded: 2147516416<br>2015-10-20 04:27:14 10068 [ERROR] WSREP: rbr write fail, data_len: 0, 2 |
Let’s see what happens if we set this to 4GB:
|
1 |
node1 mysql> set global wsrep_max_ws_size=1024*1024*1024*4;<br>Query OK, 0 rows affected, 1 warning (0.00 sec)<br><br>node1 mysql> show warnings;<br>+---------+------+-----------------------------------------------------------+<br>| Level | Code | Message |<br>+---------+------+-----------------------------------------------------------+<br>| Warning | 1292 | Truncated incorrect wsrep_max_ws_size value: '4294967296' |<br>+---------+------+-----------------------------------------------------------+<br>1 row in set (0.00 sec)<br><br>node1 mysql> select @@wsrep_max_ws_size;<br>+---------------------+<br>| @@wsrep_max_ws_size |<br>+---------------------+<br>| 4294901759 |<br>+---------------------+<br>1 row in set (0.00 sec) |
We got a warning that it exceeds the allowed value, not the 2GB, but something else a bit lower than the 32bit UNSIGNED MAX VALUE. So let’s try the transaction again:
|
1 |
node1 mysql> update t set k=k+1;<br>ERROR 1180 (HY000): Got error 5 during COMMIT |
We clearly got the same error on the client, and on the MySQL error log, it’s a bit different:
|
1 |
2015-10-20 04:54:14 10068 [ERROR] WSREP: Maximum writeset size exceeded by 1737995426: 90 (Message too long)<br> at galera/src/write_set_ng.hpp:check_size():662<br>2015-10-20 04:54:14 10068 [ERROR] WSREP: transaction size exceeded |
If you really must process a large number of rows, one way to try and reduce the amount of writeset size is to set binlog_row_image to minimal . Let’s test this again with a 2GB wsrep_max_ws_size and see how it goes:
|
1 |
node1 mysql> set global wsrep_max_ws_size=1024*1024*1024*2;<br>Query OK, 0 rows affected (0.00 sec)<br><br>node1 mysql> select @@wsrep_max_ws_size;<br>+---------------------+<br>| @@wsrep_max_ws_size |<br>+---------------------+<br>| 2147483648 |<br>+---------------------+<br>1 row in set (0.00 sec)<br><br>node1 mysql> SET GLOBAL binlog_row_image=minimal;<br>Query OK, 0 rows affected (0.00 sec)<br><br>node1 mysql> select @@binlog_row_image;<br>+--------------------+<br>| @@binlog_row_image |<br>+--------------------+<br>| MINIMAL |<br>+--------------------+<br>1 row in set (0.00 sec)<br><br>node1 mysql> show global status like 'wsrep_replicated_bytes';<br>+------------------------+-------------+<br>| Variable_name | Value |<br>+------------------------+-------------+<br>| wsrep_replicated_bytes | 13211964556 |<br>+------------------------+-------------+<br>1 row in set (0.00 sec)<br><br>node1 mysql> update t set k=k+1;<br>Query OK, 10000000 rows affected (11 min 18.33 sec)<br>Rows matched: 10000000 Changed: 10000000 Warnings: 0<br><br>node1 mysql> show global status like 'wsrep_replicated_bytes';<br>+------------------------+-------------+<br>| Variable_name | Value |<br>+------------------------+-------------+<br>| wsrep_replicated_bytes | 13402597135 |<br>+------------------------+-------------+<br>1 row in set (0.00 sec)<br><br>node1 mysql> select (13402597135-13211964556)/1024/1024 as ws_size;<br>+--------------+<br>| ws_size |<br>+--------------+<br>| 181.80139446 |<br>+--------------+<br>1 row in set (0.01 sec) |
In our tests, we reduced it to at least 10x the original writeset size. Thanks to my colleague Przemyslaw for pointing this out.
Now we know how big we can go in terms of size and how to go further with minimal row image, although this does not mean that you can and should be allowing it. Make sure to set a sane limit on the sizes depending on your workload and realistic performance expectations. Large transactions will not only cause unexpected performance issues with your cluster, but will lead to usability issues as well in terms of increased deadlocks.
Resources
RELATED POSTS