New to TokuDB® v7.5 is a feature we’re calling “Read Free Replication” (RFR). RFR allows TokuDB replication slaves to process insert, update, and delete statements with almost no read IO. As a result, the slave can easily keep up with the master (no lag) as well as brings all the read IO capacity of the slave for read-scaling your workload.
The goal of this blog is two-fold: (1) to cover why RFR is important and how RFR works and (2) to run a simple before/after benchmark showing the impact of RFR on a well known workload. Later this week I’ll post another blog showing other interesting use-cases for RFR beyond this first benchmark.
Read Free Replication: The Why and How
In MySQL, a replication slave does less work than the master because there is no need for a slave to execute SELECT statements (only INSERT, UPDATE, and DELETE). However, a MYSQL slave can struggle to keep up with the master because replication is processed in a single-threaded manner on the slave, even though the transactions are executing concurrently on the master. This leads to a condition known as slave lag as the slave falls farther behind the master as time goes on.
Note: There are replication enhancements in MySQL 5.6 and MariaDB 10.x to improve replication performance on the slave, and Tokutek’s RFR functionality will take advantage of those improvements as well. I’ll elaborate on this in the next blog.
So the challenge of replication is that the slave needs to do the same amount of write work as the master, but cannot take advantage of the concurrent transaction processing of the master. In general, the issue is specifically read IO as MySQL slave updates and deletes implement read-modify-write behavior and uniqueness checks on inserts and updates also require read IO.
There are many whitepapers, blogs, and presentations from Tokutek regarding the inner workings of Fractal Tree indexes. A one sentence explanation of what Fractal Tree indexes are capable of for those familiar with InnoDB is as follows. InnoDB supports a change buffer for inserts, updates, and deletes on secondary indexes. The change buffer is used when the leaf node containing the data for a given operation isn’t currently in the InnoDB buffer pool, and it allows the transaction to continue rather than wait for the read IO operation. At some later point in time, either the leaf node ends up in memory (and the buffered operation is applied) or the buffer gets too large and InnoDB goes looking for needed leaf nodes to reduce it’s size.
OK, maybe I needed more than one sentence for the background information.
In contrast to InnoDB’s single change buffer per secondary index, TokuDB’s Fractal Tree indexes support a “change buffer” in each internal node in each index, even the primary key indexes. This allows us to process the replication stream on the slave server with no read IO whatsoever, so long as the binary log contains everything needed for the operations.
Read Free Replication: Sysbench Benchmark
The benchmark was performed on a pair of similarly spec’d Dell R710 servers, TokuDB was set to use zlib compression with a 4GB cache, and Sysbench 0.5 OLTP was run with 16 tables, 5 million rows per table, and 64 concurrent threads. The OLTP workload was run for just over 10 minutes before the RFR optimizations were enabled, which allowed the slave lag to build up.
In the throughput graph (commits per second) it’s clear that the single threaded nature of replication is prohibiting the slave from keeping up with the master. The master is performing around 140 transaction per second (tps) and the slave around 70 tps. Once RFR is enabled on the slave, throughput climbs to over 1400 tps, and stabilizes at the exact throughput of the master, eliminating lag entirely.
[image img_url=”/wp-content/uploads/2014/09/master-vs-slave-cps.png” img_title=”Master Vs Slave: Commits Per Second”]
I also measured IO utilization during the benchmark, and measured a dramatic reduction in IO once the optimization is enabled. Much of the remaining IO utilization is due to additional writes and fsyncs() that we plan on removing in an upcoming release.
[image img_url=”/wp-content/uploads/2014/09/master-vs-slave-io.png” img_title=”Master Vs Slave: Commits Per Second”]
Unlike most of my TokuDB benchmarks, I’m comparing TokuDB 7.5 to itself, rather than InnoDB. But for those interested using the same 4GB cache size InnoDB’s throughput on the master was ~100 tps and the InnoDB slave was ~72 tps.