EmergencyEMERGENCY? Get 24/7 Help Now!

TokuDB Read Free Replication : Details and Use Cases

 | September 25, 2014 |  Posted In: Tokutek, TokuView

PREVIOUS POST
NEXT POST

The biggest innovation in TokuDB v7.5 is Read Free Replication (RFR). I blogged a few days ago posting a benchmark showing how much additional throughput can be achieved on a replication slave, while at the same time lowering the read IO operations to almost zero. The official documentation on the feature is available here.

In this second blog I want to cover the requirements for RFR, as well as some interesting use-cases for the technology.

RFR Requirements
The only requirement on the master is that replication logging must be row based (BINLOG_FORMAT=ROW). This is key, because the optimization requires the slave servers to know the before and after images of the SQL operations in order to avoid read IO.
There are a few requirements on the slave: (1) The server must be in “read only” mode (read_only=1) and (2) you must disable the default uniqueness checking (tokudb_rpl_unique_checks=0) and/or read-before-write behavior (tokudb_rpl_lookup_rows=0).
We decided to disable the RFR optimizations by default (both default to 1) as we want to make sure users request this change in behavior. Even in read only mode, MySQL slaves allow insert/update/delete operations from users with SUPER privileges. And without uniqueness checks and read-modify-write there is no way for the slave to know if there is drift. It is, however, perfectly fine for your slave to have more or less indexes than the master.
Use Case 1 : Read Scaling
If you care about your data, you likely have one or two slaves online should your master fail. Another great use for slaves is read scaling, meaning you are sending queries to your slaves to put less load on your master. TokuDB’s read free replication means that ALL of your slaves read IO capacity is available for these queries. Plus, lag free replication means you can send more of your queries to the slave without worrying about how current the data on the slave is.
Use Case 2 : Mixed Server Environments
Since MySQL replication is independent from the storage engine, it’s always been possible to add a TokuDB slave to an existing environment using InnoDB (or MyISAM). By adding a TokuDB slave to your environment you can measure the reduction in IO, increased replication throughput, and how much compression you can achieve with your own data. No benchmark compares to your own actual data and workload.
Use Case 3 : Big Masters, Small Slaves
Adding an additional TokuDB slave can be much more cost effective since it uses far less CPU, RAM, and IO. Keep in mind that if you’re using MySQL replication for availability you need to make sure that one of your slaves is powerful enough to be promoted to master and handle your workload.
What about MySQL 5.6 and MariaDB 10.x
RFR is 100% compatible with the parallel replication features in MySQL 5.6 and MariaDB 10.0. There is effort for the implementation, but the features are compatible.
What’s next?
As evidenced by my benchmarks, TokuDB fundamentally changes the landscape of slave replication performance. But as we’ve seen before, removing one bottleneck only opens your eyes to the next one. Next up is reducing the number of fsync() operations on the slave which will leave even more IO capability available for read scaling. Keep an eye on this blog, we’ll have updates as we make progress.
To learn more about TokuDB:
Download it, check out the documentation, or get active in our Jira.
PREVIOUS POST
NEXT POST

8 Comments

  • Hi Tim,

    This sounds like a great feature. It has some similarities to Domas Mituzas’ work at Facebook to speed up replication using prefetch but with the obvious benefit that you don’t actually *do* reads. That’s a nice touch. 🙂

    Can external clients (…like Tungsten) can take advantage of the performance boost?

    Cheers, Robert

    • In the feature’s current state the TokuDB slave performs the optimization while processing the binary logs. If Tungsten is replacing the binary logging process then we will need to add additional APIs for it to work.

    • TokuDB Read Free Replication (RFR) only works if the replication data in the binary log is row based. Mixed mode defaults to statement based binary logging, and reverts to row based under certain circumstances. So mixed mode is fine, but you’ll want to switch to row based for the parts of your workload where RFR is desired.

  • […] I find TokuDB to be a fascinating engine. I can tell I will need to re-watch our Dbhangops session where Tim Callaghan talked about the differences between B-Tree and Fractal Tree indexes. There’s also a session on how compression works in TokuDB and they continue to innovate with read-free replication. […]

  • Read free replication on TokuDB was inspired by the replication implementation on TokuMX. We want to avoid reads when all of the data necessary to do a write, or delete or update has already been supplied from the binary log on TokuDB (or the oplog on TokuMX).

  • […] I find TokuDB to be a fascinating engine. I can tell I will need to re-watch our Dbhangops session where Tim Callaghan talked about the differences between B-Tree and Fractal Tree indexes. There’s also a session on how compression works in TokuDB and they continue to innovate with read-free replication. […]

  • RFR is great, thanks for all the affords to Tokutek team. I thing I think is missing in documentation. If your table is lacking PK, update, delete events will not propagate reliably and eventually will create a data inconsistency.
    The Note in Documentation “In MySQL 5.5 and MariaDB 5,5, only tables with a defined primary key are eligible for this optimization. The limitation does not apply to MySQL 5.6” slightly misleading. In my case I use 5.6.27-76.0-log Percona Server (GPL), Release 76.0
    replication works fine with RFR disabled but after enabling it if 1 row updated on the Master, sometimes I end up with 2 rows on the slave. 1st row – original, 2nd row – with new updated data.
    If I delete the row on the master, I see only 1 of the updated rows was deleted on the Slave. odd.

    So, this is undocumented behaviour and needs to be mentioned as people might have legacy tables with no PK end up to be inconsistent with the Master. Or this is a bug and needs to be fixed.

Leave a Reply