MySQL CDC, Streaming Binary Logs, and Asynchronous Triggers

MySQL CDCIn this post, we’ll look at MySQL CDC, streaming binary logs and asynchronous triggers.

What is Change Data Capture and why do we need it?

Change Data Capture (CDC) tracks data changes (usually close to realtime). In MySQL, the easiest and probably most efficient way to track data changes is to use binary logs. However, other approaches exist. For example:

  • General log or Audit Log Plugin (which logs all queries, not just the changes)
  • MySQL triggers (not recommended, as it can slow down the application — more below)

One of the first implementations of CDC for MySQL was the FlexCDC project by Justin Swanhart. Nowadays, there are a lot of CDC implementations (see mysql-cdc-projects wiki for a long list).

CDC can be implemented for various tasks such as auditing, copying data to another system or processing (and reacting to) events. In this blog post, I will demonstrate how to use a CDC approach to stream MySQL binary logs, process events and save it (stream to) another MySQL instance (or MongoDB). In addition, I will show how to implement asynchronous triggers by streaming binary logs.

Streaming binary logs 

You can read binary logs using the mysqlbinlog utility, by adding “-vvv” (verbose option). mysqlbinlog can also show human readable version for the ROW based replication. For example:

Starting with MySQL 5.6, mysqlbinlog can also read the binary log events from a remote master (“fake” replication slave).

Reading binary logs is a great basis for CDC. However, there are still some challenges:

  1. ROW-based replication is probably the easiest way to get the RAW changes, otherwise we will have to parse SQL. At the same time, ROW-based replication binary logs don’t contain the table metadata, i.e. it does not record the field names, only field number (as in the example above “@1” is the first field in table “a”).
  2. We will need to somehow record and store the binary log positions so that the tool can be restarted at any time and proceed from the last position (like a MySQL replication slave).

Maxwell’s daemon (Maxwell = Mysql + Kafka), an application recently released by Zendesk, reads MySQL binlogs and writes row updates as JSON (it can write to Kafka, which is its primary goal, but can also write to stdout and can be extended for other purposes). Maxwell stores the metadata about MySQL tables and binary log events (and other metadata) inside MySQL, so it solves the potential issues from the above list.

Here is a quick demo of Maxwell:

Session 1 (Insert into MySQL):

Session 2 (starting Maxwell):

As we can see in this example, Maxwell get the events from MySQL replication stream and outputs it into stdout (if we change the producer, it can save it to Apache Kafka).

Saving binlog events to MySQL document store or MongoDB

If we want to save the events to some other place we can use MongoDB or MySQL JSON fields and document store (as Maxwell will provide use with JSON documents). For a simple proof of concept, I’ve created nodeJS scripts to implement a CDC “pipleline”:

And to run it we can use the pipeline:

The same approach can be used to save the CDC events to MongoDB with mongoimport: