Mining MySQL's Binary Log with Apache Kafka and Kafka Connect
MySQL's binary log contains a treasure trove of event data, recording every change made to the database. However, normal database queries operate over, and mutate, only the current state of the database. This is a convenient interface for many applications, but a large amount of useful data that could be processed, analyzed, and used to make business decisions is lost if you never extract it from the binary log. In this talk, I'll describe how to combine MySQL, Apache Kafka, and Kafka's new import/export tool, Kafka Connect, to leverage this data. First I'll describe Kafka Connect and how it enables and simplifies the capture of raw event data directly into Kafka. Next, I'll show how this enables the application of both real-time stream processing and delivery of the event data to systems such as Hadoop for offline, batch analysis. Finally, I'll show how to generalize this pattern to still other systems, allowing you to standardize and unify how you model, build, and maintain your entire data pipeline.
Ewen Cheslack-Postava is a Kafka committer and engineer at Confluent building a stream data platform based on Apache Kafka to help organizations reliably and robustly capture and leverage all their real-time data. He received his PhD from Stanford University where he developed Sirikata, an open source system for massive virtual environments. His dissertation defined a novel type of spatial query giving significantly improved visual fidelity, and described a system for efficiently processing these queries at scale.