Percona Live: Data Performance Conference 2016 Logo

April 18-21, 2016

Santa Clara, California

A Billion Messages a Day - Yelp's Real-time Data Pipeline

A Billion Messages a Day - Yelp's Real-time Data Pipeline

 19 April 5:15 PM - 05:40 PM @ Ballroom A
Experience level: 
25 minutes conference
New and Trending Topics
Tools and Techniques


Yelp moved quickly into building out a comprehensive service oriented architecture, and before long had over 100 data-owning production services. Distributing data across an organization creates a number of issues, particularly around the cost of joining disparate data sources, dramatically increasing the complexity of bulk data applications. Straightforward solutions like bulk data APIs and sharing data snapshots have significant drawbacks. Yelp's Data Pipeline makes it easier for these services to communicate with each other, provides a framework for real-time data processing, and facilitates high-performance bulk data applications - making large SOAs easier to work with. The Data Pipeline provides a series of guarantees that makes it easy to create universal data producers and consumers that can be mashed up into interesting real-time data flows. In one of Yelp's more interesting applications, we created a tool that connects to our MySQL binary replication logs, publishing row-level changes into the pipeline, where they then flow to a variety of targets, including Salesforce, Amazon Redshift, and are indexed for search.


Justin Cunningham's picture

Justin Cunningham

Software Engineer, Yelp


Justin Cunningham is the technical lead for the Business Analytics and Metrics team at Yelp, principally working on scaling Yelp's data infrastructure to support over 100 million monthly unique visitors. Before Yelp, Justin worked at several small startups that he founded.

Share this talk