Percona Live: Data Performance Conference 2016 Logo

April 18-21, 2016

Santa Clara, California

A Billion Messages a Day - Yelp's Real-time Data Pipeline

A Billion Messages a Day - Yelp's Real-time Data Pipeline

 19 April 5:15 PM - 05:40 PM @ Ballroom A
Experience level: 
Intermediate
Duration: 
25 minutes conference
Tracks:
Architecture/Design
New and Trending Topics
Topics:
MySQL
Tools and Techniques
Replication

Description

Yelp moved quickly into building out a comprehensive service oriented architecture, and before long had over 100 data-owning production services. Distributing data across an organization creates a number of issues, particularly around the cost of joining disparate data sources, dramatically increasing the complexity of bulk data applications. Straightforward solutions like bulk data APIs and sharing data snapshots have significant drawbacks. Yelp's Data Pipeline makes it easier for these services to communicate with each other, provides a framework for real-time data processing, and facilitates high-performance bulk data applications - making large SOAs easier to work with. The Data Pipeline provides a series of guarantees that makes it easy to create universal data producers and consumers that can be mashed up into interesting real-time data flows. In one of Yelp's more interesting applications, we created a tool that connects to our MySQL binary replication logs, publishing row-level changes into the pipeline, where they then flow to a variety of targets, including Salesforce, Amazon Redshift, and are indexed for search.

Speakers

Justin Cunningham's picture

Justin Cunningham

Software Engineer, Yelp

Biography:

Justin Cunningham is the technical lead for the Business Analytics and Metrics team at Yelp, principally working on scaling Yelp's data infrastructure to support over 100 million monthly unique visitors. Before Yelp, Justin worked at several small startups that he founded.

Share this talk