Percona Live Europe featured talk with Alexander Krasheninnikov — Processing 11 billion events a day with Spark in BadooDave Avery
Welcome to a new Percona Live Europe featured talk with Percona Live Europe 2016: Amsterdam speakers! In this series of blogs, we’ll highlight some of the speakers that will be at this year’s conference. We’ll also discuss the technologies and outlooks of the speakers themselves. Make sure to read to the end to get a special Percona Live Europe registration bonus!
In this Percona Live Europe featured talk, we’ll meet Alexander Krasheninnikov, Head of Data Team at Badoo. His talk will be on Processing 11 billions events a day with Spark in Badoo. Badoo is one of the world’s largest and fastest growing social networks for meeting new people. I had a chance to speak with Alexander and learn a bit more about the database environment at Badoo:
Percona: Give me a brief history of yourself: how you got into database development, where you work, what you love about it?
Alexander: Currently, I work at Badoo as Head of Data Team. Our team is responsible for providing internal API’s for statistics data collecting and processing.
I started as a developer at Badoo, but the project I am going to cover in my talk lead to creating a separate department.
Percona: Your talk is called “Processing 11 billion events a day with Spark in Badoo.” What were the issues with your environment that led you to Spark? How did Spark solve these needs?
Alexander: When we designed the Unified Data Stream system in Badoo, we’ve extracted several requirements: scalability, fault tolerance and reliability. Altogether, these requirements moved us towards using Hadoop as deep data storage and data processing framework. Our initial implementation was built on top of Scribe + WebHDFS + Hive. But we’ve realized that processing speed and any lag of data delivery is unacceptable (we need near-realtime data processing). One of our BI team mentioned Spark as being significantly faster than Hive in some cases, (especially ones similar to ours). When investigated Spark’s API, we found the Streaming submodule — ideal for our needs. Additionally, this framework allowed us to use some third-party libraries, and write code. We’ve actually created an aggregation framework that follows “divide and conquer” principle. Without Spark, we definitely went way re-inventing lot of things from it.
Percona: Why is tracking the event stream important for your business model? How are you using the data Spark is providing you to reach business goals?
Alexander: The event stream always represents some important business/technical metrics — votes, messages, likes and so on. All this, brought together, forms the “health” of our product. The primary goal of our Spark-based system is to process a heterogeneous event stream one way, and draw charts automatically. We acheived this goal, and now we have hundreds of charts and dozens of developers/analysts/product team members using them. The system also evolved, and now we perform automatic anomaly detection over the event stream. We report strange data behavior to all the interested people.
Percona: What is changing in data use in your businesses model that keeps you awake at night? What tools or features are you looking for to address these issues?
Alexander: As I’ve mentioned before, we have an anomaly detection process for our metrics. If some of our metrics are out of expected bounds, it is treated as being an anomaly, and notification are sent. Also, we have a self-monitoring functionality for the whole system — a small event rate of heartbeats is generated, and processed with two different systems. If those show a significant difference — that defintely keeps me awake at night! 🙂
Percona: What are looking forward to the most at Percona Live Europe this year?
Alexander: My main interest is distributed open source databases. At Percona Live Europe, I expect to gain a lot of new information from the appropriate conference sections. Particularly, I want to get some knowledge about Yandex ClickHouse, as it looks very promising.
You can read more about how Alexander and Badoo use Spark here: techblog.badoo.com.
Want to find out more about Alexander, Spark and Badoo? Register for Percona Live Europe 2016, and come see his talk Processing 11 billions events a day with Spark in Badoo.
Use the code FeaturedTalk and receive €25 off the current registration price!
Percona Live Europe 2016: Amsterdam is the premier event for the diverse and active open source database community. The conferences have a technical focus with an emphasis on the core topics of MySQL, MongoDB, and other open source databases. Percona live tackles subjects such as analytics, architecture and design, security, operations, scalability and performance. It also provides in-depth discussions for your high-availability, IoT, cloud, big data and other changing business needs. This conference is an opportunity to network with peers and technology professionals by bringing together accomplished DBA’s, system architects and developers from around the world to share their knowledge and experience. All of these people help you learn how to tackle your open source database challenges in a whole new way.
This conference has something for everyone!