Percona Live: Data Performance Conference 2016 Logo

April 18-21, 2016

Santa Clara, California

Using Apache Spark and MySQL for Data Analysis

Using Apache Spark and MySQL for Data Analysis

 20 April 01:00 PM - 01:50 PM @ Ballroom F
Experience level: 
50 minutes conference
Big Data


Apache Spark is a cluster computing framework, similar to Apache Hadoop. There are a number of tasks where MySQL (out-of-the-box) does not show great performance (for example, one of the MySQL limitations is: one query = one CPU core, which means that even if you have 48 CPU cores it will not utilize the full computing power). Spark, on the the other hand, will be able to utilize all your CPU cores. In addition, Spark is a clustering framework, so you can easily add more compute nodes so that Spark can utilize more resources and perform even faster. In this talk I will demonstrate how to use Apache Spark together with MySQL for data analysis. I will sho how Apache Spark aggregates data (wikipedia pageview statistics) and stores the resultset in MySQL. I will also show how to use Apache Spark with multiple sources and join virtual tables from MySQL, flat files and even MongoDB.


Alexander Rubin's picture

Alexander Rubin

Principal Architect, Percona


Alexander has over 10 years industry experience with the MySQL database and related technologies. His specialties are performance tuning, full text search, high availability, database infrastructure architecture and data warehouses. He has helped many MySQL customers design extremely high performance databases with optimized schema and queries.

Share this talk