EmergencyEMERGENCY? Get 24/7 Help Now!

Percona Live 2017 Sneak Peek Schedule Up Now! See the Available Sessions!

 | December 16, 2016 |  Posted In: Apache Spark, Big Data, Cloud and MySQL, Cloud and NoSQL, Database Monitoring, Docker, Events and Announcements, Group Replication, High-availability, InnoDB, MariaDB, MongoDB, MySQL, Orchestrator, Percona Live

Percona Live Featured Tutorial

We are excited to announce that the sneak peek schedule for the Percona Live 2017 Open Source Database Conference is up! The Percona Live Open Source Database Conference 2017 is April 24th – 27th, at the Hyatt Regency Santa Clara & The Santa Clara Convention Center. The Percona Live Open Source Database Conference 2017 is […]

Read More

Making Apache Spark Four Times Faster

 | January 15, 2016 |  Posted In: Apache Spark, MySQL

This is a followup to my previous post Apache Spark with Air ontime performance data. To recap an interesting point in that post: when using 48 cores with the server, the result was worse than with 12 cores. I wanted to understand the reason is was true, so I started digging. My primary suspicion was that Java (I […]

Read More

Apache Spark with Air ontime performance data

 | January 7, 2016 |  Posted In: Apache Spark, Benchmarks, MySQL

There is a growing interest in Apache Spark, so I wanted to play with it (especially after Alexander Rubin’s Using Apache Spark post). To start, I used the recently released Apache Spark 1.6.0 for this experiment, and I will play with “Airlines On-Time Performance” database from http://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=236&DB_Short_Name=On-Time. You can find the scripts I used here https://github.com/Percona-Lab/ontime-airline-performance. The uncompressed dataset is about 70GB, which is […]

Read More

Using Apache Spark and MySQL for Data Analysis

 | October 7, 2015 |  Posted In: Apache Spark, MySQL

What is Spark Apache Spark is a cluster computing framework, similar to Apache Hadoop. Wikipedia has a great description of it: Apache Spark is an open source cluster computing framework originally developed in the AMPLab at University of California, Berkeley but was later donated to the Apache Software Foundation where it remains today. In contrast […]

Read More