Apache Spark

Sneak Peek of the Percona Live 2018 Open Source Database Conference Breakout Sessions!

Percona Live 2018 Keynotes

Take a look at the sneak peek of the breakout sessions for the Percona Live 2018 Open Source Database Conference, taking place April 23-25, 2018 at the Santa Clara Convention Center in Santa Clara, California. Early Bird registration discounts are available until February 4, 2018, and sponsorship opportunities are still available.
Conference breakout sessions will feature a range […]

Read more

Column Store Database Benchmarks: MariaDB ColumnStore vs. Clickhouse vs. Apache Spark

Column Store Database

This blog shares some column store database benchmark results, and compares the query performance of MariaDB ColumnStore v. 1.0.7 (based on InfiniDB), Clickhouse and Apache Spark.
I’ve already written about ClickHouse (Column Store database).
The purpose of the benchmark is to see how these three solutions work on a single big server, with many CPU cores and large amounts of RAM. […]

Read more

Percona Live 2017 Sneak Peek Schedule Up Now! See the Available Sessions!

Percona Live

We are excited to announce that the sneak peek schedule for the Percona Live 2017 Open Source Database Conference is up! The Percona Live Open Source Database Conference 2017 is April 24th – 27th, at the Hyatt Regency Santa Clara & The Santa Clara Convention Center.
The Percona Live Open Source Database Conference 2017 is […]

Read more

How Apache Spark makes your slow MySQL queries 10x faster (or more)

Slow MySQL Queries

In this blog post, we’ll discuss how to improve the performance of slow MySQL queries using Apache Spark.
Introduction
In my previous blog post, I wrote about using Apache Spark with MySQL for data analysis and showed how to transform and analyze a large volume of data (text files) with Apache Spark. Vadim also performed a benchmark […]

Read more

Apache Spark with Air ontime performance data

There is a growing interest in Apache Spark, so I wanted to play with it (especially after Alexander Rubin’s Using Apache Spark post).
To start, I used the recently released Apache Spark 1.6.0 for this experiment, and I will play with “Airlines On-Time Performance” database from
http://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=236&DB_Short_Name=On-Time. You can find the scripts I used here https://github.com/Percona-Lab/ontime-airline-performance. The uncompressed dataset is about 70GB, which is not […]

Read more

Using Apache Spark and MySQL for Data Analysis

What is Spark
Apache Spark is a cluster computing framework, similar to Apache Hadoop. Wikipedia has a great description of it:
Apache Spark is an open source cluster computing framework originally developed in the AMPLab at University of California, Berkeley but was later donated to the Apache Software Foundation where it remains today. In contrast to […]

Read more