November 28, 2014

Parallel Query for MySQL with Shard-Query

While Shard-Query can work over multiple nodes, this blog post focuses on using Shard-Query with a single node.  Shard-Query can add parallelism to queries which use partitioned tables.  Very large tables can often be partitioned fairly easily. Shard-Query can leverage partitioning to add paralellism, because each partition can be queried independently. Because MySQL 5.6 supports the […]

Using Apache Hadoop and Impala together with MySQL for data analysis

Apache Hadoop is commonly used for data analysis. It is fast for data loads and scalable. In a previous post I showed how to integrate MySQL with Hadoop. In this post I will show how to export a table from  MySQL to Hadoop, load the data to Cloudera Impala (columnar format) and run a reporting […]

A conversation with 5 Facebook MySQL gurus

Facebook, the undisputed king of online social networks, has 1.23 billion monthly active users collectively contributing to an ocean of data-intensive tasks – making the company one of the world’s top MySQL users. A small army of Facebook MySQL experts will be converging on Santa Clara, Calif. next week where several of them are leading […]

Tools and tips for analysis of MySQL’s Slow Query Log

MySQL has a nice feature, slow query log, which allows you to log all queries that exceed a predefined about of time to execute. Peter Zaitsev first wrote about this back in 2006 – there have been a few other posts here on the MySQL Performance Blog since then (check this and this, too) but […]

Q&A: Common (but deadly) MySQL Development Mistakes

On Wednesday I gave a presentation on “How to Avoid Common (but Deadly) MySQL Development Mistakes” for Percona MySQL Webinars. If you missed it, you can still register to view the recording and my slides. Thanks to everyone who attended, and especially to folks who asked the great questions. I answered as many as we had time […]

Multiple column index vs multiple indexes with MySQL 5.6

A question often comes when talking about indexing: should we use multiple column indexes or multiple indexes on single columns? Peter Zaitsev wrote about it back in 2008 and the conclusion then was that a multiple column index is most often the best solution. But with all the recent optimizer improvements, is there anything different with […]

Using the new spatial functions in MySQL 5.6 for geo-enabled applications

Geo-enabled (or location enabled) applications are very common nowadays and many of them use MySQL. The common tasks for such applications are: Find all points of interests (i.e. coffee shops) around (i.e. a 10 mile radius) the given location (latitude and longitude). For example we want to show this to a user of the mobile […]

Innotop: A real-time, advanced investigation tool for MySQL

GUI monitoring tools for MySQL are not always suitable for all our needs or situations. Most of them are designed to provide historical views into what happens to our database over time rather then real-time insight into current MySQL server status. Excellent free tools for this include Cacti, Zabbix, Ganglia, Nagios, etc. But each of […]

Handling long-running queries in MySQL with Percona XtraBackup

I recently had a case where replication lag on a slave was caused by a backup script. First reaction was to incriminate the additional pressure on the disks, but it turned out to be more subtle: Percona XtraBackup was not able to execute FLUSH TABLES WITH READ LOCK due to a long-running query, and the […]

MySQL and Hadoop integration

Dolphin and Elephant: an Introduction This post is intended for MySQL DBAs or Sysadmins who need to start using Apache Hadoop and want to integrate those 2 solutions. In this post I will cover some basic information about the Hadoop, focusing on Hive as well as MySQL and Hadoop/Hive integration. First of all, if you […]