October 1, 2014

Q&A: Putting MySQL Fabric to use

Martin Arrieta and I gave an online presentation last week on “Putting MySQL Fabric To Use.” If you missed it, you can find a recording and the slides here, and the vagrant environment we used plus a transcript of the commands we ran here (be sure to check out the ‘sharding’ branch, as that’s what we used […]

Using InfiniDB MySQL server with Hadoop cluster for data analytics

In my previous post about Hadoop and Impala I benchmarked performance of analytical queries in Impala. This time I’ve tried InfiniDB for Hadoop (open-source version) on the modern hardware with an 8-node Hadoop cluster. One of the main advantages (at least for me) of InifiniDB for Hadoop is that it stores the data inside the Hadoop cluster but uses the […]

Using Apache Hadoop and Impala together with MySQL for data analysis

Apache Hadoop is commonly used for data analysis. It is fast for data loads and scalable. In a previous post I showed how to integrate MySQL with Hadoop. In this post I will show how to export a table from  MySQL to Hadoop, load the data to Cloudera Impala (columnar format) and run a reporting […]

Tools and Techniques for Index Design Webinar Questions Followup

I presented a webinar this week to give an overview of Tools and Techniques for Index Design. Even if you missed the webinar, you can register for it, and you’ll be emailed a link to the recording. I’d like to invite folks who are interested in tools for query optimization to attend the new Percona […]

Wow. My 6 year old MySQL Bug is finally fixed in MySQL 5.6

I got the message in the morning today about the bug being fixed in MySQL 5.6.6…. which I reported in Early 2006 (while still being with MySQL) and running MySQL 4.1 I honestly thought this issue was fixed long ago as it was indeed pretty annoying. I must say I’m very impressed with Oracle team […]

A case for MariaDB’s Hash Joins

MariaDB 5.3/5.5 has introduced a new join type “Hash Joins” which is an implementation of a Classic Block-based Hash Join Algorithm. In this post we will see what the Hash Join is, how it works and for what types of queries would it be the right choice. I will show the results of executing benchmarks […]

Join Optimizations in MySQL 5.6 and MariaDB 5.5

This is the third blog post in the series of blog posts leading up to the talk comparing the optimizer enhancements in MySQL 5.6 and MariaDB 5.5. This blog post is targeted at the join related optimizations introduced in the optimizer. These optimizations are available in both MySQL 5.6 and MariaDB 5.5, and MariaDB 5.5 […]

How to convert MySQL’s SHOW PROFILES into a real profile

SHOW PROFILES shows how much time MySQL spends in various phases of query execution, but it isn’t a full-featured profile. By that, I mean that it doesn’t show similar phases aggregated together, doesn’t sort them by worst-first, and doesn’t show the relative amount of time consumed. I’ll profile the “nicer_but_slower_film_list” included with the Sakila sample […]

Using any general purpose computer as a special purpose SIMD computer

Often times, from a computing perspective, one must run a function on a large amount of input. Often times, the same function must be run on many pieces of input, and this is a very expensive process unless the work can be done in parallel. Shard-Query introduces set based processing, which on the surface appears […]

Distributed Set Processing with Shard-Query

Can Shard-Query scale to 20 nodes? Peter asked this question in comments to to my previous Shard-Query benchmark. Actually he asked if it could scale to 50, but testing 20 was all I could due to to EC2 and time limits. I think the results at 20 nodes are very useful to understand the performance: […]