November 26, 2014

Using Apache Hadoop and Impala together with MySQL for data analysis

Apache Hadoop is commonly used for data analysis. It is fast for data loads and scalable. In a previous post I showed how to integrate MySQL with Hadoop. In this post I will show how to export a table from  MySQL to Hadoop, load the data to Cloudera Impala (columnar format) and run a reporting […]

Ultimate MySQL variable and status reference list

I am constantly referring to the amazing MySQL manual, especially the option and variable reference table. But just as frequently, I want to look up blog posts on variables, or look for content in the Percona documentation or forums. So I present to you what is now my newest Firefox toolbar bookmark: an option and […]

Implementing efficient counters with MySQL

On many web sites you would see a counter how many time given object – blog post, forum thread, image, movie etc was viewed. This is sometimes handy feature but it can be rather expensive from performance point of view. The nasty thing with counters as they are implemented the most trivial way – they […]

MySQL MyISAM Active Active Clustering – looking for trouble ?

Reading last few days worth of planet MySQL and commenting on some entries as you can see. The post by Oli catches my attention. There is also PDF with more details available Oli is saying you can use MySQL with Active Active Clustering and MyISAM tables if you follow certain rules like enabling external locking, […]

MySQL Server Variables – SQL layer or Storage Engine specific.

MySQL Server has tons of variables which may be adjusted to change behavior or for performance purposes. They are documented in the manual as well as on new page Jay has created. Still I see constant confusion out where which of variables apply to storage engines only and which are used on SQL layer and […]

Distributed Set Processing with Shard-Query

Can Shard-Query scale to 20 nodes? Peter asked this question in comments to to my previous Shard-Query benchmark. Actually he asked if it could scale to 50, but testing 20 was all I could due to to EC2 and time limits. I think the results at 20 nodes are very useful to understand the performance: […]

Tokyo Tyrant – The Extras Part I : Is it Durable?

You know how in addition to the main movie you have extras on the DVD.  Extra commentary, bloopers, extra scenes, etc? Well welcome the Tyrant extras.  With my previous blog posts I was trying to set-up a case for looking at NOSQL tools, and not meant to be a decision making tool.  Each solution has […]