September 23, 2014

Using MySQL 5.6 Global Transaction IDs (GTIDs) in production: Q&A

Thank you to all of you who attended my webinar last week about Global Transaction IDs (GTIDs), which were introduced in MySQL 5.6 to make the reconfiguration of replication straightforward. If you missed my webinar, you can still listen to the recording and download the sides (free). We had a lot of questions during the […]

When (and how) to move an InnoDB table outside the shared tablespace

In my last post, “A closer look at the MySQL ibdata1 disk space issue and big tables,” I looked at the growing ibdata1 problem under the perspective of having big tables residing inside the so-called shared tablespace. In the particular case that motivated that post, we had a customer running out of disk space in his […]

Using InfiniDB MySQL server with Hadoop cluster for data analytics

In my previous post about Hadoop and Impala I benchmarked performance of analytical queries in Impala. This time I’ve tried InfiniDB for Hadoop (open-source version) on the modern hardware with an 8-node Hadoop cluster. One of the main advantages (at least for me) of InifiniDB for Hadoop is that it stores the data inside the Hadoop cluster but uses the […]

Beware of MySQL 5.6 server UUID when cloning slaves

The other day I was working on an issue where one of the slaves was showing unexpected lag. Interestingly with only the IO thread running the slave was doing significantly more IO as compared to the rate at which the IO thread was fetching the binary log events from the master. I found this out […]

Identifying the load with the help of pt-query-digest and Percona Server

Overview Profiling, analyzing and then fixing queries is likely the most oft-repeated part of a job of a DBA and one that keeps evolving, as new features are added to the application new queries pop up that need to be analyzed and fixed. And there are not too many tools out there that can make […]

Impact of the number of idle connections in MySQL

Be careful with my findings, I appear to have compile in debug mode, I am redoing the benchmarks. Updated version here. I recently had to work with many customers having large number of connections opened in MySQL and although I told them this was not optimal, I had no solid arguments to present. More than […]

Star Schema Bechmark: InfoBright, InfiniDB and LucidDB

In my previous rounds with DataWarehouse oriented engines I used single table without joins, and with small (as for DW) datasize (see,, Addressing these issues, I took Star Schema Benchmark, which is TPC-H modification, and tried run queries against InfoBright, InfiniDB, LucidDB and MonetDB. I did not get results for MonetDB, will […]

How to generate per-database traffic statistics using mk-query-digest

We often encounter customers who have partitioned their applications among a number of databases within the same instance of MySQL (think application service providers who have a separate database per customer organization … or wordpress-mu type of apps). For example, take the following single MySQL instance with multiple (identical) databases:

Missing Data – rows used to generate result set

As Baron writes it is not the number of rows returned by the query but number of rows accessed by the query will most likely be defining query performance. Of course not all row accessed are created equal (such as full table scan row accesses may be much faster than random index lookups row accesses […]

Stored Function to generate Sequences

Today a customer asked me to help them to convert their sequence generation process to the stored procedure and even though I have already seen it somewhere I did not find it with two minutes of googling so I wrote a simple one myself and posting it here for public benefit or my later use