September 2, 2014

How number of columns affects performance ?

It is pretty understood the tables which have long rows tend to be slower than tables with short rows. I was interested to check if the row length is the only thing what matters or if number of columns we have to work with also have an important role. I was interested in peak row […]

Multi Column indexes vs Index Merge

The mistake I commonly see among MySQL users is how indexes are created. Quite commonly people just index individual columns as they are referenced in where clause thinking this is the optimal indexing strategy. For example if I would have something like AGE=18 AND STATE=’CA’ they would create 2 separate indexes on AGE and STATE […]

Finding out largest tables on MySQL Server

Finding largest tables on MySQL instance is no brainier in MySQL 5.0+ thanks to Information Schema but I still wanted to post little query I use for the purpose so I can easily find it later, plus it is quite handy in a way it presents information:


We all know disk based temporary tables are bad and you should try to have implicit temporary tables created in memory where possible, do to it you should increase tmp_table_size to appropriate value and avoid using blob/text columns which force table creation on the disk because MEMORY storage engine does not support them Right ? […]

Long PRIMARY KEY for Innodb tables

I’ve written and spoke a lot about using short PRIMARY KEYs with Innodb tables due to the fact all other key will refer to the rows by primary key. I also recommended to use sequential primary keys so you do not end up having random primary key BTREE updates which can be very expensive. Today […]

Trawling the binlog with FlexCDC and new FlexCDC plugins for MySQL

Swanhart-Tools includes FlexCDC, a change data capture tool for MySQL. FlexCDC follows a server’s binary log and usually writes “changelogs” that track the changes to tables in the database. I say usually because the latest version of Swanhart-Tools (only in github for now) supports FlexCDC plugins, which allow you to send the updates to a remote […]

Check for MySQL slave lag with Percona Toolkit plugin for Tungsten Replicator

A while back, I made some changes to the plugin interface for pt-online-schema-change which allows custom replication checks to be written. As I was adding this functionality, I also added the –plugin option to pt-table-checksum. This was released in Percona Toolkit 2.2.8. With these additions, I spent some time writing a plugin that allows Percona […]

Getting to know TokuDB for MySQL

During last April’s Percona Live MySQL Conference and Expo, TokuDB celebrated it’s first full-year as an open source storage engine. I still remember reading the official announcement and the expectations it created one year ago. The premises were very interesting as it had the potential of helping MySQL manage “big data” in a way InnoDB just […]

Using InfiniDB MySQL server with Hadoop cluster for data analytics

In my previous post about Hadoop and Impala I benchmarked performance of analytical queries in Impala. This time I’ve tried InfiniDB for Hadoop (open-source version) on the modern hardware with an 8-node Hadoop cluster. One of the main advantages (at least for me) of InifiniDB for Hadoop is that it stores the data inside the Hadoop cluster but uses the […]

Parallel Query for MySQL with Shard-Query

While Shard-Query can work over multiple nodes, this blog post focuses on using Shard-Query with a single node.  Shard-Query can add parallelism to queries which use partitioned tables.  Very large tables can often be partitioned fairly easily. Shard-Query can leverage partitioning to add paralellism, because each partition can be queried independently. Because MySQL 5.6 supports the […]