October 23, 2014

Looking out for max values in integer-based columns in MySQL

Yay! My first blog post! As long as at least 1 person finds it useful, I’ve done my job. Recently, one of my long-term clients was noticing that while their INSERTs were succeeding, a particular column counter was not incrementing. A quick investigation determined the column was of type int(11) and they had reached the […]

Multi Column indexes vs Index Merge

The mistake I commonly see among MySQL users is how indexes are created. Quite commonly people just index individual columns as they are referenced in where clause thinking this is the optimal indexing strategy. For example if I would have something like AGE=18 AND STATE=’CA’ they would create 2 separate indexes on AGE and STATE […]

When (and how) to move an InnoDB table outside the shared tablespace

In my last post, “A closer look at the MySQL ibdata1 disk space issue and big tables,” I looked at the growing ibdata1 problem under the perspective of having big tables residing inside the so-called shared tablespace. In the particular case that motivated that post, we had a customer running out of disk space in his […]

Q&A: Even More Deadly Mistakes of MySQL Development

On Wednesday I gave a presentation on “How to Avoid Even More Common (but Deadly) MySQL Development Mistakes” for Percona MySQL Webinars.  If you missed it, you can still register to view the recording and my slides. Thanks to everyone who attended, and especially to folks who asked the great questions.  I answered as many as we had […]

Check for MySQL slave lag with Percona Toolkit plugin for Tungsten Replicator

A while back, I made some changes to the plugin interface for pt-online-schema-change which allows custom replication checks to be written. As I was adding this functionality, I also added the –plugin option to pt-table-checksum. This was released in Percona Toolkit 2.2.8. With these additions, I spent some time writing a plugin that allows Percona […]

Using InfiniDB MySQL server with Hadoop cluster for data analytics

In my previous post about Hadoop and Impala I benchmarked performance of analytical queries in Impala. This time I’ve tried InfiniDB for Hadoop (open-source version) on the modern hardware with an 8-node Hadoop cluster. One of the main advantages (at least for me) of InifiniDB for Hadoop is that it stores the data inside the Hadoop cluster but uses the […]

Parallel Query for MySQL with Shard-Query

While Shard-Query can work over multiple nodes, this blog post focuses on using Shard-Query with a single node.  Shard-Query can add parallelism to queries which use partitioned tables.  Very large tables can often be partitioned fairly easily. Shard-Query can leverage partitioning to add paralellism, because each partition can be queried independently. Because MySQL 5.6 supports the […]

Using Apache Hadoop and Impala together with MySQL for data analysis

Apache Hadoop is commonly used for data analysis. It is fast for data loads and scalable. In a previous post I showed how to integrate MySQL with Hadoop. In this post I will show how to export a table from  MySQL to Hadoop, load the data to Cloudera Impala (columnar format) and run a reporting […]

Creating GEO-enabled applications with MySQL 5.6

In my previous post I’ve showed some new MySQL 5.6 features which can be very helpful when creating geo-enabled applications. In this post I will show how we can obtain open-source GIS data, convert it to MySQL and use it in our GEO-enabled applications. I will also present at the upcoming Percona Live conference on this […]

The MySQL ARCHIVE storage engine – Alternatives

In my previous post I pointed out that the existing ARCHIVE storage engine in MySQL may not be the one that will satisfy your needs when it comes to effectively storing large and/or old data. But are there any good alternatives? As the primary purpose of this engine is to store rarely accessed data in disk […]