October 30, 2014

How to avoid hash collisions when using MySQL’s CRC32 function

Percona Toolkit’s  pt-table-checksum performs an online replication consistency check by executing checksum queries on the master, which produces different results on replicas that are inconsistent with the master – and the tool pt-table-sync synchronizes data efficiently between MySQL tables. The tools by default use the CRC32. Other good choices include MD5 and SHA1. If you have installed the FNV_64 user-defined function, pt-table-sync will […]

Using Apache Hadoop and Impala together with MySQL for data analysis

Apache Hadoop is commonly used for data analysis. It is fast for data loads and scalable. In a previous post I showed how to integrate MySQL with Hadoop. In this post I will show how to export a table from  MySQL to Hadoop, load the data to Cloudera Impala (columnar format) and run a reporting […]

Percona Toolkit collection: pt-visual-explain

This is the first in a series of posts highlighting a few of the seldom-used but still handy Percona Toolkit tools. Have you ever had a problem understanding the EXPLAIN statement output? And are you the type of person who would rather use the command line than a GUI application? Then I would recommend that […]

Using the new spatial functions in MySQL 5.6 for geo-enabled applications

Geo-enabled (or location enabled) applications are very common nowadays and many of them use MySQL. The common tasks for such applications are: Find all points of interests (i.e. coffee shops) around (i.e. a 10 mile radius) the given location (latitude and longitude). For example we want to show this to a user of the mobile […]

When EXPLAIN estimates can go wrong!

I have been working with a few customer cases and one interesting case popped up. The customer was facing a peculiar problem where the rows column in the EXPLAIN output of the query was totally off. The actual number of rows was 18 times more than the number of rows reported by MySQL in the […]

Explaining Indexes with a Library Metaphor

My favorite metaphor for explaining indexes is comparing them to index cards in an old library. In an old library, you used to (or still do) have index cards at the front desk which have some brief description of the books in the library. They also used to be categorized alphabetically. (image taken from http://www.flickr.com/photos/reedinglessons/2239767394/) […]

Using any general purpose computer as a special purpose SIMD computer

Often times, from a computing perspective, one must run a function on a large amount of input. Often times, the same function must be run on many pieces of input, and this is a very expensive process unless the work can be done in parallel. Shard-Query introduces set based processing, which on the surface appears […]

Using Flexviews – part two, change data capture

In my previous post I introduced materialized view concepts. This post begins with an introduction to change data capture technology and describes some of the ways in which it can be leveraged for your benefit. This is followed by a description of FlexCDC, the change data capture tool included with Flexviews. It continues with an […]

EXPLAIN EXTENDED can tell you all kinds of interesting things

While many people are familiar with the MySQL EXPLAIN command, fewer people are familiar with “extended explain” which was added in MySQL 4.1 EXPLAIN EXTENDED can show you what the MySQL optimizer does to your query. You might not know this, but MySQL can dramatically change your query before it actually executes it. This process […]

Pacemaker, please meet NDB Cluster or using Pacemaker/Heartbeat to start a NDB Cluster

Customers have always asked me to make NDB Cluster starts automatically upon startup of the servers. For the ones who know NDB Cluster, it is tricky to make it starts automatically. I know at least 2 sets of scripts to manage NDB startup, ndb-initializer and from Johan configurator www.severalnines.com. If all the nodes come up […]