Query Planner Gotchas

Query Planner Gotchas


Indexes can reduce the amount of data your query touches by orders of magnitude. This results in a proportional query speedup. So what happens when you define a nice set of indexes and you don’t get the performance pop you were expecting? Consider the following example:

Now we’d like to perform the following query:

Great! We have index a, which cover this query. Using a should be really fast. You’d expect to use the index to jump to the beginning of the ‘this is a test’ values for a and from there continue using the index to the 8000000 section of the b‘s. From there it’s just a range query.

We added some test data (see this script) to see what would happen. We added 10M lines and made the value of a be ‘this is a test’ in every field. We got:

So the question is, is this any good? Checking the explain, we get:

Yow! Index a is used, but it’s not being used as expected. The ref type in the explain means that MySQL is performing a scan on all rows where a is ‘this is a test’ and not filtering on the values of b. This translates into looking at 10M rows instead of 100K, so this query plan is looking at 100x too much data.

This is a MySQL query planner bug, and it has been reported here. However, there is a workaround (in this case):

By adding a use index(a) — which shouldn’t do anything because we’re already using index a! — we get an 88x speedup. This is in line with looking at 1% of the data. Consulting the explain, we see:

We’re still using index a, but the type of the query plan has changed to range (which is what we would have expected all along). Now we are filtering on a and b.

Take home messages

  • Query planners are complicated things, and the MySQL query planner has some bugs.
  • It’s possible, in many cases, to deal with those bugs by checking the explain of a query that doesn’t behave as expected.
  • Although adding a use index isn’t the best software engineering practice, if you need to get fast query speeds and you’re getting a faulty query plan, it’s a handy tool to know about.

Of course, Tokutek customers care a lot about indexes. TokuDB for MySQL and MariaDB lets you define lots of indexes, because indexing is fast. You can define indexes with random keys, indexes that won’t fit in memory, clustering indexes that cover all queries, …. This insertion speed translates directly into query speed, so TokuDB can address many performance complaints, ranging from write- to read-intensive workloads.

Remember that if you take care of your indexes, they’ll take care of you!


Share this post

Leave a Reply