Advanced index analysis with mk-index-usage

The new release of Maatkit has a useful feature in mk-index-usage to help you determine how indexes are used in more flexible ways. The default report just prints out ALTER statements for removing unused indexes, which is nice, but it’s often helpful to ask more sophisticated questions about index usage. I’ll use this blog’s queries and indexes as an example.

The new feature lets you store the index usage into tables in a database, so you can query them with SQL. I logged all queries to the blog for a little while, and then secure-copied the query log to my laptop, which is across the continent. I used a variation on MySQL Sandbox to set up a little throw-away MySQL instance — that took ten seconds. And then I set up an SSH tunnel from my laptop to the MySQL Performance Blog server — another 30 seconds.

Now I’m set up: I can crunch the log locally, make it connect over the SSH tunnel to query EXPLAIN and find out the index usage on the remote server, and store the results in my MySQL sandbox instance. The new option is –save-results-database. The full mk-index-usage command looks like this:

After that finishes, the ‘index_usage’ database contains several tables:

Now let’s run some queries! From the documentation, you can copy-paste the examples. Let’s start with this one: which queries sometimes use different indexes, and what fraction of the time is each index chosen?

Here’s how to read this — the first row says that query 1262633894049058504 has two variations. One variation uses the mpb_wordpress.wp_posts.post_status index, 70% of the time in fact. The next variation is in row 2, the same query, which uses an index on post_status the other 30% of the time. If you look a few rows down, you can see that query 7675136724153707161 has a much more skewed index usage: 2.4% of the time it uses one index, and the rest of the time it uses another. These kinds of variations in query execution plans are important. You don’t want a small fraction of queries to have very different performance, in general — you want consistent performance. Maybe the performance IS consistent, we don’t know that from looking here, but we know that there is something worth looking into.

What is query 7675136724153707161, anyway?

How about another question: which indexes have lots of alternatives, i.e. are chosen instead of other indexes, and for what queries? This time I’ll add LIMIT 2 to the query, because there are lots of them:

So the primary key on a table in the forum database is favored over a couple of other indexes, for queries 6095451542512376951 and 11680437198542055892.

I think you can see how this goes — you can query these tables any way you want. The documentation includes a number of other canned queries you can run. I found a few things that I want to improve about the canned queries while writing this blog post. If you have suggestions, please post in the comments or file issues at the Maatkit bug tracker. And enjoy learning how your indexes are used!

Share this post

Comments (2)

  • peter Reply


    I think this is very cool. I wish tool also create some views for most common use cases in addition to tables it creates this would make it a lot easier to use for a lot of people.

    November 11, 2010 at 6:59 pm
  • Baron Schwartz Reply

    That’s an excellent idea. I’ll file a feature request.

    November 11, 2010 at 7:11 pm

Leave a Reply