Basic Housekeeping for MySQL Indexes

MySQL IndexesIn this blog post, we’ll look at some of the basic housekeeping steps for MySQL indexes.

We all know that indexes can be the difference between a high-performance database and a bad/slow/painful query ride. It’s a critical part that needs deserves some housekeeping once in a while. So, what should you check? In no particular order, here are some things to look at:

1. Unused indexes

With sys schema, is pretty easy to find unused indexes: use the schema_unused_indexes view.

This view is based on the performance_schema.table_io_waits_summary_by_index_usage table, which will require enabling the Performance Schema, the events_waits_current consumer and the wait/io/table/sql/handler instrument. PRIMARY (key) indexes are ignored.

If you don’t have them enabled, just execute these queries:

Quoting the documentation:

“To trust whether the data from this view is representative of your workload, you should ensure that the server has been up for a representative amount of time before using it.”

And by representative amount, I mean representative: 

  • Do you have a weekly job? Wait at least one week
  • Do you have monthly reports? Wait at least one month
  • Don’t rush!

Once you’ve found unused indexes, remove them.

2. Duplicated indexes

You have two options here:

  • pt-duplicate-key-checker
  • the schema_redundant_indexes view from sys_schema

The pt-duplicate-key-checker is part of Percona Toolkit. The basic usage is pretty straightforward:

Now, the schema_redundant_indexes view is also easy to use once you have sys schema installed. The difference is that it is based on the information_schema.statistics table:

Again, once you find the redundant index, remove it.

3. Potentially missing indexes

The statements summary tables from the performance schema have several interesting fields. For our case, two of them are pretty important: NO_INDEX_USED (means that the statement performed a table scan without using an index) and NO_GOOD_INDEX_USED (“1” if the server found no good index to use for the statement, “0” otherwise).

Sys schema has one view that is based on the performance_schema.events_statements_summary_by_digest table, and is useful for this purpose: statements_with_full_table_scans, which lists all normalized statements that have done a table scan.

For example:

The above query doesn’t use an index because there was no good index to use, and thus was reported. See the explain output:

Note that the “query” field reports the query digest (more like a fingerprint) instead of the actual query.

In this case, the CountryLanguage table is missing an index over the “isOfficial” field. It is your job to decide whether it is worth it to add the index or not.

4. Multiple column indexes order

It was explained before that Multiple Column index beats Index Merge in all cases when such index can be used, even when sometimes you might have to use index hints to make it work.

But when using them, don’t forget that the order matters. MySQL will only use a multi-column index if at least one value is specified for the first column in the index.

For example, consider this table:

A query against the field “Language” won’t use an index:

Simply because it is not the leftmost prefix for the Primary Key. If we add the “CountryCode” field, now the index will be used:

Now, you’ll have to also consider the selectivity of the fields involved. Which is the preferred order?

In this case, the “Language” field has a higher selectivity than “CountryCode”:

So in this case, if we create a multi-column index, the preferred order will be (Language, CountryCode).

Placing the most selective columns first is a good idea when there is no sorting or grouping to consider, and thus the purpose of the index is only to optimize where lookups. You might need to choose the column order, so that it’s as selective as possible for the queries that you’ll run most.

Now, is this good enough? Not really. What about special cases where the table doesn’t have an even distribution? When a single value is present way more times than all the others? In that case, no index will be good enough. Be careful not to assume that average-case performance is representative of special-case performance. Special cases can wreck performance for the whole application.

In conclusion, we depend heavily on proper indexes. Give them some love and care once in a while, and the database will be very grateful.

All the examples were done with the following MySQL and Sys Schema version:

Share this post

Comments (2)

  • Morgan Tocker

    Hi Daniel,

    Good balanced discussion on how often to do house keeping 🙂

    I just wanted to mention one thing – be careful when dropping seemingly redundant indexes. If a hint is used on a query, you will get an error rather than an alternative index selected.

    As mentioned in Manyi’s interview last week – in MySQL 8.0 we will support invisible indexes for this purpose (hide first, then delete after some time). This will also help for case #1 that you mention.

    – Morgan

    September 9, 2016 at 3:45 pm
  • doublemarket

    Hello and thanks for the nice post. The ways written here are good especially for DBAs who don’t know well about the app using a database.

    I translated this post into Japanese

    Please let me know if it’s a problem.

    Thanks again!

    P.S. I also translated Morgan’s comment as an additional note 🙂

    October 3, 2016 at 4:06 am

Comments are closed.

Use Percona's Technical Forum to ask any follow-up questions on this blog topic.