Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

Unused indexes by single query

September 13, 2008

Author

Vadim Tkachenko

Percona Software

Share this Post:

Usually unused indexes are devil, they waste diskspace, cache, they make INSERT / DELETE / UPDATE operations slower and what makes them worse – it is hard to find them.

But now ( with userstatsV2.patch) you can find all unused indexes (since last restart of mysqld) by single query

SELECT DISTINCT s.TABLE_SCHEMA, s.TABLE_NAME, s.INDEX_NAME
FROM information_schema.statistics `s` LEFT JOIN information_schema.index_statistics INDXS
ON (s.TABLE_SCHEMA = INDXS.TABLE_SCHEMA AND
s.TABLE_NAME=INDXS.TABLE_NAME AND
s.INDEX_NAME=INDXS.INDEX_NAME)
WHERE INDXS.TABLE_SCHEMA IS NULL;

SELECT DISTINCT s.TABLE_SCHEMA, s.TABLE_NAME, s.INDEX_NAME

FROM information_schema.statistics `s` LEFT JOIN information_schema.index_statistics INDXS

ON (s.TABLE_SCHEMA = INDXS.TABLE_SCHEMA AND

s.TABLE_NAME=INDXS.TABLE_NAME AND

s.INDEX_NAME=INDXS.INDEX_NAME)

WHERE INDXS.TABLE_SCHEMA IS NULL;

+--------------+---------------------------+-----------------+
| TABLE_SCHEMA | TABLE_NAME                | INDEX_NAME      |
+--------------+---------------------------+-----------------+
| art100       | article100                | ext_key         | 
| art100       | article100                | site_id         | 
| art100       | article100                | hash            | 
| art100       | article100                | forum_id_2      | 
| art100       | article100                | published       | 
| art100       | article100                | inserted        | 
| art100       | article100                | site_id_2       | 
| art100       | author100                 | PRIMARY         | 
| art100       | author100                 | site_id         | 
...
+--------------+---------------------------+-----------------+
1150 rows in set (1 min 44.23 sec)

+--------------+---------------------------+-----------------+

| TABLE_SCHEMA | TABLE_NAME | INDEX_NAME |

+--------------+---------------------------+-----------------+

| art100 | article100 | ext_key |

| art100 | article100 | site_id |

| art100 | article100 | hash |

| art100 | article100 | forum_id_2 |

| art100 | article100 | published |

| art100 | article100 | inserted |

| art100 | article100 | site_id_2 |

| art100 | author100 | PRIMARY |

| art100 | author100 | site_id |

...

+--------------+---------------------------+-----------------+

1150 rows in set (1 min 44.23 sec)

As you see query is not fast, mainly because information_schema.statistics is slow by itself, but in any case very helpful.

For some versions of the patch, where the table has only the INDEX_NAME and the ROWS_READ columns, you can use this query:

SELECT DISTINCT s.TABLE_SCHEMA, s.TABLE_NAME, s.INDEX_NAME
FROM information_schema.statistics `s`
   LEFT JOIN information_schema.index_statistics IST
      ON CONCAT_WS('.', s.TABLE_SCHEMA, s.TABLE_NAME, s.INDEX_NAME) = IST.INDEX_NAME
WHERE IST.INDEX_NAME IS NULL;

SELECT DISTINCT s.TABLE_SCHEMA, s.TABLE_NAME, s.INDEX_NAME

FROM information_schema.statistics `s`

LEFT JOIN information_schema.index_statistics IST

ON CONCAT_WS('.', s.TABLE_SCHEMA, s.TABLE_NAME, s.INDEX_NAME) = IST.INDEX_NAME

WHERE IST.INDEX_NAME IS NULL;

0 0 votes

Article Rating

14 Comments

Oldest

Newest Most Voted

Mark Robson

17 years ago

Of course you shouldn’t simply drop the unused indexes.

This patch, like any patch, should probably only be used on a non-production server. A non-production server, no matter how much you try, will NOT have the same queries and data as a production server (you could dump the SQL general query log – which will be very large – and replay it on a non-production server- that is probably the closest which is possible). Therefore, any data you get by this method needs to be used with extreme caution.

On a production server, there is probably someone somewhere running reports off your database using a cron job on a server you didn’t even know existed, which is exporting its data into something business critical. This process may only run once per week or month, but if you remove the index it relies on to work, its queries will take too long and their job will fail inexplicably. This process probably isn’t monitored properly so it will be several runs before someone in Finance notices that their data are out of date.

It is always safer to leave an index than drop it. If an index is not DEFINITELY causing a problem, leave it. If you have enough disc space, leave it.

If you want to drop an index, do some performance testing to find out how much benefit (disc space, insert speeds etc) dropping it would provide. It probably isn’t as much as you imagine.

If after all that you decide it’s still a good idea to drop it, fine. But be sure that all users of the database are aware that it’s going to be dropped.

Mark

Author

Vadim

17 years ago

Mark,

I think you are underestimate impact of additional indexes. In our book we have benchmark for InnoDB and MyISAM:

To insert 1 million records in InnoDB with 1 index took 80 sec, with 2 indexes – 136 seconds.
For MyISAM the difference is bigger: 72 sec vs 470 sec.
And this is case when data and indexes fit into memory. When load gets IO bound – difference will be even more significant.

As for usage in production – it is surely each own decision. We use it in production, Google uses it in production, we ported this patch for BlueHost, who also uses patched MySQL in real environment and we usually recommend our release to our customer when we are working on performance problems.

Pat

17 years ago

Depends on the nature of your workload as well. If your load is very read heavy (which is common for most web apps), then the overhead of rarely used indexes is probably worth it. Plus doing an index rebuild involves locking the table in question for a block of time anyway, which means you’re taking a production hit. My advice echoes Mark’s: don’t go dropping indexes unless you are solving a known performance bottleneck by doing so.

I’ve seen more systems get in trouble with too few indexes than too many.

Baron Schwartz

17 years ago

Pat,

“probably” worth it is likely to be an overstatement. “May be worth it” might be more accurate. As Vadim said, the impact of extra indexes can be very large indeed.

Of course all the comments on this post are correct, and we really all agree: you have to think carefully about it. Nobody is advocating to just drop all indexes without further thought.

Admin

Peter Zaitsev

17 years ago

I would add couple of points.

1) It is right you should START using this patch on development as with any unproven code. However in general this patch is low impact and in particular previous versions of it are used on many high traffic web site. So this is not development only toy.

2) It is important to run patch long enough so you catch all activities. Sometimes there are activities which are done on weekly basics only in others (ie monthly billing processing) it can be happening once per month and you need to keep it covered.

3) If you have a choice (like Master-Master, setup or Multiple Slaves) drop indexes in a way you can switch back to version with indexes quickly. If you do mistake and drop needed indexes the system may effectively go down and adding index back may just take too much time.

Vladimir

17 years ago

First of all I agree that it’s safer to keep indexes than to drop them in most cases. Next I’m not sure when it’s really ok to use this stats to correct your indexes – after a day of work, or a week, or maybe a month? In other words its highly application dependent question – when your data reaches some “steady” state in regard to how it’s treated by optimizer. Now let’s say you think for your dataflow 1 week makes sense. This means that you have to wait 1 week before you can make a decision. Too much for many applications I’d say. What I’d propose is to go for static analysis, and if possible – do that on the design stage.