EmergencyEMERGENCY? Get 24/7 Help Now!

Unused indexes by single query

 | September 12, 2008 |  Posted In: Percona Software

PREVIOUS POST
NEXT POST

Usually unused indexes are devil, they waste diskspace, cache, they make INSERT / DELETE / UPDATE operations slower and what makes them worse – it is hard to find them.

But now ( with userstatsV2.patch) you can find all unused indexes (since last restart of mysqld) by single query

As you see query is not fast, mainly because information_schema.statistics is slow by itself, but in any case very helpful.

For some versions of the patch, where the table has only the INDEX_NAME and the ROWS_READ columns, you can use this query:

PREVIOUS POST
NEXT POST
Vadim Tkachenko

Vadim Tkachenko co-founded Percona in 2006 and serves as its Chief Technology Officer. Vadim leads Percona Labs, which focuses on technology research and performance evaluations of Percona’s and third-party products. Percona Labs designs no-gimmick tests of hardware, filesystems, storage engines, and databases that surpass the standard performance and functionality scenario benchmarks. Vadim’s expertise in LAMP performance and multi-threaded programming help optimize MySQL and InnoDB internals to take full advantage of modern hardware. Oracle Corporation and its predecessors have incorporated Vadim’s source code patches into the mainstream MySQL and InnoDB products. He also co-authored the book High Performance MySQL: Optimization, Backups, and Replication 3rd Edition.

14 Comments

  • Of course you shouldn’t simply drop the unused indexes.

    This patch, like any patch, should probably only be used on a non-production server. A non-production server, no matter how much you try, will NOT have the same queries and data as a production server (you could dump the SQL general query log – which will be very large – and replay it on a non-production server- that is probably the closest which is possible). Therefore, any data you get by this method needs to be used with extreme caution.

    On a production server, there is probably someone somewhere running reports off your database using a cron job on a server you didn’t even know existed, which is exporting its data into something business critical. This process may only run once per week or month, but if you remove the index it relies on to work, its queries will take too long and their job will fail inexplicably. This process probably isn’t monitored properly so it will be several runs before someone in Finance notices that their data are out of date.

    It is always safer to leave an index than drop it. If an index is not DEFINITELY causing a problem, leave it. If you have enough disc space, leave it.

    If you want to drop an index, do some performance testing to find out how much benefit (disc space, insert speeds etc) dropping it would provide. It probably isn’t as much as you imagine.

    If after all that you decide it’s still a good idea to drop it, fine. But be sure that all users of the database are aware that it’s going to be dropped.

    Mark

  • Mark,

    I think you are underestimate impact of additional indexes. In our book we have benchmark for InnoDB and MyISAM:

    To insert 1 million records in InnoDB with 1 index took 80 sec, with 2 indexes – 136 seconds.
    For MyISAM the difference is bigger: 72 sec vs 470 sec.
    And this is case when data and indexes fit into memory. When load gets IO bound – difference will be even more significant.

    As for usage in production – it is surely each own decision. We use it in production, Google uses it in production, we ported this patch for BlueHost, who also uses patched MySQL in real environment and we usually recommend our release to our customer when we are working on performance problems.

  • Depends on the nature of your workload as well. If your load is very read heavy (which is common for most web apps), then the overhead of rarely used indexes is probably worth it. Plus doing an index rebuild involves locking the table in question for a block of time anyway, which means you’re taking a production hit. My advice echoes Mark’s: don’t go dropping indexes unless you are solving a known performance bottleneck by doing so.

    I’ve seen more systems get in trouble with too few indexes than too many.

  • Pat,

    “probably” worth it is likely to be an overstatement. “May be worth it” might be more accurate. As Vadim said, the impact of extra indexes can be very large indeed.

    Of course all the comments on this post are correct, and we really all agree: you have to think carefully about it. Nobody is advocating to just drop all indexes without further thought.

  • I would add couple of points.

    1) It is right you should START using this patch on development as with any unproven code. However in general this patch is low impact and in particular previous versions of it are used on many high traffic web site. So this is not development only toy.

    2) It is important to run patch long enough so you catch all activities. Sometimes there are activities which are done on weekly basics only in others (ie monthly billing processing) it can be happening once per month and you need to keep it covered.

    3) If you have a choice (like Master-Master, setup or Multiple Slaves) drop indexes in a way you can switch back to version with indexes quickly. If you do mistake and drop needed indexes the system may effectively go down and adding index back may just take too much time.

  • First of all I agree that it’s safer to keep indexes than to drop them in most cases. Next I’m not sure when it’s really ok to use this stats to correct your indexes – after a day of work, or a week, or maybe a month? In other words its highly application dependent question – when your data reaches some “steady” state in regard to how it’s treated by optimizer. Now let’s say you think for your dataflow 1 week makes sense. This means that you have to wait 1 week before you can make a decision. Too much for many applications I’d say. What I’d propose is to go for static analysis, and if possible – do that on the design stage.

Leave a Reply

 
 

Percona’s widely read Percona Data Performance blog highlights our expertise in enterprise-class software, support, consulting and managed services solutions for both MySQL® and MongoDB® across traditional and cloud-based platforms. The decades of experience represented by our consultants is found daily in numerous and relevant blog posts.

Besides specific database help, the blog also provides notices on upcoming events and webinars.
Want to get weekly updates listing the latest blog posts? Subscribe to our blog now! Submit your email address below and we’ll send you an update every Friday at 1pm ET.

No, thank you. Please do not ask me again.