Where the open source database community meets: Use code PERCONA75 and secure your spot for Percona Live.  Register

Tools and Techniques for Index Design Webinar Questions Followup

October 5, 2012
Author
Bill Karwin
Share this Post:

I presented a webinar this week to give an overview of Tools and Techniques for Index Design. Even if you missed the webinar, you can register for it, and you’ll be emailed a link to the recording.

I’d like to invite folks who are interested in tools for query optimization to attend the new Percona online virtual training Analyzing SQL Queries with Percona Toolkit, October 29 – November 2, 2012. See http://store.percona.com/catalog.php?category=17 for details on this training class and how to register to attend.

During my webinar, I got a number of good questions. Here are the questions (paraphrased), with my answers.

Q: Can you link to the book on index optimization you mentioned in the webinar?

Sure!  The book is Relational Database Index Design and the Optimizers by Tapio Lahdenmäki and Michael Leach.

Also Baron Schwartz has written a good review of this book.

Q: Given that the book was written in 2005, before MySQL 5.1 and 5.5 were released, is it still relevant?

That book is designed to be RDBMS-neutral, and the concepts it covers apply to other products besides MySQL. The logical principles still hold true: using an index really means leveraging a pre-sorted data structure to reduce the work a query needs to do.  The basic data structure of an index hasn’t changed in recent releases.  So yes, the principles of this book are still relevant to current versions of MySQL.

Q: How do the upcoming changes in MySQL 5.6 change the best practices for index design?

Fundamentally, index design best practices will not change.  MySQL 5.6 is still in beta as of this writing, but the new version promises some exciting new ways that MySQL can use indexes, such as:

Q: Is it always the case that no second-star index exists for a query with both a GROUP BY and an ORDER BY clause?

Right; you’re stuck, unless your query groups by the same column as its sort column. You can optimize by adding the grouping column to your index, so all rows in a given group are read together. But doing so fixed the order in which rows are accessed, so if you need to sort by another column, it must collect the interim result set and sort it as a separate step. This is when we see “Using temporary; Using filesort” in the EXPLAIN report.

Hopefully, the size of this interim result set is relatively small, after the GROUP BY reduces it to one row per distinct value in the grouping column. Then sorting it may be more likely to happen in memory.

Q: Can you comment on the QUBE method for index estimation?

QUBE stands for Quick Upper-Bound Estimate. This is a method for estimating the theoretical cost of a query, by counting how many lookups to index entries or table rows will be necessary for a given query in the worst case. This method tries to quantify the cost of a query, but ultimately it arrives at similar conclusions as the star system: define an index with all columns in the predicate, add columns to avoid sorting per query, and add further columns to cover the select-list. The difference is that QUBE helps you to predict quantitatively how much a new index is likely to improve the query.

Q: Could you provide some best practices for partitioning and its effect on index design?

The most important consideration is that “every unique key on the table must use every column in the table’s partitioning expression.”  It might even be impossible to partition a table while using UNIQUE KEY constraints the way that makes most sense for your table.

Logically, your index design will be the same with a partitioned table as with a non-partitioned table. But since each partition is physically stored as a separate table, each index contains only values of its respective subset of the data. The index is more shallow and quicker to search. If your query has predicates that allow partition pruning to reduce the scope of the search, the index search will be even more efficient than if you were searching for the same rows in an non-partitioned table.

Thanks to folks who attended my webinar, and thanks for the questions!

0 0 votes
Article Rating
Subscribe
Notify of
guest

5 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Elaine
Elaine
13 years ago

This was a very interesting webinar; thank you very much for making it available. I feel just a little bit smarter.

What is the best way to handle a situation where we have dozens of different kinds of queries against a table? How do I work out the tradeoff between the more versatile single column indexes versus having multiple multi-column indexes that are tailored for each query type? At what point do I need to say, there are too many indexes that are too large and they’re eating into my memory pool?

Elaine
Elaine
13 years ago

Thank you so much for that answer. It was very helpful.

As I am playing with some of these strategies, it is obvious to me that the order of the columns in the index matters. When I am not certain, testing with my real queries helps to make my decisions. In general, I am starting with the strategy to index WHERE columns, then ORDER BY columns, and then SELECT columns in left-to-right order when trying to get an index that will hit all three sweet spots.

Elaine
Elaine
13 years ago

My application has quite a few join tables where we use a compound primary key. At this point, is there a performance difference between a table with a (mostly unused) single id field as a primary key and then a unique compound index on the columns we use for queries? IE, does PRIMARY confer any speed or performance advantages over any other index that is unique and NOT NULL?

Far
Enough.

Said no pioneer ever.
MySQL, PostgreSQL, InnoDB, MariaDB, MongoDB and Kubernetes are trademarks for their respective owners.
© 2026 Percona All Rights Reserved