Clustering indexes vs. Covering indexes

Posted on:



Share Button

Yesterday, I (Zardosht) posted an entry introducing clustering indexes. Here, I elaborate on three differences between a clustering index and a covering index:

  1. Clustering indexes can create indexes that would otherwise bounce up against the limits on the maximum length and maximum number of columns in a MySQL index.
  2. Clustering indexes simplify syntax making them easier and more intuitive to use.
  3. Clustering indexes have smaller key sizes leading to better performance.

Expanding MySQL’s Limits

MySQL allows at most 16 columns in an index and at most 3072 bytes per index. For tables that have more than 16 columns or a row size of greater than 3072 bytes, one cannot create a covering index that includes all of the fields. Suppose one had the following table with 26 columns:

In standard MySQL, one cannot create a covering index for the query:

However, one can create the following clustering index to speed up the query as:

Syntactic simplification

The intent of a clustering index definition is easier to understand because it only includes columns necessary to achieve the desired sort order. Columns needed to satisfy the select clause can be left out. As users add more indexes to speed up more queries, simplifying schema definitions becomes increasingly important.

Consider the table:

Now suppose you want to speed up this query:

To create a covering index for this query you do:

To create a clustering index for this you do:

Another example of the simplification shows up when want to add a column. With clustering indexes you can write

and you don’t need to change the index definition. With a covering index, you need to:

If you don’t drop and re-add the covering index, it won’t cover the new column. Covering indexes have similar problems if you drop a column.

Smaller key sizes

The key length of a clustering index is much smaller than that of a covering index. Smaller key lengths have performance benefits. Inserting into a clustering index is often faster than inserting into an equivalent covering index. Some other (non-MySQL) databases provide syntax to include nonkey columns in an index. One can think of clustering indexes as an extreme case, in which all the columns are added as nonkey columns. As Kristian Nielsen phrased it the comments of the earlier posting, the extra fields are stored only in the leaves of a B-Tree, not in the internal nodes. That can be useful if the row size is much larger than the key size. (Although Fractal Tree® indexes aren’t exactly organized like B-trees, this insight is essentially correct.)

Tradeoffs for TokuDB vs. other storage engines

Maintaining clustering indexes has a similar cost to maintaining covering indexes. Specifically, data gets inserted into multiple places. This is the tradeoff for many storage engines: to use clustering indexes to get faster queries, one must use more I/O operations and extra disk storage.

For TokuDB these tradeoffs are a lot easier to make. TokuDB can insert data an order of magnitude faster than other storage engines, while maintaining indexes. Percona measured our insertion rate at 10.6x faster than InnoDB’s, and measured our database size at 5x to 6x smaller than InnoDB and MyISAM. Thus, the tradeoff is more manageable for TokuDB than it is for other storage engines. This was the motivation for implementing the clustering index feature.


Clustering indexes provide the benefits of covering indexes in an easy-to-use manner. Clustering indexes offer immediate performance gains that are difficult or impossible to achieve with covering indexes. TokuDB can maintain indexes while inserting data an order of magnitude faster than other storage engines, and compresses the data, making clustering keys really valuable.

Share Button

Tokutek, TokuView

  • Oops, adding z to cvr_idx made the index infeasible (17 columns). I guess that confirms the argument about simplicity…

  • Why are clustering indexes much smaller than covering indexes? Unless I’m misunderstanding your meaning, this is really not true of InnoDB, for example. Or are you talking about your indexes, and not indexes in general?

  • I may be misunderstanding your question, but I think the point we were trying to make is that the *key* of a clustering index is typically smaller than the *key* of a covering index. Because in MySQL standard syntax, there is no way to add a column to an index without adding it to the key. Bigger keys can suffer performance problems. (For example, they may reduce the fanout of B-tree-like data structures).

    The clustering key syntax allows us to store all the extra columns without throwing them into the key.

  • Right. The irritating “key” vs. “index” mis-nomenclature in MySQL strikes again. I understand you now!

  • Once a clustered is created on a table which has other non-clustered indexes, mysql’s ‘show indexes’ will say “Type of Index” is BTREE for both. Which is true in a sense that both are built using a BTREE data-structure. Does Mysql provide another way to find out the real “type” of index? Or does TokuDB provide any utilities that enhance MySQL’s Information Schema to include this information?

  • zardosht Post author

    If you run “show create table table_name;”, that should provide the table’s schema and tell you if the index is clustering or not.

Leave a Reply