Introducing Multiple Clustering Indexes

May 27, 2009
Author
Zardosht.Kasheff
Share this Post:

In this posting I’ll describe TokuDB’s multiple clustering index feature.

In general (not just for TokuDB) a clustered index or a clustering index is an index that stores the all of the data for the rows. Quoting the MySQL 5.1 reference manual:

Accessing a row through the clustered index is fast because the row data is on the same page where the index search leads. If a table is large, the clustered index architecture often saves a disk I/O operation when compared to storage organizations that store row data using a different page from the index record.

Most storage engines allow at most one clustered index for each table. For example, MyISAM does not support clustered indexes at all, whereas InnoDB allows only the primary key be a clustered index.

In TokuDB, we have added functionality and grammar syntax to define multiple clustered indexes. As a result, users can get the performance advantages of clustered indexes for multiple indexes.

How to define multiple clustered indexes

To define a clustered index in TokuDB, simply add the key word “clustering” in front of any index that is to be created. Here are some examples:

How clustering keys work

Like InnoDB, TokuDB clusters data with its primary key (if no primary key is defined, then TokuDB auto-generates a hidden one). The primary index maintains a copy of the entire row. Each secondary index stores, for each row, the row’s secondary key and primary key.

A clustering index maintains a copy of the entire row, not just the primary key. As a result, when querying on a clustering key lookups in the primary index are always avoided. A clustering index is a covering index for any query. Another way to think of a clustering index is that it is a materialization of the table sorted in another order.

An example using clustering keys

Suppose we had 40M rows in an iiBench table with this schema:

Now suppose we wanted to perform the following queries in sequence:

  • Query 1:

    which scans more than 1200 rows.
  • Query 2:

    which returns 406 rows.
  • Query 3:

    which returns 820 rows.

Ideally, to perform the first query, we would want a covering index of (customerid, price) on the table. To perform the third query, we would want a covering index of (customerid, productid, dateandtime). For the second query, a covering index would need to include all the fields.

With other storage engines, the common solution is to define a key on the field (customerid). Here are the results with TokuDB when using a key on customerid:

Here are the results with MyISAM when using a key on customerid:

Here are the results with TokuDB when using a clustering key on customerid:

As we can see, clustering indexes provide the same order of magnitude of performance as covering indexes, but for a broader range of queries.

A limitation of clustering keys

A clustering key cannot be defined to be unique.

Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Far
Enough.

Said no pioneer ever.
MySQL, PostgreSQL, InnoDB, MariaDB, MongoDB and Kubernetes are trademarks for their respective owners.
© 2026 Percona All Rights Reserved