EmergencyEMERGENCY? Get 24/7 Help Now!

Better Primary Keys, a Benefit to TokuDB’s Auto Increment Semantics

 | August 3, 2009 |  Posted In: Tokutek, TokuView


In our last post, Bradley described how auto increment works in TokuDB. In this post, I explain one of our implementation’s big benefits, the ability to combine better primary keys with clustered primary keys.

In working with customers, the following scenario has come up frequently. The user has data that is streamed into the table, in order of time. The table will have a primary key that is an auto increment field, ‘id’, and then have an index on the field ‘time’. The queries the user does are all on some range of time (e.g. select sum(clicks) from foo where time > date ‘2008-12-19’ and time < date '2008-14-20';).

For storage engines with clustered primary keys (such as TokuDB and InnoDB), having such a schema hurts query performance. Queries do a range query on a secondary index (time), and then perform point queries into the primary index. A better solution would be to start the primary key with the ‘time’ field. Then, all queries will be a range query on a clustered primary index. The problem users run into is that the ‘time’ field is not a unique field, and therefore cannot be defined as a primary key.

In TokuDB, a good solution is to take advantage of TokuDB’s auto increment semantics. Define the primary key to be (time, id), where ‘id’ is a an auto increment field. The combination (time, id) is guaranteed to be unique, therefore allowing it to be defined as a primary key. To demonstrate the benefits, let us take an example that is very similar to one used to demonstrate the benefits of partitioning in action. Suppose we had the following table:

And inserted 20M elements using the following commands:

Running the following query,

takes 7.25s. It takes this long because a table scan is required to answer the query.

If we were to add an index on the ‘c3’ field, then the query takes 2.36s. Because MyISAM does not cluster its primary index, defining a primary key of (c3, id) where ‘id’ is an auto-increment field does not help: the query still takes 2.36s in this case.

Now suppose we had a TokuDB table, with the following schema:

We took the elements in no_part_tab, and inserted them into toku_tab, allowing the auto increment field to be automatically generated. Running the query on this table takes 0.30s.

Because we had the flexibility to create a primary key that better fits the desired use of the table and could combine with this with clustering, we are able to achieve better performance. This is one of the nice benefits to our auto increment semantics.



  • My standard trick to do this in InnoDB (or other databases) is this:

    c1 int,
    c2 varchar(30),
    c3 date,
    idx smallint,
    PRIMARY KEY (c3, idx)

    Then you insert rows like this:

    INSERT INTO t (c1,c2,c3,idx) SELECT 42, “my c2 value”, “2007-01-10”, IFNULL(MAX(idx)+1, 0) FROM t WHERE c3 = “2007-01-10”

    (Don’t get fooled by the INSERT … SELECT; this inserts a single row with constant values except for the idx column, which gets the next number in sequence).

    This gets the same clustered index storage without the need for an autoincrement key.

    An advantage is that values of idx counts from 0 for each distinct date value, so data looks nicer and may be able to use a smaller integer size to save space and index btree size. A disadvantage is the need to compute the MAX() for inserts, so bulk insertion is more complicated.

    • That loads faster because the insertions are sequential, and all go to the right-most part of the tree. With fractal trees, random insertions are quite fast, but not as fast as sequential insertions.

Leave a Reply