Autoincrement Semantics

In this post I’m going to talk about how TokuDB’s implementation of auto increment works, and contrast it to the behavior of MyISAM and InnoDB. We feel that the TokuDB behavior is easier to understand, more standard-compliant and offers higher performance (especially when implemented with Fractal Tree indexes).

In TokuDB, each table can have an auto-increment column. That column can be used as any part of a key, but it doesn’t have to be part of any key. The value produced by auto incrementing is always greater than the previous maximum value for that column. There are some cases where auto-incremented values are skipped, such as when a transaction aborts, which “uses up” auto-incremented values.

This behavior is close to that required for SQL:2003 (see SQL:2003 at wikipedia), which specifies that each table provides one unnamed sequence which behaves essentially in the way we implemented auto increment. The SQL standard permits but doesn’t require auto-incremented values to be used up when a transaction aborts.

The semantics provided by TokuDB is different from MyISAM and InnoDB. The value you get in an auto-increment field depends on the exact type and order of the secondary indexes (which is described in the MySQL manual). Those relatively complex semantics seem to be viewed as a feature by MyISAM users. Also, for MyISAM, there are no transactions, and so aborting a transaction doesn’t use up auto-increment values.

InnoDB provides yet another semantics: An auto-increment field must be the first field in some key.

For many users, the difference in semantics does not seem important.

Part of the reason we implemented a different behavior than the other storage engines is that the other engines’ behaviors don’t play well with fractal tree indexing. If rows can be inserted into a table or index without fetching an existing row, then fractal trees are two orders of magnitude faster than B-trees. If we had to fetch an existing row (to find out what the previous maximum auto-increment field was), the performance advantage would be less in many cases. By using the moral equivalent of a SQL:2003 sequence, we can generate auto-increment values without fetching rows, and so indexes can be maintained at high data-arrival rates.

Thus TokuDB’s auto-increment fields have several advantages over those of MyISAM and InnoDB. TokuDB’s are simpler to use, they act more like SQL:2003 standard sequences, and (perhaps most importantly to us) they admit higher performance.

Share this post

Comments (4)

Leave a Reply