Buy Percona ServicesBuy Now!

Disk Seeks are Evil, so Let’s Avoid Them, Part 1

 | May 20, 2010 |  Posted In: Tokutek, TokuView


Disk seeks are expensive. Typically, a disk can perform no more than a few hundred seeks per second. So, any database operation that induces a disk seek is going to be slow, perhaps unacceptably slow. Adding disks can sometimes help performance, but that approach is expensive, adds complexity, and anyhow minimizing the disk seeks helps more.

TokuDB fractal tree data structures deliver insertion performance benefits over traditional B-trees by performing fewer disk seeks on random insertions (in effect, turning random I/O into sequential I/O). This is why TokuDB typically outperforms InnoDB on insertion workloads, because TokuDB’s random insertions into secondary indexes is much faster than InnoDB’s insertions — up to two orders of magnitude faster.

So let’s consider the first place where TokuDB avoids a disk seek as opposed to a B-tree. On an insertion, a B-tree seeks need to have the appropriate leaf node in memory. For large tables, this requires a disk seek. A detailed view of what Fractal Tree indexes do is available in
this video. Here is an intuitive way to understand why fractal tree indexes are fast at insertions (an sample fractal tree is in the figure below). Seven out of eight (87.5%) insertions will be in the top three nodes, which will always be in memory. So, at least 87.5% of insertions will be strictly in-memory, doing no seeks, and in practice this percent is even higher. Now, when the fractal tree does write to disk, it will write nodes of greater depth. These nodes are written at disk bandwidth, and so the performance is not limited by disk seek time. This is why fractal trees indexed insertions are so fast.

Simple Fractal Tree

Insertions, therefore, are an operation that do not require a disk seek. There are some data structures that perform a disk seek (B-trees) and others that don’t (Fractal Tree indexes). Other operations require a disk seek, no matter what data structure you use, e.g. point queries, uniqueness checks, etc. In a previous post, I talked about how to avoid disk seeks by replacing point queries from secondary indexes to the primary table by using a clustering index.

So now that TokuDB gets rid of insertion disk seeks, it’s time to get rid of as many as possible. In the coming weeks, I’ll be posting a series of blogs about other cases where disk seeks can be avoided.