Sorting a Terabyte in 197 seconds
I just returned from The 21st ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), held in Calgary, where I gave a talk about my entry to the sorting contest. I sorted 1TB in 197s on a 400-node machine at MIT Lincoln Laboratory, a record which still stands today. (And it will likely remain standing, since terabyte sorting is now deprecated because it’s too fast. Now the challenge is to sort 100TB.)
For many years Jim Gray ran a sorting contest to see how fast anyone could sort a terabtye worth of 100-byte records, how much data could be sorted in one minute, and how much data could be sorted for a penny. After Jim’s disappearance at sea in January 2007, a committee formed to continue the contest.
I entered in 2007, and this week I finally got around to talking about it at a conference. My sorting algorithm is a variant of the SampleSort Algorithm by Blelloch, Plaxton, Leiseron et al. To learn more about TokuSampleSort, take a look at the long version of the paper or the slides from the SPAA talk.
TokuSampleSort is not directly related to the Tokutek’s TokuDB storage engine for MySQL. One is a sort, and the other maintains indexes. The difference between sorting and indexing is that an index is a dynamically sorted collection of data. Sorting, however, operates on a static set of data. Sorting and indexing both require attention to how to achieve high performance from processors and disks. It can be a little bit surprising that sorting is easier than indexing. Sorting is easier because all the data is available at the beginning of the calculation. In contrast, when indexing, data may arrive a little bit at a time, and at any time a user might query a range of the data. So indexing requires maintaining data in sorted order as the data arrives (and thus produces many sorted data sets, if you count each intermediate state), whereas a sort produces a single sorted answer.
Many storage engines (including, for example, MyISAM) use a sorting algorithm when creating an index (e.g., with CREATE INDEX or ALTER TABLE, and then use B-trees to maintain the index after it’s been created. It’s difficult to maintain a B-tree index if records arrive at high speed, however. TokuDB uses Fractal Tree indexes to maintain indexes orders of magnitude faster than B-trees.
How good would a fractal tree index be for sorting? TokuSampleSort sorts at a rate of about 127,000 records per processor per second. TokuDB can index random data at about 30,000 records per processor per second. Fractal Tree indexes are not as fast as a sort, but they are surprisingly close.