The Trouble with Point QueriesTokutek
Insertion and Queries
Databases are complicated beasts, but I’d like to focus on the storage engine, just the part that talks to the storage system, and doesn’t have to worry about SQL, etc.: just transactions, concurrency, compression, updates and queries. In the next couple of blog entry, I’d like to just focus on updates (insertions and deletions) and queries (point and range). (This delineation between the
front end and the storage engine is clearly architected in MySQL.) And in particular, I’d like to explore which features of a disk limit performance for which operations.
The question is how fast can these operations go? Point queries are the slow ones, so let’s start with them first. Suppose you have data on a disk — say a 1TB Hitachi Deskstar 7K1000. It has a disk seek time 14ms and transfer rate of around 69MB/s [See tomshardware.com] Now imagine filling the disk with a table, where each row has two 8 byte integers. So that’s 62.5 billion rows (=1TB/16 bytes).
Now let’s access those data in random order. I still haven’t said how the data are stored, but the point is that it doesn’t matter. If you are doing a random point query, you need to do a disk seek. So to access all the data points, it takes almost 28 years (= 62.5 billion rows * 14 ms/row / 32×10^9 ms/year). That’s assuming no OS overhead on the accesses.
Let’s just repeat: If you fill up a big disk with small records, it takes years to read the data off the disk. Maybe the file system doesn’t let you fill up the disk completely. Maybe 16 bytes per row is too small. Maybe you have a 1TB disk with a slightly lower seek time. We started at 28 years, so there’s a lot of wiggle room here, and we’d still be talking about years. Also recall that this analysis is independent of how the data are stored. No data-structural or software fixes will solve this problem, or even mitigate it. You could distribute your data on lots of small disks, and then if you could look into the future of your point-query stream, you could pipeline your queries.
If you can keep your pipeline up to depth 10, you could have 10 simultaneous disk seeks going on 10 separate disks (and you’d have to go through some rigamarole to make sure that you don’t end up with multiple of these 10 seeks going to the same disk). So even if you go to all this trouble, and you get lucky, you still have to wait 2.8 years for your point queries to finish. If you start with something that’s this slow, you need too much parallelization to get any kind of reasonable performance.
Thus, the only way to implement a largish database that has a workload that’s heavy on point queries is with hardware, by which I mean, without hard disks.
Next time we’ll talk about range queries.