Partitioning is a commonly touted method for achieving performance in MySQL and other databases. (See here, here, here and many other examples.) I started wondering where the performance from partitions comes from, and I’ve summarized some of my thoughts here.
But first, what is partitioning? (I’ve taken the examples from Giuseppe Maxia’s Partitions in Practice intro.)
CREATE TABLE by_year (
PARTITION BY RANGE (YEAR(d))
PARTITION P1 VALUES LESS THAN (2001),
PARTITION P2 VALUES LESS THAN (2002),
PARTITION P3 VALUES LESS THAN (2003),
PARTITION P4 VALUES LESS THAN (MAXVALUE)
In this simple example, you get four partitions, depending on the year component of the date field. This is a simple example, so it’s easy to see what’s going on without getting bogged down in the increased complexity. But increased complexity is a fact of life with partitioning — What happens when you want a per-week partition for the last two years and need to manage 104 partitions? Yuck! — so there needs to be a compelling reason to partition. Which brings us to the question: What are partitions good for?
If you’ve partitioned your table, then it is possible to drop a partition, and thus delete a chunk of data very quickly. For example
ALTER TABLE by-year DROP PARTITION P1;
B-tree-based storage engines, like InnoDB, are slow to delete data, and deletions fragment the table, with consequences to performance. Dropping a partition makes deletions fast and does not fragment the table. On the other hand, for Fractal Tree storage engines, like TokuDB, deletions are fast — though not as fast as dropping a partition — and Fractal Tree indexes don’t fragment.
So partition dropping gives a clear benefit, if you’re going to use InnoDB. In the case of Fractal Tree indexes, instant partition dropping is still nice, but the case for partitions is not as strong, since it’s not clear that complexity is worth it.
Leaving deletions aside, the most common reason I see mentioned for using partitions is the claim that partitioning breaks big tables into smaller tables. Smaller tables are faster, right? Well, no. A collection of small tables that you have to swap in and out is not necessarily faster than a single table, parts of which you have to swap in and out. Contrary to popular belief, partitioning is not an example of “Divide and Conquer”. [For Divide and Conquer to work, you split up a problem and then the solutions to each piece are independent of each other. Virtual memory means that the “solutions” to managing the various indexes of the partitioned pieces are not independent.]
I think of it this way: you are just swapping one data structure on the table for another. In one case, you have a single table managed by a B-tree, Fractal Tree index, or whatever, and you have virtual memory moving parts of the structure in and out of memory. In the other case, you have a hybrid data structure, with trees at the bottom, and whatever MySQL uses to manage partitions at the top. And virtual memory still moves parts of this hybrid structure in and out of memory.
More concretely: suppose you partition a table into pieces, each of which fits in memory. That should be good, right? Well, let’s consider two extreme cases. First, suppose that your insertion/query use case hammers on one partition at a time. You swap in that partition, and you run fast because everything’s in memory. But if you hadn’t partitioned, you’d still swap in the active part of the table into memory and go fast.
At the other extreme, suppose you jump from partition to partition. Then you have to keep swapping in bits of different partitions. Without partitioning, you keep swapping in bits of the index tree.
In short, there’s no such thing as a free lunch. Partitioning just replaces one data structure on a table for another, and there’s no reason to believe that the new data structure is better.
This is what the next installment will be about. Stay tuned.