In part one, I presented a very brief and particular view of partitioning. I covered what partitioning is, with hardly a mention of why one would use partitioning. In this post, I’ll talk about a few use cases often cited as justification for using partitions.
One use case for justifying partitions is that each partition can be placed on a separate disk to avoid spindle contention. I have to say that on this one, I agree with Kevin Burton, who makes the point that if you want to distribute I/O load across several disks, you can use a RAID configuration on the disks. In this case, he says that partitioning is not worth the trouble. [NB. He makes the point that this is a problem with MySQL’s implementation of partitions and its interaction with MySQL Cluster. He proposes some fixes. If those fixes ever come to pass, we’ll have to see what the new performance characteristics look like.]
I spent some time trying to find comparisons between partitioning and RAIDing for load balancing in MySQL. If I missed something, I’d love to know about it, but the results aren’t jumping out.
In summary, if you’re going to claim that a partitioning scheme helps load balance across disks, it makes sense to compare this with a well-designed RAID system. Partitioning is high maintenance. RAID is low maintenance by comparison. I may be lazy, but I’d rather the system did the load balancing and not me.
The most commonly claimed performance improvement for partitioning, from what I can tell, is their use in avoiding full table scans during queries. Even though the details vary from case to case, I’ve been able to distill one scenario that illustrates what’s going on most of the time:
The table didn’t get any smaller. We were just able to restrict a table scan to a smaller portion of the table. For me, this helps place partitions into a context of well-known DB techniques. Partitions are often a replacement for covering indexes!
After all, that’s what covering indexes are, right? They are arrangements of data that allow you to scan part of a table to answer a query, without resorting to point queries into a primary table.
Once again, I looked for comparisons of proposed partitioning schemes with indexing-based choices. In particular, I looked for comparisons of partitioning with covering schemes. No luck. And as before, if I missed something I’d love to hear about it.
If partitions are designed to avoid table scans, then they are a high-effort way to do so. I’m not suggesting that they should never be used for this, but unless the low maintenance option of covering and clustering indexing is considered, partitioning just feels like a lot of work, potentially for no benefit.
The common reasons for using partitioning have low-maintenance, high-performance alternatives. In the case of load balancing across disks, I’d rather use RAID than partition by hand. In the case of avoiding table scans, I’d rather design indexes that avoid table scans than partition by hand. If I can’t get the insertion performance I need once I’ve designed a sensible set of indexes for my query load, I’d rather use a storage engine that does fast indexing (like TokuDB) so I can maintain the indexes I need, rather than the ones I’m forced to by the storage engine.
And in any case, it’s important to compare partitioning to the most effective alternative solutions for the performance problem encountered.