EmergencyEMERGENCY? Get 24/7 Help Now!

MySQL Partitioning: A Flow Chart

 | March 11, 2011 |  Posted In: Tokutek, TokuView


In Part 1, and Part 2 of this series, I presented some thoughts on partitioning. I heard some great feedback on why people use partitioning. Here, I present a flow chart that summarizes what I’ve learned. In summary: with TokuDB in the picture there’s almost no reason to use partitioning. Or I should say, there are almost always better (higher performing, more robust, lower maintenance) alternatives to partitioning.

Here goes:

Partitioning Flowchart (Tokutek)

  1. Spindle contention? In other words, are you partitioning in order to spread your query work load across many disks? I’ve yet to see a compelling technical case that RAIDing your disks doesn’t do this as well, with much less setup and maintenance.
  2. Are deletions a problem? Are you partitioning because of issues arising from deletions (i.e., poor performance or index fragmentation)?
  3. Is fragmentation a problem? In InnoDB, deletions cause fragmentation, whereas dropping a partition doesn’t. However, TokuDB doesn’t fragment, and it’s a lot easier to switch the storage engine once than it is to maintain partitions forever.
  4. Do you really need instant block deletes? Dropping a partition is really fast. And InnoDB deletes can be really slow. But TokuDB deletions are very fast (not as fast as dropping a partition, but fast). Is the ability to drop a partition what matters, or is it good enough to do deletions super fast (though not instantaneously)? I’m pretty sure very few people rely on instant partition drop, but this type of use case (if it exists) would be a credible reason to use partitions.
  5. Large-table performance problems? When tables (and especially indexes) get larger than memory, InnoDB fails in a variety of ways (e.g., in the way its insertion rate drops precipitously). But that doesn’t mean all storage engines do. TokuDB, for example, is designed specifically to perform superbly with out-of-core databases.
  6. Slow queries? One failure mode for large tables is for queries to become slow. This can be addressed by adding covering indexes (or in the case of TokuDB either covering or clustering indexes).
  7. Slow indexing? Adding more indexes can cause indexing performance for InnoDB to deteriorate badly. But TokuDB is fast at indexing.
  8. You probably don’t need partitioning. I don’t know of any other use cases that argue for partitioning, but I’d love to hear about them if they exist.


  • > Slow queries? – This can be addressed by adding covering indexes

    ^^ This one is failure. Partition pruning allows you to use index just for small part of table data (say, 1%). So no need to improve indexes if partition lets you significantly reduce amount of data to examine.
    If you don’t mention this, the whole article becomes too biased.

    • Show me a use case for “So no need to improve indexes if partition lets you significantly reduce amount of data to examine”. I would like to either (*) discover a new use for partitioning, or (*) show you how to make a non-partitioned table work just as well, probably by using a different index.

  • @YetAnotherBlogReader,

    Thanks for the comment. This post was meant to summarize what I had learned from feedback on my previous two posts on partitioning. Please see this post for my point about partitioning and indexes. The upshot of that analysis is that partitions only improve non-covering indexes. A well-designed covering index will examine even less data than a table scan on a partition.

    In addition to what I wrote there, I’d like to make a further point here. Partitioning is an inflexible aid to indexing, in the following sense. Suppose, as an example, that you partition on DATE, and you define some indexes, on CUSTOMER_ID, on SKU, and on PRICE. This is, in a way, like defining indexes (DATE, CUSTOMER_ID), (DATE, SKU) and (DATE, PRICE).

    These indexes probably don’t cover your queries. If they did, you wouldn’t be partitioning. Your queries speeds will be ok, as long as your partition on DATE is fine enough that you only have to do potentially wasteful table scans on small partitions.

    Now suppose your business changes, and you need to start doing queries on SKU and PRICE, independent of DATE. You’d like to be able to define an index on (SKU, PRICE), but you’re stuck with a partition on DATE. These new queries will have to be performed on all partitions, and the more partitions there are, the worse your performance.

    So now you’re forced into a tradeoff between one class of queries — the ones that want a fine partition — and another class of a queries — the ones that want as little partitioning by DATE as possible. And repartitioning means tons of downtime. Partitioning reduces your future options. It is inflexible.

    The alternative is to define a good set of covering indexes from the get go. Then if a new query class comes up, you define a good index for it. There’s no tradeoff between the quality of this index and the quality of previous indexes… as long as you have a storage engine that can keep up with the increased indexing load.

    That’s where TokuDB comes in. It allows you to define the indexes you really need and to update those indexes under heavy insertion loads, even when you run out of RAM. By comparison, partitioning is cumbersome and restrictive.

  • What about partitioning for the purpose of parallelization? As it stands, MySQL will not parallelize any queries — even ones that are obvious candidates, but you can do it yourself. Say you want to find all products where color=1. A hash function divides the data four ways.

    You could, in theory, pull in the data via FEDERATED pre-sorted from the data replicated and partitioned to a pool of N servers, then sort them together, which is linear complexity. You could also do the same thing locally across 4 threads. I think you’d have to use a TYPE=MEMORY table as it stands currently to do so. A native solution might be able to avoid that or use a cheaper data structure.

    Are you saying a covering index will suffice if you want to do this? I don’t think so. There are many environments where you could end up creating N! indices to cover queries for queries against lots of dimensions. In that case, you have to parallelize and/or pray that a bitmap-index-merge will allow you to use a few indices.

    Partitioning is more related to parallel computation than indexing to me.

    Am I wrong?

  • @Jaime,

    There is a nomenclature issue going on here. The MySQL partition infrastructure is basically used for single-server breaking up of the table. There are performance reasons for doing this on InnoDB, but not on TokuDB.

    What you refer to — breaking up the table as part of scale-out — I usually call sharding, rather than partitioning, though of course there is no uniformity of the term’s usage. And I agree with you that for scale-out purposes, sharding can help in some cases.

    As for the N! comment, please see my post.

  • http://mysql.rjweb.org/doc.php/partitionmaint#use_cases_for_partitioning lists the (only) 4 cases I have found for PARTITIONing. I have been searching for several years, without success, for another use case than cannot be replaced by InnoDB with suitable indexing.

Leave a Reply


Percona’s widely read Percona Data Performance blog highlights our expertise in enterprise-class software, support, consulting and managed services solutions for both MySQL® and MongoDB® across traditional and cloud-based platforms. The decades of experience represented by our consultants is found daily in numerous and relevant blog posts.

Besides specific database help, the blog also provides notices on upcoming events and webinars.
Want to get weekly updates listing the latest blog posts? Subscribe to our blog now! Submit your email address below and we’ll send you an update every Friday at 1pm ET.

No, thank you. Please do not ask me again.