EmergencyEMERGENCY? Get 24/7 Help Now!

What is MySQL Partitioning?

 | July 27, 2017 |  Posted In: Insight for DBAs, MySQL, MySQL 101

PREVIOUS POST
NEXT POST

MySQL PartitioningIn this blog, we’ll quickly look at MySQL partitioning.

Partitioning is a way in which a database (MySQL in this case) splits its actual data down into separate tables, but still get treated as a single table by the SQL layer.

When partitioning, it’s a good idea to find a natural partition key. You want to ensure that table lookups go to the correct partition or group of partitions. This means that all SELECT, UPDATE, DELETE should include that column in the WHERE clause. Otherwise, the storage engine does a scatter-gather, and queries ALL partitions in a UNION that is not concurrent.

Generally, you must add the partition key into the primary key along with the auto increment, i.e., PRIMARY KEY (part_id,id). If you don’t have well-designed and small columns for this composite primary key, it could enlarge all of your secondary indexes.

You can partition by range or hash. Range is great because you have groups of known IDs in each table, and it helps when querying across partition IDs. This still can create hotspots in the newest partition, as all new inserts go there. Partitioning by hash “load balances” the table, and allows you to write to partitions more concurrently. This makes range queries on the partition key a bad idea.

In MySQL 5.7, partitioning became native to the store engine and deprecated the old method where MySQL itself had to handle the partitions. This means InnoDB partitions (and a larger amount of partitions) are a better choice than in the past.

As with all features and recommendations, this only makes sense if it helps your data and workload!

PREVIOUS POST
NEXT POST
Manjot Singh

Manjot Singh is an Architect with Percona in California. He loves to learn about new technologies and apply them to real world problems.
Manjot is a veteran of startup and Fortune 50 enterprise companies alike with a few years spent in government, education, and hospital IT.

9 Comments

  • Hi Manjot, I think PRIMARY KEY (part_id,id), where id is auto increment column, wont work. Because the auto increment column should be in first position while creating the composite primary key.

    • Mayank, the solution is simple: Also have INDEX(id). The only thing that AUTO_INCREMENT requires is that the id be first in some index. It does not require PK or UNIQUE. (However, without a uniqueness constraint it is possible to insert a duplicate id — but who would do that!)

      I have used this technique several times. It provides the performance benefit of “clustering” on part_id, while still having a UNIQUE primary key. That is, it is beneficial for a SELECT … WHERE part_id = 123 with multiple rows of 123. And for SELECT … WHERE part_id BETWEEN … .

      Note that PARTITIONing add no benefit to either of those SELECTs.

    • It all depends on your workload. I recommend partitioning in a lot of cases. There are many people out there with 2B row tables and only access the last 5% of it 99% of the time.

  • Can you provide a 5th use case where Partitioning improves performance? See http://mysql.rjweb.org/doc.php/partitionmaint for the 4 cases I know of.

    (I can’t imagine how your example of Hash “load balancing” provides any performance benefit. Do you have ‘proof’?)

  • if I´m right, partitioning means that the table will be split in many files, giving the advantage to just read in one file with the small size instead of read in one big file, off course the queries should be designed for that.

  • Thank you guys for the interesting discussion..

    We are working on a project which if successful may have Billion of rows in a single table. Currently, we are working on to build a MVP and will do proof of value.

    We are bit worried on the number of potential rows a single table may have. We were looking for guidance on partitioning vs index and/or both. We will have an variable (say customer ID in product table) which can be used for creating almost separate tables for each customer.

    Would you suggest that we create a reference table which direct to a product table (which gets created) for a customer? Or we do partitioning only..

Leave a Reply