What is MySQL Partitioning?

What is MySQL Partitioning?

PREVIOUS POST
NEXT POST

MySQL PartitioningIn this blog, we’ll quickly look at MySQL partitioning.

Partitioning is a way in which a database (MySQL in this case) splits its actual data down into separate tables, but still get treated as a single table by the SQL layer.

When partitioning, it’s a good idea to find a natural partition key. You want to ensure that table lookups go to the correct partition or group of partitions. This means that all SELECT, UPDATE, DELETE should include that column in the WHERE clause. Otherwise, the storage engine does a scatter-gather, and queries ALL partitions in a UNION that is not concurrent.

Generally, you must add the partition key into the primary key along with the auto increment, i.e., PRIMARY KEY (part_id,id). If you don’t have well-designed and small columns for this composite primary key, it could enlarge all of your secondary indexes.

You can partition by range or hash. Range is great because you have groups of known IDs in each table, and it helps when querying across partition IDs. This still can create hotspots in the newest partition, as all new inserts go there. Partitioning by hash “load balances” the table, and allows you to write to partitions more concurrently. This makes range queries on the partition key a bad idea.

In MySQL 5.7, partitioning became native to the store engine and deprecated the old method where MySQL itself had to handle the partitions. This means InnoDB partitions (and a larger amount of partitions) are a better choice than in the past.

As with all features and recommendations, this only makes sense if it helps your data and workload!

PREVIOUS POST
NEXT POST

Share this post

Comments (9)

  • Mayank Tyagi Reply

    Hi Manjot, I think PRIMARY KEY (part_id,id), where id is auto increment column, wont work. Because the auto increment column should be in first position while creating the composite primary key.

    July 28, 2017 at 1:58 am
    • Rick James Reply

      Mayank, the solution is simple: Also have INDEX(id). The only thing that AUTO_INCREMENT requires is that the id be first in some index. It does not require PK or UNIQUE. (However, without a uniqueness constraint it is possible to insert a duplicate id — but who would do that!)

      I have used this technique several times. It provides the performance benefit of “clustering” on part_id, while still having a UNIQUE primary key. That is, it is beneficial for a SELECT … WHERE part_id = 123 with multiple rows of 123. And for SELECT … WHERE part_id BETWEEN … .

      Note that PARTITIONing add no benefit to either of those SELECTs.

      September 22, 2017 at 12:01 am
  • Rick James Reply

    And, yet, you would be surprised at how few use cases get any performance benefit with PARTITIONing versus a better index on a non-partitioned table.

    September 21, 2017 at 11:56 pm
    • Manjot Singh Reply

      It all depends on your workload. I recommend partitioning in a lot of cases. There are many people out there with 2B row tables and only access the last 5% of it 99% of the time.

      October 11, 2017 at 1:02 pm
  • Rick James Reply

    Can you provide a 5th use case where Partitioning improves performance? See http://mysql.rjweb.org/doc.php/partitionmaint for the 4 cases I know of.

    (I can’t imagine how your example of Hash “load balancing” provides any performance benefit. Do you have ‘proof’?)

    September 22, 2017 at 12:04 am
  • Abe Reply

    if I´m right, partitioning means that the table will be split in many files, giving the advantage to just read in one file with the small size instead of read in one big file, off course the queries should be designed for that.

    October 11, 2017 at 12:08 pm
    • Manjot Singh Reply

      exactly!

      October 11, 2017 at 1:02 pm
    • Rick James Reply

      Indexes are also designed for cutting down on how much you need to read.

      October 11, 2017 at 1:58 pm
  • Ram Reply

    Thank you guys for the interesting discussion..

    We are working on a project which if successful may have Billion of rows in a single table. Currently, we are working on to build a MVP and will do proof of value.

    We are bit worried on the number of potential rows a single table may have. We were looking for guidance on partitioning vs index and/or both. We will have an variable (say customer ID in product table) which can be used for creating almost separate tables for each customer.

    Would you suggest that we create a reference table which direct to a product table (which gets created) for a customer? Or we do partitioning only..

    December 6, 2017 at 6:16 am

Leave a Reply