I had an interesting case recently. The customer dealing with large MySQL data warehouse had the table which was had data merged into it with INSERT ON DUPLICATE KEY UPDATE statements. The performance was extremely slow. I turned out it is caused by hundreds of daily partitions created for this table. What is the most interesting (and surprising) not every statement is affected equally as you can see from the benchmarks above:

I got the following test table created:

I used MySQL 5.1.41 for my tests increasing buffer pool to 1G and log file size to 128M so all data is comfortably in memory.

I ranged number of partitions from 1 to 1000 and loaded the table with 1000000 of sequential values from 1 to 1million (the C column was set same as ID column) using bulk insert statements – 1000 rows in each.

Loading the data was taking 9 seconds for 1 partitions, 10 seconds for 10 partitions, 16 seconds for 100 partitions and 23 seconds for 1000 partitions, which means it slows down 2.5 times as number of partitions increases to 1000.

This regression is somethat surprising as in reality only 1 (max 2) partitions got data inserted to them per insert statement.

As I tested the UPDATE path on INSERT OF DUPLICATE KEY UPDATE (adding ON DUPLICATE KEY UPDATE set c=c+1 to my bulk inserts) the regression became even larger.

1 partition insert took 50 seconds to complete, 10 partitions 52, 100 partitions 72, 1000 partitions 290 seconds. The performance loss with large number of partitions is about 6 times in this case and the pattern is a bit different – the slowdown becomes drastic as we go from 100 to 1000 partitions – about 4 times compared to only 1.5 times slow down for the same case with just insert.

The difference grows even larger if we remove index on column C – the “UPDATE” part of INSERT ON DUPLICATE KEY UPDATE completes in 22 seconds for 1 partition and 250 for 1000 partitions – which is over 10x difference.

Both MyISAM and Innodb are affected by this issue. The Update path for MyISAM without indexes took 10 seconds for 1 partition vs 52 seconds for 1000 partitions (the table_cache was warm for this test)

I had couple of suspects in this case – is this the “setup” overhead of statement opening all partitions for execution or is this “per row” overhead. To check this I ran the statements with different number of rows in the batch. With 100 rows per batch and with 10000 rows per batch performance was not significantly different from 1000 rows per batch so I conclude this is per row overhead.

This test gives other rather interesting data point – it looks like updates in Innodb for CPU bound workloads can be as much as 5 times slower than inserts – which is quite a difference.

CONCLUSION: This is interesting topic and I should do more tests on this. In the meanwhile the data point is simple – be careful with number of partitions you use. Creating unused partitions for future use may cost you. From this benchmarks the performance remains reasonable with about 100 partitions (which is reasonable number for most applications) though I would expect numbers to be significantly workload dependent.

Newest Most Voted
Inline Feedbacks
View all comments
Wayne Robinson

That is interesting and is completely the opposite of what I would’ve expected.

I always thought the insert decision would be trivial for this type of partition configuration.

Patrick Casey

Wonder if its doing some form of linear search against the whole partition list to figure out which partitiion a given row should get inserted into?


If you don’t mind me dropping a slightly related question, I’m planning on having Apache log to MySQL for a shared web platform that will have several thousand sites on it. It’s about the only safe and consistent way to achieve logging given quirks of the platform. I’ve been intending to have it partition based on domain name (which will also be an indexed field) given that at our best guess 80% of the sites probably only account for 20% of the traffic. I’m hoping in the next week to get some time to do some tests, but it worries me that you’re seeing slower inserts with large partitions. Do you think I’m liable to be hitting problems?


It’d be my preference to use separate tables, but I’m told it’s not an option with the method they’re planning on using to log to the DB (I’m not the architect, not really even a DBA, but about the closest we’ve got currently). I suppose I could figure out a way of splitting the data up using MySQLproxy to re-write the queries off to tables. I can’t imagine anyone wanting to combine data from different domains for any purpose, apart from maybe once in a blue moon reports that are hardly likely to be time critical.

Istvan Podor

Hey Peter,

In my experience it’s not just about writes. If you have 1k table and with a lots of data (50k rows) per each table and you try to query against an indexed data, mysql will go trough each index and if the index’s size is around a few megabytes for each table, this can dramatically slow down the usually fast unique/primary key look ups (as all partition(table) got its own indexes right?). It was the same for me with myisam-merge (what we could think about like an earlier partitioned table). And of course, we can’t forget about if you have a lots of queries where it have to go trough all of the indexes for each partition, that will screw up your index caches and may eat up a lots of memory and decrease your cache hits.

But, benchmark speak 🙂

Harrison Fisk

Have you seen http://bugs.mysql.com/bug.php?id=37252 ? This implies it is the locking that causes the slowdown (even for InnoDB/Falcon). Can you try your test with locking the table first, doing the INSERT ON DUPLICATE KEY UPDATE statements, and then unlock?


Say I have a table with 100 partitions, partitioned by some column id. If I look up rows from this partitioned table using an IN clause with 200 ids, it seems like it first figures out which partitions it needs to check, then looks in each partition for all 200 ids, causing a lot of index reads and slowness. When I split the large select query manually to do one query per partition, I saw a big speedup. Maybe something similar is happening with your insert…update?


See BUG#48846 for some interesting development in this area


Do not jump into wrong conclusions. On a production system, if you are partitioning, one of the benefits of having partitioning is to point each partition to a separate volume with eventually independent disk controllers. The level of parallel disk access would drastically benefit performance. If you have multiple partitions on the same disk/disk controller you are getting high I/O waits.


what you mean with having multible Files running partitioning? even with InnoDB? how?



I have very important question. Is this partitioning test for whole db or for table? How many partitions can i have per db?

Because i need to partition many tables. For example i can partition one up to 100 partitions but if i have 20 tables that i need to partition then i am about to have 2000 partitions.

As i understand partition becomes slow because of the amount of files, when u make too much of it. So does number 100 applies for db also or only to table?


I have 100k rows for every single day and searching is mainly by date. So is it reasonable to have partition for every single day? It would have more then 3k partition for 10 years data.

Is there simple answer for this?