MySQL Partitioning for Performance Optimization

I wanted for a while to write about using MySQL Partitioning for Performance Optimization and I just got a relevant customer case to illustrate it. First, you need to understand how partitions work internally. Partitions are on the low level are separate table. This means when you’re doing lookup by partitioned key you will look at one (or some of) partitions, however, lookups by other keys will need to perform lookup in all partitions and hence can be a lot slower. The gain from updates typically comes from having smaller BTREE on the active partition(s) which allows for a lot better fit. Having potentially fewer level in BTREE is not a significant issue.

So lets see at example:

The access pattern to this table is to lookup data by “uu” which has UUID values and when number of deletes by “id” and bunch of inserts. The deletes are mainly clustered around most recent id values.
The table (and index) is much larger than buffer pool size.

The first problem was the replication lag, which is mainly due to modifying the uu index. This is because UUID() spreads values prefix very well effectively giving almost uniform access to all BTREE. To solve this problem partitioning was a good choice – PARTITION BY HASH (id div 10000000) PARTITIONS 32 – This allows to partition data to 32 partitions placing sequential ranges of 10M values in the same partition – very handy if you have very active access to values which have been added to the table recently.

Using this trip replication could be sped up about 10 times as a couple of partitions which were actively used could fit in buffer pool completely so replication became CPU bound (single thread) instead of IO bound.

You could celebrate but hey…. you need to check the impact on master too. Master in its turn was getting a lot of lookups by the uu value which is not part of the partitioned key and hence we’re looking at 32 logical lookups, one per partition. True only one of the partitions would contain the value but many of them will require physical IO and going down to the leaf key to verify such value does not exist, which reduced performance for random selects by UUID from 400 to 20 per second (from single thread).

The decreasing number of partitions made replication less efficient but the number of selects the table could deliver was increasing and there seems to be a reasonable number which would allow replication to perform better when it is now, while selects still performed in the amount system needs.

What is a takeaway? When you’re creating partitions think clearly what you’re trying to archive. Partitioning is not some magic feature which just makes everything a lot faster. I’ve seen some people applying partition to basically all of their tables without much a thought and believe me results were not pretty.

Share this post

Comments (54)

  • Peter Boros


    In a case like this, I usually leave the master unpartitioned and partition only the slaves. You get the performance you want on the master and the slaves, the tradeoff is that slave cloning or rebuilding is a more time-consuming task. Is this a good practice? Why did you decide to lower the number of partitions on the master and the slaves too?


    December 11, 2010 at 8:42 am
  • Peter Zaitsev


    Right. This is the option. In the given case the purpose of the slave was basically failover and backups so it does not help to have a different schema.

    December 11, 2010 at 10:20 am
  • Ken

    Peter as I read your writing I get the impression that built-in partitioning is for specific, niche type applications or cases, if at all.

    Sharding on the application level seems much more wise getting high performance. You can basically get the same functionality it seems…

    December 11, 2010 at 3:29 pm
  • Peter Zaitsev


    Partitioning and Sharding are technologies with different focus. Partitioning is database side technology which allows in some cases to get query execution more efficient. It is transparent to application which is great. Sharding is application level technology where you change how application does queries to aggregate data from different tables and servers. If you use Sharding in most cases you do not use partitioning, but sometimes it is helpful still – you may shard on “user” and when use partitioning by date/time to separate new data from old.

    December 11, 2010 at 6:10 pm
  • Andy

    So when the table is partitioned by its primary key (id), the secondary index BTREE (uu) is also broken into 32 pieces?

    I was under the impression that when a table is partitioned by PK, secondary index BTREE would not be broken up. I guess that was a misconception?

    December 12, 2010 at 10:48 am
  • Peter Zaitsev


    You touch on very interesting topic. In MySQL partitioning is kind of lame because it is not really designed from scratch but rather added on top simply being more advanced form of “merge table” having multiple table underneath. This means MySQL can only deal with local indexes (each partition has its own index) – there is no support for global indexes; you also can’t partition indexes differently than you partition data (which can make sense in some cases). To have these however you need to have interface to indexes as a separate object, which is not the case inside MySQL – indexes are something which belongs to the table and managed as part of the table by storage engine.

    There are some good news though. for some workloads you can get an effect similar to global indexes by doing them as a separate index organized table.

    December 12, 2010 at 12:17 pm
  • Andy


    “for some workloads you can get an effect similar to global indexes by doing them as a separate index organized table.”

    Could you give an example of that? I’m interested in learning more details of it.

    December 12, 2010 at 12:47 pm
  • Peter Zaitsev

    Well, Look at the table in the post. It is partitioned by PRIMARY KEY. Lets say you are also planning to do some lookups by timestamp and you do not want that timestamp to hit all partitions, when you can create the table timestamp_index with primary key(ts,id) and same columns and when you need to do the lookup by timestamp you would lookup in that table and join it to original table.

    does it make sense ?

    December 12, 2010 at 1:29 pm
  • Rick James

    A UUID field should be
    for the following reasons…
    * Not VAR because the length is constant
    * Not (255) because some (which?) queries will unnecessarily use 255 in temp tables.
    * BINARY, not CHAR because utf8 (you are Internationalized, aren’t you?!) takes 3 bytes. Alternatively, you could use CHAR(36) CHARACTER SET ASCII, if you really need case folding on the hex characters.

    December 12, 2010 at 4:09 pm
  • Peter Zaitsev


    You’re right of course for UUID in general. In this case the data type is designed to store “unique name” where this given installation stores UUID out there. In general BINARY(36) is good if you do not mix case for your UUID or CHAR(36) latin1 if you do. If you can store it in UNHEX BINARY(16) would be best

    December 12, 2010 at 8:38 pm
  • Stephen Brooks

    Hi, I’m currently looking into an issue not to dissimilar to this but not found specific answers to my question and hoping this thread is a good place to put it.

    I have a database with 1 table which is MyISAM and has 30 partitions. The table has 3 columns, which are all varbinary of lengths (8,20,65500) the first 2, (A,B), make up the primary key and are used for the partitioning key as KEY(A, B). The table contains 60M rows and expected to reach 100M+, key buffer is 8G and 56% used.

    You mention in this article about reduction of partitions improving selection speed (replication is of no concern in this case) but would you expect the second key in the partition to increase performance, as I believe this is why it was added as oppose to ensuring uniqueness?

    Not all queries use both A and B, all use A.

    January 31, 2011 at 7:28 am
  • Todd McDaniel

    To whom it may concern,

    I have been a software developer and database administrator/developer for decades, focusing mostly on VB and SQL Server. I have worked with many companies, both large and small, but my experience has been mostly with low transaction volumes.

    I have a client that has a need for a low data volume (transaction size less than 2k) yet very high transaction volume (300k per second; although if broken down by location this could be as low as 10k transaction per second) application and they want to go cheap.

    By “going cheap” I mean they want to start by going with a hosting service using commodity hardware and MySQL with the client side PHP.

    My client has a great business concept (one of those “wish I would have thought of that!” things) and plan to start small but once “live” they will grow exponentially very quickly.

    I know sql server. I need to learn MySQL, both administration as well as development. What I need to know from you is some direction in what training I need to take. Please be detailed in what you think the training I need is. Any advice would be appreciated.



    September 18, 2011 at 5:42 pm
  • ankit

    Hi , i have a 10 GB MYISAM table , which is growing rapidly . Can anyone assist me in the below :
    I am trying to implement partitioning by range (based on primary key) . I tried some benchmarking , but the results were more or less the same .
    I am unable to understand that whether it is due to the reason that SELECTs on primary key ar also very fast , so partitioning won’t be of much help OR i have missed something while benchmarking .
    I will be really thankfull if somebody can suggest me to go with Partitioining or not .
    My table has a high number of Selects as compared to Inserts/Updates and Replication is in place and working fine

    November 14, 2011 at 4:04 am
  • Peter Colclough

    Hi Peter,
    I have a couple of tables on a high transaction site (4000 queries per second), that are in excess of 50m rows. I am trying to get a suitable partitioning schema together.
    The idea was/is to partition on date ( To_day(date field)), so we can access the latest quickly , and easily archive off earlier records by copying and dropping a partition . Unfortunately the tables have a PK of char , which doesn’t fall in line with the date syntax . Adding in the date field as the primary key would negate the usefulness of the current Pk column.

    Although with 5.5 we can use multiple columns , I don’t think that helps .

    Any ideas greatly appreciated. I saw your idea of a lookup table , which could work, but would require more space, and changes in a lot if code.

    November 17, 2011 at 3:21 pm
  • Rachel

    Hi Peter & everyone,
    I’m working on a multi-tenant saas project using MySQL as the DBMS, but it’s really hard to decide on the database structure. At first, I though shared database with multi-schema (or views in mysql) would be a great solution, because it requires minimal handling in code without commingling tenants’ data.
    For those who might be interested, this article explains the approach with example:

    However, after reading Peter’s article on the performance issue of mysql view (, I ran a little test to compare query performance between multi-schema (by querying from a view that shows only data of one tenant) & single schema (by using the same base table from which the view is generated, but WHERE tenant_id = XXX is added to retrieve tenant data), and my finding shows that single-schema is approximately 2x faster than multi-schema. Is it because MySQL has to generate the view everytime before processing a query?

    But still, I prefer not to handle tenant data separation in code if possible, although some say that with careful design, this is still pretty safe. So now I’m wondering, how does MySQL partitioning differ from view? And is MySQL partition pruning a decent implementation of multi-schema approach?

    May 31, 2012 at 8:21 pm
  • Peter Colclough

    Hi Rachel,

    Firstly ‘Partitoning’ and ‘Views’ are completely different terms, and not related. A ‘View’ is simply a stored piece of Sql that produces the same columns from a table, or group of tables. ‘Partitioning’ is a way in which the DB (MySql in this case), splits its actual data down into separate tables, but are treated from the Sql POV as a single table.

    It would be relatively simple to partition your tables on ‘tenant’, so that Mysql would only ever have to look at a given partition. This would be quicker on queries, as it wouldn’t have to trawl through all the other tenants records. You would still need to store the ‘tenant’ id in your copde, or as a Mysql variable.
    If you went this route, you could still use views to simpl;ify your code, and ensure that the tenant id is used by the view, and stored for each session.

    The difference in time between querying through a view, and querying directly, is that the view will have its own execution plan, which then executes the same single plan that you did directly (if you see what I mean). basically a View generates two queries, one for itslef, and one for teh query in contains.

    Hope this helps and doesn’t confuse

    Peter C

    June 1, 2012 at 12:59 am
  • Ashish Behl

    Hi Peter,
    Nice article.

    I am having a similar problem wherein I have around 1billion rows in a MYISAM table, with a primary key consisting of 3 columns (ObjInstance (10000 types), objectClass (100 types) and timestamp of addition). All the 3 columns are also indexed as these (mostly seperately or in a combination of 2) are used for querying.

    Inserts are @ 800 inserts per second. Deletes are for records older than 60 days..
    We tried both
    a) partitions on objectClass –> deletes were really slow to the extent of 50 deletes per sec in batch (NOT good if it does not match insert rate)
    b) MERGE tables based on dates (sharding the main table) –> Queries and inserts become really slow
    I understand that this might be because all the tables below the merge table are queried. Also, there might be locking happening when doing queries taking longer and inserts at the same time..

    I know mysql should have no problem in handling this ampunt of data but am not sure what am I doing wrong..
    So, I am wondering

    June 27, 2012 at 11:24 pm
  • Rick James

    @Ashish — Based on what you said, this is the best thing to do…
    objectClass TINYINT UNSIGNED
    Normalize, if necessary, to shrink the data sizes.

    RANGE partition on dates — 1 PARTITION per day. Then do a nightly DROP PARTITION and REORGANIZE PARTITION to add a new day. This will replace the DELETEs with “instantaneous” PARTITION operations.

    Multiple indexes + 800 rows/sec == you are lucky to make it work. How much RAM? RAID? SSD? What is key_buffer_size?

    Let’s see the actual CREATE TABLE, SHOW TABLE STATUS, and the SELECTs.

    > (mostly seperately or in a combination of 2)
    Note that INDEX(a) and INDEX(a,b) are redundant; the former is virtually useless.
    Note that the optimizer will rarely use two indexes in a single SELECT. Even if it would, usually a “compound” index would be better.

    June 28, 2012 at 10:14 am
  • Peter Colclough


    I agree with you about partitions. You may need something bigger in the way of ints, like medium or bigint, but essentially you are right.

    800 rows per second is not uncommon, especially if you bulk feed them on a prepared cursor . I have achieved 2 to 3 times that figure over a 6 month period on a development system. Our live system is regularly processing 4000+ queries per second over 4 servers.

    June 28, 2012 at 11:37 am
  • Ashish Behl

    Thanks for your suggestions Rick..
    Shall try your ideas first thing morning.. (But I have already started evaluating InnoDb for this…)

    I Had tried partitioning based on ObjectClass but dropped the idea as deletes were awfully slow.. am testing below mentioned schema now.. Am still in the testing phase and am evaluating InnoDb as alternative because deletes are awfully slow and my partitioned table with 800 million rows crashed..

    Our servers have 8GB RAM, 8 cores 3 GHz. high speed Raid level 1 disks

    my.conf parameters –>
    table_open_cache = 512
    sort_buffer_size = 32M
    read_buffer_size = 32M
    read_rnd_buffer_size = 64M
    myisam_sort_buffer_size = 64M
    thread_cache_size = 20
    query_cache_size =0

    Here is my current schema..

    CREATE TABLE ObjectInstances_MRG_T (
    ObjInstance VARCHAR(20) NOT NULL,
    Version SMALLINT(2) NOT NULL,
    Configuration INT(4) NOT NULL,
    Detail1 INT(4) NOT NULL,
    Detail2 INT(4) NOT NULL,
    Detail3 INT(4) NOT NULL,
    Detail4 TINYINT(1) NOT NULL,
    Detail5 TINYINT(1) NOT NULL,
    ObjectClass SMALLINT(2) NOT NULL,
    INDEX OOU (ObjectClass, ObjInstance, UTCTime),
    INDEX UTCTime (UTCTime),
    INDEX MSISDN (ObjInstance),
    INDEX GroupId (ObjectClass)
    ENGINE=MERGE UNION (ObjectInstances_Week1_T,

    (SEVEN more tables like this..)
    CREATE TABLE ObjectInstances_Week1_T (
    ObjInstance VARCHAR(20) NOT NULL,
    Version SMALLINT(2) NOT NULL,
    Configuration INT(4) NOT NULL,
    Detail1 INT(4) NOT NULL,
    Detail2 INT(4) NOT NULL,
    Detail3 INT(4) NOT NULL,
    Detail4 TINYINT(1) NOT NULL,
    Detail5 TINYINT(1) NOT NULL,
    ObjectClass SMALLINT(2) NOT NULL,
    PRIMARY KEY (ObjectClass, ObjInstance, UTCTime),
    INDEX UTCTime (UTCTime),
    INDEX MSISDN (ObjInstance),
    INDEX GroupId (ObjectClass)

    3 main SELECTS are ( replaced actual fields with “*” ):
    SELECT o1.* FROM ObjectInstances_T o1
    ,(SELECT ObjInstance, MAX(UTCTime) AS UTCTime FROM ObjectInstances_T where ObjectClass in (…) GROUP BY ObjInstance) o2
    where (o1.ObjInstance = o2.ObjInstance and o1.utctime=o2.utctime);

    SELECT o1.* FROM ObjectInstances_T o1
    ,(SELECT ObjInstance, MAX(UTCTime) AS UTCTime FROM ObjectInstances_T where ObjectClass in (…) and ObjInstance in (…) GROUP BY ObjInstance) o2
    where (o1.ObjInstance = o2.ObjInstance and o1.utctime=o2.utctime);

    “SELECT * FROM ObjectInstances_T
    where ObjectClass in (…) and ObjInstance in(…) and
    UTCTime between ‘2010-06-10 03:13:27’ and ‘2014-06-12 08:13:27’ order by UTCTime desc limit 50000;”


    June 28, 2012 at 12:14 pm
  • Rick James

    (This discussion with Ashish should be on )

    > 8GB RAM
    > key_buffer_size=512M
    I would recommend (if you stick with MyISAM) 1500M. See

    For InnoDB, innodb_buffer_pool_size = 5500M might be optimal. (Plus key_buffer_size = 20M)

    If you switch to InnoDB, keep in mind the issues raised in , starting with
    “Disk space for InnoDB is likely to be 2-3 times as much as for MyISAM.” (I highlight this because of your “billion” rows.

    > INT(4) — the 4 says nothing. The INT will be 4 bytes, range of +/-2 billion. Possibly overkill.

    > ObjInstance VARCHAR(20) NOT NULL,

    > INDEX OOU (ObjectClass, ObjInstance, UTCTime),
    > INDEX GroupId (ObjectClass)
    DROP the latter (as I mentioned earlier)

    MERGE is an antique version of PARTITION. Were you rebuilding the MERGE UNION every week? That should have been a _very_ cheap way to drop a whole _week_ at a time.

    (reformatting for readability):
    SELECT o1.*
    FROM ObjectInstances_T o1 ,

    ( SELECT ObjInstance, MAX(UTCTime) AS UTCTime
    FROM ObjectInstances_T
    where ObjectClass in (…)
    GROUP BY ObjInstance) o2
    where (o1.ObjInstance = o2.ObjInstance
    and o1.utctime=o2.utctime);

    SELECT o1.*
    FROM ObjectInstances_T o1 ,

    ( SELECT ObjInstance, MAX(UTCTime) AS UTCTime
    FROM ObjectInstances_T
    where ObjectClass in (…)
    and ObjInstance in (…)
    GROUP BY ObjInstance) o2
    where (o1.ObjInstance = o2.ObjInstance
    and o1.utctime=o2.utctime);

    SELECT *
    FROM ObjectInstances_T
    where ObjectClass in (…)
    and ObjInstance in(…)
    and UTCTime between ‘2010-06-10 03:13:27’ and ‘2014-06-12 08:13:27’
    order by UTCTime desc
    limit 50000;

    I don’t see any obvious optimizations to do. If you go with PARTITIONs, and partition on UTCTime, “partition pruning” will be a slight benefit with “and UTCTime between…”.
    I doubt if the first two SELECTs would benefit from _any_ form of PARTITIONing (or MERGE).

    Your first two SELECTs may benefit from designing “summary table(s)”, populating them as the data arrives, then SELECTing against them.

    The 3rd SELECT may work better in newer versions of MySQL; it seems to do a better job of IN in compound indexes. What version are you running? What does the EXPLAIN say?

    If those are the _only_ SELECTs you need, then DROP all your secondary keys; they are possibly useless. This will greatly speed up INSERTs.

    > PRIMARY KEY (ObjectClass, ObjInstance, UTCTime),
    Including a time in a PK is usually a mistake, unless the application _guarantees_ that it will not generate two rows in the same second!

    CHARACTER SET? I ask because you have a VARCHAR. Often people can get away with latin1, which has less overhead than utf8.

    Let me guess… SHOW TABLE STATUS says 50GB of data + 30GB of index (MyISAM) or 120GB + 100GB ? (The large PK will adversely impact InnoDB secondary keys.) The numbers will be roughly unchanged with MERGE or PARTITION.

    June 28, 2012 at 2:05 pm
  • Rick James

    @Peter …
    Sure, 4K INSERTs/sec is possible in some applications. XtraDB shines over MyISAM and InnoDB. TokuDB may shine even with lots of indexes. INDEXes can be a killer of INSERT rates on huge tables. His table is much larger than his RAM, so we need to “count the disk hits”.

    Dissecting his case…
    Inserting row(s) into the data — appending to the .MYD file: essentially 0 disk hits per row inserted
    Inserting into 4 indexes — still cannot tell how “random” they are: 0-4 disk hits.
    “RAID 1” — Let’s say 200 disk hits/second.
    Conclusion: The indexes have hot spots, and are not very “random” if he is getting 800 INSERTs/sec.
    Dropping the secondary indexes — can insert faster.

    If he switches to InnoDB, it will depend, again, on the randomness of the data.

    With either ENGINE, there is a potential benefit of PARTITION by day, even if the keys are random…
    IF all the data and indexes for one partition fits in RAM (key_buffer / buffer_pool), and IF data is inserted in (roughly) chronological order — THEN there will be a lot of activity in the cache, but no need to flush cache until the next day gets started. And, then, the flush will (mostly) be a background task. (XtraDB has a significant fix for smoothing out this operation; InnoDB periodically “stalls”.)

    In either engine, even without dropping the unused indexes, a 1-day partition will easily fit in his RAM.

    So, I stick with my suggestion of partitioning by day. But now it is a “win” for two reasons. (1) DELETE via DROP PARTITION, and (2) no I/O thrashing during the inserts. He may well be able to sustain your 4K/sec insert rate.

    June 28, 2012 at 2:22 pm
  • Ashish Behl

    Thanks again for the analysis..

    Have created a test table with the partitions@ described. Also, modified indexes –> 1 index (ObjectClass, Timestamp), PK –> (ObjectClass, ObjectInstance, Timestamp) – Unique.
    Using InnoDb engine as the client is very finiky about his records. Cant lose them due to a crash. Though there are nightly backups of the disk, still better to use InnoDb (though disk requirements increase 2.5 times but that’s separate issue)..

    Shall fill records into the same and then test..

    I missed mentioning few main points…
    1. Had to use multiple indexes as when I tested with the above mentioned index and PK, mysql did not select any index (saw using EXPLAIN) when my queries contained only one of the columns (Strange…??)
    2. We are using mysql ndb cluster version Distrib 5.1.56-ndb-7.1.19 (to maintain consistency across the other projects in the same group ). This base mysql version (5.1.56) supports partitions, but does not support partition pruning.
    3. The records in the production (as you rightly mentioned) should be in chronological order of timestamp.
    4. The underlying disk partition is DRBD, with replication to a second standby server for failover.. (This is not yet active in my setup but I think that this might impact performance..)
    5. (Did not mention this before as this is not the right forum for the point). Tried using concurrent_inserts=2 and the inserts fired from same machine still got blocked for long running queries fired before them from the same machine.. (Do not understand the issue.. This was another reason to switch to InnoDb)

    Lastly, its been decided today by senior architects that the servers shall also run the application server along with the Database and So, RAM would be doubled and processor cores increased (new server)..

    Another point worth mentioning.. This is my first ever encounter with a database, so, request you please ignore any stupidity and guide…

    June 28, 2012 at 10:10 pm
  • Peter Colclough

    @ashish A quick one on indexes. Don’t always trust explain. It may not always give you the right answer. In this case it did. The optimiser will rarely, if ever, chose a multi column index. However if you tell it to, then it will :

    Select * from mytable use index (mymulticolumnindex)….
    Select * from mytable force index (…)

    These are called hints that tell the optimiser what to do.

    With the throughput on your servers I find it strange they have decided to put the application on there too. That will only increase bandwidth and disk io. I assume there will not be many users involved

    June 28, 2012 at 11:27 pm
  • Ashish Behl

    You are right..
    The maximum number of users is 50.

    June 29, 2012 at 3:17 am
  • Rick James

    INDEX(a,b) will probably not be used for a SELECT that does not mention a.

    I seem many cases where INDEX (a,b) is picked by the optimizer. Sometimes it will use only a from the index; the only clue of this is Key_len in the EXPLAIN.

    EXPLAIN _usually_ reflects the query plan for the query, but it _may_ not.

    > This base mysql version (5.1.56) supports partitions, but does not support partition pruning.
    Eh? Is that a NDB deficiency? Pruning “always” works with InnoDB or MyISAM. (I quote “always” because there are cases where it is not as smart as you would like, but there are usually workarounds.)

    > This is my first ever encounter with a database, so, request you please ignore any stupidity and guide
    I pegged you as being relatively knowledgeable.

    USE/FORCE INDEX — Do these only as a last resort. As the data changes, and the query plan needs to change, these can bite you.

    DB and App on same machine — This _may_ not be a problem, especially if, say, the db is disk-intensive and the app is CPU-intensive. Do not allow swapping; that will kill MySQL performance.

    “50 users” — a useless metric without knowing what they are doing. 1000 light users might not stress the machine, or a single user might bring it to its knees.

    Is NDB involved or not? You have mentioned MyISAM and InnoDB; they have nothing to do with NDB.

    June 29, 2012 at 9:54 am
  • Ashish Behl

    I Have to modify all my queries to have at least time, class OR time, instance, class in all of them.
    Shall keep in mind your suggestions about explain and force index use.

    The mysql 5.1 Manual Section 18.4 says –>
    This section discusses an optimization known as partition pruning, which was implemented for partitioned tables in MySQL 5.1.6
    So, to test, I tried selects with the first where clause using partition column and without it. The results were same..

    Coming to the application, (I cannot describe the exact nature and app due to various reasons, which I am sure all of you will appreciate)
    The application in itself is much more complex than what I might have provided a picture of.
    It uses
    — 2 application servers, both running on 2 nodes (1 redundant). So, 4 app servers.
    — 2 Databases (Separate mysqld instances)
    a) Clustered NDB, on the two nodes mentioned above
    b) InnoDb/MyISAM on the replicated DRBD partition on these two nodes. (This is what we have been discussing — do not have this in my test setup though)

    The first application server does the job of only providing some static data from a bunch of saved files on disk.
    This does not interact with the DB at all.

    The second App server has to interact with both the DBs, storing records @ 800 rps. deletion from NDB database has to happen very frequently. Deletion from MyISAM/InnoDb would happen everyday for records older than last 60 days.

    50 users would simultaneously access the database, with the 3 queries that I mentioned before, almost all the time. So, I guess that there would be a fair amount of load on the database (1 query per user per 1 min – worst case).

    Have inserted large number of records into the new InnoDb table as mentioned before, using load infile.
    Query result ( during inserts 100 rows per sec) : Query 3 in 30 sec. Queries 1, 2 don’t return till long. Shall have summary table for this.
    Query result (No insets) – 4 secs for Query 1.
    Total Disk space used: 800 million records – 95 GBs.

    July 2, 2012 at 10:16 pm
  • Ashish Behl

    rephrase “do not have this in my test setup though”
    to “do not have DRBD in my test setup though”

    July 2, 2012 at 11:35 pm
  • Peter Colclough

    When you say the queries ‘were the same’ are you convinced you weren’t reading from the cache on the second query? I suspect you were.

    However, judging from your description if the application, database, and the fact you have been told to run this in one machine, I have to say that I think it us time to revisit the whole architecture and setup of what you are doing.
    Personally, with more than 30 years of experience, I can see no logical reason for using 3 different flavours of MySQL in the same application. I also suspect that your deletion Methodology should be altered.
    Sorry if this sounds harsh , but if this is really your first move into database usage at this level , I would strongly suggest you get an experienced DBA on board to give you a hand.

    Good luck with your development

    July 3, 2012 at 2:18 pm
  • Michael


    Just wanting to ask you, i got a MySQL table with about 1.200.000 rows, I want to partition it, but I don’t know what’s the best way. My table schema is: rowId (int PK), content (longtext), content_URL(text). I’m searching in the table by content_URL, as much as I understand I should set content_URL as “key”, but I don’t understand by what to partition because this is a text, I know how to partition by int (its simple), but by a text field? I want to speed up my query’s…

    Any ideea’s?

    July 28, 2012 at 1:30 am
  • Rick James

    Michael, why do you think you want to partition?

    Let’s see SHOW CREATE TABLE (without partitions), and the various queries (SELECTs, DELETEs) that you are afraid won’t perform well. If your only search is by content_URL, PARTITIONing will buy you nothing.

    July 30, 2012 at 9:52 am
  • Michael

    Here is the show create table:

    CREATE TABLE tags (
    id int(255) NOT NULL AUTO_INCREMENT,
    word text NOT NULL,
    url text NOT NULL,
    PRIMARY KEY (id)

    I have 1,263,009 total results

    Some query’s (I have only selects now) and some randoms

    SELECT word FROM tags WHERE url = ‘sample-url’
    SELECT word,url FROM tags WHERE id >= FLOOR(RAND()*( SELECT MAX(id) FROM tags))

    July 30, 2012 at 11:55 pm
  • Michael

    Forgot to ask about the random tags, if I try this query:


    The server dies for about 2-3 seconds

    SELECT * FROM tags WHERE id >= FLOOR(RAND()*( SELECT MAX(id) FROM tags)) LIMIT 0,30

    This works preatty fast but its not really random, i get 50% duplicates on each query

    Any other ideea’s what can I do about the random part?

    Will partitioning save me?

    July 31, 2012 at 3:47 am
  • Rick James

    Suggestions on fetching a RANDOM row:

    SELECT … WHERE url =
    needs an index; else it will do a full table scan.
    Even INDEX(url(50)) would be better than nothing.

    PARTITION would help neither of your issues.

    INT(255) is meaningless — INT is a 32-bit number regardless of the value you put after it.
    TEXT is probably the wrong thing to use for word; consider VARCHAR(40).

    July 31, 2012 at 8:58 am
  • Michael

    I don’t understand this part:

    SELECT … WHERE url =
    needs an index; else it will do a full table scan.
    Even INDEX(url(50)) would be better than nothing.

    What do you mean? An example query? I need to set URL as a index?

    July 31, 2012 at 9:12 am
  • Peter Colclough

    What Rick means is that without an index on url (the query you quoted was:
    SELECT word FROM tags WHERE url = ‘sample-url’)
    the db will start at row 0 (zero), and read, row by row down teh table till it finds a match. If the match is the last row in the table it will read all 1.2million rows. This is called a ‘full table scan’…..and is absolutely something you should never ever do, unless your table has 10 rows.

    If you do something like:
    create index xurl_indx on table(url(50))
    this will; create an index on the first 50 characters of your url, which would then be used in the query and would make life a lot easier for your DB


    July 31, 2012 at 9:18 am
  • Michael

    Thanks guys because of you my query time is now 0.0002 sec and before 0.2334 sec. I added this blog to my bookmark, thanks!

    July 31, 2012 at 9:51 am
  • Raghu Sastry

    I had a similar situation where in lookups in master countered the partition effect for writes. I took an alternative approach – The partition on master was different from slave. The slave partition was made to satisfy the read queries for which the partition pruning kicked in. However on the master, without partitioning, the rate of inserts fell as the size of the table increased. So by making sure that if the size of a single partition is less than innodb buffer pool size than we got a constant through put for inserts on the master. Yes we invited maintenance overhead as the partition structures were different and had to be careful enough for upgrades and patches. Is this a good approach? any suggestions?


    September 14, 2012 at 12:47 am
  • Rick James

    Michael — Are you sure the Query cache did not kick in? Try SELECT SQL_NO_CACHE (etc).

    Raghu — I’m surprised you don’t have trouble on the Slaves. Remember that every INSERT on the Master must be also done on the Slave(s). Are you using SBR or RBR? Would you share your two CREATE TABLEs so we can get a better feel for what you are saying.

    September 14, 2012 at 8:33 am
  • John

    Hello Peter,

    If i have a table with 20,000,000 rows around 50GB and if i made just simple querys like:

    select * from table where id = ‘1549090’

    Creating a partition by key(id) (30 partitions) will increase my performance for selects and inserts?


    April 25, 2013 at 7:39 am
  • Peter Colclough


    In theory you will see a significant increase in your Selects, as you will be querying on a smaller dataset. You shoould have a faster insert as well, as you will be adding into a smaller dataset/index. Partitions are effectively a ‘chunked’ table, which requires the DB finding out which ‘chunk’ to work on, then inserting or reading your data.

    The only area you need to be careful of is if you are issuing queries against this table, but NOT using the id column. This could cause a longer running query, as the engine needs to read through all the ‘chunks’.


    Peter C

    April 25, 2013 at 8:38 am
  • Rick James

    “select * from table where id = ’1549090′” — needs only PRIMARY KEY(id). PARTITIONing would not speed up the SELECT. (Yes, I disagree with Peter C.)

    April 29, 2013 at 5:53 pm
  • Peter Colclough

    John, Rick… I feel a discussion coming on 🙂 .
    I actually gave the short answer, Rick has a point but there are generally other considerations to take into account. He also didn’t answer the ‘Insert’ part.

    1. With a standard numeric key id, set as a Primary Key, it will probably be quicker, or the same response, on a normal table as against a partitioned table.
    2. The Insert should be quicker on a partitioned table, as it will be adding to a smaller index, and a smaller ‘chunk’ od table. Depending on how often you write to the table, you may not see the difference.


    1. If you have uneven reads across the DB (most appearing in one section …latest writes etc ), it will probably be quicker in the long term to read from a partitioned table, as, depending on the size of the partitions, the whole partition can be cached, as opposed to a single table, where probably only recently read records will be cached, resulting in more disc reads.

    2. If you need to delete records from this table, those will be quicker, as they will come out of the partition, not the whole table.

    3. Insert on Duplicate Key (I dont do replace… ever)… if across partitions, will be slower, if in the same partition will be faster.

    4. If you do queries on any other column, these could easily be quicker on a partitioned table, as Mysql can fire off one thread per partition (at least I have seen this happen in a MyIsam environment on similar sized tables).

    What I am really saying is that its not only about the ‘Selects’ and Inserts, its also about what you do with the table. Removing partitions is easier and quicker than deleting a few hundred thousand rows… for example, if you need to delete.

    However, thank you Rick for pointing out my error in the original post…. it was misleading, but wasn’t meant to be…

    April 30, 2013 at 4:53 am
  • Rick James

    Yeah, a discussion is brewing.

    You pointed out an important use case for PARTITIONing (“uneven reads”, Re: #1); let me rephrase the use case —
    * The entire table is too big to be cached, but
    * One partition is small enough to be cached, and
    * Your accesses are to that one partition.

    In one instance, I take advantage of that — I have 10 years of data, but usually queries hit the last week or two. I have only 20 partitions; they are PARTITION BY RANGE (to_days(…)). However, the ranges are uneven — older partitions cover larger timespans. The last few partitions are one week each. So, even if the user’s ad hoc query leads to scanning the entire last two partitions, performance is not bad. (The last 2-3 partitions tend to stay in the buffer_pool.)

    If I recall correctly, any INSERT/DELETE/etc always opens all partitions, even before thinking about pruning. Seems like this is being fixed in 5.6; not sure about Percona or MariaDB.

    (Re #4) There are _no_ parallel queries (within a single connection) anywhere in any ‘free’ variant of MySQL. PARTITIONs are scanned one at a time. (Please provide specific references if you find otherwise.)

    I would argue that single-row queries (SELECT, INSERT, DELETE, UPDATE) are similar in speed between PARTITIONed or non-PARTITIONed. When PARTITIONed, first it has to find and open the required partition, then it drills down a BTree that might be one level shallower than the non-partition equivalent.

    Even single-row writes are likely to be slower in PARTITIONed tables unless you include something to facilitate pruning.

    A billion rows in a BTree is only (about) 5 levels deep. If you split that table into 100 partitions, the BTree in each partition would be about 4 levels. A “point query” in a billion-row (non-partitioned) table will, at worst, hit the disk only 5 times. (Usually 4 of the 5 are cached from previous queries.)

    (Re #2) I have rambled on about things as a preface to rebutting “delete records from this table, those will be quicker, as they will come out of the partition, not the whole table”:
    * If the DELETE is a single row based on a UNIQUE/PRIMARY key and it can prune, the delete involves (1) find partition (if partitioned), (2) drill down the BTree, (3) remove one record. PARTITIONing makes very little difference.
    * If it cannot prune, then partitioned would be slower, since it would attempt the DELETE in all partitions.
    * A ‘range’ delete would mostly follow above two points.
    * Secondary indexes are also BTrees, so they need updating (although delayed). Yes, their BTrees may be one level shallower when partitioned, hence slightly faster.
    Conclusion: Not much difference.

    Yes, DROPping an ‘old’ partition is very efficient, much more efficient than DELETEing the equivalent rows. This is, in my experience, the main use case for PARTITIONing. I go into more details (and code) here:

    The third use case for PARTITIONing involves 2D-like indexing. I have not found a 4th use case.

    April 30, 2013 at 2:03 pm
  • Ian Baldwin


    Great article and had fun reading all the responses. After implementing what was suggested I have also managed to reduce my query time greatly.

    I will bookmark this blog for future reference.

    Best regards


    June 3, 2013 at 4:29 am
  • Colin MacKenzie

    Hi Peter,

    When you use partitions are the columns in the partition, if using HASH|KEY method, stored redundantly in each partition? It seems to me since the values of the KEY would be uniform over the partition that this would be unnecessary and these column values could be restored during queries based on the partition they come from. Not so of RANGE partitions, etc,of course.


    June 22, 2013 at 11:01 pm
  • Rick James

    Think of a partition as being a table in its own right. This implies that it would have the HASH/KEY value in it. (The redundancy does not waste much space, anyway.)

    June 23, 2013 at 4:06 pm
  • Colin MacKenzie

    Thanks Rick, I knew they were implemented as basically separate tables, I thought perhaps it still may be optimized out. I am not sure why it would *not* waste much space, I have some tables that have a 24 byte row length, with a 12 byte PK. I could save about 6 bytes or 25%…nothing to sneeze at! 🙂

    June 23, 2013 at 4:12 pm
  • jayaram

    Hi Peter Zaitsev,

    I have a requirement that maintaining of daily transactions information of our application in tables for logging purpose and also we are having tools for analyzing the logs based on that tables .

    The problem is , we have one table let us say ‘x-table’ one day size is 2GB. we are merging the data every day by using ‘ insert into x-total-table select * from yesterday.x-table’. In this way we are merging the data. then size of x-total-table becomes very huge and merging also takes long time.

    Hence, I’m thinking about a solution to get rid these two problems 1) DataSize , 2) Merging Time, For that i would like to create day-wise tables and based on that i will like to create merge table.

    Could you please suggest me the right way.

    July 17, 2013 at 8:32 am
  • Tom Diederich

    Hi jayaram,

    I’m Percona’s community manager. Thanks very much for your question. Our discussion forums are a more appropriate place for questions like the one you have. The great thing about the forums is that you can get advice from the community in addition to just Percona experts.

    July 17, 2013 at 9:11 am
  • mathias

    jayaram question is intresting, and I would love to know what would be wise?

    When I think about restoring data, I guess many tables become more handy. Fixing a SQL-Dump (because maybe of upgrades) on the console. grep a 50GB file! Good by my RAM 🙂

    May 15, 2014 at 7:21 pm
  • Andrei


    I have this table below , which is partitioned and i need to extend the number of partitions beyond 2016-07-26. Let’s say i want to create partitions untill 2020-01-01. In order to achieve this i must delete the old set of partitions and recreate a new set ? I don’t really understand how this stuff works , and i’m afraid not to mess things up somehow. When these partitions were created , the original table event.Event was not affected ? I mean , the partitions and the original table coexist ? or the original table was altered and the data spread across these partitions ?
    Can someone please help me to do this ?

    > show create table event.Contact \G
    *************************** 1. row ***************************
    Table: Contact
    Create Table: CREATE TABLE Contact (
    PK_Contact int(11) NOT NULL AUTO_INCREMENT,
    FK_Notification int(11) DEFAULT NULL,
    EK_User int(11) DEFAULT NULL,
    EK_Account int(11) DEFAULT NULL,
    EK_Oem int(11) DEFAULT NULL,
    Type int(11) DEFAULT NULL,
    DateAdded datetime DEFAULT NULL,
    Destination varchar(255) DEFAULT NULL,
    EK_Provider int(11) DEFAULT NULL,
    NumAttempts int(11) DEFAULT ‘0’,
    FK_Attempt int(11) DEFAULT NULL,
    Status int(11) DEFAULT NULL,
    Subject text,
    Contents text,
    TemplateS varchar(120) DEFAULT NULL,
    TemplateC varchar(120) DEFAULT NULL,
    ContactTimestamp int(11) NOT NULL COMMENT ‘unix time stamp from the server clock’,
    PRIMARY KEY (PK_Contact,ContactTimestamp),
    KEY FK_Notification (FK_Notification),
    KEY EK_User (EK_User),
    KEY EK_Account (EK_Account),
    KEY Destination (Destination)
    /*!50100 PARTITION BY RANGE (ContactTimestamp)
    (PARTITION p2014_07_28 VALUES LESS THAN (1406505600) ENGINE = InnoDB,
    PARTITION p2014_07_29 VALUES LESS THAN (1406592000) ENGINE = InnoDB,
    PARTITION p2014_07_30 VALUES LESS THAN (1406678400) ENGINE = InnoDB,
    PARTITION p2014_07_31 VALUES LESS THAN (1406764800) ENGINE = InnoDB,


    PARTITION p2016_07_25 VALUES LESS THAN (1469404800) ENGINE = InnoDB,
    PARTITION p2016_07_26 VALUES LESS THAN (1469491200) ENGINE = InnoDB,
    1 row in set (0.03 sec)

    March 6, 2017 at 9:59 am
  • Gregg

    There needs to be a mechanism for time series partitioning, to have a small write partition which is merged into a larger archive partition and then the archived moved forward and a new daily, hourly, whatever you need for write speed partition created

    Imagine starting with partitioning by performing the following partitioning.

    alter table timeseriesData
    PARTITION BY RANGE (to_days(nodedatetime))
    PARTITION from20173108 VALUES LESS THAN (to_days(‘2017-09-01’)) ENGINE = InnoDB,
    PARTITION from20173009 VALUES LESS THAN (to_days(‘2017-10-01’)) ENGINE = InnoDB,
    PARTITION from20170110 VALUES LESS THAN (to_days(‘2017-10-02’)) ENGINE = InnoDB,

    Now, each day after the first of October, you’d like to have a small partition for inserting into, each day (our experience is that it’s roughly an order of magnitude faster when the innodb_buffer_pool is a write through cache).

    create tempTable like timeseriesData;
    alter table tempTable remove indexing;
    alter table timeseriesData
    exchange partition from20170110 with table tempTable;
    insert into tempTable
    select * from timeseriesData where nodedatetime >= ‘2017-10-02’ and nodedatetime < '2017-10-03';
    alter table timeseriesData
    exchange partition from20170210 with table tempTable;
    drop table tempTable;
    alter table timeseriesData drop partition from20170110;

    Given any particular day moving forward, you would have a partition for each major interval you wish to archive on (month here), and you have a partition for this month with everything from yesterday to the beginning of the month, todays partition, and the future partition.

    The exchanges above, which provide the ability to move yesterdays data into the archive for the month, hide the archive data for the month from queries. Realistically, we just need to be able to copy the records from yesterday, into the month archive, exchange the two partitions to bring the archive forward to yesterday, and remove the small partition for yestedays data.

    All of this should be capable of being done without locks and without blocking writes, as the affected partitions are not going to be written to (only future is written until we create todays partition).

    Reorganizing partitions is painful in write heavy applications and timeseries data in particular has nice windows of "write" vs "read only" access. If a partition could be marked as "read only", that would allow the reorganization to happen without worries of mutable content that could corrupt the data traversal and movement.

    The copy of data which violates the key on the partition (copying yesterdays data into the archive partition) is only an issue related to it not being visible in the archive partition, until the keying on the partition is changed. The exchange of partition and table is painful because of key analysis blocking write access to the partition while the comparison of keys is being done. The table should be locked, but not the partition that will be swapped to. If it is deleted before the swap occurs, then fail at the time of the actual swap.

    Somehow, this kind of "small write partition" with "large archive partitions" needs to be readily accounted for in MySQL partitioning.

    October 17, 2017 at 9:26 am
    • Gregg

      An additional detail to this, is simply that if I am partitioning on a to_days(date) key, the swap of a partition and table takes a full table scan to evaluate whether the table fits into the partition. There is ample opportunity for optimization here. Each of these functions needs an internal function which can interact with the partition swap to see that the keyset works just fine. In the case of to_days, all we need is the minimum and maximum date values from the primary key to discover that the to_days(date) function returns values that fit within the partitions keyset. It takes a very long time, currently, for the partition to swap with the table. It appears to me that a full table scan is happening, given the time. In the above implementation, the trailing archive table is growing larger and larger, which means that over time, the delay for write access back to the table that the partition is in (timeseriesData table) will create a huge back log of data meaning that the applications using the data, will have a delayed view of reality which is a huge problem for mean applications using timeseriesData.

      October 19, 2017 at 9:24 am

Comments are closed.

Use Percona's Technical Forum to ask any follow-up questions on this blog topic.