EmergencyEMERGENCY? Get 24/7 Help Now!

Why “insert … on duplicate key update” May Be Slow, by Incurring Disk Seeks

 | July 14, 2010 |  Posted In: Tokutek, TokuView


In my post on June 18th, I explained why the semantics of normal ad-hoc insertions with a primary key are expensive because they require disk seeks on large data sets. I previously explained why it would be better to use “replace into” or to use “insert ignore” over normal inserts. In this post, I explain why another alternative to normal inserts, “insert … on duplicate key update” is no better in MySQL, because the command incurs disk seeks.

The reason “insert ignore” and “replace into” can be made fast with TokuDB’s fractal trees is that the semantics of what to do in case a duplicate key is found is simple. In one case, you ignore, and in the other, you overwrite. With specific tombstone messages defined for these simple semantics, we defer the uniqueness check to a more opportune time.

The semantics of “insert … on duplicate key update” are not simple:

  • if the primary (or unique) key does not exist, insert the new row
  • if the primary key does exist, perform some update as defined in the SQL statement

The problem is we do not have a way of encoding the SQL update function into a message, the way we are able to encode “replace into” as an ‘i’ and “insert ignore” as an ‘ii’. If we did, we could similarly make “insert … on duplicate key update” fast.

I am not claiming that this is not theoretically possible, just that the storage engine API in MySQL does not allow for the encoding of updates as messages. Instead, what MySQL does is the following:

  • call handler::write_row to attempt an insertion, if it succeeds, we are done
  • if handler::write_row returns an error indicating a duplicate key, outside of the handler, apply the necessary update to the row
  • call handler::update_row to apply the update

The storage engine API does not have any access to the function that applies an update to the existing row. This is why the storage engine has no way of encoding any SQL update function (even some simple ones, such as “increment column a”).

So, in the meantime, to implement these semantics, B-trees and Fractal Tree data structures both:

  • look up the primary (or unique) key to verify existence
  • take the appropriate action based on whether the primary (or unique) key exists

The first step incurs a disk seek on large data sets with an ad-hoc primary (or unique key). And that is why it is slow.

So, the moral of the story is this. In MySQL, “insert … on duplicate key update” is slower than “replace into”. Although the sematics are slightly different in the case where the primary key is found (the former is defined as an update, whereas the latter is defined as a delete followed by an insert), if possible, the simpler semantics of “replace into” allow it to be faster than “insert … on duplicate key update”.



  • I understand your reasoning, but In the past we have ran into trouble with replace on Innodb. As I’m sure you are aware, Replace always deletes the row, then inserts it with the new values. If you do this often enough you can end up fragmenting the innodb table space. With a fragmented table space things just get slower on rotational disks. Plus, you run the risk of running out of tablespace much faster than you would normally.

  • I dont see the point of this post because as you say “replace into” and “insert… on duplicate key update” have different semantics so why are you trying to compare them as though they are apples to apples when in fact its an apples to orange comparison.

    “insert … on duplicate key update” is the fastest solution, because other semantically comparable functions are to do a “select” in application code then via an “if” statement perform an “update” or “insert” afterwards. This is still a poor way to handle it as you would have to lock the table to avoid primary key collisions in a high concurrency environment if done in application code. It could also be done in a stored procedure but still is not much better.

  • Too bad we can’t see the code to verify what you’re saying.

    Plus, I agree with Paul that you’re mixing semantic actions here. REPLACE and INSERT … ON DUPLICATE KEY UPDATE are very different semantic actions.


  • William, unlike InnoDB or MyISAM, TokuDB doesn’t fragment as it age. It doesn’t suffer from the problem you outline. You might like to try TokuDB. It costs nothing for development and evaluation, and it costs nothing for datasets which are less than 50GB. (The space used counts the uncompressed size of tables, and doesn’t count the sizes of any indexes.)

  • Paul (and Jay), it’s true that if your application actually requires the semantics of “Insert … on duplicate key update”, it’s probably about as fast as it can get. Many applications can use “replace into” or “insert ignore”, however.

    In many storage engines, there’s little difference between the performance of these alternatives. In TokuDB there is a huge difference. Even in InnoDB there’s a difference, but it’s typically not as dramatic. I’ve found that it’s important to know what’s fast and what’s slow and what’s fast.

    Zardosht isn’t saying “look these two things which are the same have very different performance”. He’s saying “these two things which have different semantics have very different performance”. The implied question is “do you really need the slow version, or can you reorganize your application to use the faster one. Here “faster” can be two orders of magnitude, so it’s worth thinking about.

  • Jay, I don’t know what you mean by “too bad we cannot see your code to verify what you’re saying”. I’ll take a stab at responding, but if somehow I’ve missed the point of your comment, please feel free to set me straight.

    Zardosht is making a claim about the performance difference between a couple of different ways to perform similar insert operations. The fact that they are not quite the same thing is part of what makes it an interesting topic. You seem to be saying that you’d like a way to verify that TokuDB ‘s performance actually is significantly different for these different operations, and that you’d like to see the source code to help with that verification.

    I think if you’re interested, you can verify what Zardosht is saying by measuring the performance of the system, without looking at any source code. (I find it’s tough to predict performance from looking at code, anyway.) But if you really want to see the code, you can look at our handlerton (published under GPL), where most of the implications play themselves out. You wouldn’t have to look inside the closed-source component of TokuDB to verify Zardosht’s performance claims.

    The closed-source fractal tree has primitives such as “put-with-overwrite”, which is fast, and “get” which is typically much slower. If you want to know how “put with overwrite” can be made much faster than “get”, you can take a look at the various presentations that I’ve given (e.g., at MySQL conference or OpenSQL Camp) on how Fractal Trees actually work. If you want to see how we implement “insert … on duplicate key update” in terms of the fractal tree primitives you can look at our GPL’d handlerton code.

    Hope this makes sense.

  • In my case:
    with a small table:
    email ( primary )
    is_bounce ( 0/1 )
    is_unsubscribe ( 0/1 )

    REPLACE INTO executed a lot more faster that INSERT … ON DUPLICATE KEY UPDATE for ~70k rows

    I had to parse a CSV file and at each 100 rows parsed I did an INSERT … ON DUPLICATE KEY UPDATE
    each row was a email address, and I had to update the flags accordingly, simple task.

    this took about 10 min on my machine.

    After using REPLACE INTO it toked less that 30s .
    Huge difference. Thank you!

    • Time for you to get a new machine or check your query for where the bottleneck is then. ON DUPLICATE UPDATE only takes 20min on just over 2 million rows and 40 fields on my machine. However for processing such as that I do not use PHP I use C++ with the libmysql library. I don’t recommend PHP for updating a table with that many rows as you will most certainly time out and setting your server, PHP ini or my.ini to not timeout or increasing the timeout to some obscene amount like I have seen many do is just bad coding because that will lead to all sorts of problems if some unknown problem does occur and may eventually lead to your MySQL server crashing under the load.

  • This information needs to written in bold in the official docs, as it is critical to getting good TokuDB performance and it is not obvious – the behaviour is very different to InnoDB.

    I recently converted a geographic point database with millions of rows from InnoDB to TokuDB. The primary key is a composite key consisting of Latitude and Longitude.

    The database dramatically reduced in size on disk and read queries became much, much faster.

    However, inserts became slow, often reducing to 1 per second and what’s worse, they would quickly saturate multiple CPUs and slow down the execution of all other database queries on the same server, causing request timeouts in our webapp.

    The only idea I had was to reduce compression from LZMA to ZLIB. This reduced the severity of the problem, but did not eliminate it.

    I stumbled on this post at 3am after many hours of debugging and testing. As soon as I changed the insert code from using INSERT ON DUPLICATE UPDATE to REPLACE INTO, good performance was restored and the problems disappeared.

    This is what the TokuDB docs need to explain succinctly:

    *TokuDB’s use of tombstones and how they affect the performance of different types of SQL queries

    *Choosing the best insert method based on whether your table has an AUTO_INCREMENT primary key where data is always inserted at the end or whether your table is clustered by primary keys where inserts can take on random values and be inserted at any position within the table.

    I am certain that others have experienced these difficulties and have abandoned TokuDB out of ignorance, which is a shame because it’s otherwise fantastic.

  • Just to add on here based on my experience.

    I work at a software company in London and we do a bulk insert/update for 4M IDs everyday.

    When I first put together the process I was using ON DUPLICATE, which was taking north of 40 minutes. After using REPLACE INTO I was able to perform the same operation in under 3 minutes, which in a production environment makes ALL the difference.


  • REPLACE INTO is more destructive in my opinion due to row delete and index reconstruct after that. So in theory, updating data should take less disk I/O, doesn’t it?

Leave a Reply