In my post on June 18th, I explained why the semantics of normal ad-hoc insertions with a primary key are expensive because they require disk seeks on large data sets. I previously explained why it would be better to use “replace into” or to use “insert ignore” over normal inserts. In this post, I explain why another alternative to normal inserts, “insert … on duplicate key update” is no better in MySQL, because the command incurs disk seeks.
The reason “insert ignore” and “replace into” can be made fast with TokuDB’s fractal trees is that the semantics of what to do in case a duplicate key is found is simple. In one case, you ignore, and in the other, you overwrite. With specific tombstone messages defined for these simple semantics, we defer the uniqueness check to a more opportune time.
The semantics of “insert … on duplicate key update” are not simple:
The problem is we do not have a way of encoding the SQL update function into a message, the way we are able to encode “replace into” as an ‘i’ and “insert ignore” as an ‘ii’. If we did, we could similarly make “insert … on duplicate key update” fast.
I am not claiming that this is not theoretically possible, just that the storage engine API in MySQL does not allow for the encoding of updates as messages. Instead, what MySQL does is the following:
The storage engine API does not have any access to the function that applies an update to the existing row. This is why the storage engine has no way of encoding any SQL update function (even some simple ones, such as “increment column a”).
So, in the meantime, to implement these semantics, B-trees and Fractal Tree data structures both:
The first step incurs a disk seek on large data sets with an ad-hoc primary (or unique key). And that is why it is slow.
So, the moral of the story is this. In MySQL, “insert … on duplicate key update” is slower than “replace into”. Although the sematics are slightly different in the case where the primary key is found (the former is defined as an update, whereas the latter is defined as a delete followed by an insert), if possible, the simpler semantics of “replace into” allow it to be faster than “insert … on duplicate key update”.
Percona’s widely read Percona Data Performance blog highlights our expertise in enterprise-class software, support, consulting and managed services solutions for both MySQL® and MongoDB® across traditional and cloud-based platforms. The decades of experience represented by our consultants is found daily in numerous and relevant blog posts.
Besides specific database help, the blog also provides notices on upcoming events and webinars.
Want to get weekly updates listing the latest blog posts? Subscribe to our blog now! Submit your email address below and we’ll send you an update every Friday at 1pm ET.