EmergencyEMERGENCY? Get 24/7 Help Now!

Making “Insert Ignore” Fast, by Avoiding Disk Seeks


Posted on:

|

By:


PREVIOUS POST
NEXT POST
Share Button

In my post from three weeks ago, I explained why the semantics of normal ad-hoc insertions with a primary key are expensive because they require disk seeks on large data sets. Towards the end of the post, I claimed that it would be better to use “replace into” or “insert ignore” over normal inserts, because the semantics of these statements do NOT require disk seeks. In my post last week, I explained how the command “replace into” can be fast with TokuDB’s fractal trees. Today, I explain how “insert ignore” can be fast, using a strategy that is very similar to what we do with “replace into”.

The semantics of “insert ignore” are similar to that of “replace into”:


  • if the primary (or unique) key does not exist: insert the new row
  • if the primary (or unique) key does exist: do nothing

B-trees have the same problem with “insert ignore” that they have with “replace into”. They perform a lookup of the primary key, incurring a disk seek. We have already shown how fractal trees do not incur this disk seek for “replace into”, so let’s see how we can avoid disk seeks with “insert ignore”.

The only difference with “replace into” is when the primary (or unique) key exists, instead of overwriting the old row with the new row, we disregard the new row. So, all we need to do is tweak our tombstone messaging scheme (that we use for deletes and “replace into”) so that when “insert ignore” commands do not overwrite old rows with new rows. Similar to deletes and replace into, with this scheme, “insert ignore” can be two orders of magnitude faster than insertions into a B-tree.

Here is what we do. We insert a message into the fractal tree, with a new message “ii”, to signify that we are doing an “insert ignore”. The only difference between this message and the normal “i” message for insertions is what we do on queries and merges. On queries, if the message is an “ii”, then the value in the LOWER node is read, and not the higher node. On merges, if the higher node has a message of “ii”, the value in the LOWER node takes precedence over the value in the higher node.

Let’s look at an example that is similar to what we looked at for “replace into”:

Suppose the fractal tree for this table looks as follows:

The ‘i’ stands for insertion message. Now suppose we do:

With fractal trees, we insert (ii (1000,1001)) into the top node. The tree then looks as such:

So upon querying the key ’1000′, a cursor notices that (1000,1001) has a message of “ii”. If it finds another value for the key 1000 in a lower node, it reads that value, otherwise, it reads (1000,1001). Because (1000,1000) is located in a lower node, the cursor returns (1000,1000) to the user. On merges, the message in the lower node, (1000,1000) overwrites the message in the higher node, (1000,1001).

While “insert ignore” can be fast, there are caveats (indexes, triggers, replication), just as there are with “replace into”. In a future posting, I will get into some of them.

Share Button
PREVIOUS POST
NEXT POST


Tags:

, , , , , ,

Categories:
Tokutek, TokuView


Comments
  • hi zardosht,

    can i assume that with row-based replication, we are to completely avoid using “replace into” owing to the bug you filed: http://bugs.mysql.com/bug.php?id=53561 ?!

    or has this been fixed? the bugs log doesn’t seem to suggest it’s been fixed tho……

    many thanks,

    moses.

    Reply

    • There is no need to avoid “replace into”. If row based replication does not work with your “replace into” statement and TokuDB, then an error will be reported. You can choose to set tokudb_pk_insert_mode to 2, and get reduced performance, or you can switch to statement based replication.

      Reply

Leave a Reply

Percona’s widely read Percona Data Performance blog highlights our expertise in enterprise-class software, support, consulting and managed services solutions for both MySQL® and MongoDB® across traditional and cloud-based platforms. The decades of experience represented by our consultants is found daily in numerous and relevant blog posts.

Besides specific database help, the blog also provides notices on upcoming events and webinars.

Want to get weekly updates listing the latest blog posts? Subscribe to our blog now! Submit your email address below.

No, thank you. Please do not ask me again.