September 16, 2014

INSERT ON DUPLICATE KEY UPDATE and REPLACE INTO

Jonathan Haddad writes about REPLACE INTO and INSERT ON DUPLICATE KEY UPDATE. Really, Why MySQL has both of these, especially both are non ANSI SQL extensions ?

The story here seems to be the following – REPLACE INTO existed forever, at least since MySQL 3.22 and was a way to do replace faster and what is also important atomically, remember at that time MySQL only had ISAM tables with table locks and no transactions support. Of course you could use LOCK TABLES but it is not efficient.

The reason REPLACE could be efficient for ISAM and MyISAM, especially for fixed length rows is – it could perform row replacement without reading old data first, and of course because you could set it to replace multiple values at the same time just as you have multiple value INSERT.

As a side note: the fact REPLACE does not have to do read before write is a bit overrated from efficiency standpoint. As most rows are less than 4K-8K in size and are not aligned to OS cache page OS still would need to perform read from hard drive before it can perform an update, if data is not in OS cache, and if it is read would not be large overhead ether.

The problem with REPLACE was – many people tried to use it like update accessing previous column value, for example doing something like REPLACE INTO tbl (col1) VALUES (col1+10);. Instead of acting as update this really will insert NULL value, as col1 is undefined at this stage.

INSERT ON DUPLICATE KEY UPDATE is newer feature which came in MySQL 4.1 by advice one of MySQL big users. The question was efficiently maintaining counters in MySQL. There are also number of similar cases when you want ether to insert the new row or update stats for existing row.

I think both features are really great and I use them both. They are implemented really in MySQL style of being simple powerful and easy to use.

About Peter Zaitsev

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.

Comments

  1. Thanks for the clarification and history. Using INNODB, are there any performance advantages to either?

  2. peter says:

    Sure,

    I do not have benchmarks with Innodb but I’d expect them to be close. I’d mostly look at what makes more sense for your application from development standpoint.

  3. Jerry Zheng says:

    The two statements have different impact for INNODb table.
    The REPLACE INTO acts as DELETE/INSERT for duplicates.
    The INSERT ON DUPLIACTE UPDATE is true update.

    If you have a child table defined with “on delete CASCADE”, the REPLACE INTO will delete the child record too.

    I will use INSERT ON DUPLICATE rather than the REPLACE INTO for the above reason.

  4. Jerry Zheng says:

    Here is a test case:

    mysql> select version();
    +———————+
    | version() |
    +———————+
    | 4.1.18-standard-log |
    +———————+

    mysql> create table p(id int primary key, notid int) type=innodb;
    Query OK, 0 rows affected, 1 warning (0.07 sec)

    mysql> create table c (id int primary key, fk int , foreign key (fk) references p(id) on delete CASCADE ) type=innodb;
    Query OK, 0 rows affected, 1 warning (0.06 sec)

    mysql> insert into p values (1,1), (2,2);
    Query OK, 2 rows affected (0.00 sec)
    Records: 2 Duplicates: 0 Warnings: 0

    mysql> insert into c values (1,1), (2,2);
    Query OK, 2 rows affected (0.00 sec)
    Records: 2 Duplicates: 0 Warnings: 0

    mysql> select * from p;
    +—-+——-+
    | id | notid |
    +—-+——-+
    | 1 | 1 |
    | 2 | 2 |
    +—-+——-+
    2 rows in set (0.00 sec)

    mysql> select * from c;
    +—-+——+
    | id | fk |
    +—-+——+
    | 1 | 1 |
    | 2 | 2 |
    +—-+——+
    2 rows in set (0.00 sec)

    mysql> replace into p values (1,10);
    Query OK, 2 rows affected (0.00 sec)

    mysql> select * from c;
    +—-+——+
    | id | fk |
    +—-+——+
    | 2 | 2 |
    +—-+——+
    1 row in set (0.00 sec)

    – We lost the child #1.

    mysql> insert into p (id, notid) values (2,20) on duplicate key update notid=20;
    Query OK, 2 rows affected (0.01 sec)

    mysql> select * from c;
    +—-+——+
    | id | fk |
    +—-+——+
    | 2 | 2 |
    +—-+——+
    1 row in set (0.00 sec)

    – Child #2 is safe.

    mysql> select * from p;
    +—-+——-+
    | id | notid |
    +—-+——-+
    | 1 | 10 |
    | 2 | 20 |
    +—-+——-+
    2 rows in set (0.00 sec)

  5. peter says:

    Thanks Jerry,

    Good catch.

  6. Also related to Jerrys comments, REPLACE will use the next available # on auto-increment fields, which may not always be desired.

  7. Jerry:
    Thanks for the tip – I’ll be sure to keep that in mind. For the most part, in what I do, INSERT ON DUPLICATE UPDATE is what I need to use, especially knowing the cascading keys issue.

  8. P.ANBALAGAN says:

    I WANT QUERY FOR TO INSERT MORE 100000 RECORDS IN A TABLE.

  9. G Barnes says:

    I have two identical tables named Table1 and Table2 on two computers respectivaly. There are two fields in each, “ITEM” and “SOLD”. In both tables, “ITEM” is the primary key. Is there a way to merge Table1 into Table2 with something similar to INSERT ON DUPLICATE UPDATE sold =sold +(the first table’s sold value)? Thus if Table1 contains 10 (in the sold field) and Table2 contains 20. I want the total in Table 2 to be 30. Is this possible?

  10. Jeff says:

    One benefit of REPLACE over UPDATE is the DELAYED feature, which comes in handy.

  11. Ayman says:

    Thanks for all who share their knowledge here.
    I guess I need to use the REPLACE statement, but there is a question in my head:
    If the auto-increment primary key is 7 now (for example), after doing REPLACE for the 7th row or less than 7, what will be the next auto-increment value? will it be 8 or 9?

  12. s.mateev says:

    Consider the following:

    DROP TABLE IF EXISTS `person`;
    CREATE TABLE `person` (
    `person_id` int(11) NOT NULL auto_increment,
    `name` varchar(60) character set utf8 NOT NULL,
    `social_security_number` bigint(20) NOT NULL,
    PRIMARY KEY (`person_id`),
    UNIQUE KEY `social_security_number` (`social_security_number`)
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1;

    TRUNCATE TABLE person;
    INSERT INTO person (name, social_security_number) VALUES (‘Alice’, 123456);
    INSERT INTO person (name, social_security_number) VALUES (‘Bob’, 787878);
    REPLACE INTO person (name, social_security_number) VALUES (‘Bob’, 787878);

    Both on InnoDB and on MyISAM, the results are:

    +———–+——-+————————+
    | person_id | name | social_security_number |
    +———–+——-+————————+
    | 1 | Alice | 123456 |
    | 3 | Bob | 787878 |
    +———–+——-+————————+

    If the primary key value changes, I think the slot can’t be reused.

    With regard to having more than one row removed – nice! I agree that it must be used appropriately.

  13. Juan says:

    I’m using a trigger on an innodb table that “dumps” any updates to the table to a myisam table holding a log of updates.
    The update table has the same primary key than the primary table.
    The idea is to keep the last change for every primary key (well, I don’t dump the whole rows, but only some selected fields that I need in the log).
    The trigger uses INSERT ON DUPLICATE KEY UPDATE to achieve that.
    The problem is that I find many locked queries in the db as soon as there’s a bit of update activity over the primary table. It even looks like there might be some deadlock (or I don’t find an explanation for such long locks in so many queries), but I can’t find any reason in the code for that.
    I was wondering if INSERT … UPDATE might lock for read access (or lock only the index, or something simmilar) while checking for key value existence, as inserts in myisam are concurrent with reads, and then try to lock for write access when performing the update. This would lead to a deadlock when trying to upgrade two read locks to write locks or any equivalent situation.
    My version is 5.1.49-3-log.
    Is it possible that a problem like the one I described is really happenning?
    Might a REPLACE solve the issue? Would it work right where INSERT … UPDATE locks?

  14. vitamin-R says:

    Juan, just make your log table InnoDb too, to get rid of the table locks.

    Use something like “ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8″ if you need it to use less disk space.

  15. Jed says:

    Peter, thank you for this article. It would be nice if you could update it with a section that had something on the lines of “Use REPLACE for these situations….” and “Use ON DUPLICATE for these situations” (discalimer : ymmv)
    I have a piece of code that is an insert with rather long query, but I need to make it replace or on duplicate. I was going to use on duplicate but then thought I’d rather use replace to make the code easier to read. Since this is a memory table I don’t think it’ll matter a lot which I use and I think I’d opt for readability.

Speak Your Mind

*