MySQL INSERT ON DUPLICATE KEY UPDATE and counters

PREVIOUS POST
NEXT POST

MySQL INSERT ON DUPLICATE KEY UPDATE is very powerful but often forgotten MySQL feature. It was introduced in MySQL 4.1 but I still constantly see people unaware of it.

Myself I like this feature big deal because it is designed in truly MySQL style – very efficient solution for frequent task while keeping it beautiful and easy to use.

So what is the MySQL INSERT … ON DUPLICATE KEY UPDATE great for? Well any kind of maintaining counters. If you’re writing traffic accounting it could be traffic and number of packet passed for given port or IP address. For Web Applications it could be counting number of visits per page or IP address, number of times particular keyword was searched etc.

This functionality also makes it very easy to do incremental single pass log file processing and building summary tables.

Here is example:

This example actually shows one more neat feature of MySQL – inet_aton and inet_ntoa functions which can convert IP address strings to integers and back. This allows to save on field length significantly by using 4 bytes instead of 15

The third feature this example takes advantage of is TIMESTAMP field. By default first TIMESTAMP column will have its value automatically updated to current timestamp on insert and update. We actually could have omitted now() in insert clause but this would require to specify list of columns which we skipped for sake of example.

So how would this example work ? Well just you would expect it. If there is no such IP address in the table it will be added with hits=1 if it is already where (note ip is PRIMARY KEY) it would be just incremented and last visit timestamp updated.

The benefit of using MySQL INSERT … ON DUPLICATE KEY UPDATE instead of INSERT + UPDATE could be different, depending on number of new rows and data set size. 30% speedup should be typical. Performance increase is not the only benefit – what is even more important the application code becomes simplier – less error prone and easy to read.

PREVIOUS POST
NEXT POST

Share this post

Comments (50)

  • pimsainnum Reply

    If I had tableA field(Id, timestamp, day, value)

    Insert or Update where field (day) not same.

    How to do??

    May 29, 2006 at 12:00 am
  • Photo Matt » MySQL Counters Reply

    […] Check out this cool INSERT ON DUPLICATE KEY UPDATE feature in MySQL 4.1. This will make some logging code I have much easier and cleaner than it currently is. « Yahoo Search Broken (again?) […]

    May 29, 2006 at 3:52 pm
  • sunxiunan Reply

    MySQL is a complete DataBase Server now. But so many people only think it as simple as before.
    Such as it didn’t have transaction process, or don’t support many features.
    I think it is important that study the release notes about it to know how we can use it effectively.

    May 29, 2006 at 6:01 pm
  • peter Reply

    Yes. There are many features being added which might be hard to track. I also would not always encourage all of them just for sake of using performance. Many of new features are complex – for example Stored Procedures. If you start using stored procedures in your code you might need to ensure other members of the team are also fluent with them. INSERT ON DUPLICATE KEY UPDATE is of the different sort – there is not much to learn about it – you can use it safely in your code. If someone does not know about it it takes to read single page to learn.

    May 30, 2006 at 4:09 am
  • Blog of Leonid Mamchenkov » Daily del.icio.us bookmarks Reply

    […] MySQL Performance Blog » INSERT ON DUPLICATE KEY UPDATE and summary counters. Tagged as: code databases mysql optimization performance programming sql tips tricks […]

    May 31, 2006 at 12:19 pm
  • cyber-junky.co.uk Reply

    […] “INSERT ON DUPLICATE KEY UPDATE” function ( Trackback URL for this post / RSS comment feed for this post) […]

    June 3, 2006 at 8:14 am
  • Creating easy summary counters in MySQL at Symlinked Reply

    […] Awesome: […]

    June 9, 2006 at 1:05 pm
  • Paul Wilson Reply

    I’ve done a lot of work with traffic databases, and others with IP addresses and one thing appeared very quickly – if possible store the IP address both as an unsigned int as as a char(15). The conversion functions aren’t fast iterated over millions of rows.

    June 12, 2006 at 12:31 am
  • peter Reply

    Paul,

    I’m actually surprised with your finding. Unless you’re doing indexing and going to do something like where IP like “192.168.%” there should not be serious benefit to store both versions. Yes you save on function call but typically it is not serious response time contributor. You however waste space and especially if you do inexing on this field you end up with larger and slower indexes.

    Of course it depends a lot on workload. For example if you do full web request logging and store page, referer, user agent, cookie information you might not save a lot of space by using integer ip

    June 14, 2006 at 7:45 am
  • Apachez Reply

    But in the case of “IP like ‘192.168.%'” you could use “IP BETWEEN 3232235520 AND 3232301055” if you have stored ips as unsigned int which also should be faster I guess to process I guess…

    June 22, 2006 at 10:37 am
  • peter Reply

    Apachez,

    Yes you’re right of course. I used wrong example to ilustrate an idea. like “%192.168.%.255” would be better example. And yes you still can use inet_ntoa and apply extra filtering on result. It is just matter of convenience 🙂

    Especially in log tables where you might want to run some adhoc queries manually ease of use might be more important than couple of percent performance gain.

    Thanks for correction again.

    June 23, 2006 at 12:10 am
  • Michael’s Tech Blog » Blog Archive » Mysql useful tips Reply

    […] use INSERT ON DUPLICATE KEY UPDATE (via MySQL Performance Blog): my $ret = $dbh->do(“INSERT INTO ipstat VALUES(inet_aton(‘192.168.0.1’),1,now()) ON duplicate KEY UPDATE hits=hits+1”); Note: ip should be a primary key in this case […]

    June 30, 2006 at 12:49 am
  • MySQL Performance Blog » INSERT ON DUPLICATE KEY UPDATE and REPLACE INTO Reply

    […] INSERT ON DUPLICATE KEY UPDATE is newer feature which came in MySQL 4.1 by advice one of MySQL big users. The question was efficiently maintaining counters in MySQL. There are also number of similar cases when you want ether to insert the new row or update stats for existing row. […]

    January 18, 2007 at 10:59 am
  • Erick Reply

    ON DUPLICATE KEY is the same as an UPDATE command. It is merely a simpler (and confusing) way of doing things. In a typical MySQL way, when this does get invoked, it returns “2 Affected Rows”. Nice. How it is a really effective feature is lost on me —

    OLD WAY:
    1. Insert
    2. If error, then UPDATE.

    NEW WAY:
    1. Insert
    2. Engine discovers if it already exists (through internal error), so UPDATE

    No different.

    As for inet ip address conversion functions, it is still much more recommended to use the functionality provided by your programming language. In the case of PHP for instance, ip2long function will do this for you: http://php.net/ip2long – you can do a simple loop microtime test to see some STARK and REAL difference in performance when you get to medium sized data.

    June 5, 2007 at 10:57 pm
  • peter Reply

    Erick,

    The fact it returns 2 rows affected if only one really was updated is probably a bug. Please report it.

    I can’t say much about being confused by this feature. I think it is rather straightforward – insert data and if it is where update it.

    Now from performance standpoint the difference is in number of statements – for in memory data sets going over network and parsing query is what is taking most of time, this is why having it in single statement helps to improve performance.

    June 6, 2007 at 3:00 am
  • Bill’s Dev Blog » Blog Archive » INSERT … ON DUPLICATE KEY UPDATE Reply

    […] References: forums.devshed.com dev.mysql.com dev.mysql.com http://www.mysqlperformanceblog.com […]

    June 6, 2007 at 12:34 pm
  • CJ Reply

    SICK! This should save me some sql calls.

    June 26, 2007 at 7:27 pm
  • Marki Reply

    Peter: It is not a bug, it is a feature (documented one).
    When row is inserted, report it as 1 affected row. When it is updated, report it as 2 affected rows.

    August 2, 2007 at 12:48 pm
  • peter Reply

    Thanks Maki,

    Even though I find it counter intuitive. May be similary to update statement using rows_matched to see the rows which were already in the table would be better idea.

    August 3, 2007 at 1:15 am
  • Peter Romianowski Reply

    A short note on performance: If you use this feature in order to aggregate reports or something like that (i.e. you have a low insert but high update rate) a strategie like:

    try {
    UPDATE
    } catch (exception) {
    INSERT INTO
    }

    might be better. I notices a speed up of up to 15% using this strategie instead of ON DUPLICATE KEY UPDATE. The higher your actual insert rate becomes the lower is the benefit until it performs worse. As always: Test your application 😉

    August 23, 2007 at 2:04 am
  • Lars Strojny Reply

    The problem with inet_aton and inet_ntoa is, that IPv6 is just not supported. So you will end up implementing inet_aton6 by hand (which is tricky) or using varchars again.

    December 14, 2007 at 9:02 am
  • Eraser Reply

    Hey,
    thanks for this post! I searched for a function like this!

    I love MySQL! Very powerfull!

    Regards,
    E

    January 1, 2008 at 9:49 am
  • Raven Riley Nude Reply

    Raven Riley

    Raven Riley

    January 22, 2008 at 9:17 pm
  • foo Reply

    also useful in combination with unique keys 🙂
    see http://textsnippets.com/posts/show/975

    February 15, 2008 at 3:30 am
  • TM Reply

    Is there any way i can use a query similar to this but have some kind of where clause at the end? say if i wanted to do a whole bunch of inserts but update them if they exist already and update the primary key also? so for example
    insert into tmmember values (‘STRES17N’,’SK’,’1725′,’ESTRES17′,’N’,’6′,’255′,’-1′)
    on duplicate key update memb_memberid = ‘STRES17N’, memb_teamid = ‘SK’ only the original memb_member id might be ‘SKRES17N’ so as you can see id like to be able to say where memb_memberid = ‘SKRES17N’ for the cases when its true.

    February 28, 2008 at 3:57 pm
  • dreamluverz Reply

    I’m trying to use this on INSERT SELECT statement and im getting an error. my sql is something like this

    INSERT INTO table SELECT * FROM table WHERE id=’2′ ON DUPLICATE KEY UPDATE id=LAST_INSERT_ID() + 1
    I use LAST_INSERT_ID() coz im using auto increment.

    May 26, 2008 at 12:44 am
  • peter Reply

    Dreamluverz,

    The best place to ask the question is forums.mysqlperformanceblog.com ; and it is also best to provide exact statement and error message 🙂

    May 26, 2008 at 8:40 pm
  • Sergio Reply

    I’ve tried this but I didn’t notice any performance improvement.

    The UPDATE rate is higher than the INSERT rate.

    So.. peter, from your experience, what’s the FASTEST WAY from the following:

    1) This method I’ve been using until found out about this article. First, we check if there is a record.. then.. see there.

    mysql_query(“SELECT COUNT(*) as count FROM table WHERE user_id = ‘1’”);
    $cnt = mysql_fetch_array();
    $cnt = $cnt[‘count’];

    if ($cnt == 0) {
    mysql_query(“INSERT INTO table VALUES (‘1’, ‘1’)”);
    }
    else {
    mysql_query(“UPDATE table SET user_hits = user_hits+1 WHERE user_id = ‘1’”);
    }

    2) With inspiration from Peter Romianowski’s post. Haven’t tested, but I guess I’ve wrote the code ok.

    mysql_query(“UPDATE table SET user_hits = user_hits+1 WHERE user_id = ‘1’”);
    $affected_rows = mysql_affected_rows();

    if ($affected_rows == 0) mysql_query(“INSERT INTO table VALUES (‘1’, ‘1’)”);

    3) Using INSERT .. ON duplicate KEY UPDATE ..

    mysql_query(“INSERT INTO table VALUES (‘1’, ‘1’) ON duplicate KEY UPDATE user_hits = user_hits+1 “);

    So.. which one is recommended? I repeat.. my update rate is higher than the insert rate. I guess method 2. But I’m waiting other opinions.
    Thanks.

    July 15, 2008 at 2:43 am
  • Peter Romianowski Reply

    Sergio, I would recommend version 2 for high update rates. But I would also recommend to benchmark your application in high concurrency environment. 😉

    Cheers

    Peter

    July 15, 2008 at 5:19 am
  • Sergio Reply

    Thanks for reply Peter.

    However, version 2 has a small issue.
    If you try to update a record with the same data as in table, mysql ignore the query (affected rows => 0), and insert query is executed, even if the record exist in table.

    For example, if I refresh twice the page the following query will be the same (if I refresh the page in the same SECOND), so affected rows = 0, thus INSERT INTO query is executed and an error will occur (duplicate entry cos’ of the primary index)

    INSERT INTO forum_sessions (session_id, session_user_id, session_start, session_time, session_ip, session_page, session_logged_in, session_browser, session_mode) VALUES (‘a321587721fdb3a86a49d761fb4b1a2e’, ”, ‘1216230846’, ‘1216230846’, ‘127.0.0.2’, ‘main’, ‘0’, ‘Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9) Gecko/2008052906 Firefox/3.0’, ‘html’)

    Of course errors can be masked for users..

    Can you provide me some tips for benchmarking? The generation time of the pages is almost the same in all 3 situations (I’ve tried all versions on my public site too.. ). localhost, I just refresh the page multiple times & read the smallest generation time.

    Maybe I don’t feel any difference because the tables are small (4,870 rows, 1.4MB & 20-60 rows, < 100KB)

    July 16, 2008 at 11:11 am
  • peter Reply

    Sergio,

    This is why it is always mentioned in cases count=count+1 and similar – in this case you always change the data. Really MySQL returns both rows matched and rows updated and it is rows matched you need to wants.

    Generally Insert on dup key update is the method which was designed to make it most optimal though the question if it always is remains open 🙂

    Also note the difference in this cases comes from amount of queries you have to run mainly – run benchmarks over network to see the difference better.

    July 20, 2008 at 9:14 am
  • pravin Reply

    I have a query in which I’m checking whether LAN(my unique key) is duplicate or not,if it duplicate then update the record else insert that LAN No.

    I did it in the following way.

    insert into romappingfinal(LAN,romappingfinalid,branch_code,
    RO_CODE,main_ro_code,loan_end_date,ro_status,main_ro_status,
    ro_transfer_status,main_ro_transfer_status,ro_name,main_ro_name)
    select LAN,romappingstagingid as romappingfinalid,branch_code,
    RO_CODE,main_ro_code,loan_end_date,ro_status,main_ro_status,
    ro_transfer_status,main_ro_transfer_status,ro_name,main_ro_name
    from viewromappingfinal
    on duplicate key update LAN=values(LAN)

    it’s working fine if i won’t use romappingfinalid(table primary key).

    Suggest me how to use the INSERT ON UPDATE KEY using a table primary key but checking the condition on another primary key(LAN).

    August 4, 2008 at 1:45 am
  • zhilly Reply

    i sinc i add duplicate key the insert not work
    that not work now
    someone know my little erreur?

    //create and execute the query
    $sql = “INSERT INTO yah_unlimited (data_key,data_value,agent_id,data_group,access_time) VALUES (‘$key’, ‘$value’, ‘$group’, ‘$owner_id’, NOW()) ON DUPLICATE KEY UPDATE access_time = NOW()”;
    $result = mysql_query($sql) or die(mysql_error());
    echo ‘Store successful.’;

    November 20, 2008 at 8:15 pm
  • Jesse Reply

    Hello,

    I have a dilema with INSERT (fld1, fld2, fld3) VALUES(‘1′,’1a’,’1b’),(‘2′,’2a’,’2b’),(‘3′,’3a’,’3b’) ON DUPLICATE KEY UPDATE fld2 = ??? ( I want this to be the value that would be inserted ie ‘2a’) (fld1 is a unique index)

    What should go in the ??? spot?

    Thanks.

    June 5, 2009 at 4:43 am
  • Baron Schwartz Reply

    VALUES(fld2) should go there.

    June 5, 2009 at 7:04 am
  • Artis Reply

    If I am collecting banner display statistics for each day, then I have table ad_stat with fields:
    id, banner_id, date, views
    with (banner_id, date) as unique

    I use insert into ad_stat (banner_id, date, views) values ($banner_id, ‘$date’, 1) ON DUPLICATE KEY UPDATE views=views+1

    So this query firstly tries to insert and only then to Update – is there opposite way like:
    UPDATE … ON NO ROWS INSERT ???
    Because there are 5 inserts daily and 50 000 updates – wouldn’t it somehow increase performance?

    November 17, 2009 at 7:32 am
  • kn33ch41 Reply

    A note for those of you who may have problems updating a row that does in fact exist in the table, and so are always seeing a new row inserted:

    In order for this to work, the PRIMARY KEY defined for the table should only contain one column–or if it has multiple columns–the data you’re going to insert/update for the given columns should be identical to the columns defined in the PRIMARY KEY.

    To illustrate:

    PRIMARY_KEY(id, timestamp) == a single key (but with two values, thus more unique)

    The id is very likely always going to be unique. However, adding the timestamp to the primary key makes it even more unique; so, if the id exists in the table and you simply want to update the timestamp, the new value for the timestamp you’re providing will mean that id matches but timestamp does not, because you’ve defined the primary key to be id and timestamp, which means BOTH values you’re inserting/updating must equal the values currently in the database to make the primary key match and therefore update the row instead of insert a new row with the same id, but different timestamp.

    July 1, 2010 at 2:04 pm
  • Jayapal Chandran Reply

    I HAVE MISSED THIS FOR TWO YEARS. OH WHAT A S… . … WHAT EVER.
    and now… just now found it while searching thinking why not there be an option like this… and here i am…

    how to do like this

    insert into (field1, field2, field3) values (1,2,3),(1,2,3) and so including on duplicate update. because i want insert/update many records from am json source.
    This should have been posted in a forum but since it deals my problem exactly i thought of posting it here.

    July 13, 2010 at 7:19 am
  • Brian May Reply

    well, i definetly using INSERT ON DUPLICATE KEY UPDATE rather than REPLACE INTO
    it’s for performance and code reading

    August 23, 2010 at 7:57 pm
  • Dudeson Reply

    Beautiful function. I was on the verge of using a conditional update statement until i discovered this. Anyone knows if there is a mysql function that will shift all records key id down? by adding 1 when I add a new key, say 0 on the TOP of the table?

    October 16, 2010 at 11:06 pm
  • Scott Reply

    With ON DUPLICATE KEY UPDATE is there some sort of temp table involved with the old and new values?
    I need access to the old values on update, but only want to hit the database once.
    It somewhat works with CASE WHEN statements to catch high and low values by day.

    field1= CASE WHEN field1>’$field1′ THEN ‘$field1′ ELSE field1 // this gets a low value
    field2= CASE WHEN field2’$field1’ THEN ‘$field3′ ELSE field3 // this does not work off the existing (old) field1 value!

    This should update field3 when field1 changes. But it does not.

    field3= CASE WHEN field1=’$field1’ THEN ‘$field3’ ELSE field3 // if field3 is time you will only see the most recent value match of field1

    This does work but it looks to see if field1 is equal to the new value.
    The problem here is that I can only have the last match, I would like to have the first match in some cases.

    It seems that old and new values are only available within a given field. Is that so?

    Is there another way to have old/existing values available from one field in the case switch or value of another field.

    Thanks.

    October 19, 2010 at 8:05 am
  • Scott Reply

    The field2 line above is a mess… it should be

    field2= CASE WHEN field2’$field1′ THEN ‘$field3′ ELSE field3 // this does not work off the existing (old) field1 value!

    October 19, 2010 at 8:16 am
  • Steve Jackson Reply

    Given the table.

    CREATE TABLE test (
    id int(11) NOT NULL,
    md5 char(32) DEFAULT NULL,
    PRIMARY KEY (id),
    UNIQUE KEY md5 (md5)
    ) ENGINE=InnoDB

    INSERT IGNORE INTO test (md5) values (‘A’)

    1 row affected

    INSERT IGNORE INTO test (md5) values (‘A’)

    Duplicate entry ‘A’ for key ‘md5’

    I think an incredibly useful feature here would be to return the innodb row id of the row which generated the duplicate key error. This would save us a select statement (either before or after the insert) to determine which row caused the dupe error. (I guess the unique row_id would not be of much help, so only return the value of PRIMARY if it is defined)

    Is there something existing like this already? I dumped the mysqli connection object in php, and I see nothing like that. Insert_id of course, but that can only be used when there is an auto_increment column.

    Someone will answer (why not just use on duplicate key update?) Well this is because i want “on duplicate key return unique_row_id”

    Cheers

    //Steve

    September 4, 2011 at 11:10 pm
  • Gleb Deykalo Reply

    What can you say about http://bugs.mysql.com/bug.php?id=52020

    Can you please explain how really locking in INSERT … ON DUPLICATE KEY UPDATE works.

    As I recall:

    Lock 1) Transaction 1 locks row for INSERT
    Lock 2) Transaction 2 tries to locks row for UPDATE (goes into queue)
    Lock 3) Transaction 1 tries to locks row for UPDATE (goes into queue)

    It is reasonable not to release Lock 1, but why MySQL tries to create Lock 3? Why can’t it use Lock 1 instead of Lock 3 for UPDATE operation?

    Or maybe I am wrong at all?

    March 23, 2012 at 2:27 am
  • Andrea Reply

    Hi guys, I’ve have a doubt: if I’m right the UPDATE statement implied a SELECT statement before the datas modification. So, is the primary key lookup in INSERT INTO…ON DUPLICATE KEY UPDATE faster than the SELECT statement implied in the UPDATE?

    e.g. UPDATE table SET value=newValue WHERE primary=primaryValue;

    INSERT INTO table VALUES primaryValue ON DUPLICATE KEY UPDATE value=newValue

    June 15, 2012 at 9:22 am
  • Rob Reply

    Just a note on the returned rows affected count:
    above somebody mentioned if it does the update it returns “2 rows affected,” if only does the insert it returns 1.

    Methinks intentional (coz still like that in v5.5) and a damn good idea: and here is what I think way.

    Imagine it returns 1 row affected in either case:
    how long would it be before someone asks: ‘how do I know if it inserted or updated?”
    – hence returning 1 or 2 they have the answer
    — (either way it’s > 0 so if using that as a test it’s fine), and for those using a == 1 test: back to programming school

    (would you prefer they returned “0,1 string” or 0 then 1, or introduced a new second value like @@ROWCOUNT?
    – imagine how much code that would break.)

    “2 row(s) affected” not a bug, it’s a damn good idea and I hope that never changes.

    December 1, 2012 at 12:13 am
  • Vladimir Reply

    peter, what do you say about the performance of this method?


    CREATE TABLE ipstat(
    ip INT UNSIGNED NOT NULL PRIMARY KEY,
    hits INT UNSIGNED NOT NULL UNIQUE AUTO_INCREMENT,
    last_hit TIMESTAMP
    );

    REPLACE ipstat (ip, last_hit) VALUES (INET_ATON('192.168.0.1'),NOW())

    July 1, 2014 at 8:55 am
  • Vladimir Reply

    oh, sorry. It work for my single counter

    July 1, 2014 at 9:19 am

Leave a Reply