A few weeks back, I was working on a customer issue wherein they were observing database performance that dropped through the floor (to the point of an outage) roughly every 4 weeks or so. Nothing special about the environment, the hardware, or the queries; really, the majority of the database was a single table with an auto-incrementing integer PK and a secondary UNIQUE KEY.
The queries being run against this table were almost exclusively INSERT … ON DUPLICATE KEY UPDATE (INSERT ODKU), with the columns from the INSERT part of the statement corresponding to the columns in the secondary index, and they were coming in at a rate of approximately 1500 to 2000 per second, sustained, 24h per day. The mathematically-astute among you may already be able to see where this is going.
For purposes of discussion, we can use the following table to illustrate the situation:
|
1 |
CREATE TABLE update_test (<br> id INT UNSIGNED NOT NULL AUTO_INCREMENT,<br> username VARCHAR(20) NOT NULL,<br> host_id TINYINT UNSIGNED NOT NULL,<br> last_modified TIMESTAMP NULL DEFAULT NULL,<br> PRIMARY KEY(id),<br> UNIQUE KEY(username)<br>) ENGINE=InnoDB; |
Now consider the following sequence of events.
|
1 |
(root@localhost) [test]> INSERT INTO update_test (username, host_id, last_modified) VALUES ('foo',3,NOW());<br>Query OK, 1 row affected (0.00 sec)<br><br>(root@localhost) [test]> select * from update_test;<br>+----+----------+---------+---------------------+<br>| id | username | host_id | last_modified |<br>+----+----------+---------+---------------------+<br>| 1 | foo | 3 | 2012-10-05 22:36:30 |<br>+----+----------+---------+---------------------+ |
Nothing crazy here, right? We’ve inserted one row into an empty table, and if we were to do a SHOW CREATE TABLE we’d see that the AUTO_INCREMENT counter is currently set to 2. If we do an INSERT … ODKU on this table, we see the following:
|
1 |
<br>(root@localhost) [test]> insert into update_test (username,host_id) values ('foo',1) on duplicate key update last_modified=NOW();<br>Query OK, 2 rows affected (0.00 sec)<br><br>(root@localhost) [test]> select * from update_test;<br>+----+----------+---------+---------------------+<br>| id | username | host_id | last_modified |<br>+----+----------+---------+---------------------+<br>| 1 | foo | 3 | 2012-10-05 22:58:28 |<br>+----+----------+---------+---------------------+<br>1 row in set (0.00 sec)<br> |
And now, even though we didn’t insert a new row, our auto-increment counter has increased to 3. This is actually expected behavior; InnoDB checks constraints in the order in which they were defined, and the PRIMARY KEY is always going to be considered as being defined first. So, MySQL checks our INSERT, sees that the next auto-inc value is available, and claims it, but then it checks the UNIQUE KEY and finds a violation, so instead it does an UPDATE. If we look at the handler status counters, we can see that there was 1 request to insert a row (which failed) and 1 request to update a row, which succeeded (this explains why, when there’s a row-update, that we have 2 rows affected and not 1.
|
1 |
<br>(root@localhost) [test]> show status like 'handler%';<br> *** some rows omitted ***<br>+----------------------------+-------+<br>| Variable_name | Value |<br>+----------------------------+-------+<br>| Handler_commit | 1 |<br>| Handler_rollback | 0 |<br>| Handler_update | 1 |<br>| Handler_write | 1 |<br>+----------------------------+-------+<br> |
At this point, you might be thinking, “Ok, so what?” Let’s go back to our customer. 1500 INSERT ODKUs per second, sustained for 24 hours per day. The PK on their table is the same as what I’ve used in my demonstration table – INT UNSIGNED. Do the math. The maximum value for an auto-increment INT UNSIGNED is 4294967295. Divide that by 1500 qps and then again by 86400, which is the number of seconds in a day, and we get 33.1 days, or a little over 4 weeks. Coincidence? I think not.
So what actually happens when we run out of auto-increment space? Some of the behavior might surprise you. Let’s go back to our demonstration table and insert a row at the end of the auto-increment range, and then try to insert another one:
|
1 |
<br>(root@localhost) [test]> insert into update_test (id,username,host_id) values (4294967295, 'bar', 10);<br>Query OK, 1 row affected (0.00 sec)<br>(root@localhost) [test]> flush status;<br>(root@localhost) [test]> insert into update_test (username,host_id) values ('baz', 10);<br>ERROR 1062 (23000): Duplicate entry '4294967295' for key 'PRIMARY'<br><br>(root@localhost) [test]> show status like 'handler%';<br> *** some rows omitted ***<br>+----------------------------+-------+<br>| Variable_name | Value |<br>+----------------------------+-------+<br>| Handler_rollback | 1 |<br>| Handler_write | 1 |<br>+----------------------------+-------+<br> |
Ok, so we tried to insert a row, and it failed because the counter for auto-increment was already at its maximum value, and the statement was rolled back. But what happens if we try an INSERT … ODKU? First, recall what’s in our table:
|
1 |
(root@localhost) [test]> select * from update_test;<br>+------------+----------+---------+---------------------+<br>| id | username | host_id | last_modified |<br>+------------+----------+---------+---------------------+<br>| 1 | foo | 3 | 2012-10-05 22:58:28 |<br>| 4294967295 | bar | 10 | NULL |<br>+------------+----------+---------+---------------------+<br>2 rows in set (0.00 sec)<br><br>(root@localhost) [test]> INSERT INTO update_test (username, host_id) VALUES ('foo', 7) ON DUPLICATE KEY UPDATE host_id=7, last_modified=NOW();<br>Query OK, 2 rows affected (0.00 sec)<br> |
Looks fine, right? 2 rows affected, so obviously, the row that we wanted, i.e., the one that has the username = ‘foo’, was updated with the proper host_id and the last_modified time, and we can happily go about our day. Unfortunately, this isn’t the case.
|
1 |
<br>(root@localhost) [test]> select * from update_test;<br>+------------+----------+---------+---------------------+<br>| id | username | host_id | last_modified |<br>+------------+----------+---------+---------------------+<br>| 1 | foo | 3 | 2012-10-05 22:58:28 |<br>| 4294967295 | bar | 7 | 2012-10-05 23:24:49 |<br>+------------+----------+---------+---------------------+<br>2 rows in set (0.00 sec) |
Yep, it’s actually THE LAST ROW, the one where the id is equal to the auto-increment max value, which is the one that got updated. The secondary UNIQUE on username is, for all intents and purposes, ignored.
For the customer whose database served as the inspiration for this post, we can see fairly easily what the problem turned out to be. 1500 queries per second all trying to lock and update the same row is not going to end well; deadlocks, rollbacks, contention, and all sorts of related unpleasantness. There is, of course, a trivial solution to this: just change the AUTO-INCREMENT column to use a BIGINT, and problem solved.
As it turns out, this is documented behavior; the manual states that our INSERT … ODKU on a table with multiple unique indexes such as this one would be equivalent to “UPDATE update_test SET host_id=7, last_modified=NOW() WHERE id=4294967295 OR username=’foo’ LIMIT 1”, and of course, a PK lookup is going to be more likely to be chosen by the optimizer than a UNIQUE on the secondary index.
So what do we learn here?
Resources
RELATED POSTS