September 30, 2014

How InnoDB promotes UNIQUE constraints

The other day I was running pt-duplicate-key-checker on behalf of a customer and noticed some peculiar recommendations on an InnoDB table with an odd structure (no PRIMARY key, but multiple UNIQUE constraints). This got me thinking about how InnoDB promotes UNIQUE constraints to the role of PRIMARY KEYs. The documentation is pretty clear:

[DOCS]
When you define a PRIMARY KEY on your table, InnoDB uses it as the clustered index. Define a primary key for each table that you create. If there is no logical unique and non-null column or set of columns, add a new auto-increment column, whose values are filled in automatically.

If you do not define a PRIMARY KEY for your table, MySQL locates the first UNIQUE index where all the key columns are NOT NULL and InnoDB uses it as the clustered index.

If the table has no PRIMARY KEY or suitable UNIQUE index, InnoDB internally generates a hidden clustered index on a synthetic column containing row ID values. The rows are ordered by the ID that InnoDB assigns to the rows in such a table. The row ID is a 6-byte field that increases monotonically as new rows are inserted. Thus, the rows ordered by the row ID are physically in insertion order.
[/DOCS]

But it really leaves a number of outstanding questions, the most pressing of which (in my mind) is whether or not the column used in the clustered index can change over time or if it is set and remains at table creation time.

Let’s take the following example, using MySQL 5.6.11:


mysql> CREATE TABLE t1 (
-> c1 char(10),
-> c2 char(10),
-> KEY (c2)
-> ) ENGINE=InnoDB;
Query OK, 0 rows affected (0.14 sec)

mysql> INSERT INTO t1 VALUES (‘abcdefghij’,’klmnopqrst’);
Query OK, 1 row affected (0.00 sec)

mysql> EXPLAIN SELECT c1, c2 FROM t1 WHERE c2='klmnopqrst'G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t1
type: ref
possible_keys: c2
key: c2
key_len: 11
ref: const
rows: 1
Extra: Using index condition
1 row in set (0.00 sec)

We can clearly confirm here that there is no candidate column to use as the clustered index. So, if we add a UNIQUE constraint, will it be used?


mysql> ALTER TABLE t1 ADD UNIQUE (c1);
Query OK, 0 rows affected (0.15 sec)
Records: 0 Duplicates: 0 Warnings: 0

 

mysql> EXPLAIN SELECT c1, c2 FROM t1 WHERE c2='klmnopqrst'G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t1
type: ref
possible_keys: c2
key: c2
key_len: 11
ref: const
rows: 1
Extra: Using index condition
1 row in set (0.00 sec)

No. Because InnoDB will only use it as the clustered index if all of the key columns are NOT NULL. So, let’s see if that will change:


mysql> ALTER TABLE t1 CHANGE COLUMN c1 c1 CHAR(10) NOT NULL;
Query OK, 0 rows affected, 1 warning (0.23 sec)
Records: 0 Duplicates: 0 Warnings: 1

 

mysql> EXPLAIN SELECT c1, c2 FROM t1 WHERE c2='klmnopqrst'G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t1
type: ref
possible_keys: c2
key: c2
key_len: 11
ref: const
rows: 1
Extra: Using where; Using index
1 row in set (0.00 sec)

Fantastic. InnoDB will seemingly re-evaluate candidates for clustered index as the result of each alter statement. What happens if we add another candidate? Will InnoDB keep the original choice or update again?


mysql> ALTER TABLE t1 ADD COLUMN c3 char(10) not null;
Query OK, 0 rows affected, 1 warning (0.21 sec)
Records: 0 Duplicates: 0 Warnings: 1

mysql> UPDATE t1 SET c3=’uvwxyz1234′;
Query OK, 1 row affected (0.01 sec)
Rows matched: 1 Changed: 1 Warnings: 0

mysql> ALTER TABLE t1 ADD UNIQUE (c3);
Query OK, 0 rows affected (0.24 sec)
Records: 0 Duplicates: 0 Warnings: 0

mysql> EXPLAIN SELECT c1, c2 FROM t1 WHERE c2=’klmnopqrst’G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t1
type: ref
possible_keys: c2
key: c2
key_len: 11
ref: const
rows: 1
Extra: Using where; Using index
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT c3, c2 FROM t1 WHERE c2='klmnopqrst'G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t1
type: ref
possible_keys: c2
key: c2
key_len: 11
ref: const
rows: 1
Extra: Using index condition
1 row in set (0.00 sec)

It seems that InnoDB kept the original clustered index. This is expected because MySQL will choose the first index that fits it’s criteria (UNIQUE, NOT NULL) and c1 is the first column declared in the table definition. Let’s change that:


mysql> ALTER TABLE t1 DROP COLUMN c3;
Query OK, 0 rows affected (0.27 sec)
Records: 0 Duplicates: 0 Warnings: 0

mysql> ALTER TABLE t1 ADD COLUMN c3 char(10) not null FIRST;
Query OK, 0 rows affected (0.26 sec)
Records: 0 Duplicates: 0 Warnings: 0

mysql> UPDATE t1 SET c3=’uvwxyz1234′;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0

mysql> ALTER TABLE t1 ADD UNIQUE (c3);
Query OK, 0 rows affected (0.18 sec)
Records: 0 Duplicates: 0 Warnings: 0

mysql> EXPLAIN SELECT c1, c2 FROM t1 WHERE c2=’klmnopqrst’G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t1
type: ref
possible_keys: c2
key: c2
key_len: 11
ref: const
rows: 1
Extra: Using where; Using index
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT c3, c2 FROM t1 WHERE c2=’klmnopqrst’G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t1
type: ref
possible_keys: c2
key: c2
key_len: 11
ref: const
rows: 1
Extra: Using index condition
1 row in set (0.00 sec)

mysql> SHOW CREATE TABLE t1G
*************************** 1. row ***************************
Table: t1
Create Table: CREATE TABLE t1 (
c3 char(10) NOT NULL,
c1 char(10) NOT NULL,
c2 char(10) DEFAULT NULL,
UNIQUE KEY c1 (c1),
UNIQUE KEY c3 (c3),
KEY c2 (c2)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

So in this case, InnoDB kept using c1 as the clustered index even though c3 comes first in the table definition. If you look at the order of UNIQUE KEYs, you see that c1 remains first. You can always tell the index that is used as the clustered index because it appears first in the list of UNIQUE KEYs. Here is an ALTER statement verifying, by changing c1 to default to NULL, it moves down in the UNIQUE index list:


mysql> show create table t1G
*************************** 1. row ***************************
Table: t1
Create Table: CREATE TABLE t1 (
c1 char(10) NOT NULL,
c2 char(10) DEFAULT NULL,
c3 char(10) NOT NULL,
UNIQUE KEY c1 (c1),
UNIQUE KEY c3 (c3),
KEY c2 (c2)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

mysql> ALTER TABLE t1 CHANGE COLUMN c1 c1 char(10);
Query OK, 0 rows affected (0.32 sec)
Records: 0 Duplicates: 0 Warnings: 0

mysql> show create table t1G
*************************** 1. row ***************************
Table: t1
Create Table: CREATE TABLE t1 (
c1 char(10) DEFAULT NULL,
c2 char(10) DEFAULT NULL,
c3 char(10) NOT NULL,
UNIQUE KEY c3 (c3),
UNIQUE KEY c1 (c1),
KEY c2 (c2)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

So, the moral of the story is that InnoDB will choose the first created NOT NULL UNIQUE indexed column for clustered index (although this is not “exposed” anywhere aside from the SHOW CREATE TABLE order or EXPLAIN), and that is re-evaluated at each ALTER statement (indeed, the decision about promoting a UNIQUE index to the clustered index is not even stored anywhere on disk; it is re-evaluated each time InnoDB loads the table into the data dictionary cache, even if the table structure has not been changed by ALTER TABLE [via Alexey Kopytov]). Even though it behaves deterministically, it is still much clearer to declare a column as the PRIMARY KEY instead of relying on this behavior.

About Ryan Lowe

Ryan is a Principal Consultant and team manager at Percona. He has experience with many database technologies in industries such as health care, telecommunications, and social networking.

Comments

  1. dct says:

    this makes sense. i would think that innodb would retain the first clustered index since switching to another would make it necessary to “re-cluster” everything: it would have to rearrange the data so that the non-indexed columns are on the same page as the values in the columns on the new index, correct?

    this begs the question of whether/how the data is rearranged when you construct your clustered index in the first place – how did innodb switch from the ‘default’ clustered index on row ids to the index you made when you made c1 unique and non-null?

Speak Your Mind

*