In this blog post, I’ll provide an explanation of why you should avoid using the CREATE TABLE AS SELECT statement.
The SQL statement “create table <table_name> as select …” is used to create a normal or temporary table and materialize the result of the select. Some applications use this construct to create a copy of the table. This is one statement that will do all the work, so you do not need to create a table structure or use another statement to copy the structure.
At the same time there are a number of problems with this statement:
Let’s imagine we need to transfer money from one account to another (classic example). But in addition to just transferring funds, we need to calculate fees. The developers decide to create a table to perform a complex calculation.
Then the transaction looks like this:
|
1 2 3 4 5 6 |
begin; update accounts set amount = amount - 100000 where account_id=123; -- now we calculate fees create table as select ... join ... update accounts set amount = amount + 100000 where account_id=321; commit; |
The “create table as select … join … ” commits a transaction that is not safe. In case of an error, the second account obviously will not be credited by the second account debit that has been already committed!
Well, instead of “create table … “, we can use “create temporary table …” which fixes the issue, as temporary table creation is allowed.
If you try to use CREATE TABLE AS SELECT when GTID is enabled (and ENFORCE_GTID_CONSISTENCY = 1) you get this error:
|
1 |
General error: 1786 CREATE TABLE ... SELECT is forbidden when @@GLOBAL.ENFORCE_GTID_CONSISTENCY = 1. |
The application code may break.
Metadata lock issue
Metadata lock issue for CREATE TABLE AS SELECT is less known. (More information about the metadata locking in general). Please note: MySQL metadata lock is different from InnoDB deadlock, row-level locking and table-level locking.
This quick simulation demonstrates metadata lock:
session1:
|
1 |
mysql> create table test2 as select * from test1; |
session2:
|
1 |
mysql> select * from test2 limit 10; |
— blocked statement
This statement is waiting for the metadata lock:
session3:
|
1 2 3 4 5 6 7 8 |
mysql> show processlist; +----+------+-----------+------+---------+------+---------------------------------+------------------------------------------- | Id | User | Host | db | Command | Time | State | Info +----+------+-----------+------+---------+------+---------------------------------+------------------------------------------- | 2 | root | localhost | test | Query | 18 | Sending data | create table test2 as select * from test1 | 3 | root | localhost | test | Query | 7 | Waiting for table metadata lock | select * from test2 limit 10 | 4 | root | localhost | NULL | Query | 0 | NULL | show processlist +----+------+-----------+------+---------+------+---------------------------------+------------------------------------------- |
The same can happen another way: a slow select query can prevent some DDL operations (i.e., rename, drop, etc.):
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
mysql> show processlistG *************************** 1. row *************************** Id: 4 User: root Host: localhost db: reporting_stage Command: Query Time: 0 State: NULL Info: show processlist Rows_sent: 0 Rows_examined: 0 Rows_read: 0 *************************** 2. row *************************** Id: 5 User: root Host: localhost db: test Command: Query Time: 9 State: Copying to tmp table Info: select count(*), name from test2 group by name order by cid Rows_sent: 0 Rows_examined: 0 Rows_read: 0 *************************** 3. row *************************** Id: 6 User: root Host: localhost db: test Command: Query Time: 5 State: Waiting for table metadata lock Info: rename table test2 to test4 Rows_sent: 0 Rows_examined: 0 Rows_read: 0 3 rows in set (0.00 sec) |
As we can see, CREATE TABLE AS SELECT can affect other queries. However, the problem here is not the metadata lock itself (the metadata lock is needed to preserve consistency). The problem is that the metadata lock will not be released until the statement is finished.
The fix is simple: copy the table structure first by doing “create table new_table like old_table”, then do “insert into new_table select …”. The metadata lock is still held for the create table part (very short) but isn’t for the “insert … select” part (the total time to hold the lock is much shorter). To illustrate the difference, let’s look at two cases:
In some cases, however, the table structure is not known beforehand. For example, we may need to materialize the result set of a complex select statement, involving joins and/or group by. In this case, we can use this trick:
|
1 2 |
create table new_table as select ... join ... group by ... limit 0; insert into new_table as select ... join ... group by ... |
The first statement creates a table structure and doesn’t insert any rows (LIMIT 0). The first statement places a metadata lock. However, it is very quick. The second statement actually inserts rows into the table and doesn’t place a metadata lock.
More reading on metadata locks and how to troubleshoot them in MySQL 5.7:
https://www.percona.com/blog/2016/12/28/quickly-troubleshooting-metadata-locks-mysql-5-7/
You may also like:
Clickhouse is a fast, easy to pick up, open-source DBMS for online analytical processing. It can be integrated with your applications that already use MySQL. However, Clickhouse does not support Update/Delete yet. Don’t fret, as Percona’s Jervin Real has detailed a clever stratagem for updating and deleting rows by implementing incremental refresh on a Clickhouse table. Read his article for a high-level overview of the process.
MySQL GROUP BY could be responsible for 90% or more of your database’s query execution time. Percona Founder and CEO, Peter Zaitsev, breaks down the four ways MySQL executes GROUP BY statements and details when the query may not be the most optimal course of action.
Sorry, but I have to ask. Are you 100% sure about this:
“With “create table table_new as select … from table1“, other application connections can’t read from the source table (table1) for the duration of the statement (even “show fields from table1” will be blocked)”
My tests and common sense tell me this is incorrect. I have this running for a long time:
mysql> create table tslow as select * from tbig;
Query OK, 524288 rows affected (1 min 12.55 sec)
Records: 524288 Duplicates: 0 Warnings: 0
and soon after it started I tried these in other session:
mysql> show processlist;
+—-+——+—————–+——+———+——+————–+———-
——————————–+
| Id | User | Host | db | Command | Time | State | Info
|
+—-+——+—————–+——+———+——+————–+———-
——————————–+
| 1 | root | localhost:51058 | test | Query | 5 | Sending data | create ta
ble tslow as select * from tbig |
| 2 | root | localhost:51070 | test | Query | 0 | init | show proc
esslist |
+—-+——+—————–+——+———+——+————–+———-
——————————–+
2 rows in set (0.00 sec)
mysql> select count(*) from tbig;
+———-+
| count(*) |
+———-+
| 524288 |
+———-+
1 row in set (16.05 sec)
mysql> show fields from tbig;
+——-+———–+——+—–+———+—————-+
| Field | Type | Null | Key | Default | Extra |
+——-+———–+——+—–+———+—————-+
| id | int(11) | NO | PRI | NULL | auto_increment |
| c1 | char(100) | YES | | NULL | |
+——-+———–+——+—–+———+—————-+
2 rows in set (0.02 sec)
From the above I conclude that your statement quoted either is not specific enough or incorrect. Same with INSERT INTO target SELECT … FROM source. No metadata lock prevents other sessions from reading the source table. We can not ALTER it or apply any DD, but otherwise we are free to do whatever storage engine allows to do concurrently…
More details about my environment:
mysql> show create table tbigG
*************************** 1. row ***************************
Table: tbig
Create Table: CREATE TABLE
tbig(idint(11) NOT NULL AUTO_INCREMENT,c1char(100) DEFAULT NULL,PRIMARY KEY (
id)) ENGINE=InnoDB AUTO_INCREMENT=524289 DEFAULT CHARSET=utf8
1 row in set (0.00 sec)
mysql> select version();
+————+
| version() |
+————+
| 5.6.29-log |
+————+
1 row in set (0.00 sec)
Hi Valeriy,
Sorry, I have a typo in the blog text:
It should be ” destination table (table_new)” Thank you for pointing out.
Try select count(*) from tslow;
Alex
Second item still requires similar fixing:
“With “create table new_table like old_table” + “insert into new_table select …”, other application connections can’t read from the source table during the “insert into new_table select …” part.”
No metadata lock here prevents reading from SOURCE table that INSERT … SELECT also reads.
Still this is incorrect:
“other application connections can’t read from the destination table during the “insert into new_table select …” part.”
This is NOT true of destination table is InnoDB one. You can read from it while new data are inserted, you just will not see newly inserted data probably (this depends on transaction isolation level).
Metadata locks can cause serious problems: Metadata locks and dumping are generating problems.
You could add “ANALYZE TABLE” to your article as another dangerous statement with metadata locks.
It is also interesting to read this comment in the code:
https://github.com/mysql/mysql-server/blob/f5ed5d58ffae0ff4c668830376faa411987915a1/sql/handler.cc#L1309
The scary part is:
“This diversity makes it hard to say what will happen if
by chance a stored function is invoked during a DDL —
whether any modifications it makes will be committed or not
is not clear. Fortunately, SQL grammar of few DDLs allows
invocation of a stored function.”
I’m happy with a temporary table but when I tried “CREATE TEMPORARY TABLE xyz AS SELECT …” I discover no that is not allowed either. I get this error: ERROR 1105 (HY000): Percona-XtraDB-Cluster prohibits use of CREATE TABLE AS SELECT with pxc_strict_mode = ENFORCING or MASTER
However using “CREATE TEMPORARY TABLE xyz LIKE …” does work, and you can then use the “INSERT INTO xyz SELECT” later on. That’s kind of annoying but you can live with it.