In this blog post, I am going to show why we have to be careful with stored functions in
select list
, as a single query can cause thousands of queries in the background if we aren’t cautious.
For this example, I am only going to use the SLEEP
function to demonstrate the issue, but you could use any other stored functions.
Here is the test schema and data:
1 2 3 4 5 6 7 8 |
DROP TABLE IF EXISTS t1; CREATE TABLE t1 ( id INT NOT NULL PRIMARY KEY auto_increment, gcol INT NOT NULL, fcol INT NOT NULL, INDEX (gcol) ); INSERT INTO t1 VALUES (NULL,1,1),(NULL,2,1),(NULL,3,1),(NULL,4,2),(NULL,5,1); |
And the first query:
1 2 3 4 5 6 7 |
mysql [localhost] {msandbox} (test) > SELECT gcol, SLEEP(1) FROM t1 WHERE 1 GROUP BY gcol LIMIT 1; +------+----------+ | gcol | SLEEP(1) | +------+----------+ | 1 | 0 | +------+----------+ 1 row in set (1.00 sec) |
The query takes one second, which means the SLEEP
was executed after/during the LIMIT
step.
The second query creates a temp table:
1 2 3 4 5 6 7 |
mysql [localhost] {msandbox} (test) > SELECT fcol, SLEEP(1) FROM t1 WHERE 1 GROUP BY fcol LIMIT 1; +------+----------+ | fcol | SLEEP(1) | +------+----------+ | 1 | 0 | +------+----------+ 1 row in set (5.02 sec) |
It takes 5 seconds, meaning the SLEEP
was executed for every row before LIMIT
, and the only differences between these two queries is the second one uses temp tables. (Yes, the schema and query could be improved to avoid temp tables, but this is only a test case and you can’t avoid temp tables all the time.)
MySQL uses a different order of steps here and does the select list
evaluation before/during creating the temp tables.
As you can see, there are functions in the select list, and MySQL will create temp tables, and that could cause thousands of extra queries in the background. I recently had a case where running a single query with functions in the select list resulted in 333,906 queries in the background! In the query, there was a Limit 0,10
.
There is a very good blog post describing the order of the operations in SQL, which should be the following:
1 2 3 4 5 6 7 8 9 10 11 12 |
- From - Where - Group By - Aggregations - Having - Window - Select - Distinct - Uninon - Order by - Offset - Limit |
Based on this, if MySQL strictly follows the SQL order, the LIMIT
should be the last, and the first query should take 5s because the Select list
evaluation should happen before that for every row. But I guess this is a performance optimization in MySQL, to only run the evaluation for the limited number of rows.
When temp tables are involved, MySQL will use a different logic and do the evaluation before/during the group by
when it creates the temp tables.
When MySQL creates a temp table, it has to materialize every single row, and that includes the functions as well. If your query is going to create a huge temp table, MySQL will execute the function for all those rows.
MySQL does not have generated columns in internal temp tables, but what might do the trick here is to just point to a function that needs to be executed when MySQL reads the data out from the temp table.
I have created a ticket, but we are still debating if there is any room for improvement here.
Using DML queries in these functions
If you are using DML (insert/update/delete) queries in stored functions, and you are calling them in the select list, you have to be careful because if MySQL creates temp tables it will call those queries for all the rows in the temp table as well. So you might just want to insert only one row in another table, as you will end up with thousands if not.
How can we avoid this?
If you are facing this issue, you have to analyze your queries. You might only need an extra index to avoid temp tables, or you could rewrite the query.
Where can I see if this is happening?
Unfortunately, in this case, the slow query log cannot help us, because these function calls and the queries from the functions are not logged. This could also be a feature request because it would make debugging much easier. (I have already created a feature request in Percona Server to log queries from stored functions.)
However, in the slow query log for the original query, the Rows_examined
would be very high if you are using stored functions and that number is way higher than it should be and might cause problems. Also, the general log can be helpful for investigating the problem here, because in the general log we can see all these function calls as well.
Conclusion
If you are using stored functions in Select list, you have to make sure the query does not use temp tables, otherwise, it could cause many extra queries in the background.
This is a normal behavior, because ordering executed on temp table (by special order algorithm differs from memory qsort 🙂 ) and all rows required before sorting. Limit operation can be run after sorting of course.
So this is a special sose effect, thx for info.
PS: explain can show the temp file usage?
Szia Balazs,
Yeah, I know and understand why does this happening, but maybe somehow it could be improved. In this special cases when you have functions in the select list can be very expensive to run the functions for each row.
Here is the explain for the query which uses temp tables:
mysql [localhost] {msandbox} (test) > explain SELECT fcol, SLEEP(1) FROM t1 WHERE 1 GROUP BY fcol LIMIT 1\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t1
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 5
filtered: 100.00
Extra: Using temporary
1 row in set, 1 warning (0.00 sec)
And yes in Extra it says it is going to use a temp table.