How smart is the MySQL optimizer? If it sees an expression repeated many times, does it realize they’re all the same and not calculate the result for each of them?
I had a specific case where I needed to find out for sure, so I made a little benchmark. The query looks something like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
select sql_no_cache pow(sum(rental_id), 1), pow(sum(rental_id), 2), pow(sum(rental_id), 3), pow(sum(rental_id), 4), pow(sum(rental_id), 5), pow(sum(rental_id), 6), pow(sum(rental_id), 7), pow(sum(rental_id), 8), pow(sum(rental_id), 10), pow(sum(rental_id), 11), pow(sum(rental_id), 12), pow(sum(rental_id), 13), pow(sum(rental_id), 14), pow(sum(rental_id), 15), pow(sum(rental_id), 16), pow(sum(rental_id), 17), pow(sum(rental_id), 18) from sakila.rental; |
I wrote a bash script to execute the query 500 times and redirect the results to /dev/null. Results: 31s wall-clock time.
The following is an alternative query that returns the same result:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
select sql_no_cache pow(@sum := (select sum(rental_id) from sakila.rental), 1), pow(@sum, 2), pow(@sum, 3), pow(@sum, 4), pow(@sum, 5), pow(@sum, 6), pow(@sum, 7), pow(@sum, 8), pow(@sum, 10), pow(@sum, 11), pow(@sum, 12), pow(@sum, 13), pow(@sum, 14), pow(@sum, 15), pow(@sum, 16), pow(@sum, 17), pow(@sum, 18); |
This query runs 500 iterations in 12s. So it looks like the optimizer was computing the SUM() 18 times in the first query. (A response on the internals mailing list agreed with this finding. Thanks Konstantin.)
Along the same lines, I benchmarked POW() and found that squaring a number is more than twice as fast to do with ordinary multiplication as with POW(). Does this matter? Sometimes… we had a client who was doing vector math in a many-dimensional space, and the POW() seemed to be taking a pretty significant amount of time; changing to ordinary multiplication made the query faster. (It was not the ultimate suggestion though.)
Lukas
Its always worthwhile to look at expensive expressions like that if they are repeated. In some cases I have turned these into subqueries in the SELECT clause. Even though MySQL is very badly optimized for these kinds of subqueries, usually the result sets are very small for these cases and so it does not hurt that much and is radically offset by not having to do the same computation multiple times.
Sergei Golubchik
Just to say that your workaround isn’t guaranteed to work. The server is not required to evaluate expressions in the SELECT clause in the order they are listed. The fact that it does is an implementation detail that may change any time.
Lukas
@Sergei: So the subquery solution is safer?
Baron Schwartz Post author
Sergei, indeed: in cases where it matters, such as in mk-table-checksum, I run test queries to find out which place the “real” expression needs to go and where the user variables can be substituted. The result is often that the real expression goes LAST, not first! It looks odd but it’s true — you end up using @variable apparently before it’s assigned to. In this case it needed to be first.
Sergei Golubchik
sorry, bad timing we were commenting at the same time, apparently, at least, your comment wasn’t here when I was writing mine.
I mean that Baron’s workaround relies on the order in which SELECT expressions are evaluated. And this order is undefined, afaik.
Shlomi Noach
Hi,
To solve the issue of whether @sum is calculated first, the query can be rewritten as:
SELECT sql_no_cache
pow(@sum, 1),
pow(@sum, 2),
pow(@sum, 3),
pow(@sum, 4),
pow(@sum, 5),
pow(@sum, 6),
pow(@sum, 7),
pow(@sum, 8),
pow(@sum, 10),
pow(@sum, 11),
pow(@sum, 12),
pow(@sum, 13),
pow(@sum, 14),
pow(@sum, 15),
pow(@sum, 16),
pow(@sum, 17),
pow(@sum, 18)
FROM
(SELECT @sum := (SELECT sum(rental_id) FROM sakila.rental)) a
In this query @sum is guaranteed to be calculated before all the pow(…) take place.
Sergei Golubchik
You don’t need a user variable in your solution:
SELECT
pow(sum, 1),
pow(sum, 2),
pow(sum, 3),
pow(sum, 4),
pow(sum, 5),
pow(sum, 6),
pow(sum, 7),
pow(sum, 8),
pow(sum, 10),
pow(sum, 11),
pow(sum, 12),
pow(sum, 13),
pow(sum, 14),
pow(sum, 15),
pow(sum, 16),
pow(sum, 17),
pow(sum, 18)
FROM (SELECT sum(rental_id) as sum FROM sakila.rental) a
you can even write ”
…
SELECT sum(rental_id) as sum FROM sakila.rental GROUP BY customer_id
…
which isn’t possible with user variables.
Ms. Anthrope
Notably, Postgres does not have this particular deficiency. I often run into optimizer deficiencies in MySQL with my clients (this being one of the more minor sorts) and find myself longing for my younger days with the PGSQL.
Lukas
@Sergei: thats exactly the approach I meant in my original approach.
@Ms. Anthrope: right .. then again in some cases its nicer for the eyes to still do this subquery if the expression is very long.
peter
Sergei,
Do you know if this is on the roadmap to be fixed natively by the way ?
Sergei Golubchik
To recognize common subexpressions ? No, I don’t know.