Optimizing repeated subexpressions in MySQL

How smart is the MySQL optimizer? If it sees an expression repeated many times, does it realize they’re all the same and not calculate the result for each of them?

I had a specific case where I needed to find out for sure, so I made a little benchmark. The query looks something like this:

I wrote a bash script to execute the query 500 times and redirect the results to /dev/null. Results: 31s wall-clock time.

The following is an alternative query that returns the same result:

This query runs 500 iterations in 12s. So it looks like the optimizer was computing the SUM() 18 times in the first query. (A response on the internals mailing list agreed with this finding. Thanks Konstantin.)

Along the same lines, I benchmarked POW() and found that squaring a number is more than twice as fast to do with ordinary multiplication as with POW(). Does this matter? Sometimes… we had a client who was doing vector math in a many-dimensional space, and the POW() seemed to be taking a pretty significant amount of time; changing to ordinary multiplication made the query faster. (It was not the ultimate suggestion though.)

Share this post

Comments (11)

  • Lukas

    Its always worthwhile to look at expensive expressions like that if they are repeated. In some cases I have turned these into subqueries in the SELECT clause. Even though MySQL is very badly optimized for these kinds of subqueries, usually the result sets are very small for these cases and so it does not hurt that much and is radically offset by not having to do the same computation multiple times.

    January 24, 2009 at 3:18 am
  • Sergei Golubchik

    Just to say that your workaround isn’t guaranteed to work. The server is not required to evaluate expressions in the SELECT clause in the order they are listed. The fact that it does is an implementation detail that may change any time.

    January 24, 2009 at 3:20 am
  • Lukas

    @Sergei: So the subquery solution is safer?

    January 24, 2009 at 3:22 am
  • Baron Schwartz

    Sergei, indeed: in cases where it matters, such as in mk-table-checksum, I run test queries to find out which place the “real” expression needs to go and where the user variables can be substituted. The result is often that the real expression goes LAST, not first! It looks odd but it’s true — you end up using @variable apparently before it’s assigned to. In this case it needed to be first.

    January 24, 2009 at 4:33 am
  • Sergei Golubchik

    sorry, bad timing 🙂 we were commenting at the same time, apparently, at least, your comment wasn’t here when I was writing mine.

    I mean that Baron’s workaround relies on the order in which SELECT expressions are evaluated. And this order is undefined, afaik.

    January 24, 2009 at 4:40 am
  • Shlomi Noach


    To solve the issue of whether @sum is calculated first, the query can be rewritten as:
    SELECT sql_no_cache
    pow(@sum, 1),
    pow(@sum, 2),
    pow(@sum, 3),
    pow(@sum, 4),
    pow(@sum, 5),
    pow(@sum, 6),
    pow(@sum, 7),
    pow(@sum, 8),
    pow(@sum, 10),
    pow(@sum, 11),
    pow(@sum, 12),
    pow(@sum, 13),
    pow(@sum, 14),
    pow(@sum, 15),
    pow(@sum, 16),
    pow(@sum, 17),
    pow(@sum, 18)
    (SELECT @sum := (SELECT sum(rental_id) FROM sakila.rental)) a

    In this query @sum is guaranteed to be calculated before all the pow(…) take place.

    January 24, 2009 at 11:10 pm
  • Sergei Golubchik

    You don’t need a user variable in your solution:

    pow(sum, 1),
    pow(sum, 2),
    pow(sum, 3),
    pow(sum, 4),
    pow(sum, 5),
    pow(sum, 6),
    pow(sum, 7),
    pow(sum, 8),
    pow(sum, 10),
    pow(sum, 11),
    pow(sum, 12),
    pow(sum, 13),
    pow(sum, 14),
    pow(sum, 15),
    pow(sum, 16),
    pow(sum, 17),
    pow(sum, 18)
    FROM (SELECT sum(rental_id) as sum FROM sakila.rental) a

    you can even write ”

    SELECT sum(rental_id) as sum FROM sakila.rental GROUP BY customer_id

    which isn’t possible with user variables.

    January 25, 2009 at 3:21 am
  • Ms. Anthrope

    Notably, Postgres does not have this particular deficiency. I often run into optimizer deficiencies in MySQL with my clients (this being one of the more minor sorts) and find myself longing for my younger days with the PGSQL.

    January 25, 2009 at 4:11 am
  • Lukas

    @Sergei: thats exactly the approach I meant in my original approach.

    @Ms. Anthrope: right .. then again in some cases its nicer for the eyes to still do this subquery if the expression is very long.

    January 25, 2009 at 7:26 am
  • peter


    Do you know if this is on the roadmap to be fixed natively by the way ?

    January 25, 2009 at 12:18 pm
  • Sergei Golubchik

    To recognize common subexpressions ? No, I don’t know.

    January 25, 2009 at 1:23 pm

Comments are closed.