EmergencyEMERGENCY? Get 24/7 Help Now!

Optimizing repeated subexpressions in MySQL

 | January 23, 2009 |  Posted In: Insight for Developers

PREVIOUS POST
NEXT POST

How smart is the MySQL optimizer? If it sees an expression repeated many times, does it realize they’re all the same and not calculate the result for each of them?

I had a specific case where I needed to find out for sure, so I made a little benchmark. The query looks something like this:

I wrote a bash script to execute the query 500 times and redirect the results to /dev/null. Results: 31s wall-clock time.

The following is an alternative query that returns the same result:

This query runs 500 iterations in 12s. So it looks like the optimizer was computing the SUM() 18 times in the first query. (A response on the internals mailing list agreed with this finding. Thanks Konstantin.)

Along the same lines, I benchmarked POW() and found that squaring a number is more than twice as fast to do with ordinary multiplication as with POW(). Does this matter? Sometimes… we had a client who was doing vector math in a many-dimensional space, and the POW() seemed to be taking a pretty significant amount of time; changing to ordinary multiplication made the query faster. (It was not the ultimate suggestion though.)

PREVIOUS POST
NEXT POST
Baron Schwartz

Baron is the lead author of High Performance MySQL. He is a former Percona employee.

11 Comments

  • Its always worthwhile to look at expensive expressions like that if they are repeated. In some cases I have turned these into subqueries in the SELECT clause. Even though MySQL is very badly optimized for these kinds of subqueries, usually the result sets are very small for these cases and so it does not hurt that much and is radically offset by not having to do the same computation multiple times.

  • Just to say that your workaround isn’t guaranteed to work. The server is not required to evaluate expressions in the SELECT clause in the order they are listed. The fact that it does is an implementation detail that may change any time.

  • Sergei, indeed: in cases where it matters, such as in mk-table-checksum, I run test queries to find out which place the “real” expression needs to go and where the user variables can be substituted. The result is often that the real expression goes LAST, not first! It looks odd but it’s true — you end up using @variable apparently before it’s assigned to. In this case it needed to be first.

  • sorry, bad timing 🙂 we were commenting at the same time, apparently, at least, your comment wasn’t here when I was writing mine.

    I mean that Baron’s workaround relies on the order in which SELECT expressions are evaluated. And this order is undefined, afaik.

  • Hi,

    To solve the issue of whether @sum is calculated first, the query can be rewritten as:
    SELECT sql_no_cache
    pow(@sum, 1),
    pow(@sum, 2),
    pow(@sum, 3),
    pow(@sum, 4),
    pow(@sum, 5),
    pow(@sum, 6),
    pow(@sum, 7),
    pow(@sum, 8),
    pow(@sum, 10),
    pow(@sum, 11),
    pow(@sum, 12),
    pow(@sum, 13),
    pow(@sum, 14),
    pow(@sum, 15),
    pow(@sum, 16),
    pow(@sum, 17),
    pow(@sum, 18)
    FROM
    (SELECT @sum := (SELECT sum(rental_id) FROM sakila.rental)) a

    In this query @sum is guaranteed to be calculated before all the pow(…) take place.

  • You don’t need a user variable in your solution:

    SELECT
    pow(sum, 1),
    pow(sum, 2),
    pow(sum, 3),
    pow(sum, 4),
    pow(sum, 5),
    pow(sum, 6),
    pow(sum, 7),
    pow(sum, 8),
    pow(sum, 10),
    pow(sum, 11),
    pow(sum, 12),
    pow(sum, 13),
    pow(sum, 14),
    pow(sum, 15),
    pow(sum, 16),
    pow(sum, 17),
    pow(sum, 18)
    FROM (SELECT sum(rental_id) as sum FROM sakila.rental) a

    you can even write ”


    SELECT sum(rental_id) as sum FROM sakila.rental GROUP BY customer_id

    which isn’t possible with user variables.

  • Notably, Postgres does not have this particular deficiency. I often run into optimizer deficiencies in MySQL with my clients (this being one of the more minor sorts) and find myself longing for my younger days with the PGSQL.

  • @Sergei: thats exactly the approach I meant in my original approach.

    @Ms. Anthrope: right .. then again in some cases its nicer for the eyes to still do this subquery if the expression is very long.

Leave a Reply

 
 

Percona’s widely read Percona Data Performance blog highlights our expertise in enterprise-class software, support, consulting and managed services solutions for both MySQL® and MongoDB® across traditional and cloud-based platforms. The decades of experience represented by our consultants is found daily in numerous and relevant blog posts.

Besides specific database help, the blog also provides notices on upcoming events and webinars.
Want to get weekly updates listing the latest blog posts? Subscribe to our blog now! Submit your email address below and we’ll send you an update every Friday at 1pm ET.

No, thank you. Please do not ask me again.