EmergencyEMERGENCY? Get 24/7 Help Now!

When the subselect runs faster


Posted on:

|

By:


PREVIOUS POST
NEXT POST
Share Button

A few weeks ago, we had a query optimization request from one of our customer.

The query was very simple like:

This column in the table is looks like this:

The table have 549252 rows and of course, there is an index on the col1. MySQL estimated the cardinality of that index as 87, though what was of course misleading as index cardinality in this case can’t be over 9, as there is only 8(+ NULL) different possible values for this column.

This query took more than 5 minutes (the rows are large and table does not fit in cache well)

When you want to run this query mysql first will try to find each row where col1 is A or B using index. Then its going to order by the ID using file sort and then send first 20 rows ignoring the rest.

In this case MySQL has 2 indexes where one is usable to find rows, while other is usable to return them in the right order. MySQL can chose only one of them to execute the query – use index to find rows. This is sensible strategy if there is no LIMIT, however it is poor chose if there is one – it is often a lot faster to retrieve rows in order checking WHERE clause for them until required number of rows were returned. Especially in the cases when WHERE clause is not very selective.

So I tried this:

In this case we forcing MySQL to do retrieve rows in sorted order and checking if it matches our original WHERE clause with subselects. It looks scary if we look at EXPLAIN but in reality the dependent subquery is only executed enough times to produce 20 rows in result set.

The result is a lot better result time:

So by rewriting query using subqueries we actually improved it performance 100 times. So subqueries are
not always slowing things down.

Even though proving subqueries are not always slow this way is not the most optimal. We do not really need separate subselect to make MySQL check WHERE clause while scanning table in index order. We can just use FORCE INDEX hint to override MySQL index choice:

This approach works well if WHERE clause is not very selective, otherwise MySQL may need to scan very many rows to find enough matching rows. You can use another trick Peter
wrote. about couple of years ago.

Share Button
PREVIOUS POST
NEXT POST


Istvan Podor

Istvan is a former Percona employee. Istvan's expertise is high-traffic websites driven by MySQL. Prior to joining Percona in 2009, Istvan worked for several Alexa Top 500 websites on high availability solutions and high load performance tuning.



Tags:

, , , , ,

Categories:
Insight for Developers, MySQL


Comments
  • why it is differen number of rows in cases (606920 vs 765105)?
    and why it is PRIMARY as key for DEPENDENT SUBQUERY in second case?

    Reply

  • ah, about first question: seems like the optimizer get a projected number of possible affected index records aproximately due to it’s measured before real query was executed

    Reply

  • zerkms,

    Yes estimation may give different value. Regarding PRIMARY KEY this is because MySQL does primary key lookups for rows as it retrieves them in the order and after fetching row by primary key checks if it matches WHERE clause.

    Reply

  • Seems like this is really just a workaround for a weakness in the optimizer. Properly handling limit clauses in the optimizer is definitely something I’d like to see tackled.

    Reply

  • I think that the real problem is having an index in col1, it’s not optimal at all having and index in a field with just 9 different possible values, and index is as efficient as its cardinality. I would try removing the index in col1, moreover you would save memory and the index creation when inserting rows.

    Reply

  • I suggest to also run the queries in a different order, to make sure that the caching of blocks by the kernel does not happen in favor or the second query.
    The most useful would be the have a col1,id index so definitely the index would be used by default.

    Reply

  • Nice! :)

    Reply

  • Why not something with union / join?

    select * from table where id in (
    select id from table where col1=’a’
    union
    select id from table where col1=’b’
    )ORDER BY id DESC LIMIT 20

    or even:

    select t1.* from table t1, (
    select id from table where col1=’a’
    union
    select id from table where col1=’b’
    ) t2
    where t1.id = t2.id
    ORDER BY id DESC LIMIT 20

    Reply

  • Istvan Podor Post author

    Nikolay:

    If you would give me the explain I could give an exact answer. Now it’s looks like because the joins are slow/slower in the subquery. As I mentioned in the post, the subquery will be executed for each result of the main query. So in this case each query like this means 20 joins :)

    Reply

Leave a Reply

Percona’s widely read Percona Data Performance blog highlights our expertise in enterprise-class software, support, consulting and managed services solutions for both MySQL® and MongoDB® across traditional and cloud-based platforms. The decades of experience represented by our consultants is found daily in numerous and relevant blog posts.

Besides specific database help, the blog also provides notices on upcoming events and webinars.

Want to get weekly updates listing the latest blog posts? Subscribe to our blog now! Submit your email address below.

No, thank you. Please do not ask me again.