rows_examined_per_scan, rows_produced_per_join: EXPLAIN FORMAT=JSON answers on question “What number of filtered rows mean?”

EXPLAINAt the end of my talk “Troubleshooting MySQL Performance” at the LinuxPiter conference, a user asked me a question: “What does the EXPLAIN ‘filtered’ field mean, and how do I use it?” I explained that this is the percentage of rows that were actually needed, against the equal or bigger number of resolved rows. While the user was happy with the answer, I’d like to better illustrate this. And I can do it with help of EXPLAIN FORMAT=JSON and its rows_examined_per_scan, rows_produced_per_join  statistics.

Let’s take a simple query that searches information about the Russian Federation in the table Country  of the standard world database:

It returns single row – but how many rows were actually used to resolve the query? EXPLAIN  will show us:

You see that 239 rows were examined, and 10% of them filtered. But what exactly was done? An explanation exists in the  EXPLAIN FORMAT=JSON  output:

We are interested in this part:

It clearly shows that 239 rows were examined, but only 23 rows were used to produce the result. To make this query more effective we need to add an index on the  Name field:

Now the  EXPLAIN  plan is much better: we only examine 1 required row, and the value of filtered is 100%:

 

Share this post

Comments (3)

  • SuperQ Reply

    Not exactly the same thing, but you can get similar interesting results from Premetheus mysqld_exporter metrics.

    For example, this query:
    sum(rate(mysql_perf_schema_events_statements_rows_sent_total[5m])) by (digest_text) / (sum(rate(mysql_perf_schema_events_statements_rows_examined_total[5m])) by (digest_text) > 0)

    will give you the ratio of rows sent to rows examined. The closer you get to 1, the more efficient your index use is. You may get some funny results with COUNT() queries.

    December 10, 2015 at 7:17 pm
  • Sveta Smirnova Reply

    I assume you use Performance Schema? This is different thing: to use Performance Schema metrics you need to execute query while EXPLAIN shows estimates. Results in Performance Schema are more precise, but EXPLAIN can help to understand why optimizer chooses one or another plan. For example, estimates in these fields can be wrong and show bug in optimizer or the fact what table statistics is out of date.

    In short: EXPLAIN tells what optimizer suspects about the query while Performance Schema shows what really happened.

    December 10, 2015 at 7:26 pm
  • SuperQ Reply

    Yup, like I said, it’s not the same thing. The perf schema + prometheus allows you to see changes in index performance over time, instead of having to look at it manually. The explain is nice to help dig deeper once you’ve identified a problem with metrics. I’ve done this when coming in to an existing system where I don’t have any understanding of the app or use case. I can examine metrics for all queries and pick the top candidates for examine and more digging.

    December 10, 2015 at 7:32 pm

Leave a Reply