At the end of my talk “Troubleshooting MySQL Performance” at the LinuxPiter conference, a user asked me a question: “What does the EXPLAIN ‘filtered’ field mean, and how do I use it?” I explained that this is the percentage of rows that were actually needed, against the equal or bigger number of resolved rows. While the user was happy with the answer, I’d like to better illustrate this. And I can do it with help of EXPLAIN FORMAT=JSON and its
rows_examined_per_scan, rows_produced_per_join statistics.
Let’s take a simple query that searches information about the Russian Federation in the table Country of the standard world database:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
mysql> select * from Country where Name='Russian Federation'G *************************** 1. row *************************** Code: RUS Name: Russian Federation Continent: Europe Region: Eastern Europe SurfaceArea: 17075400.00 IndepYear: 1991 Population: 146934000 LifeExpectancy: 67.2 GNP: 276608.00 GNPOld: 442989.00 LocalName: Rossija GovernmentForm: Federal Republic HeadOfState: Vladimir Putin Capital: 3580 Code2: RU 1 row in set (0.00 sec) |
It returns single row – but how many rows were actually used to resolve the query? EXPLAIN will show us:
|
1 2 3 4 5 6 7 8 9 |
mysql> 56-explain select * from Country where Name='Russian Federation'; +----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+ | 1 | SIMPLE | Country | NULL | ALL | NULL | NULL | NULL | NULL | 239 | 10.00 | Using where | +----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+ 1 row in set, 1 warning (0.00 sec) Note (Code 1003): /* select#1 */ select `world`.`Country`.`Code` AS `Code`,`world`.`Country`.`Name` AS `Name`,`world`.`Country`.`Continent` AS `Continent`,`world`.`Country`.`Region` AS `Region`,`world`.`Country`.`SurfaceArea` AS `SurfaceArea`,`world`.`Country`.`IndepYear` AS `IndepYear`,`world`.`Country`.`Population` AS `Population`,`world`.`Country`.`LifeExpectancy` AS `LifeExpectancy`,`world`.`Country`.`GNP` AS `GNP`,`world`.`Country`.`GNPOld` AS `GNPOld`,`world`.`Country`.`LocalName` AS `LocalName`,`world`.`Country`.`GovernmentForm` AS `GovernmentForm`,`world`.`Country`.`HeadOfState` AS `HeadOfState`,`world`.`Country`.`Capital` AS `Capital`,`world`.`Country`.`Code2` AS `Code2` from `world`.`Country` where (`world`.`Country`.`Name` = 'Russian Federation') |
You see that 239 rows were examined, and 10% of them filtered. But what exactly was done? An explanation exists in the EXPLAIN FORMAT=JSON output:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
mysql> explain format=json select * from Country where Name='Russian Federation'G *************************** 1. row *************************** EXPLAIN: { "query_block": { "select_id": 1, "cost_info": { "query_cost": "53.80" }, "table": { "table_name": "Country", "access_type": "ALL", "rows_examined_per_scan": 239, "rows_produced_per_join": 23, "filtered": "10.00", "cost_info": { "read_cost": "49.02", "eval_cost": "4.78", "prefix_cost": "53.80", "data_read_per_join": "6K" }, "used_columns": [ "Code", "Name", "Continent", "Region", "SurfaceArea", "IndepYear", "Population", "LifeExpectancy", "GNP", "GNPOld", "LocalName", "GovernmentForm", "HeadOfState", "Capital", "Code2" ], "attached_condition": "(`world`.`Country`.`Name` = 'Russian Federation')" } } } 1 row in set, 1 warning (0.00 sec) Note (Code 1003): /* select#1 */ select `world`.`Country`.`Code` AS `Code`,`world`.`Country`.`Name` AS `Name`,`world`.`Country`.`Continent` AS `Continent`,`world`.`Country`.`Region` AS `Region`,`world`.`Country`.`SurfaceArea` AS `SurfaceArea`,`world`.`Country`.`IndepYear` AS `IndepYear`,`world`.`Country`.`Population` AS `Population`,`world`.`Country`.`LifeExpectancy` AS `LifeExpectancy`,`world`.`Country`.`GNP` AS `GNP`,`world`.`Country`.`GNPOld` AS `GNPOld`,`world`.`Country`.`LocalName` AS `LocalName`,`world`.`Country`.`GovernmentForm` AS `GovernmentForm`,`world`.`Country`.`HeadOfState` AS `HeadOfState`,`world`.`Country`.`Capital` AS `Capital`,`world`.`Country`.`Code2` AS `Code2` from `world`.`Country` where (`world`.`Country`.`Name` = 'Russian Federation') |
We are interested in this part:
|
1 2 3 |
"rows_examined_per_scan": 239, "rows_produced_per_join": 23, "filtered": "10.00", |
It clearly shows that 239 rows were examined, but only 23 rows were used to produce the result. To make this query more effective we need to add an index on the Name field:
|
1 2 3 |
mysql> alter table Country add index(Name); Query OK, 0 rows affected (0.40 sec) Records: 0 Duplicates: 0 Warnings: 0 |
Now the EXPLAIN plan is much better: we only examine 1 required row, and the value of filtered is 100%:
|
1 2 3 4 5 6 7 |
mysql> pager egrep 'rows_examined_per_scan|rows_produced_per_join|filtered'; PAGER set to 'egrep 'rows_examined_per_scan|rows_produced_per_join|filtered'' mysql> explain format=json select * from Country where Name='Russian Federation'G "rows_examined_per_scan": 1, "rows_produced_per_join": 1, "filtered": "100.00", 1 row in set, 1 warning (0.00 sec) |
Not exactly the same thing, but you can get similar interesting results from Premetheus mysqld_exporter metrics.
For example, this query:
sum(rate(mysql_perf_schema_events_statements_rows_sent_total[5m])) by (digest_text) / (sum(rate(mysql_perf_schema_events_statements_rows_examined_total[5m])) by (digest_text) > 0)
will give you the ratio of rows sent to rows examined. The closer you get to 1, the more efficient your index use is. You may get some funny results with COUNT() queries.
I assume you use Performance Schema? This is different thing: to use Performance Schema metrics you need to execute query while EXPLAIN shows estimates. Results in Performance Schema are more precise, but EXPLAIN can help to understand why optimizer chooses one or another plan. For example, estimates in these fields can be wrong and show bug in optimizer or the fact what table statistics is out of date.
In short: EXPLAIN tells what optimizer suspects about the query while Performance Schema shows what really happened.
Yup, like I said, it’s not the same thing. The perf schema + prometheus allows you to see changes in index performance over time, instead of having to look at it manually. The explain is nice to help dig deeper once you’ve identified a problem with metrics. I’ve done this when coming in to an existing system where I don’t have any understanding of the app or use case. I can examine metrics for all queries and pick the top candidates for examine and more digging.