Time for another entry in the EXPLAIN FORMAT=JSON is cool! series of blog posts. This time we’ll discuss how using EXPLAIN FORMAT=JSON allows you to see that
cost_info knows why the optimizer prefers one index to another.
Tables often have more than one index. Any of these indexes can be used to resolve query. The optimizer has to make a choice in this case. One of the metrics that can be used to help make the choice is the potential cost of the query evaluation.
For example, let’s take the table titles from the standard employees database:
|
1 |
mysql> show create table titlesG<br>*************************** 1. row ***************************<br> Table: titles<br>Create Table: CREATE TABLE `titles` (<br> `emp_no` int(11) NOT NULL,<br> `title` varchar(50) NOT NULL,<br> `from_date` date NOT NULL,<br> `to_date` date DEFAULT NULL,<br> PRIMARY KEY (`emp_no`,`title`,`from_date`),<br> KEY `emp_no` (`emp_no`),<br> CONSTRAINT `titles_ibfk_1` FOREIGN KEY (`emp_no`) REFERENCES `employees` (`emp_no`) ON DELETE CASCADE<br>) ENGINE=InnoDB DEFAULT CHARSET=latin1<br>1 row in set (0.01 sec) |
It has two indexes: emp_no and PRIMARY, each of which could be used to resolve query:
|
1 |
select distinct title from titles where year(from_date) > '1990'; |
At first glance, emp_no doesn’t really fit for this query. PRIMARY does fit, because it contains both the title and from_date fields. Unfortunately, it cannot be used to resolve the query, because we don’t limit the search by emp_no and title . It can, however, be used to select rows from the index. When we use EXPLAIN , though, it shows us that the optimizer has chosen index emp_no (every secondary index in InnoDB contains a link to the clustered index anyway):
|
1 |
mysql> explain select distinct title from titles where year(from_date) > '1990'G<br>*************************** 1. row ***************************<br> id: 1<br> select_type: SIMPLE<br> table: titles<br> partitions: NULL<br> type: index<br>possible_keys: PRIMARY,emp_no<br> key: emp_no<br> key_len: 4<br> ref: NULL<br> rows: 442724<br> filtered: 100.00<br> Extra: Using where; Using index; Using temporary<br>1 row in set, 1 warning (0.00 sec) |
PRIMARY KEY exists in the field possible_keys, but was not chosen. EXPLAIN FORMAT=JSON can show us why.
First let’s run it on the original query:
|
1 |
mysql> explain format=json select distinct title from titles where year(from_date) > '1990'G<br>*************************** 1. row ***************************<br>EXPLAIN: {<br> "query_block": {<br> "select_id": 1,<br> "cost_info": {<br> "query_cost": "89796.80"<br> },<br> "duplicates_removal": {<br> "using_temporary_table": true,<br> "using_filesort": false,<br> "table": {<br> "table_name": "titles",<br> "access_type": "index",<br> "possible_keys": [<br> "PRIMARY",<br> "emp_no"<br> ],<br> "key": "emp_no",<br> "used_key_parts": [<br> "emp_no"<br> ],<br> "key_length": "4",<br> "rows_examined_per_scan": 442724,<br> "rows_produced_per_join": 442724,<br> "filtered": "100.00",<br> "using_index": true,<br> "cost_info": {<br> "read_cost": "1252.00",<br> "eval_cost": "88544.80",<br> "prefix_cost": "89796.80",<br> "data_read_per_join": "27M"<br> },<br> "used_columns": [<br> "emp_no",<br> "title",<br> "from_date"<br> ],<br> "attached_condition": "(year(`employees`.`titles`.`from_date`) > '1990')"<br> }<br> }<br> }<br>}<br>1 row in set, 1 warning (0.01 sec) |
The important part here is:
|
1 |
"cost_info": {<br> "query_cost": "89796.80"<br> }, |
Which shows that the overall query_cost is 89796.80. We don’t really know what the units are for this cost, or how it is actually measured. It isn’t important; the only thing that is important for now is that smaller is better. (Think of it like shopping for a product: it doesn’t matter which you buy it from, just that you buy it at the lowest price.)
Another important member of the index is cost_info, which belongs to the table itself:
|
1 |
"cost_info": {<br> "read_cost": "1252.00",<br> "eval_cost": "88544.80",<br> "prefix_cost": "89796.80",<br> "data_read_per_join": "27M"<br> }, |
Here we get even more details, such as cost of read operation and evaluation. prefix_cost is not useful for this example, because it contains the cost of joining to the next table in JOIN. Since we don’t join the table titles with any other value of prefix_cost, is equivalent to the cost of the full query. data_read_per_join contains the amount of data that should be read for each JOIN operation. In our case it is once again the same as how much data we should read to fully evaluate the query.
Now let’s force index PRIMARY and examine the EXPLAIN FORMAT=JSON output:
|
1 |
mysql> explain format=json select distinct title from titles force index(primary) where year(from_date) > '1990'G<br>*************************** 1. row ***************************<br>EXPLAIN: {<br> "query_block": {<br> "select_id": 1,<br> "cost_info": {<br> "query_cost": "531269.80"<br> },<br> "duplicates_removal": {<br> "using_temporary_table": true,<br> "using_filesort": false,<br> "table": {<br> "table_name": "titles",<br> "access_type": "index",<br> "possible_keys": [<br> "PRIMARY",<br> "emp_no"<br> ],<br> "key": "PRIMARY",<br> "used_key_parts": [<br> "emp_no",<br> "title",<br> "from_date"<br> ],<br> "key_length": "59",<br> "rows_examined_per_scan": 442724,<br> "rows_produced_per_join": 442724,<br> "filtered": "100.00",<br> "using_index": true,<br> "cost_info": {<br> "read_cost": "442725.00",<br> "eval_cost": "88544.80",<br> "prefix_cost": "531269.80",<br> "data_read_per_join": "27M"<br> },<br> "used_columns": [<br> "emp_no",<br> "title",<br> "from_date"<br> ],<br> "attached_condition": "(year(`employees`.`titles`.`from_date`) > '1990')"<br> }<br> }<br> }<br>}<br>1 row in set, 1 warning (0.01 sec) |
Notice the numbers are different this time. The total query cost is 531269.80, which is about 6 times greater than 89796.80:
|
1 |
"cost_info": {<br> "query_cost": "531269.80"<br> }, |
read_cost is 442725.00, which is 353 times greater than 1252.00. However, the eval_cost and data_read_per_join are the same as the query that uses index emp_no :
|
1 |
"cost_info": {<br> "read_cost": "442725.00",<br> "eval_cost": "88544.80",<br> "prefix_cost": "531269.80",<br> "data_read_per_join": "27M"<br> }, |
These numbers clearly explain why the optimizer prefers the index emp_no to PRIMARY KEY.
In our example above this behavior is correct. In a real life scenario, if the optimizer’s choice is wrong. these numbers can show either that there is a bug in the optimizer or that the table’s statistics are outdated and need to be updated.
Conclusion: EXPLAIN FORMAT=JSON can be used together with FORCE INDEX to find out why the optimizer prefers one index to another.
Resources
RELATED POSTS