EXPLAIN FORMAT=JSON: cost_info knows why optimizer prefers one index to another

EXPLAIN FORMAT=JSONTime for another entry in the EXPLAIN FORMAT=JSON is cool! series of blog posts. This time we’ll discuss how using EXPLAIN FORMAT=JSON allows you to see that cost_info  knows why the optimizer prefers one index to another.

Tables often have more than one index. Any of these indexes can be used to resolve query. The optimizer has to make a choice in this case. One of the metrics that can be used to help make the choice is the potential cost of the query evaluation.

For example, let’s take the table titles  from the standard employees database:

It has two indexes: emp_no  and PRIMARY, each of  which could be used to resolve query:

At first glance,  emp_no  doesn’t really fit for this query. PRIMARY  does fit, because it contains both the  title  and from_date  fields. Unfortunately, it cannot be used to resolve the query, because we don’t limit the search by emp_no  and title .  It can, however, be used to select rows from the index. When we use  EXPLAIN , though, it shows us that the optimizer has chosen index emp_no  (every secondary index in InnoDB contains a link to the clustered index anyway):

PRIMARY KEY  exists in the field possible_keys, but was not chosen. EXPLAIN FORMAT=JSON  can show us why.

First let’s run it on the original query:

The important part here is:

Which shows that the overall query_cost  is 89796.80. We don’t really know what the units are for this cost, or how it is actually measured. It isn’t important; the only thing that is important for now is that smaller is better. (Think of it like shopping for a product: it doesn’t matter which you buy it from, just that you buy it at the lowest price.)

Another important member of the index is cost_info, which belongs to the table itself:

Here we get even more details, such as cost of read operation and evaluation. prefix_cost  is not useful for this example, because it contains the cost of joining to the next table in JOIN. Since we don’t join the table titles  with any other value of prefix_cost, is equivalent to the cost of the full query. data_read_per_join  contains the amount of data that should be read for each JOIN  operation. In our case it is once again the same as how much data we should read to fully evaluate the query.

Now let’s force index PRIMARY  and examine the  EXPLAIN FORMAT=JSON  output:

Notice the numbers are different this time. The total query cost is 531269.80, which is about 6 times greater than 89796.80:

read_cost  is 442725.00, which is 353 times greater than 1252.00. However, the  eval_cost  and data_read_per_join  are the same as the query that uses index emp_no :

These numbers clearly explain why the optimizer prefers the index  emp_no  to PRIMARY KEY.

In our example above this behavior is correct. In a real life scenario, if the optimizer’s choice is wrong. these numbers can show either that there is a bug in the optimizer or  that the table’s statistics are outdated and need to be updated.

Conclusion: EXPLAIN FORMAT=JSON  can be used together with FORCE INDEX  to find out why the optimizer prefers one index to another.

Share this post

Comment (1)

Leave a Reply