Using Explain Analyze in MySQL 8

Explain Analyze in MySQLIn MySQL 8.0.18 there is a new feature called Explain Analyze when for many years we mostly had only the traditional Explain. I know there are different formats, but those based on the same information just show it in a different format with some extra details.

But Explain Analyze is a different concept. It is actually going to run the query and measure execution time by using the new iterator executor for each step. That topic itself deserves its own blog post on how the new iterator executor works, and I will write a post about that as well. But if you cannot wait and you would like to read up, here are some links to some additional information: Iterator executor analytics queries, Volcano iterator design, and Volcano iterator semijoin.

In this post, we are going to focus on what Explain Analyze can give us.

As always let’s start with some testing. I have my test server with a sbtest1 table:

There are almost 1 million rows in it:

Traditional Explain

Hopefully, most of you already know this output; we can see which table and index will be used and we can also see it is a range query and approximately it has to read 493204 Rows. With InnoDB we know that is just an estimation, it is not the real number. In the past, we had two options: either run the query and see the real number or run the query and check the handler statistics to get even more detailed information.

Now I am going to run the query and see the real row count:

So the query has to read 625262 rows and takes 0.10s.  Let’s have a look at Explain Analyze.

Explain Analyze

It always uses the tree format and it took me some time to actually understand what all this information means. Unfortunately, the manual page does not really explain it.

Let me try to fill the gaps:

Index range scan on sbtest1 using idx3 –  this part is quite trivial, it says it is going to use a range scan on sbtest1 table and will use the idx3 index.

cost=98896.53 rows=493204 –  this is the cost and the same row number that traditional Explain gives us.

actual time=0.021..96.488 rows=625262 loops=1:

  • 0.021 – The time to return first row (Init + first Read) in milliseconds.
  • 96.488 – The time to return all rows (Init + all Read calls) in milliseconds.
  • rows=625262 – The number of rows returned by this iterator. Finally, it is the exact number.
  • loops=1 – The number of loops (number of Init calls).

The source for this information is here: Implement EXPLAIN ANALYZE.

We can see all this information with each step, which gives us great help and insights for query optimization.

But it is important to notice in the last step that the time is 178ms and it also reports 1 row in set (0.18 sec). So it says this query takes 0.18s! But the original query took only 0.10s.

The manual says:

This has naturally some overhead, but it seems this is only around 5–6% for an instrumented query
(entirely free for a non-instrumented query).

But with this very simple query, the overhead is already 80%, so we cannot really trust those numbers.

Let’s have a look on another query which takes more time, and see how this difference behaves:

It reports that the query is 60% slower than in real life.

Just out of curiosity I tried Explain Analyze with a Stored Procedure, but it didn’t work. One of the most painful tasks is to analyze and optimize a Stored Procedure because MySQL does not log the individual queries from a Procedure and it’s hard to actually see what is going under the hood.  (In Percona Server there are log_slow_sp_statements which will allow you to log the individual queries into a slow query log.)

I would love to see if Explain Analyze would run the procedure and then display the actual execution plan in a tree format. But maybe I am believing in a perfect world and I am too naive.

I also noticed some interesting behavior:

If there isn’t a where condition, or group by, or anything else and just a simple count query, it does not give us the time for that step or how many rows were read or which index was used. I think here its a bit too minimalistic and some basic information would be good to see.

Conclusion

I think Explain Analyze is a great step in the right direction, and I am excited to see if in the future it gets even more features, but it should report more realistic times for execution time.

Share this post

Comments (9)

  • dbdemon Reply

    Looks like a useful feature. Your CREATE TABLE statement for the sbtest1 table is actually showing a different table (t1).

    October 28, 2019 at 4:12 pm
    • Tibor Korocz Reply

      Thanks, we fixed it.

      October 30, 2019 at 4:48 pm
  • SD Reply

    Just want to know that what is the meaning of cost value?

    October 29, 2019 at 1:58 am
  • Georgi Sotirov Reply

    Nice article! A few notes from me:

    > Unfortunately, the manual page does not really explain it.

    Yes, it does – see https://dev.mysql.com/doc/refman/8.0/en/explain.html#explain-analyze

    > … this is the cost and the same row number that traditional Explain gives us

    Traditional explain plan (if you mean EXPLAIN FORMAT=TRADITIONAL) doesn’t provide any information about costs (see https://dev.mysql.com/doc/refman/8.0/en/explain-output.html). Costs are visible only in TREE format (since 8.0.18) and JSON (used in MySQL Workbench for visual explain plans).

    > …just a simple count query, it does not give us the time for that step or how many rows were read or which index was used…

    That’s really an interesting observation that only confirms the current differences between the EXPLAIN formats, because TRADITIONAL and JSON would show rows and indexes for such a query if table is InnoDB. Also HASH joins are visible only in TREE format (and EXPLAIN ANALYZE of course). Currently some queries, which don’t use the iterator executor, won’t be explained and ran by EXPLAIN ANALYZE at all and you’ll get only “not executable by iterator executor” 🙂

    October 29, 2019 at 4:38 am
    • Tibor Korocz Reply

      Hi, thanks for the comments.

      Yes you can find some information on that page but for me it was still hard to understand at first which part of the explain analyse means what. I prefer the “High Level Architecture” description here: https://dev.mysql.com/worklog/task/?id=4168
      Which gives better informations.

      You can also use “explain format=JSON” which is part of the traditional explain and you can see the costs there as well.

      October 30, 2019 at 4:53 pm
      • Georgi Sotirov Reply

        Yes, understood, but work logs are more for architects/developers and the manual is for users, so you could nevertheless refer to the manual, which was already updated for EXPLAIN ANALYZE.

        Yes, as I wrote costs are printed in JSON and now TREE formats. For me “traditional” means the classic, tabular explain plan that is available since 3.2x versions from 20 or more years ago. JSON was introduced with 5.6.5 and costs were added into it with 5.7.2 several years ago. TREE format was introduced with 8.0.16 and row and cost information were added with 8.0.18 released earlier this month. JSON format provides more information than traditional, so I make the distinction between the two formats. Thus, please understand, that for me it was confusing reading that “traditional” provides cost information 🙂

        I suggest you also report your finding about SELECT count(*) queries in bugs.mysql.com, because this shouldn’t be the normal behavior and should be fixed. Although understandable, I do not think that the differences between EXPLAIN formats are user friendly (e.g. hash join is visible only in TREE format), because they’re confusing – you do not expect to get different information in different “formats” of the same command and I filed a bug for hash join already.

        October 31, 2019 at 3:01 am
  • Pavel Katiushyn Reply

    Thank you for the post. The overhead is really worrying. I hope it adds equally to each step it measures, otherwise, how can it be trusted?

    October 30, 2019 at 8:59 am
  • Ted Wennmark Reply

    Not sure about the “One of the most painful tasks is to analyze and optimize a Stored Procedure because MySQL does not log the individual queries from a Procedure and it’s hard to actually see what is going under the hood”. Since MySQL 5.7 you can trace SP, look at WL#5766 (https://dev.mysql.com/worklog/task/?id=5766), there is also a blog here on how to use this: http://mablomy.blogspot.com/2015/03/profiling-stored-procedures-in-mysql-57.html

    November 1, 2019 at 4:33 am

Leave a Reply