Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

Using Explain Analyze in MySQL 8

October 28, 2019

Author

Tibor Korocz

MySQL

Share this Post:

Explain Analyze in MySQL In MySQL 8.0.18 there is a new feature called Explain Analyze when for many years we mostly had only the traditional Explain. I know there are different formats, but those based on the same information just show it in a different format with some extra details.

But Explain Analyze is a different concept. It is actually going to run the query and measure execution time by using the new iterator executor for each step. That topic itself deserves its own blog post on how the new iterator executor works, and I will write a post about that as well. But if you cannot wait and you would like to read up, here are some links to some additional information: Iterator executor analytics queries, Volcano iterator design, and Volcano iterator semijoin.

In this post, we are going to focus on what Explain Analyze can give us.

As always let’s start with some testing. I have my test server with a sbtest1 table:

CREATE TABLE `sbtest1` (
`id` int(11) NOT NULL,
`k` int(11) NOT NULL DEFAULT '0',
`c` char(120) NOT NULL DEFAULT '',
`pad` char(60) NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
KEY `idx3` (`k`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

CREATE TABLE `sbtest1` (

`id` int(11) NOT NULL,

`k` int(11) NOT NULL DEFAULT '0',

`c` char(120) NOT NULL DEFAULT '',

`pad` char(60) NOT NULL DEFAULT '',

PRIMARY KEY (`id`),

KEY `idx3` (`k`)

) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

There are almost 1 million rows in it:

mysql> select count(*) from sbtest1;
+----------+
| count(*) |
+----------+
| 999999 |
+----------+
1 row in set (0.32 sec)

mysql> select count(*) from sbtest1;

+----------+

| count(*) |

+----------+

| 999999 |

+----------+

1 row in set (0.32 sec)

Traditional Explain

mysql> explain select count(*) from sbtest1 where k > 500000G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: sbtest1
partitions: NULL
type: range
possible_keys: idx3
key: idx3
key_len: 4
ref: NULL
rows: 493204
filtered: 100.00
Extra: Using where; Using index
1 row in set, 1 warning (0.00 sec)

mysql> explain select count(*) from sbtest1 where k > 500000G

*************************** 1. row ***************************

id: 1

select_type: SIMPLE

table: sbtest1

partitions: NULL

type: range

possible_keys: idx3

key: idx3

key_len: 4

ref: NULL

rows: 493204

filtered: 100.00

Extra: Using where; Using index

1 row in set, 1 warning (0.00 sec)

Hopefully, most of you already know this output; we can see which table and index will be used and we can also see it is a range query and approximately it has to read 493204 Rows. With InnoDB we know that is just an estimation, it is not the real number. In the past, we had two options: either run the query and see the real number or run the query and check the handler statistics to get even more detailed information.

Now I am going to run the query and see the real row count:

mysql> select count(*) from sbtest1 where k > 500000G
*************************** 1. row ***************************
count(*): 625262
1 row in set (0.10 sec)

mysql> select count(*) from sbtest1 where k > 500000G

*************************** 1. row ***************************

count(*): 625262

1 row in set (0.10 sec)

So the query has to read 625262 rows and takes 0.10s. Let’s have a look at Explain Analyze.

Explain Analyze

mysql> explain analyze select count(*) from sbtest1 where k > 500000G
*************************** 1. row ***************************
EXPLAIN: -> Aggregate: count(0) (actual time=178.225..178.225 rows=1 loops=1)
-> Filter: (sbtest1.k > 500000) (cost=98896.53 rows=493204) (actual time=0.022..147.502 rows=625262 loops=1)
-> Index range scan on sbtest1 using idx3 (cost=98896.53 rows=493204) (actual time=0.021..96.488 rows=625262 loops=1)

1 row in set (0.18 sec)

mysql> explain analyze select count(*) from sbtest1 where k > 500000G

*************************** 1. row ***************************

EXPLAIN: -> Aggregate: count(0) (actual time=178.225..178.225 rows=1 loops=1)

-> Filter: (sbtest1.k > 500000) (cost=98896.53 rows=493204) (actual time=0.022..147.502 rows=625262 loops=1)

-> Index range scan on sbtest1 using idx3 (cost=98896.53 rows=493204) (actual time=0.021..96.488 rows=625262 loops=1)

1 row in set (0.18 sec)

It always uses the tree format and it took me some time to actually understand what all this information means. Unfortunately, the manual page does not really explain it.

Let me try to fill the gaps:

Index range scan on sbtest1 using idx3 – this part is quite trivial, it says it is going to use a range scan on sbtest1 table and will use the idx3 index.

cost=98896.53 rows=493204 – this is the cost and the same row number that traditional Explain gives us.

actual time=0.021..96.488 rows=625262 loops=1:

- 0.021 – The time to return first row (Init + first Read) in milliseconds.

- 96.488 – The time to return all rows (Init + all Read calls) in milliseconds.

- rows=625262 – The number of rows returned by this iterator. Finally, it is the exact number.

- loops=1 – The number of loops (number of Init calls).

The source for this information is here: Implement EXPLAIN ANALYZE.

We can see all this information with each step, which gives us great help and insights for query optimization.

But it is important to notice in the last step that the time is 178ms and it also reports 1 row in set (0.18 sec). So it says this query takes 0.18s! But the original query took only 0.10s.

The manual says:

This has naturally some overhead, but it seems this is only around 5–6% for an instrumented query
(entirely free for a non-instrumented query).

But with this very simple query, the overhead is already 80%, so we cannot really trust those numbers.

Let’s have a look on another query which takes more time, and see how this difference behaves:

mysql> select count(*) from sbtest1 t1 left join sbtest1 t2 on t1.k=t2.k;
+----------+
| count(*) |
+----------+
| 77902613 |
+----------+
1 row in set (15.42 sec)

mysql> select count(*) from sbtest1 t1 left join sbtest1 t2 on t1.k=t2.k;

+----------+

| count(*) |

+----------+

| 77902613 |

+----------+

1 row in set (15.42 sec)

mysql> explain analyze select count(*) from sbtest1 t1 left join sbtest1 t2 on t1.k=t2.kG
*************************** 1. row ***************************
EXPLAIN: -> Aggregate: count(0) (actual time=26501.232..26501.233 rows=1 loops=1)
-> Nested loop left join (cost=810799.35 rows=4550475) (actual time=0.036..22572.530 rows=77902616 loops=1)
-> Index scan on t1 using idx3 (cost=108280.56 rows=986408) (actual time=0.026..176.645 rows=999999 loops=1)
-> Index lookup on t2 using idx3 (k=t1.k) (cost=0.25 rows=5) (actual time=0.001..0.017 rows=78 loops=999999)

1 row in set (26.50 sec)

mysql> explain analyze select count(*) from sbtest1 t1 left join sbtest1 t2 on t1.k=t2.kG

*************************** 1. row ***************************

EXPLAIN: -> Aggregate: count(0) (actual time=26501.232..26501.233 rows=1 loops=1)

-> Nested loop left join (cost=810799.35 rows=4550475) (actual time=0.036..22572.530 rows=77902616 loops=1)

-> Index scan on t1 using idx3 (cost=108280.56 rows=986408) (actual time=0.026..176.645 rows=999999 loops=1)

-> Index lookup on t2 using idx3 (k=t1.k) (cost=0.25 rows=5) (actual time=0.001..0.017 rows=78 loops=999999)

1 row in set (26.50 sec)

It reports that the query is 60% slower than in real life.

Just out of curiosity I tried Explain Analyze with a Stored Procedure, but it didn’t work. One of the most painful tasks is to analyze and optimize a Stored Procedure because MySQL does not log the individual queries from a Procedure and it’s hard to actually see what is going under the hood. (In Percona Server there are log_slow_sp_statements which will allow you to log the individual queries into a slow query log.)

I would love to see if Explain Analyze would run the procedure and then display the actual execution plan in a tree format. But maybe I am believing in a perfect world and I am too naive.

I also noticed some interesting behavior:

mysql> explain analyze select count(*) from sbtest1G
*************************** 1. row ***************************
EXPLAIN: -> Count rows in sbtest1

1 row in set (0.13 sec)

mysql> explain analyze select count(*) from sbtest1G

*************************** 1. row ***************************

EXPLAIN: -> Count rows in sbtest1

1 row in set (0.13 sec)

If there isn’t a where condition, or group by, or anything else and just a simple count query, it does not give us the time for that step or how many rows were read or which index was used. I think here its a bit too minimalistic and some basic information would be good to see.

Conclusion

I think Explain Analyze is a great step in the right direction, and I am excited to see if in the future it gets even more features, but it should report more realistic times for execution time.

Get Percona Support for PostgreSQL

Percona supports DBAs and developers seeking help with their PostgreSQL databases.

Percona Support for PostgreSQL

Additional Percona support and resources include:

Percona Distribution for PostgreSQL Software

Percona Platform

Percona Managed Services

Percona Professional Services

0 0 votes

Article Rating

11 Comments

Oldest

Newest Most Voted

dbdemon

6 years ago

Looks like a useful feature. Your CREATE TABLE statement for the sbtest1 table is actually showing a different table (t1).

Author

Tibor Korocz

6 years ago

Reply to dbdemon

Thanks, we fixed it.

6 years ago

Just want to know that what is the meaning of cost value?

Georgi Sotirov

6 years ago

Nice article! A few notes from me:

> Unfortunately, the manual page does not really explain it.

Yes, it does – see https://dev.mysql.com/doc/refman/8.0/en/explain.html#explain-analyze

> … this is the cost and the same row number that traditional Explain gives us

Traditional explain plan (if you mean EXPLAIN FORMAT=TRADITIONAL) doesn’t provide any information about costs (see https://dev.mysql.com/doc/refman/8.0/en/explain-output.html). Costs are visible only in TREE format (since 8.0.18) and JSON (used in MySQL Workbench for visual explain plans).

> …just a simple count query, it does not give us the time for that step or how many rows were read or which index was used…

That’s really an interesting observation that only confirms the current differences between the EXPLAIN formats, because TRADITIONAL and JSON would show rows and indexes for such a query if table is InnoDB. Also HASH joins are visible only in TREE format (and EXPLAIN ANALYZE of course). Currently some queries, which don’t use the iterator executor, won’t be explained and ran by EXPLAIN ANALYZE at all and you’ll get only “not executable by iterator executor” 🙂

Author

Tibor Korocz

6 years ago

Reply to Georgi Sotirov

Hi, thanks for the comments.

Yes you can find some information on that page but for me it was still hard to understand at first which part of the explain analyse means what. I prefer the “High Level Architecture” description here: https://dev.mysql.com/worklog/task/?id=4168
Which gives better informations.

You can also use “explain format=JSON” which is part of the traditional explain and you can see the costs there as well.

Pavel Katiushyn

6 years ago

Thank you for the post. The overhead is really worrying. I hope it adds equally to each step it measures, otherwise, how can it be trusted?

Ted Wennmark

6 years ago

Not sure about the “One of the most painful tasks is to analyze and optimize a Stored Procedure because MySQL does not log the individual queries from a Procedure and it’s hard to actually see what is going under the hood”. Since MySQL 5.7 you can trace SP, look at WL#5766 (https://dev.mysql.com/worklog/task/?id=5766), there is also a blog here on how to use this: http://mablomy.blogspot.com/2015/03/profiling-stored-procedures-in-mysql-57.html

lefred

6 years ago

Hi Tibi

Select count() is a special case which is done in parallel and there’s only one single handler call. What useful information would you expect ?

Also before stating that there is a 80% overhead in general using explain analyze, don’t forget that the simpler the query is, the more overhead there will be.

Additionally, some Linux platform are currently having larger overhead due to libstdc++. Maybe it’s your case, which Linux are you suing ?

Georgi Sotirov

6 years ago

Reply to lefred

The bug about SELECT count(*) not being included in EXPLAIN ANALYZE (see https://bugs.mysql.com/bug.php?id=97296) was already fixed in 8.0.19 that was released last month.