MySQL EXPLAIN limits and errors.Peter Zaitsev
Running EXPLAIN for problematic queries is very powerful tool for MySQL Performance optimization. If you’ve been using this tool a lot you probably noticed it is not always provide adequate information. Here is list of things you may wish to watch out.
EXPLAIN can be wrong – this does not happen very often but it does. EXPLAIN may not be telling you the truth, or full truth about query execution. The reason is – MySQL does not really have the special “plan generating” step, instead plan is generated as a part of query execution, sometimes being dynamic based on data. With EXPLAIN MySQL kind of simulates execution but it obviously does not access the data so does not have access to this dynamic component. If you suspect EXPLAIN is lieing you you can use SHOW STATUS “Handler” statistics to see if number of operations match.
EXPLAIN works for SELECT only This is in works to be fixed in some of next version but it is so as of MySQL 5.0 The workaround is to modify statement to become SELECT with appropriate where clause. This however would not tell you full story, especially now as MySQL has triggers which can make updates very different. Also there is no way to tell you how ALTER table would be executed – will in add index live as it can now do on Cluster ? Is it expected to rebuild index by sort of key_cache as it can for MyISAM.
EXPLAIN may take long time In MySQL 5.0 with addition of greedy join the problem on looking at too many table join combinations in joins with very large tables is elevated, however there are still cases when it may take too long. Most practically important one is if Derived tables (selects in FROM clause) are used. MySQL will execute these subselects on explain to provide plan for top most statement, so if you made an error in select in from clause EXPLAIN might never complete. Solution is to run EXPLAIN for statements in FROM clause separately before looking at main query.
Estimated number of rows may be very inaccurate It is well known this number is aproximate but I guess not everyone knows how much wrong it could be. If number of rows is estimated performing btree dive with certainly structured tree you still can get number of rows off 3-10 times. It gets even worse if cardinality is used. In this case if data distribution is skewed and you statement looks at portion of data with different distribution results can be off by 10-100 times and even more.
Computing rows to be examined is more complicated It is frequent approach to take number of rows estimated from each row in a join and multiply them to see how many rows will be examined in total. This can be rather wrong. One thing to consider is number of rows analyzed for given table might not be the same as passed to the next step – there could be some non-index based filtering applied. The other reason for difference is join_buffer which reduces number of passes MySQL had to perform and so number of row combinations really examined.
LIMIT is not taken into account while estimating number of rows Even if you have LIMIT which restricts how many rows will be examined MySQL will still print full number. Here is example:
mysql> explain select * from lt limit 10;
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
| 1 | SIMPLE | lt | index | NULL | PRIMARY | 4 | NULL | 1600 | Using index |
1 row in set (0.00 sec)
This statement obviously will not scan more than 10 rows but EXPLAIN shows it will scan over 1000. This is not the problem if you examine your queries manually but if you have automatic tool which catches possibly complex queries it can give you head ache.
filesort, using temporary are always in the first line So it is impossible to tell at which stage MySQL actually perform the sort or creates temporary table and so how much rows will be stored to temporary table. Sometimes it is enough to perform sorting or use temporary table with partial result set (partial rows) and than compose full rows by reading data from other tables.
No information about using disk for sort or temporary tables MySQL always calls it “filesort” even if it is done in memory. Also it would not tell you if temporary table will be required on disk. In many cases it is impossible to tell as we do not know how much data we’ll need to sort or accumulate. This information is not available even if it is well known – for example BLOB fields can’t work with in memory temporary table. The other thing – since MySQL 4.1 sorting can be done with storing row data in the sort file, not only key values and row pointers. If this is going to be used in particular sort is not seen in explain.
Limited cost information Yes, number of rows to be scanned is shown but it does not tell much about cost of the operation – full table scan of table in memory may be done at speed of millions of rows per second, while retrieval of large rows from uncached table by random index values (or even range) may be perfoming at some 100 rows/sec.
No info about functions cost Since MySQL 5.0 you may use user functions in your query which may also perform queries themselves. These actions are totally hidden in EXPLAIN which may seriously affect the outcome. The cost of functions is also not used in optimization which may produce suboptimal plans if stored functions are used in queries.
In general in my opinion EXPLAIN needs serious overhaul so it can be used with GUI tool to provide tree like structure of opertions applied to the data, methods used, costs etc. Simple table like output format is good but with growing MySQL Server complexity it is getting impossible to present all relevant information this way, so we need at least an alternative view.
If you want to find more information about understanding EXPLAIN output, MySQL manual EXPLAIN page is good place to start.