October 2, 2014

Feature Idea: Finding columns which query needs to access

In query examinations it is often interesting which columns query needs to access to provide result set as it gives you ideas if you can use covering indexes to speed things up or even cache some data by denormalizing tables. So far it has to be done manually – look at SELECT clause, WHERE clause, ORDER BY GROUP BY and HAVING to sum it up, not to mention more complex questions of subselects.

It would be great tool for MySQL Performance Optimization if MySQL Server could show this table and I know it already has this data as it is used by the optimizer.

It would be also good if such tool would show mapping of WHERE clauses to the tables as well as other clauses such as GROUP BY and ORDER BY. For complex queries especially if they do not refer to the columns with table prefix it may take a while to figure our where these columns come from and so how they can be optimized.

About Peter Zaitsev

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.

Comments

  1. Xaprb says:

    That’s a great idea! I vote yes.

  2. Peter, re the mapping of WHERE/GROUP/ORDER BY clauses to the tables, doesn’t EXPLAIN EXTENDED do that when it produces a warning with rewritten query text? Yes, it’s not readable but it is there. So what is needed is to just write a pretty-printing tool?

  3. Re the used columns, where do you think this info could be shown? One more column in the output of “EXPLAIN WITH USED_COLUMN_LISTS SELECT …”, with list of used columns? That’s the easiest way, but I’m afraid EXPLAIN output will become human-unreadable. Another option I can think of is to make EXPLAIN WITH USED_COLUMN_LISTS produce a temporary table with (table_name, field_name, is_used) columns. Is that better?

  4. peter says:

    Thank you Sergey,

    Yes I guess it can be used together with some parser to provide some readable output. Being readable is important here as you already can manually dig such info it just takes a lot of effort.

    It would be also nice to be able to see which where clauses are resolved doing index lookups and which are post-filtering.

  5. I would very much like a pseudo table in the information_schema like this:

    information_schema.parse_tree(
    query_text text — query text for which the parse tree is given
    , id int unsigned — id for this node
    , parent_id int unsigned — id of the parent node (implements adjacency list)
    , position int unsigned — position of this node within the sibling nodes
    , lft int unsigned — nested set implementation
    , rgt int unsigned — nested set implementation
    , node_type enum(‘identifier’,’constant’,…) — reference to syntax production rule
    , node_text text — substring of query_Text that is parsed
    )

    THe usage would be like this:

    select *
    from information_schema.parse_tree
    where query_text = ‘select col1,col2 from aTable where col3 = ”bla”’
    ;

    What would happen is that the parse_tree table would parse the query_text expression, and generate rows for all the nodes in the parse tree.

    The nice thing about this table is that you could use for many more things, such as
    -discover dependencies between views and tables
    -discover dependencies between procedures and other procedures

    and…and..

    Anyway, what do you think?

  6. Extra note:
    People I have told about this idea usually first say: “Nah this ugly, does not make sense, why pass the statement as a string, you should have a SHOW command rather than this, etc.”

    Well, first: it makes as much sense as EXPLAIN. There too, you feed an expression, and you get a resultset back.
    second: you cannot join against a SHOW command. Joining is good, you can use it then to parse the code from existing VIEW and PROCEDURE objects via the corresponding tables in he information schema.
    third, it makes sense to pass it as a string. That way you can join to VIEW_DEFINITION and ROUTINE_DEFINITION etc columns in the information_schema VIEWS and ROUTINES columns

    More purposes: refactoring code (renaming identifiers etc) but then the parse_tree table would also need the exact string position info.

  7. peter says:

    Roland,

    I think it would be good, however I’m not sure passing query text is good idea. Might be EXPLAIN can simply feel these tables ready to use or something similar.

    The problem with such solution is – it will be unlikely implemented any time soon as it not that trivial.

    The other problem it may be a bit complicated to use – some people hate SHOW statements but they are much easier to use compared to information schema :)

  8. peter says:

    Sergey,

    You can run EXPLAIN … \G in which case number of columns is not the problem any more.

    Adding two columns will be great, first listing columns which are used from the table and second listing where/on clauses applying to this table and how they are resolved, something like

    SELECT count(*) FROM A,B WHERE A.ID=B.ID and A.C>5 and B.D>6;

    A:
    Used_Columns: ID,C
    Checks: C>5(range)

    B:
    Used_Columns: ID,D
    Checks: ID=A.ID(ref), D>6(row)

    This would allow to very easy to see which of the checks are not using index which are great candidates for optimization.

  9. Apachez says:

    Does it have to be an extension to EXPLAIN?

    A standalone script (say written in perl or so) would be for me more interresting. This way you could give the script various queries and it will spit out suggestions for indexes needed. Of course in the longer run it would be interresting to have it also available like EXPLAIN EXTENDED or something…

  10. peter says:

    To write a script you need infrastructure on MySQL Server side still.

    You can’t really write full query parser or reliably check which clauses are resolved which way.

    I do not really care if it is part of explain or available some other way.

  11. Hello, the MySQL parser has been yanked out and is now available as a separate Perl CPAN module, DBIx::MyParse. The current downloadable version has considerable limitations, however I am working on a new one that should parse any and all SELECTS that MySQL accepts. Once I get there, I will try to provide code that dumps the list of columns.

  12. peter says:

    Thanks Philip,

    This would be cool. You can’t get parser to be equivalent to MySQL parser in all cases as there are different versions and even different MySQL Server settings may change things. But it should be good enough for most cases.

  13. Andrisi says:

    Good idea. Does not have to be human-readable. Would be useful in many client applications, that do complex user-defined queries, or ones doing editing with auto-build forms.

  14. Jason says:

    So yeah this is really old but I stumbled upon it looking for the same thing and ended up building myself a php script to tease the info out of mysql. Not sure if this will work in every scenario but if the column isn’t a possible key you probably don’t need to index it anyway. So here it is, pretty self explanatory but I added comments in anyway. This will only work on a single table, so no joins, and it’s not prettied up but you can modify it for your needs.

    Tada! Works for me anyway.

  15. Jason says:

    wtf? Didn’t go through. Code tags here? Trying again.

    So yeah this is really old but I stumbled upon it looking for the same thing and ended up building myself a php script to tease the info out of mysql. Not sure if this will work in every scenario but if the column isn’t a possible key you probably don’t need to index it anyway. So here it is, pretty self explanatory but I added comments in anyway. This will only work on a single table, so no joins, and it’s not prettied up but you can modify it for your needs.
    [code]
    [/code]

    Tada! Works for me anyway.

  16. Jason says:

    Hah! Just getting ugly now. Last try:

    $con = mysql_connect(‘ip’, ‘user’, ‘pass’);
    mysql_select_db(“db”, $con);

    // Fill in your table name and the query you want to check.
    $table = “yourtable”;
    $checkquery= “yourquery”;

    // We should create a tiny database since it’s much faster and we don’t really need the data.
    $sql=”create table test like $table”;
    $result = mysql_query($sql);
    $sql=”insert into test select * from $table limit 1″;
    $result = mysql_query($sql);

    // Here we get all of the columns from the table and create an index on each one.
    $sql=”show columns from test”;
    $result = mysql_query($sql);
    while($row = mysql_fetch_array($result))
    {
    $field=$row['Field'];
    $sql2=”create index $field on test($field);”;
    $result2 = mysql_query($sql2);
    }

    // Now we run explain to see what columns (indexes) MySQL could possibly use. Here’s the trick that should get you a column list from the query.
    $sql=”explain $checkquery”;
    $result= mysql_query($sql);
    while($row = mysql_fetch_array($result))
    {
    echo “Columns: ” . $row['possible_keys'] . ” are used for this query”;
    }

    // Clean up the mess.
    $sql=”drop table test;”;
    $result = mysql_query($sql);

Speak Your Mind

*