How to estimate query completion time in MySQL

Have you ever run a query in MySQL and wondered how long it’ll take to complete? Many people have had this experience. It’s not a big deal until the query has been running for an hour. Or a day and a half. Just when IS that query going to finish, anyway?

There are actually a few ways to estimate how long it’ll take for the query to complete, depending on what the query is. One of the simplest is to estimate how many rows the query needs to examine, measure how fast it’s working, and do the math.

As an example, I recently worked on a customer’s site where a typical data-warehousing query needed optimization. It was a fact table joined to two dimension tables — a classic star schema query. The fact table was very large, and after some tuning (I’ll write more about that later) I convinced MySQL to perform the query as a table scan of the fact table, then an index lookup in each dimension table in turn.

The table structures aren’t really important. All you need to know for this post is that the fact table has about 150 million rows and the query was taking over 10 minutes to complete. Actually, it had never completed at all, according to the customer. I wanted to know whether I’d be waiting for another minute, hours, or days.

The answer was simple, because there was nothing else running on the server. That means that SHOW GLOBAL STATUS gave a rough idea of what the query was actually doing. (If there had been a lot of activity on the server, I wouldn’t have been able to say with confidence that SHOW GLOBAL STATUS showed what that one query was doing; activity from other queries would have been mixed in there too. It would be great to be able to choose another thread and watch only its status, but MySQL doesn’t currently let you do that.)

The solution was to measure how fast the query was scanning rows in the table scan of the fact table. This is shown by the Handler_read_rnd_next status variable. Here’s an easy way to watch it (innotop is another handy way):

So the server was reading roughly 43K rows per second, and there were 150 million rows in the table. A little math later, and you get 3488 seconds to completion, or a little less than an hour. And indeed the query completed in about 55 minutes.

This is the simplest case, and there are more complicated ones to consider, but hopefully this gives you an idea how you can tackle this problem in different situations.

Share this post

Comments (17)

  • Manish Poddar

    Hi Baron,

    I was wondering if you were to take the value of Handler_read_rnd_next and divide it by number of total threads connected (ever?) will that give us a reasonable row reads per second? When I ran “mysqladmin extended -r -i 10 | grep Handler_read_rnd_next” I got a value of 1849842923.

    | Threads_connected | 9 |
    | Threads_created | 1042 |
    | Threads_running | 1 |
    | Uptime | 6180733 |

    so either I am getting too good row reads per second or I am misled by the data.

    What should I read into when the value of Handler_read_rnd_next is at 184M. We do have the slow query log enabled and we sometimes see the queries (that are properly indexed) show up in the slow logs.

    Any help in this matter will be greatly appreciated.

    April 22, 2008 at 10:14 pm
  • Baron Schwartz

    Hi Mannish,

    You should ignore the first set of outputs from the command. The second and subsequent outputs will contain incremental numbers per ten seconds.

    You can’t see from the status variables how many connections have ever been created, but you can log in and execute SELECT CONNECTION_ID() for a rough idea. This value may wrap around to zero, though.

    Sometimes queries show up in the slow logs because of other things that are happening, such as backups or a lock by another query.

    April 23, 2008 at 4:22 am
  • Mekin

    Is it possible to estimate this for

    – inserts, updates & deletes
    – alter table queries


    April 23, 2008 at 6:25 am
  • Baron Schwartz

    Yes, those are the “different situations” I mentioned. If you know how many rows will be affected, and you watch the Handler_* counters, you can see how much work has been done.

    April 23, 2008 at 6:48 am
  • David Holoboff


    Thank you for sharing this tool – it is very useful… 🙂

    I tested this with inserts, updates and deletes and it works very well. I am going to use this request on our job-related servers to estimate finish time. My estimated time after doing several tests on a 100 million row table was within 5 seconds of the actual time for selects, inserts, updates, and deletes (deletes, being the fastest, had a margin of error of no more than one second).

    David Holoboff

    April 23, 2008 at 7:25 am
  • peter


    Too bad this only works on unloaded server. It would be great to get performance counters for different thread in this case it would be much easier to see the progress. May be we should see if it is possible to implement such patch.

    April 23, 2008 at 8:38 pm
  • Baron Schwartz

    I had thought about this patch too. The KILL command is the only place I know where there might be sample code for identifying another thread. I haven’t looked at the code yet, but in theory it might not be that hard to do this.

    April 24, 2008 at 4:57 am
  • Mike

    Nice work man

    June 4, 2008 at 2:13 pm
  • Kailash Kumar P

    Great work Baron.

    August 4, 2008 at 6:59 am
  • Jason

    Very, very helpful, thanks for this great tip! -Jason

    March 17, 2010 at 11:15 am
  • MLBR

    Great tip!

    Any insight into what the “rnd_next” in “Handler_read_rnd_next” *translates* to?

    Most status variables are clear to me (e.g. prev/next/key), but this one escapes me.

    June 15, 2010 at 1:13 pm
  • Lucek


    Any news on implementation of per thread stats?

    August 12, 2010 at 10:07 am
  • Baron Schwartz

    No work on that yet. But you could discuss on our mailing list or file a feature request for percona-server on Launchpad. (I think there might already be such a feature request.) We actively look for people to sponsor features such as these, so if you are interested, contact our sales team to discuss.

    August 12, 2010 at 12:13 pm
  • chad ambrosius

    I know you said the customer reported the query taking over 10 minutes, but isn’t 429224 = 430K?? So a little math would put this calculation at closer to 6 minutes than 55.

    June 27, 2012 at 6:37 pm
  • Baron Schwartz

    The mysqladmin command prints out results every 10 seconds (-i 10). Your calculation would be correct if it were printing once per second.

    June 29, 2012 at 6:40 am
  • Gopal

    I have found another way to get the Estimate. (This is using the base data only for a single thread):

    1. Read the ‘undo log entries’ corresponding to given thread’ (using show engine innodb status), lets say – A
    One can use following command.
    mysql -h$server -uusername -p$password -e “show engine innodb status\G” | grep -n2 “$thread_id” | grep “undo log entries”

    2. Look into the Process List to find the ‘Current Execution’ time – B
    3. Now you can get the know ‘Number of rows’ per second using NPS = A / B
    4. Divide Estimated Time = Total Number of Rows /NPS

    July 21, 2015 at 9:23 am

Comments are closed.

Use Percona's Technical Forum to ask any follow-up questions on this blog topic.