Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

How to estimate query completion time in MySQL

April 23, 2008

Author

Baron Schwartz

Insight for DBAs

Share this Post:

Have you ever run a query in MySQL and wondered how long it’ll take to complete? Many people have had this experience. It’s not a big deal until the query has been running for an hour. Or a day and a half. Just when IS that query going to finish, anyway?

There are actually a few ways to estimate how long it’ll take for the query to complete, depending on what the query is. One of the simplest is to estimate how many rows the query needs to examine, measure how fast it’s working, and do the math.

As an example, I recently worked on a customer’s site where a typical data-warehousing query needed optimization. It was a fact table joined to two dimension tables — a classic star schema query. The fact table was very large, and after some tuning (I’ll write more about that later) I convinced MySQL to perform the query as a table scan of the fact table, then an index lookup in each dimension table in turn.

The table structures aren’t really important. All you need to know for this post is that the fact table has about 150 million rows and the query was taking over 10 minutes to complete. Actually, it had never completed at all, according to the customer. I wanted to know whether I’d be waiting for another minute, hours, or days.

The answer was simple, because there was nothing else running on the server. That means that SHOW GLOBAL STATUS gave a rough idea of what the query was actually doing. (If there had been a lot of activity on the server, I wouldn’t have been able to say with confidence that SHOW GLOBAL STATUS showed what that one query was doing; activity from other queries would have been mixed in there too. It would be great to be able to choose another thread and watch only its status, but MySQL doesn’t currently let you do that.)

The solution was to measure how fast the query was scanning rows in the table scan of the fact table. This is shown by the Handler_read_rnd_next status variable. Here’s an easy way to watch it (innotop is another handy way):

mysqladmin extended -r -i 10 | grep Handler_read_rnd_next
-- ignore the first line of output...
| Handler_read_rnd_next             | 429224      |

mysqladmin extended -r -i 10 | grep Handler_read_rnd_next

-- ignore the first line of output...

| Handler_read_rnd_next | 429224 |

So the server was reading roughly 43K rows per second, and there were 150 million rows in the table. A little math later, and you get 3488 seconds to completion, or a little less than an hour. And indeed the query completed in about 55 minutes.

This is the simplest case, and there are more complicated ones to consider, but hopefully this gives you an idea how you can tackle this problem in different situations.

0 0 votes

Article Rating

17 Comments

Oldest

Newest Most Voted

Manish Poddar

18 years ago

Hi Baron,

I was wondering if you were to take the value of Handler_read_rnd_next and divide it by number of total threads connected (ever?) will that give us a reasonable row reads per second? When I ran “mysqladmin extended -r -i 10 | grep Handler_read_rnd_next” I got a value of 1849842923.

| Threads_connected | 9 |
| Threads_created | 1042 |
| Threads_running | 1 |
| Uptime | 6180733 |

so either I am getting too good row reads per second or I am misled by the data.

What should I read into when the value of Handler_read_rnd_next is at 184M. We do have the slow query log enabled and we sometimes see the queries (that are properly indexed) show up in the slow logs.

Any help in this matter will be greatly appreciated.

Author

Baron Schwartz

18 years ago

Hi Mannish,

You should ignore the first set of outputs from the command. The second and subsequent outputs will contain incremental numbers per ten seconds.

You can’t see from the status variables how many connections have ever been created, but you can log in and execute SELECT CONNECTION_ID() for a rough idea. This value may wrap around to zero, though.

Sometimes queries show up in the slow logs because of other things that are happening, such as backups or a lock by another query.

Mekin

18 years ago

Is it possible to estimate this for

– inserts, updates & deletes
– alter table queries

Thanks,
mekin

Author

Baron Schwartz

18 years ago

Yes, those are the “different situations” I mentioned. If you know how many rows will be affected, and you watch the Handler_* counters, you can see how much work has been done.

David Holoboff

18 years ago

Baron,

Thank you for sharing this tool – it is very useful… 🙂

I tested this with inserts, updates and deletes and it works very well. I am going to use this request on our job-related servers to estimate finish time. My estimated time after doing several tests on a 100 million row table was within 5 seconds of the actual time for selects, inserts, updates, and deletes (deletes, being the fastest, had a margin of error of no more than one second).

Thanks,
David Holoboff

Admin

Peter Zaitsev

18 years ago

Baron,

Too bad this only works on unloaded server. It would be great to get performance counters for different thread in this case it would be much easier to see the progress. May be we should see if it is possible to implement such patch.

Author

Baron Schwartz

18 years ago

I had thought about this patch too. The KILL command is the only place I know where there might be sample code for identifying another thread. I haven’t looked at the code yet, but in theory it might not be that hard to do this.

Mike

18 years ago

Nice work man

Kailash Kumar P

17 years ago

Great work Baron.

Jason

16 years ago

Very, very helpful, thanks for this great tip! -Jason

MLBR

16 years ago

Great tip!

Any insight into what the “rnd_next” in “Handler_read_rnd_next” *translates* to?

Most status variables are clear to me (e.g. prev/next/key), but this one escapes me.

Author

Baron Schwartz

16 years ago

See http://www.mysqlperformanceblog.com/2010/06/15/what-does-handler_read_rnd-mean/

Lucek

15 years ago

Baron,

Any news on implementation of per thread stats?

Author

Baron Schwartz

15 years ago

No work on that yet. But you could discuss on our mailing list or file a feature request for percona-server on Launchpad. (I think there might already be such a feature request.) We actively look for people to sponsor features such as these, so if you are interested, contact our sales team to discuss.

chad ambrosius

14 years ago

I know you said the customer reported the query taking over 10 minutes, but isn’t 429224 = 430K?? So a little math would put this calculation at closer to 6 minutes than 55.

Author

Baron Schwartz

14 years ago

The mysqladmin command prints out results every 10 seconds (-i 10). Your calculation would be correct if it were printing once per second.

Gopal

10 years ago

I have found another way to get the Estimate. (This is using the base data only for a single thread):

1. Read the ‘undo log entries’ corresponding to given thread’ (using show engine innodb status), lets say – A
One can use following command.
mysql -h$server -uusername -p$password -e “show engine innodb statusG” | grep -n2 “$thread_id” | grep “undo log entries”

2. Look into the Process List to find the ‘Current Execution’ time – B
3. Now you can get the know ‘Number of rows’ per second using NPS = A / B
4. Divide Estimated Time = Total Number of Rows /NPS