Have you ever run a query in MySQL and wondered how long it’ll take to complete? Many people have had this experience. It’s not a big deal until the query has been running for an hour. Or a day and a half. Just when IS that query going to finish, anyway?
There are actually a few ways to estimate how long it’ll take for the query to complete, depending on what the query is. One of the simplest is to estimate how many rows the query needs to examine, measure how fast it’s working, and do the math.
As an example, I recently worked on a customer’s site where a typical data-warehousing query needed optimization. It was a fact table joined to two dimension tables — a classic star schema query. The fact table was very large, and after some tuning (I’ll write more about that later) I convinced MySQL to perform the query as a table scan of the fact table, then an index lookup in each dimension table in turn.
The table structures aren’t really important. All you need to know for this post is that the fact table has about 150 million rows and the query was taking over 10 minutes to complete. Actually, it had never completed at all, according to the customer. I wanted to know whether I’d be waiting for another minute, hours, or days.
The answer was simple, because there was nothing else running on the server. That means that SHOW GLOBAL STATUS gave a rough idea of what the query was actually doing. (If there had been a lot of activity on the server, I wouldn’t have been able to say with confidence that SHOW GLOBAL STATUS showed what that one query was doing; activity from other queries would have been mixed in there too. It would be great to be able to choose another thread and watch only its status, but MySQL doesn’t currently let you do that.)
The solution was to measure how fast the query was scanning rows in the table scan of the fact table. This is shown by the Handler_read_rnd_next status variable. Here’s an easy way to watch it (innotop is another handy way):
mysqladmin extended -r -i 10 | grep Handler_read_rnd_next
-- ignore the first line of output...
| Handler_read_rnd_next | 429224 |
So the server was reading roughly 43K rows per second, and there were 150 million rows in the table. A little math later, and you get 3488 seconds to completion, or a little less than an hour. And indeed the query completed in about 55 minutes.
This is the simplest case, and there are more complicated ones to consider, but hopefully this gives you an idea how you can tackle this problem in different situations.