MySQL ORDER BY LIMIT Performance Optimization

Suboptimal MySQL ORDER BY implementation, especially together with LIMIT is often the cause of MySQL performance problems. Here is what you need to know about MySQL ORDER BY LIMIT optimization to avoid these problems.

MySQL ORDER BY with LIMIT is the most common use of ORDER BY in interactive applications with large data sets being sorted. On many web sites, you will find top tags, recently registered users, etc – which would often require ORDER BY with LIMIT in the back end. In general this type of ORDER BY looks like SELECT ….. WHERE [conditions] ORDER BY [sort] LIMIT N,M

Make sure it uses index It is very important to have ORDER BY with LIMIT executed without scanning and sorting full result set, so it is important for it to use index – in this case, index range scan will be started and query execution stopped as soon as required amount of rows generated.

For example, if I do SELECT * FROM sites ORDER BY date_created DESC LIMIT 10; I would use index on (date_created) to get a result set very fast.

Now what if I have something like SELECT * FROM sites WHERE category_id=5 ORDER BY date_created DESC LIMIT 10;

In this case index by date_created may also work but it might not be the most efficient – If it is a rare category large portion of table may be scanned to find 10 rows. So index on (category_id, date_created) will be a better idea.

Let’s take a look at a bit more complex case: SELECT * FROM sites WHERE category_id in (5,10,12) ORDER BY date_created DESC LIMIT 10;

Even though it looks quite similar to the previous one it is a lot different as there are multiple category_id values in the list now so index on (category_id, date_created) can’t be used directly. Index on date_created separately would still work. The good from performance standpoint (even though a bit ugly) will be UNION workaround I already wrote about.

So what if you have an application which can perform a search on many different columns, with worse then perfect selectivity. Various social networking and dating sites are perfect examples of such queries

SELECT FROM people where gender=’m’ and age between 18 and 28 and country_id=5 and city_id=345 order by last_online desc limit 10;

There could be many possible limiting factors with all of them being optional. This is a hard nut to crack and I know on high-end custom search solutions can be developed, but if we stick to simple MySQL using multiple indexes on most selective columns would be a good idea for the performance of such queries.

For example, you may put index on(gender,last_online) assuming most people will have gender specified, as well as (country_id,city_id,last_online) assuming in most cases these will be specified. It takes a good look at queries actually being run and data selectivity to come up with a good set of indexes for such case, it also may need to be adjusted in the future.

The main thing to watch for, if you do not have full where clause resolved by index is how many rows you need to scan to resolve order by (this can be found in slow query log or by examining Hander statistics). If only 50 rows are examined to provide
10 rows of result set you’re in decent shape but if it is 5000 you might need to rethink your indexing.

Also, note – number of records scanned to provide result set will be very dynamic based on particular constant and other factors.
For example for our dating example if we use only (last_online) index and look for people from the USA we likely will find 10 people pretty quickly, if the country is small or simply there are few members from the country, ie Slovenia – the same kind of search might need to scan 1000s times more rows to provide result set.

In the example above we did order by last column, in fact, the index can be used for ORDER BY if sorting is done by leading column(s). Note however columns following column used for order by can’t be used to restrict result set. For example:

key(a,b,c) SELECT * FROM tbl WHERE c=5 ORDER BY a,b limit 10 – In this case first two columns from the index can be used to satisfy order by, index, however, will not be helpful to check c=5 (unless it is index covered query). Index on
(c,a,b) would work better for the query above.

Do not sort by expressions I guess this one is obvious – expressions or functions will block index usage for order by.

Sort by column in leading table if you have JOIN with ORDER BY … LIMIT you should try hard to have sorting column(s) to be in the leading table. If ORDER BY is going by field from the table which is not first in the join order index can’t be used. Sometimes it means breaking normalization and duplicating column(s) you’re going to use in ORDER BY in other table.

Here is an example when ORDER BY is done by the second table which requires filesort:

However, if the first table has “const” or “system” access type it is effectively removed from join execution (replaced with constants) and so ORDER BY can be optimized even if it is done by the second table:

The difference between these cases is “i” is the primary key while “k” is simply indexed column.

Note: In some cases even if it is possible to use index to do ORDER BY with JOIN MySQL still will not be able to use it as Optimizer is not smart enough yet to detect such cases:

In this case, there is index (k,j) on the table so indexes could be used on each of the tables to optimize order by, or at least local sort could be used for each t.k=const value for the second table. Which is not done, however.

Sort in one direction. If you have ORDER BY col1, col2 it can be optimized using index, if you have
ORDER BY col1 DESC, col2 DESC same thing, however if you would have ORDER BY col1, col2 DESC MySQL will have to use filesort. Classic for solution for this would be to have index which is sorted appropriately (ascending by col1 and descending by col2) but MySQL can’t do it at this point. Workaround which can be currently used is separate column which holds reverse values, so you can do ORDER BY col1, col2_reverse instead.

Beware of large LIMIT Using index to sort is efficient if you need first few rows, even if some extra filtering takes place so you need to scan more rows by index then requested by LIMIT. However if you’re dealing with LIMIT query with large offset efficiency will suffer. LIMIT 1000,10 is likely to be way slower than LIMIT 0,10. It is true most users will not go further than 10 page in results, however Search Engine Bots may very well do so. I’ve seen bots looking at 200+ page in my projects. Also for many web sites failing to take care of this provides very easy task to launch a DOS attack – request page with some large number from few connections and it is enough. If you do not do anything else make sure you block requests with too large page numbers.

For some cases, for example if results are static it may make sense to precompute results so you can query them for positions.
So instead of query with LIMIT 1000,10 you will have WHERE position between 1000 and 1009 which has same efficiency for any position (as long as it is indexed)

Force index if needed In some cases MySQL Optimizer may prefer to use different index, which has better selectivity or just better estimates instead of which allows you to do the sort. For example if you would have indexes on (country_id,city_id) and index on (country_id,last_online) for query SELECT * FROM people WHERE country_id=5 and city_id=6 order by last_online desc limit 10; first index will be likely selected even if it leads to filesort.

The solution for this problem is either extending your indexes so MySQL Optimizer does not have to chose between better sort or better lookup or use FORCE INDEX to force it to use appropriate index.

Many of the tips I’ve mentioned here work for MySQL ORDER BY without LIMIT as well but there are some differences. I should write another article about ORDER BY without limit and large tables soon.

For queries like “SELECT … WHERE [conditions] ORDER BY [sort] LIMIT N“, the optimizer may choose the index to resolve ORDER BY instead of using an index on the column(s) in WHERE clause. There was a bug that was fixed in MySQL 5.7.

For queries that combine ORDER BY with LIMIT, the optimizer may switch to an index that applies to the ORDER BY. In some cases, the decision to switch was based on a heuristic rather than on cost. The optimizer now uniformly makes the decision whether to switch on a cost basis. This should result in better performance when switching would cause a query to read an entire index or a large part of it to find qualifying rows.
References: See also: Bug #78993, Bug #22108385, Bug #73837, Bug #19579507, Bug #16522053.

MySQL 8.0 introduced descending indexes which store the key values of an index in descending order. A descending index can be scanned in forward order, which is more efficient. If a query mixes ASC and DESC, the optimizer can use an index on the columns if the index also uses corresponding mixed ascending and descending columns:

SELECT * FROM test
ORDER BY k DESC, j ASC;

The optimizer can use an index on (k, j) if k is descending and j is ascending. It can also use an index on those columns (with a backward scan) if k is ascending and j is descending.

If multiple rows have the same value in the ORDER BY columns, the server is free to return those rows in any order. If it is important to ensure the same row order with and without LIMIT, include additional columns in the ORDER BY clause to make the order deterministic.

Some free resources that you might find useful

Webinars

Blog Posts

White Papers & eBooks

Learn more about Percona Server for MySQL

Share this post