Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

Process MySQL LIMIT & ORDER BY for Performance Optimization

June 1, 2023

Author

Peter Zaitsev

Insight for Developers

MySQL

Share this Post:

Updated June 1, 2023.

Suboptimal MySQL ORDER BY implementation, especially together with MySQL LIMIT is often the cause of MySQL performance problems. Here is what you need to know about MySQL ORDER BY LIMIT optimization to avoid these problems.

Try Now: Free your applications with Percona Distribution for MySQL

MySQL LIMIT clause

The MySQL LIMIT clause is a valuable tool for controlling the number of rows returned by a SELECT statement. By specifying the maximum number of rows to retrieve from the result set, it enables you to work with subsets of data, especially in situations involving large tables. This feature enhances query performance and optimizes resource usage by fetching only the necessary rows.

Syntax of the MySQL LIMIT clause

The LIMIT clause in MySQL accepts one or two arguments: offset and count. Both parameters should be non-negative integers.

The offset parameter indicates the position of the first row to be returned from the result set, determining the number of rows to skip before returning the rows.

The count parameter specifies the maximum number of rows to be retrieved from the result set, setting a limit on the number of rows to be returned.

When using two parameters, the first parameter represents the offset, and the second parameter represents the count. This lets you retrieve a specific range of rows from the result set. But when using only one parameter, it signifies the number of rows to be returned from the beginning of the result set.

The basic syntax of the LIMIT clause is as below:

SELECT column1, column2, ...
FROM table_name
LIMIT offset, count;

SELECT column1, column2, ...

FROM table_name

LIMIT offset, count;

How to use the ORDER BY and LIMIT clauses in a query

MySQL ORDER BY with LIMIT is the most common use of ORDER BY in interactive applications with large data sets being sorted. On many websites, you will find top tags, recently registered users, etc., which often require ORDER BY with LIMIT in the back end. In general, this type of ORDER BY looks like SELECT ….. WHERE [conditions] ORDER BY [sort] LIMIT N, M.

Make sure it uses index. It is very important to have ORDER BY with LIMIT executed without scanning and sorting the full result set, so it is important for it to use index – in this case, index range scan will be started, and query execution stopped as soon as the required amount of rows generated.

MySQL LIMIT clause examples

Below we take a look at several examples of how to use the MySQL LIMIT clause to retrieve specific results, including date created, category IDs, and running queries on multiple columns.

Using MySQL LIMIT 10 to find result sets by ‘date_created’ and ‘category_id”

For example, if I do SELECT * FROM sites ORDER BY date_created DESC LIMIT 10, I would use index on (date_created) to get a result set very fast.

Now what if I have something like SELECT * FROM sites WHERE category_id=5 ORDER BY date_created DESC LIMIT 10;

In this case index by date_created may also work but it might not be the most efficient – If it is a rare category large portion of table may be scanned to find 10 rows. So index on (category_id, date_created) will be a better idea.

Let’s take a look at a bit more complex case: SELECT * FROM sites WHERE category_id in (5,10,12) ORDER BY date_created DESC LIMIT 10;

Even though it looks quite similar to the previous one, it is a lot different as there are multiple category_id values in the list now, so index on (category_id, date_created) can’t be used directly. Index on date_created separately would still work. The good from a performance standpoint (even though a bit ugly) will be UNION workaround I already wrote about.

Using MySQL LIMIT for queries on multiple columns

So what if you have an application which can perform a search on many different columns with worse-than-perfect selectivity? Various social networking and dating sites are perfect examples of such queries.

SELECT FROM people where gender=’m’ and age between 18 and 28 and country_id=5 and city_id=345 order by last_online desc limit 10;

There could be many possible limiting factors, with all of them being optional. This is a hard nut to crack, and I know on high-end custom search solutions can be developed, but if we stick to simple MySQL, using multiple indexes on most selective columns would be a good idea for the performance of such queries.

For example, you may put index on(gender,last_online), assuming most people will have gender specified, as well as (country_id,city_id,last_online), assuming in most cases these will be specified. It takes a good look at queries actually being run and data selectivity to come up with a good set of indexes for such cases, it also may need to be adjusted in the future.

The main thing to watch for if you do not have a full WHERE clause resolved by index is how many rows you need to scan to resolve order by (this can be found in a slow query log or by examining Hander statistics). If only 50 rows are examined to provide 10 rows of a result set, you’re in decent shape. But if it is 5,000, you might need to rethink your indexing.

Also, note – the number of records scanned to provide a result set will be very dynamic based on particular constant and other factors.

For example, for our dating example, if we use only (last_online) index and look for people from the USA, we likely will find ten people pretty quickly if the country is small or simply there are few members from the country, i.e., Slovenia – the same kind of search might need to scan 1,000s of times more rows to provide a result set.

In the example above, we did order by last column. In fact, the index can be used for ORDER BY if sorting is done by leading column(s). Note, however, columns following column used for order by can’t be used to restrict the result set. For example:

key(a,b,c) SELECT * FROM tbl WHERE c=5 ORDER BY a,b limit 10 – In this case, first two columns from the index can be used to satisfy order by, index, however, will not be helpful to check c=5 (unless it is index covered query). Index on (c,a,b) would work better for the query above.

Related Content: Need a drop-in replacement for any MySQL database? Check out Percona Server for MySQL

Best practices for using the MySQL limit clause

Do not sort by expressions

I guess this one is obvious – expressions or functions will block index usage for order by.

Sort by column in leading table

If you have JOIN with ORDER BY … LIMIT you should try hard to have sorting column(s) to be in the leading table. If ORDER BY is going by field from the table, which is not first in the join order index can’t be used. Sometimes it means breaking normalization and duplicating column(s) you’re going to use in ORDER BY in other tables.

Here is an example when ORDER BY is done by the second table, which requires filesort:

mysql> explain select test.i from test, test t where test.k=5 and test.i=t.k order by t.k limit 5;
+----+-------------+-------+------+---------------+------+---------+-------------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref         | rows | Extra                           |
+----+-------------+-------+------+---------------+------+---------+-------------+------+---------------------------------+
|  1 | SIMPLE      | test  | ref  | PRIMARY,k     | k    | 4       | const       |    1 | Using temporary; Using filesort |
|  1 | SIMPLE      | t     | ref  | k             | k    | 4       | test.test.i |    1 | Using where; Using index        |
+----+-------------+-------+------+---------------+------+---------+-------------+------+---------------------------------+
2 rows in set (0.00 sec)

mysql> explain select test.i from test, test t where test.k=5 and test.i=t.k order by t.k limit 5;

+----+-------------+-------+------+---------------+------+---------+-------------+------+---------------------------------+

+----+-------------+-------+------+---------------+------+---------+-------------+------+---------------------------------+

| 1 | SIMPLE | test | ref | PRIMARY,k | k | 4 | const | 1 | Using temporary; Using filesort |

| 1 | SIMPLE | t | ref | k | k | 4 | test.test.i | 1 | Using where; Using index |

+----+-------------+-------+------+---------------+------+---------+-------------+------+---------------------------------+

2 rows in set (0.00 sec)

However, if the first table has “const” or “system” access type it is effectively removed from join execution (replaced with constants) and so ORDER BY can be optimized even if it is done by the second table:

mysql> explain select test.i from test, test t where test.i=5 and test.k=t.k order by t.k limit 5;
+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref   | rows | Extra       |
+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------------+
|  1 | SIMPLE      | test  | const | PRIMARY,k     | PRIMARY | 4       | const |    1 |             |
|  1 | SIMPLE      | t     | ref   | k             | k       | 4       | const |    1 | Using index |
+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------------+
2 rows in set (0.01 sec)

mysql> explain select test.i from test, test t where test.i=5 and test.k=t.k order by t.k limit 5;

+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------------+

+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------------+

| 1 | SIMPLE | t | ref | k | k | 4 | const | 1 | Using index |

+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------------+

2 rows in set (0.01 sec)

The difference between these cases is “i” is the primary key while “k” is simply an indexed column.

Note: In some cases, even if it is possible to use index to do ORDER BY with JOIN, MySQL still will not be able to use it as Optimizer is not smart enough yet to detect such cases:

mysql> explain select test.i from test, test t where test.k=5 and test.i=t.k order by test.k,t.j limit 5;
+----+-------------+-------+------+---------------+------+---------+-------------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref         | rows | Extra                           |
+----+-------------+-------+------+---------------+------+---------+-------------+------+---------------------------------+
|  1 | SIMPLE      | test  | ref  | PRIMARY,k     | k    | 4       | const       |    1 | Using temporary; Using filesort |
|  1 | SIMPLE      | t     | ref  | k             | k    | 4       | test.test.i |    1 | Using where; Using index        |
+----+-------------+-------+------+---------------+------+---------+-------------+------+---------------------------------+
2 rows in set (0.00 sec)

mysql> explain select test.i from test, test t where test.k=5 and test.i=t.k order by test.k,t.j limit 5;

+----+-------------+-------+------+---------------+------+---------+-------------+------+---------------------------------+

+----+-------------+-------+------+---------------+------+---------+-------------+------+---------------------------------+

| 1 | SIMPLE | test | ref | PRIMARY,k | k | 4 | const | 1 | Using temporary; Using filesort |

| 1 | SIMPLE | t | ref | k | k | 4 | test.test.i | 1 | Using where; Using index |

+----+-------------+-------+------+---------------+------+---------+-------------+------+---------------------------------+

2 rows in set (0.00 sec)

In this case, there is index (k,j) on the table, so indexes could be used on each of the tables to optimize order by, or at least local sort could be used for each t.k=const value for the second table. Which is not done, however.

Sort in one direction

If you have ORDER BY col1, col2, it can be optimized using index. If you have ORDER BY col1 DESC, col2 DESC same thing, however, if you would have ORDER BY col1, col2 DESC MySQL will have to use filesort. A classic solution for this would be to have index which is sorted appropriately (ascending by col1 and descending by col2), but MySQL can’t do it at this point. Workaround which can be currently used is separate column which holds reverse values so that you can do ORDER BY col1, col2_reverse instead.

Beware of large LIMIT

Using index to sort is efficient if you need first few rows, even if some extra filtering takes place, so you need to scan more rows by index than requested by LIMIT. However, if you’re dealing with LIMIT query with large offset, efficiency will suffer. LIMIT 1000,10 is likely to be way slower than LIMIT 0,10. It is true most users will not go further than 10 pages in results. However, Search Engine Bots may very well do so. I’ve seen bots looking at 200+ pages in my projects. Also, many websites fail to take care of this, providing a very easy task to launch a DOS attack – a request page with a large number from few connections, and it is enough. If you do not do anything else, make sure you block requests with too large page numbers.

For some cases, for example, if results are static, it may make sense to precompute results so you can query them for positions. So instead of query with LIMIT 1000,10 you will have WHERE position between 1000 and 1009, which has the same efficiency for any position (as long as it is indexed)

Force index if needed

In some cases, MySQL Optimizer may prefer to use a different index, which has better selectivity or just better estimates, instead of which allows you to do the sort. For example, if you would have indexes on (country_id,city_id) and index on (country_id,last_online) for query SELECT * FROM people WHERE country_id=5 and city_id=6 order by last_online desc limit 10, the first index will likely be selected even if it leads to filesort.

The solution for this problem is either extending your indexes so MySQL Optimizer does not have to choose between better sort or better lookup or using FORCE INDEX to force it to use the appropriate index.

Many of the tips I’ve mentioned here work for MySQL ORDER BY without LIMIT as well, but there are some differences. I should write another article about ORDER BY without limit and large tables soon.

For queries like “SELECT … WHERE [conditions] ORDER BY [sort] LIMIT N“, the optimizer may choose the index to resolve ORDER BY instead of using an index on the column(s) in WHERE clause. There was a bug that was fixed in MySQL 5.7.

Bug: https://bugs.mysql.com/bug.php?id=73837

From: https://dev.mysql.com/doc/relnotes/mysql/5.7/en/news-5-7-6.html

For queries that combine ORDER BY with LIMIT, the optimizer may switch to an index that applies to the ORDER BY. In some cases, the decision to switch was based on a heuristic rather than on cost. The optimizer now uniformly makes the decision whether to switch on a cost basis. This should result in better performance when switching would cause a query to read an entire index or a large part of it to find qualifying rows.
References: See also: Bug #78993, Bug #22108385, Bug #73837, Bug #19579507, Bug #16522053.

Using descending index

MySQL 8.0 introduced descending indexes which store the key values of an index in descending order. A descending index can be scanned in forward order, which is more efficient. If a query mixes ASC and DESC, the optimizer can use an index on the columns if the index also uses corresponding mixed ascending and descending columns:

SELECT * FROM test
ORDER BY k DESC, j ASC;

The optimizer can use an index on (k, j) if k is descending and j is ascending. It can also use an index on those columns (with a backward scan) if k is ascending and j is descending.

If multiple rows have the same value in the ORDER BY columns, the server is free to return those rows in any order. If it is important to ensure the same row order with and without LIMIT, include additional columns in the ORDER BY clause to make the order deterministic.

Choosing a MySQL Support Solution

Percona Support for MySQL is a valuable solution for organizations seeking to enhance their existing MySQL talent with the support and expertise provided by Percona’s open source software, services, and industry-best SLAs.

Organizations can enjoy a wide range of benefits, including comprehensive and responsive support for MySQL databases across various infrastructures, whether it’s on-premises, in the cloud, or in a database-as-a-service (DBaaS) environment. With Percona, organizations can have peace of mind knowing they have access to reliable and cost-flexible support for their MySQL databases, allowing them to focus on their core business operations.

Get high availability support for MySQL

Some free resources that you might find useful

FAQS

Does MySQL have a limit?

Yes, MySQL does impose a limit on the number of rows that can be returned using the LIMIT clause. The specific limit varies depending on the version of MySQL being used and the configuration settings.

In earlier versions of MySQL, such as versions prior to 8.0.17, the maximum number of rows that can be returned is 18446744073709551615 (2^64 – 1). This value is extremely large and is considered practically unlimited for most practical use cases.

However, starting from MySQL version 8.0.17, the maximum number of rows that can be returned is determined by the value of the max_allowed_packet variable. By default, this value is set to 4MB. If there is a need to retrieve a larger number of rows, this value can be increased accordingly to accommodate larger result sets.

What is the maximum MySQL database size?

In MySQL, the maximum size of a database is not directly imposed by MySQL itself. Instead, it is determined by the file system limits of the operating system on which the database is hosted. MySQL leverages the file system to store the database files, and the maximum size is subject to the constraints and limitations set by the underlying file system.

What is the limit option in MySQL?

In MySQL, the LIMIT option serves the purpose of limiting the number of rows that are returned by a SELECT statement. It provides a way to specify the maximum number of rows that you want to retrieve from the result set. By using the LIMIT clause, you can control the amount of data that is returned, making it particularly useful when dealing with large tables or when you only need to retrieve a subset of the data.

What is the limit of MySQL user?

In MySQL, there is no predefined limit on the number of users that can be created. The ability to create users depends on the system resources and configuration of the MySQL server. Factors such as available memory, disk space, and the maximum number of open file descriptors allowed by the operating system can impact the number of users that can be effectively managed.

To ensure efficient management of user accounts, it is advisable to create user accounts as needed and revoke unnecessary privileges when they are no longer required.

What does limit 100 mean in SQL?

In SQL, the LIMIT clause is used to restrict the number of rows returned by a SELECT statement. When LIMIT is used with a numeric value, such as LIMIT 100, it specifies that the query should return a maximum of 100 rows from the result set.

Does MySQL sort by primary key?

By default, in MySQL, when you execute a SELECT statement without specifying an explicit ordering using the ORDER BY clause, the result set is not guaranteed to be sorted by the primary key or any other specific column. If you want to ensure a specific order of the result set, you should use the ORDER BY clause and specify the column(s) by which you want the data to be sorted.

0 0 votes

Article Rating

105 Comments

Oldest

Newest Most Voted

Jay Pipes

19 years ago

Hi! Great article (as always). Just wanted to point out one quick exception to this:

“Do not sort by expressions”

On a FULLTEXT index, if you have a MATCH … AGAINST … in the WHERE expression as well as the ORDER BY expression (for instance, for the purpose of ordering by relevancy), then the FULLTEXT index will indeed by used and the ORDER BY will be quite efficient, as it is *not* reexecuted.

Cheers, and keep up the fantastic articles, Peter and Vadim!

Jay

Maarten Manders

19 years ago

Thanks for another great blog post!

“Do not sort by expressions”

Picking random rows is another thing that social networking pages use a lot (random x users that live in the same city as you; random x users that share some attribute with you).

ORDER BY RAND() is evil but what alternatives are there? Is there a best practice to tackle this issue with PHP5/MySQL5, especially when I only need little amounts (

Maarten Manders

19 years ago

Something got stripped away:

Is there a best practice to tackle this issue with PHP5/MySQL5, especially when I only need little amounts (less than 20) of random rows?

Tinus

19 years ago

Good point Maarten, I hope somebody knows an alternative / best practice?

Apachez

19 years ago

One way is to have something like a unique rowid in that table. Then in the script choose a random value and try to select a row matching that number.

One algorithm would then be:

1) SELECT MAX(rowid) AS maxrowid FROM t1 LIMIT 1; (we know that lowest rowid is 0 since its unsigned int).

2) Run the random() function in your script which will return a numeric value between 0 and MAX(rowid) (dont forget to seed).

3) SELECT col FROM t1 WHERE rowid >= “numeric value from 2)” LIMIT 1;

4) If 0 rows were returned check to the other direction: SELECT col FROM t1 WHERE rowid = “numeric value from 2)” LIMIT 1;

Apachez

19 years ago

Uhm for some reason half my post vanished !?

Anyway…

5) If still no rows were returned then tell the client that no rows were found.

In the case of “random x user from city y” you can have the select as:

SELECT col FROM t1 WHERE city = y AND rowid >= “numeric value from 2)” LIMIT 1;

along with an index such as INDEX (city, rowid)

Author

Peter Zaitsev

19 years ago

Jay,

What do you mean exactly ? First I did not speak about full text search indexes here but about BTREE ones. If do full text search using natural language search you do not need to do any extra ORDER BY clause – it will be automatically sorted by “relevance”:

mysql> select title, match (title,content) against (“internet”)as score from cont where match (title,content) against (“internet”) limit 10;
+—————————————+—————–+
| title | score |
+—————————————+—————–+
| Internet dynamics | 9.0042238235474 |
| Internet Generation | 8.8267345428467 |
| Neutral New Zealand Internet Exchange | 8.596981048584 |
| Regional Internet Registries | 8.3543605804443 |
| Internet organizations | 8.2909164428711 |
| Internet exchange points | 8.2456722259521 |
| Internet Explorer shells | 8.2083368301392 |
| Internet celebrities | 8.0902004241943 |
| IPv4 internet | 7.9001092910767 |
| Botswana Internet Exchange | 7.8775291442871 |
+—————————————+—————–+
10 rows in set (0.08 sec)

But you must NOT include ORDER BY score DESC if you want your query executed in decent time. I will require Filesort which can take very long:

mysql> select title, match (title,content) against (“internet”)as score from cont where match (title,content) against (“internet”) order by score desc limit 10;
+—————————————+—————–+
| title | score |
+—————————————+—————–+
| Internet dynamics | 9.0042238235474 |
| Internet Generation | 8.8267345428467 |
| Neutral New Zealand Internet Exchange | 8.596981048584 |
| Regional Internet Registries | 8.3543605804443 |
| Internet organizations | 8.2909164428711 |
| Internet exchange points | 8.2456722259521 |
| Internet Explorer shells | 8.2083368301392 |
| Internet celebrities | 8.0902004241943 |
| IPv4 internet | 7.9001092910767 |
| Botswana Internet Exchange | 7.8775291442871 |
+—————————————+—————–+
10 rows in set (1 min 46.73 sec)

In fact it is one of major weaknesses of build in MySQL Full Text search – inability to efficiently combine it with sorting by anything else then relevance. For example if I would want to run boolean full text search for our dating example and sort by date registered or last login time it would crawl. This is where external solutions come in play.

Author

Peter Zaitsev

19 years ago

I’ve now modified post to include Beware of large LIMIT section which is somewhing I forgot to write about during initial revision.

Author

Peter Zaitsev

19 years ago

ORDER BY RAND() is indeed trouble maker you can meet in many applications. As workarounds are not trivial many people simply decide to leave it this way. It however will not scale for large data sets.

Apachez gives one decent workaround – if you just need one random item and you have sequential row_ids or there are few holes and you do not care about a bit skewed distribution – use this approach – select random(id) in the range and find row close to it.

If you need multiple rows or there is no sequential integer ids, you can use other approach.

– add indexed column called “random” and populate it with random values
– do order by random instead of order by rand()
– once you’ve selected the row update “random” to new random value
– periodically run process to resent random values for everyone otherwise if row has got very large value on update it would be never selected.

This works well for large data set but small number of selects as you need to do a lot of updates in this case.

Other approach would be to pre-create sequence of ordered values and use it. For example generate 1.000.000 of random rows at once. Now have as simple counter of how much of random rows you’ve used, ie first take 10 first rows, then second 10 rows etc. For many applications it is enough to run in circles for other you can generate more rows as soon as they are used.

The good thing about last technique is – you can get same distribution as with order by rand() and you can have very limited overhead.

Speeple

19 years ago

If you know/calculate the size of your rows you could try this for a faster alternative to RAND() (example in PHP):

$var = row(“SELECT COUNT(id) FROM mytable”);
$var = rand(0, $var – 1);
$result = query(“SELECT random_item FROM mytable WHERE … ORDER BY … LIMIT $var,1”);

A dirty UNION query could produce a large result set.

Author

Peter Zaitsev

19 years ago

Speeple,

This is pretty bad idea. You did not write what you do ORDER BY but it may require filesort. But even if you do it by index in average 50% of rows will have to be scanned because of LIMIT.

LIMIT 50000,1 is bad – meaning 50.000 of rows will be generated and frown away.

Apachez provided correct solution for the case if you have id column 🙂

Speeple

19 years ago

Wups, the ORDER BY is redundent in such a query.

Author

Peter Zaitsev

19 years ago

Better 🙂 But even without ORDER BY LIMIT with non zero offset is expensive.

Try LIMIT 10000000,1 on some large table 🙂

On small tables everything may look fast – full table scans, joins without indexing and this is often the root cause of scalability problems later on.

Speeple

19 years ago

I’ve just tried the query on a several tens of gigabyte table with 12 Mil rows:

SELECT * FROM index_archive LIMIT 1000000,1

True, it is very slow, especially from a web application view point.

I can’t remember where I read that this was a good method… But it’s obviously a poor option, even if it is marginally faster than RAND().

I’ve implemented this in a social network project I’m working on for “random blog” & “random profile”… guess i’ll be giving Apachez’s method a try!

Author

Peter Zaitsev

19 years ago

Yeah. Random blog and random profile are indeed easy as they often have sequential integer ids. There could be holes due to removed accounts but still good enough

Speeple

19 years ago

Yes, the only flaw I have found with Apachez’s method is when there’s a integer jump e.g. 1 > 5 and id 3 is selected resulting in an error.

Author

Peter Zaitsev

19 years ago

Well. That is whole point. If you random value was 3

you do where id>=3 order by id asc limit 1;

which gives you value 3 or closest to it if it does not exist.

Panos

19 years ago

I just want to disagree with Jay’s point (1st post). I’m using MySQL 5 and am having some fun adapting to their new optimization engine. I have a table that I use a full text index on when I run a query with no order by on it it is increadibly fast. HOWEVER when I use an ORDER BY MATCH … AGAINST … LIMIT the query is painstakingly slow as it uses filesort for some reason that I do not understand.

—

SELECT * FROM sites
WHERE MATCH(title, url, stripped) AGAINST(‘test’ IN BOOLEAN MODE) AND stat = ‘A’
ORDER BY match(title, url, stripped) AGAINST (‘test’ )
DESC
LIMIT 0, 50

27 rows fetched (60.67 sec)

id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE sites fulltext stat,match_url_title_stripped match_url_title_stripped 0 1 Using where; Using filesort

— Versus —

SELECT * FROM sites
WHERE MATCH(title, url, stripped) AGAINST(‘test’ IN BOOLEAN MODE) AND stat = ‘A’
LIMIT 0, 50

27 rows fetched (0.75 sec)

id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE sites fulltext stat,match_url_title_stripped match_url_title_stripped 0 1 Using where

Author

Peter Zaitsev

19 years ago

Panos,

Jay is wrong on this. See my comment above.

The behavior you’re getting is quite expected. Check my new presentation on FullText Search performance to see other things which will be painfully slow:
http://www.mysqlperformanceblog.com/files/presentations/EuroOSCON2006-High-Performance-FullText-Search.pdf

rob

19 years ago

All the workarounds for ORDER BY RAND() usually assume you are looking for a single record, or some small number. What if you want to randomly select an arbitrary number of rows (say between 500 and 15000)? If someone has a faster alternative to ORDER BY RAND() for this situation, I would be interested in hearing it. Thanks.

Author

Peter Zaitsev

19 years ago

Rob ? Are you saying about random number of random rows ?

I mentioned one workaround which may work for you – you can have separate pre-creaed table which has sequence of random row ID something like:

id rnd
1 456
2 123
3 905

So you have sequential row number and random values. Now when you need N random you can do select rnd from tbl where id> LIMIT N;

If N is random – so compute it on your application.

Now you just need to advance as you use data from this table. Also you need something when all random data is used – for some application looping is enough for others you need to make sure you generate enough random data.

Alastair James

19 years ago

Hi there….

Great blog…

I have a difficult (for me) mysql optimisation problem. Basically I want to optimise a query that uses a fulltext search and orders by a stactic field. I.e.

SELECT * FROM WHERE MATCH() AGAINST(”) ORDER BY score DESC

Where score is a precomputed score field.

I have indexes built on the fulltext and score seperately. I cant combine them as they are different types. However, as mysql only uses the fulltext index it is forced to filesort for the orderby. And way round this?

Thanks!

Alastair James

19 years ago

Sorry, sql should read:

SELECT * FROM WHERE MATCH(text_field) AGAINST(‘textâ€™) ORDER BY score DESC

Author

Peter Zaitsev

19 years ago

Fulltext search is special. First it automatically does sorting by relevance unless you’re doing BOOLEAN search so you better just remove order by from this query.

Second – are you sure you do not want to use LIMIT ? Getting all matches will be very slow for large data sets.

Dan Musil

19 years ago

Hi.

At first I have to say this is the excellent site! 🙂

And now my troubles. I discovered range and order by field in index is the last one used.
Suppose I have a table with a, b and c fields and SQL query

SELECT * FROM table WHERE a=1 AND b>10 ORDER BY c

key(a,b) works, but filesort is used.
key(a,c) works, but b range goes sequentially.
key(a,b,c) doesn’t work (only a and b is used).

Do you have any idea how to go around this? How to define index for a, b and c?

Thank you anyway,
Dan

Author

Peter Zaitsev

19 years ago

Dan,

You’re right. This is the challenge and there is no solution which works for everything.
If b range is small you can use this trick:

http://www.mysqlperformanceblog.com/2006/08/14/mysql-followup-on-union-for-query-optimization-query-profiling/

Anca

19 years ago

Your articles posted here are great. I’m a beginner in mysql optimization and i have a “maybe stupid” question.
Let’s say i have a table mytable(a,b,c,d,e …) and i want to perform ORDER BY with LIMIT on more than 60-70% of the columns (i want to have many sorts available)
example queries:
select a,b,c FROM mytable ORDER BY a;
select a,b,c FROM mytable ORDER BY b;
… etc…

is it ok to define indexes for every column used in order by? When the indexes defined on a table are considered to be too many (assuming that are not redundant are are used for different purposes…)

Author

Peter Zaitsev

19 years ago

Thank you.

Having sorts by many fields is challenging. You basically have only two choices you ether have a lot of indexes which slows down insertions/update and require space on disk and in caches or you use filesort which does not scale well with large data sets.

Anca

19 years ago

You say “Sort by column in leading table”, but could it be an workaround to create a view by joining 2 tables and then sort by any colum you want (from the view). Or is the same thins as SELECT with join and ORDER BY

Author

Peter Zaitsev

19 years ago

Not really,

View is virtual entity so creating view does not make it to magically behave like it would be table from indexing standpoint – optimizer looks at base tables instead.

steve jin

19 years ago

use subquery with in FROM, like

select * from (select * from tab where … limit 0, 20) order by rand() limit 0,1

Author

Peter Zaitsev

19 years ago

Steve,

What is your advice about ? Your query would not be faster than query in FROM clause and you’re selecting random row out of 20 rows selected by first query which is not entirely the same as fully random 🙂

steve jin

19 years ago

Peter,

It is semi- or fake random to minimum sort and table scan. I have 5 robot client run in the same time. It is for make sure the 5 client don’t pick up the same record out of 10,000 records.

It is random order in first 20, instead of random order in all (10,000 rows in my case). Performance is much better.

for 5 clients, 5 is OK instead of 20 in the sql. even put 100 there the performance won’t be big difference. I put 20 there, in case I increase the robot number.

Author

Peter Zaitsev

19 years ago

Yes. In this case that works. Similar approach would be to create 20 rows store them in separate table and do select order by rand() from it. The benefit in such case is – you can regenerate the table every few minutes and get different set for rotation.

Jonathan Haddad

19 years ago

Good article. I didn’t realize the join order would affect the ORDER BY clause.

Davies

19 years ago

How to optimize in this kind:

update targets set status=’D’ where status=’C’ order by schedule limit 1;

with index of (status, schedule), and it was really slow. And

Select * from targets where status=’C’ order by schedule limit 1;
update targets set status=’D’ where id=’id’;

are very fast as expected.

how to solve it?
//Bow

Author

Peter Zaitsev

19 years ago

Davies,

The problem in this case you want to UPDATE to use the same key as you’re updating which can cause “loops” for certain updates, consider for example

update tbl set i=i+10 where i>5

Updating value 10 will make it 20 and it will again match the clause etc.

So MySQL can’t use such index for query execution. It could use temporary table in the same way you do it but I guess it does not do it right now.

Rob

19 years ago

Hey guys, here’s another good solution. I have a users table where I want to select five people at random who have a certain setting on their profile. I can’t use max ID because the IDs won’t necessarily be in sequence, so here’s what I do:

Select all the user IDs that match my params and put them into an array…

$query = “SELECT user_id FROM users WHERE profile_privacy=’0′ AND profile_display_name””;

… Then use PHP to shuffle the resulting array and take the first five items:

shuffle($random_profile_ids);
$random_profile_ids = array_slice($random_profile_ids,0,5);

… Then do an IN query on the user_id field for the five items left in your array:

$in_sql = “‘” . implode($random_profile_ids,”‘,'”) . “‘”;
$query = “SELECT * FROM users WHERE user_id IN ($in_sql) LIMIT 5”;

On my 30k row table the new solution ran in a combined .0147 secs, compared to .0770 secs for ORDER BY RAND(). Not a bad savings if you ask me. Enjoy,

Rob

marrtins

19 years ago

Rob,

That wont work well on large datasets (it will eat all mamory sooner or later as data set grows). Acctually it seems to me, that using RAND() in this case will be more acceptable, even in on small data sets, becouse of not a such big performance gain.

Btw, you can improve your code by *not* using shuffle` but instead get 5 values from rand(0, count($random_profile_ids) – 1); (make sure thay do not overlap) and then just pick appropriate values from array.

John

19 years ago

I have a question regarding order by and limit..

i used to use this system in order to page the results 100 rows at a time but found it to be very slow as the page numbers grew larger and larger..

what i thought to help it scale better is to have something like this:

SELECT inverted_timestamp FROM whatever WHERE visible = ‘yes’ ORDER BY torrents.inverted_timestamp LIMIT 100,1

the inverted_timestamp field and the orderby go together as i m sorting by a few different columns (its like a spreadsheet table which u can page and sort by a few fields – i have added indexes for all the used cases).. the 100 is just the currentpage*perpage which happens to be 100 since i m on the 2nd page (page 1) and have 100 rows per page.

then i do the main query to fetch the actual results:

SELECT users.username,bla bla….. FROM whatever LEFT JOIN categories ON category = categories.id LEFT JOIN users ON torrents.owner = users.id WHERE visible = ‘yes’ AND inverted_timestamp>=’3111748733′ ORDER BY inverted_timestamp LIMIT 100

which i have indxed as (visible,inverted_timestamp) (and similarly for the rest of the columns i order by)
the value >=’311..’ comes from the 1st query which i use to essentially find the first row of that page and then use that value in the main query to prevent the main one (which fetches all the results) to have to do an order by in the entire data set… From my testing it seems to be running very well upto around 1-2 million rows.. from then on the second query is still very fast but the first one is starting to take the hit.. previously i had just teh second query with no >= condition and i used limit offset,perpage which was ok up until 100k rows and then it was really slow…

it seems this solution is an improvement .. my question really is, is there a way to make it better?

to illustrate what i meant above these are the queries generated with a different ordering:
SELECT size FROM whatever WHERE visible = ‘yes’ ORDER BY size DESC LIMIT 3800,1
SELECT users.username,bla bla FROM whatever LEFT JOIN categories ON category = categories.id LEFT JOIN users ON owner = users.id WHERE visible = ‘yes’ AND size

John

19 years ago

it got cut off:
AND size= value_found) which worked well but only when i ordered by id (so it was limited – though i can still probably use this way when the ordering is done via an integer that has sequential properties)..

the problem really lies in how to speed up the other order by’s beyond what i have done already.

John

19 years ago

hmm it doesnt like my posts.. it cut it again :/

size

John

19 years ago

ffs i guess its the single quote that fucks up … i wont bother trying again .. i believe u have understood what i m trying to say

Mike

18 years ago

Here is my logic to do random.

$entries = array();
$count = 10;
while ($count > 0) {
$DB->query(“SELECT col1,col2 FROM entries AS e1 INNER JOIN members ON userid=id AND active=1 JOIN (SELECT ROUND(RAND() * (SELECT MAX(entryid) FROM entries)) AS entryid) AS e2 WHERE e1.entryid >= e2.entryid AND e1.photos!=’a:0:{}’ ORDER BY e1.entryid ASC LIMIT 1”);
if ($DB->get_num_rows() > 0) {
$count–;
$entries[‘entries’][] = $this->__processEntry($DB->fetch_array());
}
}

It works pretty well for my standards

chris

18 years ago

Great Weblog, I did often visit it with good results but now, I totally stuck in the simplest thing.

EXPLAIN SELECT p.products_id
FROM products p
ORDER BY p.manufacturers_id ASC
LIMIT 3

Gives me a

id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE p index NULL idx_manufacturers_id 5 NULL 3490

could anyone give me a hint why it steps thure 3490 rows, even when using the idx_manufacturers_id?
thanks for any idea (using mysql5) && sorry if this is too stupid post 🙂
Chris

Nick Jag

18 years ago

That was extremely helpful. I was having a hard time locating good and clear resources on limit and order optimization. Thanks!

Jan

18 years ago

Mikes solution is by far the fastest.
http://jan.kneschke.de/projects/mysql/order-by-rand a.id
This brings trouble as well because the random number is generated inside the query resulting with two different random numbers. This can be overcome by making a pre-query to generate a random number in your application, still leaving you with the first problem.

my choice was to combine the ORDER BY RAND() and the other one.

SELECT * FROM
(SELECT * FROM comments
WHERE count > $number
AND topic = $topic
LIMIT 100)
AS r1 ORDER BY RAND() LIMIT 5

The resulting speed is acceptable. The first LIMIT can be use to adapt the level of random in the query.
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY ALL NULL NULL NULL NULL 5 Using temporary; Using ilesort
2 DERIVED comments ALL NULL NULL NULL NULL 140809 Using where

The query as you can see goes through 140.000 rows in 0.0792 seconds. Not as fast as the other one but a lot more flexible and open for other WHERE statements

Jan

18 years ago

Once again a post has been cut into pieces.

But it has one problem. What if you intend to look for multiple rows with another criteria.
Lets say you have 100.000 rows and want to output 5 randomized rows. From the 100.000 rows 200 qualify to be in the resulting rows. You can not run the query 5 times because you might end up with the same row 5 times. If there is a large gap between two qualifying results it can happen that you generate a random number that is in the gap thus giving you the same row over and over again.
Secondly you may hit the end of the set thus generating no output at all. Some ppl suggested to run the query backwards and forwards. i.id (smaller-equal) a.id and the i.id (greater) a.id
This brings trouble as well because the random number is generated inside the query resulting with two different random numbers. This can be overcome by making a pre-query to generate a random number in your application, still leaving you with the first problem.

Jan

18 years ago

Addendum:

I forgot to create a index over the concerning collumns. Now the efficiency has increased a lot.
But I just found a problem. The results are not spread evenly. They allways cluster around the begining of the table.

Any way to alter this behaviour?

Simon

18 years ago

Great resource Peter, and also thanks to Jan (comment #48) for his “compromise” solution to the ORDER BY RAND() dilemma. I’ve now managed to get a query (testing with 500k rows, requiring < 10 random rows) down from 0.8s execution time to just 0.05s. 🙂