Suboptimal MySQL ORDER BY implementation, especially together with LIMIT is often the cause of MySQL performance problems. Here is what you need to know about MySQL ORDER BY LIMIT optimization to avoid these problems.
MySQL ORDER BY with LIMIT is the most common use of ORDER BY in interactive applications with large data sets being sorted. On many web sites, you will find top tags, recently registered users, etc – which would often require ORDER BY with LIMIT in the back end. In general this type of ORDER BY looks like SELECT ….. WHERE [conditions] ORDER BY [sort] LIMIT N,M
Make sure it uses index It is very important to have ORDER BY with LIMIT executed without scanning and sorting full result set, so it is important for it to use index – in this case, index range scan will be started and query execution stopped as soon as required amount of rows generated.
For example, if I do SELECT * FROM sites ORDER BY date_created DESC LIMIT 10; I would use index on (date_created) to get a result set very fast.
Now what if I have something like SELECT * FROM sites WHERE category_id=5 ORDER BY date_created DESC LIMIT 10;
In this case index by date_created may also work but it might not be the most efficient – If it is a rare category large portion of table may be scanned to find 10 rows. So index on (category_id, date_created) will be a better idea.
Let’s take a look at a bit more complex case: SELECT * FROM sites WHERE category_id in (5,10,12) ORDER BY date_created DESC LIMIT 10;
Even though it looks quite similar to the previous one it is a lot different as there are multiple category_id values in the list now so index on (category_id, date_created) can’t be used directly. Index on date_created separately would still work. The good from performance standpoint (even though a bit ugly) will be UNION workaround I already wrote about.
So what if you have an application which can perform a search on many different columns, with worse then perfect selectivity. Various social networking and dating sites are perfect examples of such queries
SELECT FROM people where gender=’m’ and age between 18 and 28 and country_id=5 and city_id=345 order by last_online desc limit 10;
There could be many possible limiting factors with all of them being optional. This is a hard nut to crack and I know on high-end custom search solutions can be developed, but if we stick to simple MySQL using multiple indexes on most selective columns would be a good idea for the performance of such queries.
For example, you may put index on(gender,last_online) assuming most people will have gender specified, as well as (country_id,city_id,last_online) assuming in most cases these will be specified. It takes a good look at queries actually being run and data selectivity to come up with a good set of indexes for such case, it also may need to be adjusted in the future.
The main thing to watch for, if you do not have full where clause resolved by index is how many rows you need to scan to resolve order by (this can be found in slow query log or by examining Hander statistics). If only 50 rows are examined to provide
10 rows of result set you’re in decent shape but if it is 5000 you might need to rethink your indexing.
Also, note – number of records scanned to provide result set will be very dynamic based on particular constant and other factors.
For example for our dating example if we use only (last_online) index and look for people from the USA we likely will find 10 people pretty quickly, if the country is small or simply there are few members from the country, ie Slovenia – the same kind of search might need to scan 1000s times more rows to provide result set.
In the example above we did order by last column, in fact, the index can be used for ORDER BY if sorting is done by leading column(s). Note however columns following column used for order by can’t be used to restrict result set. For example:
key(a,b,c) SELECT * FROM tbl WHERE c=5 ORDER BY a,b limit 10 – In this case first two columns from the index can be used to satisfy order by, index, however, will not be helpful to check c=5 (unless it is index covered query). Index on
(c,a,b) would work better for the query above.
Do not sort by expressions I guess this one is obvious – expressions or functions will block index usage for order by.
Sort by column in leading table if you have JOIN with ORDER BY … LIMIT you should try hard to have sorting column(s) to be in the leading table. If ORDER BY is going by field from the table which is not first in the join order index can’t be used. Sometimes it means breaking normalization and duplicating column(s) you’re going to use in ORDER BY in other table.
Here is an example when ORDER BY is done by the second table which requires filesort:
1 2 3 4 5 6 7 8 | mysql> explain select test.i from test, test t where test.k=5 and test.i=t.k order by t.k limit 5; +----+-------------+-------+------+---------------+------+---------+-------------+------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+-------------+------+---------------------------------+ | 1 | SIMPLE | test | ref | PRIMARY,k | k | 4 | const | 1 | Using temporary; Using filesort | | 1 | SIMPLE | t | ref | k | k | 4 | test.test.i | 1 | Using where; Using index | +----+-------------+-------+------+---------------+------+---------+-------------+------+---------------------------------+ 2 rows in set (0.00 sec) |
However, if the first table has “const” or “system” access type it is effectively removed from join execution (replaced with constants) and so ORDER BY can be optimized even if it is done by the second table:
1 2 3 4 5 6 7 8 | mysql> explain select test.i from test, test t where test.i=5 and test.k=t.k order by t.k limit 5; +----+-------------+-------+-------+---------------+---------+---------+-------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+---------+---------+-------+------+-------------+ | 1 | SIMPLE | test | const | PRIMARY,k | PRIMARY | 4 | const | 1 | | | 1 | SIMPLE | t | ref | k | k | 4 | const | 1 | Using index | +----+-------------+-------+-------+---------------+---------+---------+-------+------+-------------+ 2 rows in set (0.01 sec) |
The difference between these cases is “i” is the primary key while “k” is simply indexed column.
Note: In some cases even if it is possible to use index to do ORDER BY with JOIN MySQL still will not be able to use it as Optimizer is not smart enough yet to detect such cases:
1 2 3 4 5 6 7 8 | mysql> explain select test.i from test, test t where test.k=5 and test.i=t.k order by test.k,t.j limit 5; +----+-------------+-------+------+---------------+------+---------+-------------+------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+-------------+------+---------------------------------+ | 1 | SIMPLE | test | ref | PRIMARY,k | k | 4 | const | 1 | Using temporary; Using filesort | | 1 | SIMPLE | t | ref | k | k | 4 | test.test.i | 1 | Using where; Using index | +----+-------------+-------+------+---------------+------+---------+-------------+------+---------------------------------+ 2 rows in set (0.00 sec) |
In this case, there is index (k,j) on the table so indexes could be used on each of the tables to optimize order by, or at least local sort could be used for each t.k=const value for the second table. Which is not done, however.
Sort in one direction. If you have ORDER BY col1, col2 it can be optimized using index, if you have
ORDER BY col1 DESC, col2 DESC same thing, however if you would have ORDER BY col1, col2 DESC MySQL will have to use filesort. Classic for solution for this would be to have index which is sorted appropriately (ascending by col1 and descending by col2) but MySQL can’t do it at this point. Workaround which can be currently used is separate column which holds reverse values, so you can do ORDER BY col1, col2_reverse instead.
Beware of large LIMIT Using index to sort is efficient if you need first few rows, even if some extra filtering takes place so you need to scan more rows by index then requested by LIMIT. However if you’re dealing with LIMIT query with large offset efficiency will suffer. LIMIT 1000,10 is likely to be way slower than LIMIT 0,10. It is true most users will not go further than 10 page in results, however Search Engine Bots may very well do so. I’ve seen bots looking at 200+ page in my projects. Also for many web sites failing to take care of this provides very easy task to launch a DOS attack – request page with some large number from few connections and it is enough. If you do not do anything else make sure you block requests with too large page numbers.
For some cases, for example if results are static it may make sense to precompute results so you can query them for positions.
So instead of query with LIMIT 1000,10 you will have WHERE position between 1000 and 1009 which has same efficiency for any position (as long as it is indexed)
Force index if needed In some cases MySQL Optimizer may prefer to use different index, which has better selectivity or just better estimates instead of which allows you to do the sort. For example if you would have indexes on (country_id,city_id) and index on (country_id,last_online) for query SELECT * FROM people WHERE country_id=5 and city_id=6 order by last_online desc limit 10; first index will be likely selected even if it leads to filesort.
The solution for this problem is either extending your indexes so MySQL Optimizer does not have to chose between better sort or better lookup or use FORCE INDEX to force it to use appropriate index.
Many of the tips I’ve mentioned here work for MySQL ORDER BY without LIMIT as well but there are some differences. I should write another article about ORDER BY without limit and large tables soon.
For queries that combine
ORDER BY
withLIMIT
, the optimizer may switch to an index that applies to theORDER BY
. In some cases, the decision to switch was based on a heuristic rather than on cost. The optimizer now uniformly makes the decision whether to switch on a cost basis. This should result in better performance when switching would cause a query to read an entire index or a large part of it to find qualifying rows.
References: See also: Bug #78993, Bug #22108385, Bug #73837, Bug #19579507, Bug #16522053.
SELECT * FROM test
ORDER BY k DESC, j ASC;
The optimizer can use an index on (k, j) if k is descending and j is ascending. It can also use an index on those columns (with a backward scan) if k is ascending and j is descending.
Some free resources that you might find useful
Webinars
- Top most overlooked MySQL Performance Optimizations
- MySQL scaling and high availability – production experience from the last decade(s)
- How to analyze and tune MySQL queries for better performance
Blog Posts
White Papers & eBooks
- Performance at scale: keeping your database on its toes
- Practical MySQL Performance Optimization Part 1, Part 2 and Part 3
Learn more about Percona Server for MySQL
Comments (109)
Hi Peter,
I need to know the syntax if I want to use index hint. [USE INDEX FOR ORDER BY (idx_name)] when we are using Joins in the query.
For Ex:- Select a.test_id, b.test_data, c.test_details From test a left join test_data b on a.test_id = b.test_id left join test_details c on b.test_id = c.test_id
Where a.test_flag = ‘Y’
Order by c.test_sum Desc limit 50;
Thanks.
Hi Peter,
one of our application query (book search) it takes more than 3 mins for execution. Below i have given the query & cnf details.
Select identifier,sku_id,vol_num,TITLE,TITLE_OPT,AUTHOR_TITLE,AUTHOR_FNAME,AUTHOR_MNAME,AUTHOR_LNAME, AUTHOR_ALIAS,AUTHORS, ORIG_PUBLISH_DATE, match(TITLE,AUTHOR_TITLE,AUTHOR_FNAME,AUTHOR_MNAME,AUTHOR_LNAME,AUTHOR_ALIAS, AUTHORS,SUBTITLE,ORIG_PUBLISH_DATE,DESCRIPTION,OTHER_INFO,CATEGORY,REMARKS) against (‘+ history’ in boolean mode) as score
from bp_search as bp
where match(TITLE,AUTHOR_TITLE,AUTHOR_FNAME,AUTHOR_MNAME,AUTHOR_LNAME,AUTHOR_ALIAS,AUTHORS,SUBTITLE,ORIG_PUBLISH_DATE,DESCRIPTION, OTHER_INFO,CATEGORY,REMARKS) against (‘+ history’ in boolean mode) group by group_id order by score desc limit 0,10;
TABLE DESC
——————–
CREATE TABLE
bp_search
(BOOK_ID
int(11) NOT NULL default ’0′,GROUP_ID
varchar(128) default NULL,IDENTIFIER
varchar(64) default NULL,SKU_ID
varchar(15) default NULL,TITLE
varchar(1600) default NULL,AUTHOR_TITLE
varchar(100) default NULL,AUTHOR_FNAME
varchar(256) default NULL,AUTHOR_MNAME
varchar(256) default NULL,AUTHOR_LNAME
varchar(2048) default NULL,AUTHOR_ALIAS
varchar(128) default NULL,AUTHORS
varchar(512) default NULL,SUBTITLE
varchar(1600) default NULL,ORIG_PUBLISH_DATE
varchar(64) default NULL,DESCRIPTION
varchar(1024) default NULL,OTHER_INFO
varchar(1024) default NULL,CATEGORY
varchar(3072) default NULL,REMARKS
varchar(1024) default NULL,VOL_NUM
varchar(40) default NULL,SOURCE
varchar(20) NOT NULL,TITLE_OPT
varchar(1600) default NULL,LAST_UPDATE
timestamp NOT NULL default ’0000-00-00 00:00:00′,KEY
GROUP_ID_idx
(GROUP_ID
),FULLTEXT KEY
book_semantics
(TITLE
,AUTHOR_TITLE
,AUTHOR_FNAME
,AUTHOR_MNAME
,AUTHOR_LNAME
,AUTHOR_ALIAS
,AUTHORS
,SUBTITLE
,ORIG_PUBLISH_DATE
,DESCRIPTION
,OTHER_INFO
,CATEGORY
,REMARKS
)) ENGINE=MyISAM DEFAULT CHARSET=utf8
CNF
——-
[mysqld]
socket = /var/lib/mysql/mysql.sock
skip-locking
key_buffer = 384M
max_allowed_packet = 256M
table_cache = 512M
sort_buffer_size = 2M
read_buffer_size = 2M
read_rnd_buffer_size = 1M
myisam_sort_buffer_size = 64M
thread_cache_size = 80
query_cache_size = 32M
thread_concurrency = 8
query_cache_type = 1
query_cache_limit = 32M
thread_stack=256K
myisam_data_pointer_size=6
net_buffer_length = 8K
log-slow-queries=/var/lib/mysql/astdb01.as.osti.local-slow.log
log-warnings=2
max_connect_errors=1000
max_connections=1000
max_heap_table_size=1073741824
local_infile=0
ft_min_word_len=3
tmpdir = /opt/
key_buffer_size=256M
connect_timeout=180
wait_timeout=360
tmp_table_size=851443712
Can you pl help me with this?
___________+_______________+________________+_____________
Hello Peter,
I am going through your article its found pretty much inserting. I need one help as i am displaying my property query which is displaying properly and ordering the result by following 4 column
p.is_move_up DESC,REPLACE(IFNULL(p.is_feature,””),”2015-12-31″,””) DESC,
p.listing_order DESC,p.propid DESC
In order site, properties are been displayed in this order: Moveup paid, Paid featured, paid, trial then expired. However, we want to show within each group the properties random and make them rotate. Now,
the properties do not rotate so it is not fair for users to see always the same properties in same order. So, we need the rotation to be within same group not mixing all.
Query :-
SELECT DISTINCT(p.propid),p.category_id,p.people_sleep,p.bedrooms,p.bathrooms,p.children,
p.airport,p.features,p.pets,p.smoke, p.wheelchair,p.prop_title,p.property_type,prop_date,
display_calender,other_url_calender,is_feature
FROM property p
LEFT JOIN category ON (p.category_id=category.id)
WHERE p.active IN(“yes”,”d”) AND p.is_feature >= SYSDATE()
ORDER BY p.is_move_up DESC,REPLACE(IFNULL(p.is_feature,””),”2015-12-31″,””) DESC,
p.listing_order DESC,p.propid DESC
Can you please help me into this.
Thanks for the quick response. VEry Nice
Hi! Great article (as always). Just wanted to point out one quick exception to this:
“Do not sort by expressions”
On a FULLTEXT index, if you have a MATCH … AGAINST … in the WHERE expression as well as the ORDER BY expression (for instance, for the purpose of ordering by relevancy), then the FULLTEXT index will indeed by used and the ORDER BY will be quite efficient, as it is *not* reexecuted.
Cheers, and keep up the fantastic articles, Peter and Vadim!
Jay
Thanks for another great blog post!
“Do not sort by expressions”
Picking random rows is another thing that social networking pages use a lot (random x users that live in the same city as you; random x users that share some attribute with you).
ORDER BY RAND() is evil but what alternatives are there? Is there a best practice to tackle this issue with PHP5/MySQL5, especially when I only need little amounts (
Something got stripped away:
Is there a best practice to tackle this issue with PHP5/MySQL5, especially when I only need little amounts (less than 20) of random rows?
Good point Maarten, I hope somebody knows an alternative / best practice?
One way is to have something like a unique rowid in that table. Then in the script choose a random value and try to select a row matching that number.
One algorithm would then be:
1) SELECT MAX(rowid) AS maxrowid FROM t1 LIMIT 1; (we know that lowest rowid is 0 since its unsigned int).
2) Run the random() function in your script which will return a numeric value between 0 and MAX(rowid) (dont forget to seed).
3) SELECT col FROM t1 WHERE rowid >= “numeric value from 2)” LIMIT 1;
4) If 0 rows were returned check to the other direction: SELECT col FROM t1 WHERE rowid = “numeric value from 2)” LIMIT 1;
Uhm for some reason half my post vanished !?
Anyway…
5) If still no rows were returned then tell the client that no rows were found.
In the case of “random x user from city y” you can have the select as:
SELECT col FROM t1 WHERE city = y AND rowid >= “numeric value from 2)” LIMIT 1;
along with an index such as INDEX (city, rowid)
Jay,
What do you mean exactly ? First I did not speak about full text search indexes here but about BTREE ones. If do full text search using natural language search you do not need to do any extra ORDER BY clause – it will be automatically sorted by “relevance”:
mysql> select title, match (title,content) against (“internet”)as score from cont where match (title,content) against (“internet”) limit 10;
+—————————————+—————–+
| title | score |
+—————————————+—————–+
| Internet dynamics | 9.0042238235474 |
| Internet Generation | 8.8267345428467 |
| Neutral New Zealand Internet Exchange | 8.596981048584 |
| Regional Internet Registries | 8.3543605804443 |
| Internet organizations | 8.2909164428711 |
| Internet exchange points | 8.2456722259521 |
| Internet Explorer shells | 8.2083368301392 |
| Internet celebrities | 8.0902004241943 |
| IPv4 internet | 7.9001092910767 |
| Botswana Internet Exchange | 7.8775291442871 |
+—————————————+—————–+
10 rows in set (0.08 sec)
But you must NOT include ORDER BY score DESC if you want your query executed in decent time. I will require Filesort which can take very long:
mysql> select title, match (title,content) against (“internet”)as score from cont where match (title,content) against (“internet”) order by score desc limit 10;
+—————————————+—————–+
| title | score |
+—————————————+—————–+
| Internet dynamics | 9.0042238235474 |
| Internet Generation | 8.8267345428467 |
| Neutral New Zealand Internet Exchange | 8.596981048584 |
| Regional Internet Registries | 8.3543605804443 |
| Internet organizations | 8.2909164428711 |
| Internet exchange points | 8.2456722259521 |
| Internet Explorer shells | 8.2083368301392 |
| Internet celebrities | 8.0902004241943 |
| IPv4 internet | 7.9001092910767 |
| Botswana Internet Exchange | 7.8775291442871 |
+—————————————+—————–+
10 rows in set (1 min 46.73 sec)
In fact it is one of major weaknesses of build in MySQL Full Text search – inability to efficiently combine it with sorting by anything else then relevance. For example if I would want to run boolean full text search for our dating example and sort by date registered or last login time it would crawl. This is where external solutions come in play.
I’ve now modified post to include Beware of large LIMIT section which is somewhing I forgot to write about during initial revision.
ORDER BY RAND() is indeed trouble maker you can meet in many applications. As workarounds are not trivial many people simply decide to leave it this way. It however will not scale for large data sets.
Apachez gives one decent workaround – if you just need one random item and you have sequential row_ids or there are few holes and you do not care about a bit skewed distribution – use this approach – select random(id) in the range and find row close to it.
If you need multiple rows or there is no sequential integer ids, you can use other approach.
– add indexed column called “random” and populate it with random values
– do order by random instead of order by rand()
– once you’ve selected the row update “random” to new random value
– periodically run process to resent random values for everyone otherwise if row has got very large value on update it would be never selected.
This works well for large data set but small number of selects as you need to do a lot of updates in this case.
Other approach would be to pre-create sequence of ordered values and use it. For example generate 1.000.000 of random rows at once. Now have as simple counter of how much of random rows you’ve used, ie first take 10 first rows, then second 10 rows etc. For many applications it is enough to run in circles for other you can generate more rows as soon as they are used.
The good thing about last technique is – you can get same distribution as with order by rand() and you can have very limited overhead.
If you know/calculate the size of your rows you could try this for a faster alternative to RAND() (example in PHP):
$var = row(“SELECT COUNT(id) FROM mytable”);
$var = rand(0, $var – 1);
$result = query(“SELECT random_item FROM mytable WHERE … ORDER BY … LIMIT $var,1”);
A dirty UNION query could produce a large result set.
Speeple,
This is pretty bad idea. You did not write what you do ORDER BY but it may require filesort. But even if you do it by index in average 50% of rows will have to be scanned because of LIMIT.
LIMIT 50000,1 is bad – meaning 50.000 of rows will be generated and frown away.
Apachez provided correct solution for the case if you have id column 🙂
Wups, the ORDER BY is redundent in such a query.
Better 🙂 But even without ORDER BY LIMIT with non zero offset is expensive.
Try LIMIT 10000000,1 on some large table 🙂
On small tables everything may look fast – full table scans, joins without indexing and this is often the root cause of scalability problems later on.
I’ve just tried the query on a several tens of gigabyte table with 12 Mil rows:
SELECT * FROM index_archive LIMIT 1000000,1
True, it is very slow, especially from a web application view point.
I can’t remember where I read that this was a good method… But it’s obviously a poor option, even if it is marginally faster than RAND().
I’ve implemented this in a social network project I’m working on for “random blog” & “random profile”… guess i’ll be giving Apachez’s method a try!
Yeah. Random blog and random profile are indeed easy as they often have sequential integer ids. There could be holes due to removed accounts but still good enough
Yes, the only flaw I have found with Apachez’s method is when there’s a integer jump e.g. 1 > 5 and id 3 is selected resulting in an error.
Well. That is whole point. If you random value was 3
you do where id>=3 order by id asc limit 1;
which gives you value 3 or closest to it if it does not exist.
I just want to disagree with Jay’s point (1st post). I’m using MySQL 5 and am having some fun adapting to their new optimization engine. I have a table that I use a full text index on when I run a query with no order by on it it is increadibly fast. HOWEVER when I use an ORDER BY MATCH … AGAINST … LIMIT the query is painstakingly slow as it uses filesort for some reason that I do not understand.
—
SELECT * FROM sites
WHERE MATCH(title, url, stripped) AGAINST(‘test’ IN BOOLEAN MODE) AND stat = ‘A’
ORDER BY match(title, url, stripped) AGAINST (‘test’ )
DESC
LIMIT 0, 50
27 rows fetched (60.67 sec)
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE sites fulltext stat,match_url_title_stripped match_url_title_stripped 0 1 Using where; Using filesort
— Versus —
SELECT * FROM sites
WHERE MATCH(title, url, stripped) AGAINST(‘test’ IN BOOLEAN MODE) AND stat = ‘A’
LIMIT 0, 50
27 rows fetched (0.75 sec)
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE sites fulltext stat,match_url_title_stripped match_url_title_stripped 0 1 Using where
Panos,
Jay is wrong on this. See my comment above.
The behavior you’re getting is quite expected. Check my new presentation on FullText Search performance to see other things which will be painfully slow:
http://www.mysqlperformanceblog.com/files/presentations/EuroOSCON2006-High-Performance-FullText-Search.pdf
All the workarounds for ORDER BY RAND() usually assume you are looking for a single record, or some small number. What if you want to randomly select an arbitrary number of rows (say between 500 and 15000)? If someone has a faster alternative to ORDER BY RAND() for this situation, I would be interested in hearing it. Thanks.
Rob ? Are you saying about random number of random rows ?
I mentioned one workaround which may work for you – you can have separate pre-creaed table which has sequence of random row ID something like:
id rnd
1 456
2 123
3 905
So you have sequential row number and random values. Now when you need N random you can do select rnd from tbl where id> LIMIT N;
If N is random – so compute it on your application.
Now you just need to advance as you use data from this table. Also you need something when all random data is used – for some application looping is enough for others you need to make sure you generate enough random data.
Hi there….
Great blog…
I have a difficult (for me) mysql optimisation problem. Basically I want to optimise a query that uses a fulltext search and orders by a stactic field. I.e.
SELECT * FROM WHERE MATCH() AGAINST(”) ORDER BY score DESC
Where score is a precomputed score field.
I have indexes built on the fulltext and score seperately. I cant combine them as they are different types. However, as mysql only uses the fulltext index it is forced to filesort for the orderby. And way round this?
Thanks!
Sorry, sql should read:
SELECT * FROM WHERE MATCH(text_field) AGAINST(‘text’) ORDER BY score DESC
Fulltext search is special. First it automatically does sorting by relevance unless you’re doing BOOLEAN search so you better just remove order by from this query.
Second – are you sure you do not want to use LIMIT ? Getting all matches will be very slow for large data sets.
Hi.
At first I have to say this is the excellent site! 🙂
And now my troubles. I discovered range and order by field in index is the last one used.
Suppose I have a table with a, b and c fields and SQL query
SELECT * FROM table WHERE a=1 AND b>10 ORDER BY c
key(a,b) works, but filesort is used.
key(a,c) works, but b range goes sequentially.
key(a,b,c) doesn’t work (only a and b is used).
Do you have any idea how to go around this? How to define index for a, b and c?
Thank you anyway,
Dan
Dan,
You’re right. This is the challenge and there is no solution which works for everything.
If b range is small you can use this trick:
http://www.mysqlperformanceblog.com/2006/08/14/mysql-followup-on-union-for-query-optimization-query-profiling/
Hi
Your articles posted here are great. I’m a beginner in mysql optimization and i have a “maybe stupid” question.
Let’s say i have a table mytable(a,b,c,d,e …) and i want to perform ORDER BY with LIMIT on more than 60-70% of the columns (i want to have many sorts available)
example queries:
select a,b,c FROM mytable ORDER BY a;
select a,b,c FROM mytable ORDER BY b;
… etc…
is it ok to define indexes for every column used in order by? When the indexes defined on a table are considered to be too many (assuming that are not redundant are are used for different purposes…)
Thank you.
Having sorts by many fields is challenging. You basically have only two choices you ether have a lot of indexes which slows down insertions/update and require space on disk and in caches or you use filesort which does not scale well with large data sets.
You say “Sort by column in leading table”, but could it be an workaround to create a view by joining 2 tables and then sort by any colum you want (from the view). Or is the same thins as SELECT with join and ORDER BY
Not really,
View is virtual entity so creating view does not make it to magically behave like it would be table from indexing standpoint – optimizer looks at base tables instead.
use subquery with in FROM, like
select * from (select * from tab where … limit 0, 20) order by rand() limit 0,1
Steve,
What is your advice about ? Your query would not be faster than query in FROM clause and you’re selecting random row out of 20 rows selected by first query which is not entirely the same as fully random 🙂
Peter,
It is semi- or fake random to minimum sort and table scan. I have 5 robot client run in the same time. It is for make sure the 5 client don’t pick up the same record out of 10,000 records.
It is random order in first 20, instead of random order in all (10,000 rows in my case). Performance is much better.
for 5 clients, 5 is OK instead of 20 in the sql. even put 100 there the performance won’t be big difference. I put 20 there, in case I increase the robot number.
Yes. In this case that works. Similar approach would be to create 20 rows store them in separate table and do select order by rand() from it. The benefit in such case is – you can regenerate the table every few minutes and get different set for rotation.
Good article. I didn’t realize the join order would affect the ORDER BY clause.
How to optimize in this kind:
update targets set status=’D’ where status=’C’ order by schedule limit 1;
with index of (
status
,schedule
), and it was really slow. AndSelect * from targets where status=’C’ order by schedule limit 1;
update targets set status=’D’ where id=’id’;
are very fast as expected.
how to solve it?
//Bow
Davies,
The problem in this case you want to UPDATE to use the same key as you’re updating which can cause “loops” for certain updates, consider for example
update tbl set i=i+10 where i>5
Updating value 10 will make it 20 and it will again match the clause etc.
So MySQL can’t use such index for query execution. It could use temporary table in the same way you do it but I guess it does not do it right now.
Hey guys, here’s another good solution. I have a users table where I want to select five people at random who have a certain setting on their profile. I can’t use max ID because the IDs won’t necessarily be in sequence, so here’s what I do:
Select all the user IDs that match my params and put them into an array…
$query = “SELECT user_id FROM users WHERE profile_privacy=’0′ AND profile_display_name””;
… Then use PHP to shuffle the resulting array and take the first five items:
shuffle($random_profile_ids);
$random_profile_ids = array_slice($random_profile_ids,0,5);
… Then do an IN query on the user_id field for the five items left in your array:
$in_sql = “‘” . implode($random_profile_ids,”‘,'”) . “‘”;
$query = “SELECT * FROM users WHERE user_id IN ($in_sql) LIMIT 5”;
On my 30k row table the new solution ran in a combined .0147 secs, compared to .0770 secs for ORDER BY RAND(). Not a bad savings if you ask me. Enjoy,
Rob
Rob,
That won
t work well on large datasets (it will eat all mamory sooner or later as data set grows). Acctually it seems to me, that using RAND() in this case will be more acceptable, even in on small data sets, becouse of not a such big performance gain.
Btw, you can improve your code by *not* using shuffle` but instead get 5 values from rand(0, count($random_profile_ids) – 1); (make sure thay do not overlap) and then just pick appropriate values from array.
I have a question regarding order by and limit..
i used to use this system in order to page the results 100 rows at a time but found it to be very slow as the page numbers grew larger and larger..
what i thought to help it scale better is to have something like this:
SELECT inverted_timestamp FROM whatever WHERE visible = ‘yes’ ORDER BY torrents.inverted_timestamp LIMIT 100,1
the inverted_timestamp field and the orderby go together as i m sorting by a few different columns (its like a spreadsheet table which u can page and sort by a few fields – i have added indexes for all the used cases).. the 100 is just the currentpage*perpage which happens to be 100 since i m on the 2nd page (page 1) and have 100 rows per page.
then i do the main query to fetch the actual results:
SELECT users.username,bla bla….. FROM whatever LEFT JOIN categories ON category = categories.id LEFT JOIN users ON torrents.owner = users.id WHERE visible = ‘yes’ AND inverted_timestamp>=’3111748733′ ORDER BY inverted_timestamp LIMIT 100
which i have indxed as (visible,inverted_timestamp) (and similarly for the rest of the columns i order by)
the value >=’311..’ comes from the 1st query which i use to essentially find the first row of that page and then use that value in the main query to prevent the main one (which fetches all the results) to have to do an order by in the entire data set… From my testing it seems to be running very well upto around 1-2 million rows.. from then on the second query is still very fast but the first one is starting to take the hit.. previously i had just teh second query with no >= condition and i used limit offset,perpage which was ok up until 100k rows and then it was really slow…
it seems this solution is an improvement .. my question really is, is there a way to make it better?
to illustrate what i meant above these are the queries generated with a different ordering:
SELECT size FROM whatever WHERE visible = ‘yes’ ORDER BY size DESC LIMIT 3800,1
SELECT users.username,bla bla FROM whatever LEFT JOIN categories ON category = categories.id LEFT JOIN users ON owner = users.id WHERE visible = ‘yes’ AND size
it got cut off:
AND size= value_found) which worked well but only when i ordered by id (so it was limited – though i can still probably use this way when the ordering is done via an integer that has sequential properties)..
the problem really lies in how to speed up the other order by’s beyond what i have done already.
hmm it doesnt like my posts.. it cut it again :/
size
ffs i guess its the single quote that fucks up … i wont bother trying again .. i believe u have understood what i m trying to say
Here is my logic to do random.
$entries = array();
$count = 10;
while ($count > 0) {
$DB->query(“SELECT col1,col2 FROM entries AS e1 INNER JOIN members ON userid=id AND active=1 JOIN (SELECT ROUND(RAND() * (SELECT MAX(entryid) FROM entries)) AS entryid) AS e2 WHERE e1.entryid >= e2.entryid AND e1.photos!=’a:0:{}’ ORDER BY e1.entryid ASC LIMIT 1”);
if ($DB->get_num_rows() > 0) {
$count–;
$entries[‘entries’][] = $this->__processEntry($DB->fetch_array());
}
}
It works pretty well for my standards
Great Weblog, I did often visit it with good results but now, I totally stuck in the simplest thing.
EXPLAIN SELECT p.products_id
FROM products p
ORDER BY p.manufacturers_id ASC
LIMIT 3
Gives me a
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE p index NULL idx_manufacturers_id 5 NULL 3490
could anyone give me a hint why it steps thure 3490 rows, even when using the idx_manufacturers_id?
thanks for any idea (using mysql5) && sorry if this is too stupid post 🙂
Chris
That was extremely helpful. I was having a hard time locating good and clear resources on limit and order optimization. Thanks!
Mikes solution is by far the fastest.
http://jan.kneschke.de/projects/mysql/order-by-rand a.id
This brings trouble as well because the random number is generated inside the query resulting with two different random numbers. This can be overcome by making a pre-query to generate a random number in your application, still leaving you with the first problem.
my choice was to combine the ORDER BY RAND() and the other one.
SELECT * FROM
(SELECT * FROM comments
WHERE count > $number
AND topic = $topic
LIMIT 100)
AS r1 ORDER BY RAND() LIMIT 5
The resulting speed is acceptable. The first LIMIT can be use to adapt the level of random in the query.
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY ALL NULL NULL NULL NULL 5 Using temporary; Using ilesort
2 DERIVED comments ALL NULL NULL NULL NULL 140809 Using where
The query as you can see goes through 140.000 rows in 0.0792 seconds. Not as fast as the other one but a lot more flexible and open for other WHERE statements
Once again a post has been cut into pieces.
But it has one problem. What if you intend to look for multiple rows with another criteria.
Lets say you have 100.000 rows and want to output 5 randomized rows. From the 100.000 rows 200 qualify to be in the resulting rows. You can not run the query 5 times because you might end up with the same row 5 times. If there is a large gap between two qualifying results it can happen that you generate a random number that is in the gap thus giving you the same row over and over again.
Secondly you may hit the end of the set thus generating no output at all. Some ppl suggested to run the query backwards and forwards. i.id (smaller-equal) a.id and the i.id (greater) a.id
This brings trouble as well because the random number is generated inside the query resulting with two different random numbers. This can be overcome by making a pre-query to generate a random number in your application, still leaving you with the first problem.
Addendum:
I forgot to create a index over the concerning collumns. Now the efficiency has increased a lot.
But I just found a problem. The results are not spread evenly. They allways cluster around the begining of the table.
Any way to alter this behaviour?
Great resource Peter, and also thanks to Jan (comment #48) for his “compromise” solution to the ORDER BY RAND() dilemma. I’ve now managed to get a query (testing with 500k rows, requiring < 10 random rows) down from 0.8s execution time to just 0.05s. 🙂
SELECT *, MATCH(title, url, stripped) AGAINST(’test’ IN BOOLEAN MODE) as score FROM sites
WHERE stat = ‘A’
ORDER BY score
having score >0
Hello Peter,
in your presentation on FULLTEXT SEARCH you say:
â— Be care careful with extra where clause if it
filters out a lot of matches.
– select title from cont where match (title,content) against (“global service
company”) and date_added>unix_timestamp(“2006-07-18 18:00:00”)
limit 10;
â— Takes 26 min 43.59 sec
I think, maybe it was the unix_timestamp-function, that slowed things down? Because maybe the unix-timestamp had to be calculated for each of the millions of records, and therefore no index could be used?
Tobias,
In this case unix_timestamp is called for constant so it is not executed millions of time. It also would not block index usage. The problem is once you’re using full text search index you can’t use other index to do sorting.
Peter, Thanks….
This is where I found very useful matters about optimization which I have been looking for so long.
Thanks again.
Mathu Mou
How to optimized this query in PHP
$sqls=”select id,state from states”;
$rss=mysql_query($sqls) or die(mysql_error());
$num_rowss=mysql_num_rows($rss);
$count=1;
while($num_rowss && $datas=mysql_fetch_assoc($rss))
{
$state_id=$datas[‘id’];
$state=$datas[‘state’];
$sqlc=”select * from companies,package where comp_status=1 and comp_account_site=0 and comp_balance >0 and comp_mem_pac=pac_id order by pac_amount DESC, rand()”;
$rsc=mysql_query($sqlc) or die(mysql_error());
$num_rowsc=mysql_num_rows($rsc);
$checkonlyonetime=1; // header
while($num_rowsc && $datac=mysql_fetch_assoc($rsc)) // loop of company
{
$comp_id=$datac[‘comp_id’];
$comp_name=$datac[‘comp_name’];
$comp_mapped_city=$datac[‘comp_mapped_city’];
@$comp_mapped_city=explode(“,”,trim($comp_mapped_city,”,”));
}
}
… LIMIT optimization 🙂
SELECT *
FROM forum_posts AS pa
LEFT JOIN forum_posts_text AS pb ON pa.post_id = pb.post_id
LEFT JOIN forum_users ON user_id = post_poster
WHERE post_topic_id = ‘450’
ORDER BY pa.post_id ASC
LIMIT 5475 , 15
versus
SELECT *
FROM forum_posts AS pa
LEFT JOIN forum_posts_text AS pb ON pa.post_id = pb.post_id
LEFT JOIN forum_users ON user_id = post_poster
WHERE post_topic_id = ‘224’
AND pa.post_id >= (
SELECT MAX( post_id )
FROM (
SELECT post_id
FROM forum_posts
WHERE post_topic_id = ‘224’
ORDER BY post_id ASC
LIMIT 6075 , 1
) AS tmp )
ORDER BY pa.post_id ASC
LIMIT 15
~ 0.09 sec versus ~ 0.004 sec
Tables with posts: 2×292,000+ rows (25MB + 89MB)
Topic: 6092 rows (rows that meet condition WHERE post_topic_id = ‘224’)
Source: http://www.easemarry.com/blog/mysql-limit-optimization/
If you found something better, I would be glad to hear.
Here SELECT MAX(post_id) FROM (SELECT …) is not necessary. It’s redundant
(it’s obvious ‘cos that subquery extracts only one post_id – see LIMIT x, 1)
SELECT *
FROM forum_posts AS pa
LEFT JOIN forum_posts_text AS pb ON pa.post_id = pb.post_id
LEFT JOIN forum_users ON user_id = post_poster
WHERE post_topic_id = ‘224’
AND pa.post_id >= (
SELECT post_id
FROM forum_posts
WHERE post_topic_id = ‘224’
ORDER BY post_id ASC
LIMIT 6075 , 1)
ORDER BY pa.post_id ASC
LIMIT 15
I have this query, but the orderby and the last 2 left joins affect considerably to the performance.
Any idea how I can optimize this query?
Thanks.
SELECT
*
FROM
message
LEFT JOIN
accounts a ON message_account = a.id
LEFT JOIN
opportunities o ON message_project = o.id
LEFT JOIN
leads l1 ON (UCASE(message_from) LIKE CONCAT(\’%\’, UCASE(l1.email1), \’%\’)
AND l1.email1 \’\’) OR (UCASE(message_from) LIKE CONCAT(\’%\’,
UCASE(l1.email2), \’%\’) AND l1.email2 \’\’)
LEFT JOIN
leads l2 ON (UCASE(message_to) LIKE CONCAT(\’%\’, UCASE(l2.email1), \’%\’)
AND l2.email1 \’\’) OR (UCASE(message_to) LIKE CONCAT(\’%\’,
UCASE(l2.email2), \’%\’) AND l2.email2 \’\’)
WHERE
message_visibility
AND message_mailbox = UCASE(\’user\’)
AND message_visibility 1
ORDERBY
message_ocode DESC
One tweak that we do for large limit offsets on pagination is to first select columns that are part of the index. Typically this would be the id’s. After that we use an “IN” to select the the other data columns from the same table. We find that there is a vast improvement on selects that use large offsets.
Example (category,id index)
select photo.id, photo.description from photo where category=1 limit 30000000,20
Query time: 4 seconds
To speed things up we split the call into 2 queries
select photo.id from photo where category=1 limit 30000000,20
Query time: 0.030 (this just uses the index so no traversing the data table)
Select photo.id, photo.description from photo where photo.id IN (x1,x2…x20)
The x1 – x20 are the results the first query returns
Query time: 0.020
So with a simple change like this we sped things up by a couple of orders of magnitude and allow our users to use large pagination.
I assume it has something to do that the the first query only uses the index to get the ids and does not have to jump around getting a larger data set.
Cheers.
Radek, and what about if I want to order the resultset?
eg: select id …. order by id desc ..
Have you tried?
I’m Joining multiple tables, so the join must be done in the second query..
Unfortunately, with your solution I have to use 3 queries:
– SELECT COUNT(*) .. (for total rows that meets the criteria blabla)
– SELECT id … LIMIT x,y
– SELECT id, name .. IN (x1,x2…x20)
Anyway.. thank you for the tweak!
Yes Sergio, You dont need to do the count as usually you have the page number or offset in the get varaible as you go from page to page. Typically we do a count first to create all the pagination stuff. Current page, total pages, etc.
Example of it in action here. http://www.pinkbike.com/photo/list/?date=all&page=6
You can page to page 100,000 (24 per page, so the offset is 2.4 million) without and slow down. If this was done in one call it literally takes seconds to generate offsets of 2,000,000
Yes, you can order by in the first query by id no problem. The id is indexed so no penalty there. Also your result set can be ordered by also but that is fast as you only have 24 items, or however many you show on the page.
Cheers
Hey Radek,
Yes, I was talking about this: “Typically we do a count first to create all the pagination stuff. Current page, total pages, etc.” – you have to create a counter for pagination information.
Thank you for your solution, maybe I will try it on other projects. Right now I’m fine with solution posted above by me.
Hi, was wondering if there is a way to stop a SELECT COUNT(*) from continuing once it reached a max number of rows. Eg, i want to do something like:
SELECT count(*) FROM table_a LIMIT 1000;
So if the table has more than 1k rows, it should just return and say “1000”. What i’m trying to avoid is have the entire table scanned, in particular if it has millions of rows. Basically the question i want to ask is: “Does table X have at least N rows?”.
Hi Radek,
I am trying replay this from your post:
# select photo.id from photo where category=1 limit 30000000,20
# Query time: 0.030 (this just uses the index so no traversing the data table)
but withou success. Can you help me?
In MySQL I do:
> create table c (id integer auto_increment, a integer, b integer, primary key(id));
> insert into c (a,b) value (1,2),(5,6),(3,9),(122,22),(32,54);
> insert into c (a,b) value (1,2),(5,6),(3,9),(122,22),(32,54);
about 21x times i run this to fill table:
> insert into c (a,b) select a,b from c;
So I have:
> select count(*) from c;
+———-+
| count(*) |
+———-+
| 20971520 |
+———-+
1 row in set (0.00 sec)
> show index from c;
+——-+————+———-+————–+————-+———–+————-+———-+——–+——+————+———+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+——-+————+———-+————–+————-+———–+————-+———-+——–+——+————+———+
| c | 0 | PRIMARY | 1 | id | A | 20971520 | NULL | NULL | | BTREE | |
+——-+————+———-+————–+————-+———–+————-+———-+——–+——+————+———+
1 row in set (0.00 sec)
And finally:
> explain select id from c limit 20000000,20;
+—-+————-+——-+——-+—————+———+———+——+———-+————-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+—-+————-+——-+——-+—————+———+———+——+———-+————-+
| 1 | SIMPLE | c | index | NULL | PRIMARY | 4 | NULL | 20971520 | Using index |
+—-+————-+——-+——-+—————+———+———+——+———-+————-+
1 row in set (0.00 sec)
> select id from c limit 20000000,20;
+———-+
| id |
+———-+
| 20000001 |
……..
| 20000020 |
+———-+
20 rows in set (4.58 sec)
So even if I use indexes, the paginationing on huge table is very slow 🙁 Any idea why?
My MySQL is 5.0.51a-18
PC: Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz, bogomips: 4776.06, 8GB RAM, and system including mysql id on SSD hard drive
What I am doing wrong?
Thanks
bozek, is it possible that your key_buffer size is set too small and therefore the whole index for your 30 million records is not large enough and therefore not caching the index? Not on my application this is set larger than the total index sizes for ALL tables and not just this one.
As you have somthing like this SELECT * FROM table LIMIT 10
It working with this script SELECT * FROM table LIMIT 0,10
Is there anyway to accomplish the same thing as this query, avoiding a temporary table?
select ID_MEMBER_STARTED, COUNT(*) as hits from smf_topics group by ID_MEMBER_STARTED order by hits desc limit 20;
There is an index on ID_MEMBER_STARTED and the table type is InnoDB.
Thanks dude that is rather helpful!
My query performance is very slow for large database.Table A has 2050702 records in it. ID is primary key and index on time.
SELECT ID, Name, Time from TableA WHERE Time >= ‘2009-03-24 12:53:00’ and ID > 2050702
Order By ID DESC LIMIT 100
I would like to know ways to optimize the query above. SO retrieve last 100 records from TAble A matching the criteria in where clause
Add a query on (ID, Time) and re-order the WHERE clause to “ID > 2050702 and Time >= 2009-03-24 12:53:00”.
MySQL can only use one index in a query (basically), so you need an index that contains all the columns you are filtering and ordering by, in the same order.
Sorry, that should be “Add an index on” not “Add a query on”.
I have a query that takes 14 seconds to execute:
SELECT posts.* FROM posts force index (idx_post_date) INNER JOIN follow ON follow.followuserid = posts.userid WHERE follow.userid=’61585′ ORDER BY date DESC LIMIT 0, 20;
The table posts has ~1000000 records and the number of records in the posts table belonging to userid 61585 is 0.
The table follow has 63537 records.
This is how my indexes are set:-
mysql> show index from follow;
+——–+————+———-+————–+————–+———–+————-+———-+——–+——+————+———+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+——–+————+———-+————–+————–+———–+————-+———-+——–+——+————+———+
| follow | 0 | PRIMARY | 1 | userid | A | NULL | NULL | NULL | | BTREE | |
| follow | 0 | PRIMARY | 2 | followuserid | A | 63537 | NULL | NULL | | BTREE | |
+——–+————+———-+————–+————–+———–+————-+———-+——–+——+————+———+
mysql> show index from posts;
+——-+————+——————+————–+————-+———–+————-+———-+——–+——+————+———+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+——-+————+——————+————–+————-+———–+————-+———-+——–+——+————+———+
| posts | 0 | PRIMARY | 1 | id | A | 1047467 | NULL | NULL | | BTREE | |
| posts | 1 | idx_post_date | 1 | date | A | 1047467 | NULL | NULL | | BTREE | |
| posts | 1 | idx_posts_userid | 1 | userid | A | 22771 | NULL | NULL | | BTREE | |
+——-+————+——————+————–+————-+———–+————-+———-+——–+——+————+———+
mysql> explain SELECT posts.* FROM posts force index (idx_post_date) INNER JOIN follow ON follow.followuserid = posts.userid WHERE follow.userid=’61585′ ORDER BY date DESC LIMIT 0, 20;
+—-+————-+——–+——–+—————+—————+———+——————————-+——+————-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+—-+————-+——–+——–+—————+—————+———+——————————-+——+————-+
| 1 | SIMPLE | posts | index | NULL | idx_post_date | 4 | NULL | 20 | |
| 1 | SIMPLE | follow | eq_ref | PRIMARY | PRIMARY | 8 | const,wwwproject.posts.userid | 1 | Using index |
+—-+————-+——–+——–+—————+—————+———+——————————-+——+————-+
Is there a way to optimize this query i have now been trying for quite some time but could not find a solution any help would be very much appreciated??
Good article. I didn’t realize the join order would affect the ORDER BY clause.
Thanks for the article, but I have an additional feature to throw at you.
Suppose I request to select, say 10, ordered some way. If none of the 10 fit any additional constraints I may have, I would like to fetch the second best 10 set, and on like this until I get the record that fits all my constraints.
How do I ensure the first 10 will not show up?
Let me just add that the additional constraints may eventually be associated with features that are outside the database.
Cheers,
Carlos
I have a query :
In this table category_id, date_created is indexed.
EXPLAIN select newsid from news where category_id=53 and approval=’yes’ order by date_created desc limit 0,10
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE news index PRIMARY,category_id date_created 9 NULL 216 Using where
the above query examines 216 rows to fetch only 10 records.
Whereas the below query examines 10 rows which it should, (here i am not searching by category_id):
EXPLAIN select newsid from news where approval=’yes’ order by date_created desc limit 0,10
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE news index NULL date_created 9 NULL 10 Using where
Why does the first query examine 216 rows? Is this fine or does the query need to be optimised futher.
Please advice, thanks in advance.
Reshma,
you seems to have poor index setup to begin with. You have “index scan” while you should have ref if you have (category_id,approval,date_created) for all queries.
If Index is scanned the more rows have to be scanned to provide 10 results the more selective where clause is.
Hello Peter,
Thanks for the quick response.
I have removed the index on category_id. After removing it Explain for the below select query is as follows:
EXPLAIN select newsid from news where category_id=53 and approval=’yes’ order by date_created desc limit 0,10
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE news index NULL date_created 9 NULL 10 Using where
This is fine i guess.
Another question is: If there are 56 records for category_id=53 and i want to display 100 records for all given categories in news table, why does below query examine 100 records for category_id=53 when there are only 56 of them?
EXPLAIN select newsid from news where category_id=53 and approval=’yes’ order by date_created desc limit 0,100
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE news index NULL date_created 9 NULL 100 Using where
Most of the rows are displayed based on category_id, so I want to know if I am doing it correctly.
If this is confusing I could send the table structures and queries.
In this post you have mentioned about DOS attack, do you mean paging records can cause DOS attack ?
Any help would be highly appreciated.
Hi Peter
We have a tool which is processing 30 million records, this tool is a migration project where we are using perl script to do that
we need to compare two tables where we are doing like this
for using counter of 50,000 records in chunk size
and then we are using
select column1, column2 from table order by table limit index_num(21 million), (500,000)
initially for smaller dataset, things are working fine (26 secs to 50 secs)
when index_num is growing, it is taking almost 30 mins to do that
We want to optimize this one.
As per your recent posts,
we need to have index, for column_1, we are creating indexes ..still it is not optimal.
Hey Peter,
I’ve been reading your articles for some time now, especially since I’ve been running into issues with wrapping my head around this whole MySQL optimization stuff.
Anyways, I get the jist of it, however, I run a real estate website (displays listings, etc, in a search results page), and there are just too many variables to try and cover an index on each scenario… I’d end up with 20 indexes, and I fear that any update/insert/delete queries to that table would blow up the internet.
I’m not looking for anybody to write me anything, I’m just looking to get a better understanding of this:
SELECT
id
,… more *necessary* columns …
FROM
listings
WHERE
category
= 1AND
country
= 1AND
state
= 45AND
city
= 35859AND
visible
= 1ORDER BY
price
ASCLIMIT 0, 10
Now,
category
numbers represent different listing types, ie. 1 = For Sale, 2 = For Rent, 3 = Pending Sale, and so on. 1 (For Sale is by the far the most popular, representing at least 75% of all listings in the table; probably more).country
,state
,city
are always in the query.visible
is either a 1 or 0. 1 meaning the listing has not expired, and 0 meaning it has. Probably 85%+ instances ofvisible
= 1 in the table.Now, there are also other WHERE conditions that are possible, like WHERE
price
= 300000, orbedrooms
= 2, orbathrooms
= 3, etc. And who knows what the user is going to enter in.Then, there is the order by on top of that.
What I have been looking for is how to accommodate these kinds of variables when there are so many possible scenarios? When I do an ORDER BY
price
DESC, it can take 4-5 seconds, maybe more… I guess that’s not the end of the world, but our traffic is always growing, and that will end up being unacceptable.Any thoughts?
– Batman
Hi,
This is a hard one especially if you add ranges as MySQL will not be able to both use range for selection and sort by different column. You may and up with large scans and sorts which can use many resources. Sphinx (http://www.sphinxsearch.com) is often a good alternative in such cases even if you’re not using full text search query
Thanks for this artikel!! Was what I was looking for.
Thanks for the quick response. VEry Nice
Hello Peter,
I am going through your article its found pretty much inserting. I need one help as i am displaying my property query which is displaying properly and ordering the result by following 4 column
p.is_move_up DESC,REPLACE(IFNULL(p.is_feature,””),”2015-12-31″,””) DESC,
p.listing_order DESC,p.propid DESC
In order site, properties are been displayed in this order: Moveup paid, Paid featured, paid, trial then expired. However, we want to show within each group the properties random and make them rotate. Now,
the properties do not rotate so it is not fair for users to see always the same properties in same order. So, we need the rotation to be within same group not mixing all.
Query :-
SELECT DISTINCT(p.propid),p.category_id,p.people_sleep,p.bedrooms,p.bathrooms,p.children,
p.airport,p.features,p.pets,p.smoke, p.wheelchair,p.prop_title,p.property_type,prop_date,
display_calender,other_url_calender,is_feature
FROM property p
LEFT JOIN category ON (p.category_id=category.id)
WHERE p.active IN(“yes”,”d”) AND p.is_feature >= SYSDATE()
ORDER BY p.is_move_up DESC,REPLACE(IFNULL(p.is_feature,””),”2015-12-31″,””) DESC,
p.listing_order DESC,p.propid DESC
Can you please help me into this.
Hi Peter,
I have one question regarding your comment
“Sort by column in leading table if you have JOIN with ORDER BY … LIMIT you should try hard to have sorting column(s) to be in the leading table. If ORDER BY is going by field from the table which is not first in the join order index can’t be used.”
I have a similar situation where I can not have sorting columns in leading table; in such case please suggest any work around.
Thanks in anticipation.
Rbk
Thank you so much for this great post! It was so much of a huge help and just exactly what I needed to know and understand right now. It is always great how you can present a topic so logically and make it look so easy and understandable even for not yet highly experienced people like me! 🙂
Good Article. Thank you for that!
Peter, in the 2nd example where the query was “SELECT * FROM sites WHERE category_id=5 ORDER BY date_created DESC LIMIT 10”, you suggested that using an index on (date_created,category_id) might fasten the look-up. But if you use this index, queries won’t be able to use the index on category_id because it comes 2nd in the compound index. So won’t this compensate the gain earned by using the compound index?
@Cupidvogel:
You should create another index in this case for category_id. There is always tradeoff between performance and space. Space will always remain cheap compared to the performance gain.
Thanks.
I made a simple test case comparing performance of the LIMIT approach vs the JOIN workaround. You can see the results here: http://devoluk.com/sql-pagination.html
Hello everybody,
We tried the solution base on query see in older post :
SELECT * FROM sites WHERE category_id=5 ORDER BY date_created DESC LIMIT 10
changed to
select * FROM (SELECT * FROM sites WHERE category_id=5 ORDER BY date_created DESC) as my_table LIMIT 10
It works for us !
Super helpful and clear explanation! I was able to use the “sort by column in leading table” and “force index” approaches to avoid a file sort. Yay!
We’ve 1 query written in our legacy system. (using MySQL 5.5)
Now, with growing data – the below mentioned query taking huge time.
In our system, we’ve somewhat 200,00,00,000 (2 billion rows) approx 650 GB of data.
Table is partitioned with respect to every day. (which means above query is fetching data from 30 partitions).
=> Allocate 16GB to innodb_buffer_pool_size.
=> Index is there on START_TIME
Query-1
SELECT * FROM ( SELECT a,b,c,d,e,f,g,h,i,j,k,l FROM TEST WHERE START_TIME between ‘2013-11-14 00:00:01’ and ‘2013-12-14 23:59:59’ ORDER BY START_TIME DESC) as TEST_DATA LIMIT 10000;
Above Query => Means selecting all the columns for all the data between 1 month and performing sorting and at last show 10000 records to end user.
Now, my doubt goes: Query-2
SELECT a,b,c,d,e,f,g,h,i,j,k,l FROM TEST WHERE START_TIME between ‘2013-11-14 00:00:01’ and ‘2013-12-14 23:59:59’ ORDER BY START_TIME DESC limit 10000;
Above Query => selecting all the columns from 1 month of data and perform sorting and display the result as soon as 10000 records sorted. (No sorting and buffering of all records).
With Query-1 and Query-2 ->
1> Does these 2 queries will display different result set? Or same?
2> Any other impact on performance?
Because in Query-1, we doing sorting on all records and then display 10k
whereas in Query-2, we doing display 10k sorted records.
Thanks a lot for your help.
In the software for the directory, all categories are written the same.
1 of the categories is continuing to fail when selected, although 100% of the sub-categories and listings in the parent category are available.
I continue to get the following error in response to the attempt at accessing MySQL:
[09-07-2014 07:25:38pm] Fatal Error: MySQL error.Query:SELECT id, friendly_url, IF(date_update!='0000-00-00 00:00:00',date_update,date) AS date, priority FROM pmd_listings WHERE status='active' ORDER BY id ASC LIMIT 180000, 10000Error: (3) Error writing file ‘/var/db/mysql-tmp/MYnTfXMV’ (Errcode: 28) in /home/citydir/public_html/prevpmd/includes/class_database.php on line 132
[09-07-2014 07:27:38pm] Fatal Error: MySQL error.Query:SELECT SQL_CALC_FOUND_ROWS l.* FROM pmd_listings l INNER JOIN pmd_listings_categories lc ON l.id=lc.list_id WHERE lc.cat_id=192 AND l.status='active' ORDER BY priority DESC LIMIT 0,10Error: (126) Incorrect key file for table ‘/var/db/mysql-tmp/#sql_e66b1_0.MYI’; try to repair it in /home/citydir/public_html/prevpmd/includes/class_database.php on line 132
[09-07-2014 07:28:55pm] Fatal Error: MySQL error.Query:SELECT id, friendly_url, IF(date_update!='0000-00-00 00:00:00',date_update,date) AS date, priority FROM pmd_listings WHERE status='active' ORDER BY id ASC LIMIT 1710000, 10000Error: (3) Error writing file ‘/var/db/mysql-tmp/MYeKsGGZ’ (Errcode: 28) in /home/citydir/public_html/prevpmd/includes/class_database.php on line 132
I can’t get a response from the developer who wrote the software, so I’m left to my own devices in finding a resolve.
I would appreciate any help you may render.
waka waka
What about an ORDER BY on a SUM()ed column with a LIMIT on a table that has 18 million rows?
What would you recommend on something like that?
Thanks for this good informations!
We were facing performance issues with querying a InnoDB table in MySql 5.6 instance. This table has about 72 million rows, has only 28 columns. There are indexes on the columns used in where and order by clauses of the query.
In the following query, col1: String (6 characters length), col2: Date, col3: Time
Indexes: idx_col1col2 on (col1, col2) and idx_col2col3 on (col2, col3)
select * from force index (idx_col1col2) force index for order by (idx_col2col3) where col1= and col2 >= order by col2 desc, col3 desc limit 1000;
The above query was taking about 30 seconds to a minute to return results. As you can see we used force index hints for where as well as order by.
Could you please advise.
Adding index on (col1, col2, col3) helped! Thanks.
Hi Peter,
Could you suggest improvement that I can do in following query. It’s taking 8 seconds to execute and I am afraid it will take more as table fill up with more data.
EXPLAIN SELECT
w
.Id
,w
.PartNumber
,w
.PartDescription
,w
.CreatedDate
,w
.ErpKey
,w
.IsActive
, COALESCE(w.LookupManufacturer
.Value
, ”) ASManufacturer
, COALESCE(w.LookupUom
.Value
, ”) ASUom
,w
.IsSerialized
ASSerialized
,w
.IsIMU
ASImu
, COALESCE(w.LookupAtaChapter
.Value
,”) AS
AtaChapter
,w.LookupGroup
.Value
ASGroup
,w
.IsCharge
FROM
tab
.rfqpart
ASw
LEFT JOIN
tab
.lookupgroup
ASw.LookupGroup
ONw
.RefLookupGroupId
=w.LookupGroup
.Id
LEFT JOIN
tab
.lookupatachapter
ASw.LookupAtaChapter
ONw
.RefLookupAtaChapterId
=w.LookupAtaChapter
.Id
LEFT JOIN
tab
.lookupuom
ASw.LookupUom
ONw
.RefLookupUomId
=w.LookupUom
.Id
LEFT JOIN
tab
.lookupmanufacturer
ASw.LookupManufacturer
ONw
.RefLookupManufacturerId
=w.LookupManufacturer
.Id
WHERE (
w
.RfqAccountId
= ‘eccd587f-9684-42a4-bae3-5b043fbea1f5’) AND ((w
.IsCharge
= FALSE) AND (w
.IsActive
= TRUE))ORDER BY
w
.PartNumber
LIMIT 20 OFFSET 10;
Possible_Keys: IX_AccountId_IsCharge_IsActive
Key: IX_AccountId_IsCharge_IsActive
Key_Length: 124
ref: cost,const,const
rows: 570677
filtered: 100
Extra: Using temporary; Using filesort
I would appreciate any help you may render.