September 17, 2014

ORDER BY … LIMIT Performance Optimization

Suboptimal ORDER BY implementation, especially together with LIMIT is often the cause of MySQL Performance problems.
Here is what you need to know about ORDER BY … LIMIT optimization to avoid these problems

ORDER BY with LIMIT is most common use of ORDER BY in interactive applications with large data sets being sorted. On many web sites you will fine top tags, recently registered users etc – which would often require ORDER BY with LIMIT in the back end. In general this type of ORDER BY looks like: SELECT ….. WHERE [conditions] ORDER BY [sort] LIMIT N,M

Make sure it uses index It is very important to have ORDER BY with LIMIT executed without scanning and sorting full result set, so it is important for it to use index – in this case index range scan will be started and query execution stopped as soon as soon as required amount of rows generated.

For example if I do SELECT * FROM sites ORDER BY date_created DESC LIMIT 10; I would use index on (date_created) to get result set very fast.

Now what if I have something like SELECT * FROM sites WHERE category_id=5 ORDER BY date_created DESC LIMIT 10;

In this case index by date_created may also work but it might not be the most efficient – If it is rare category large portion of table may be scanned to find 10 rows. So index on (category_id, date_created) will be better idea.

Lets take a look at a bit more complex case: SELECT * FROM sites WHERE category_id in (5,10,12) ORDER BY date_created DESC LIMIT 10;

Even though it looks quite similar to previous one it is a lot different as there are multiple category_id values in the list now so index on (category_id, date_created) can’t be used directly. Index on date_created separately would still work. The good from performance standpoint (even though a bit ugly) will be UNION workaround I already wrote about.

So what if you have application which can perform search on many different columns, with worse then perfect selectivity. Various social networking and dating sites are perfect example of such queries

SELECT FROM people where gender=’m’ and age between 18 and 28 and country_id=5 and city_id=345 order by last_online desc limit 10;

There could be many possible limiting factors with all of them being optional. This is hard nut to crack and I know on high end custom search solutions can be developed, but if we stick to simple MySQL using multiple indexes on most selective columns would be good idea for performance of such queries.

For example you may put index on(gender,last_online) assuming most people will have gender specified, as well as (country_id,city_id,last_online) assuming in most cases these will be specified. It takes a good look at queries actually being run and data selectivity to come up with good set of indexes for such case, it also may need to be adjusted in the future.

The main thing to watch for, if you do not have full where clause resolved by index is how many rows you need to scan to resolve order by (this can be found in slow query log or by examining Hander statistics). If only 50 rows are examined to provide
10 rows of result set you’re in decent shape but if it is 5000 you might need to rethink your indexing.

Also note – number of records scanned to provide result set will be very dynamic based on particular constant and other factors.
For example for our dating example if we use only (last_online) index and look for people from USA we likely will find 10 people pretty quickly, if the country is small or simply there are few members from the country, ie Slovenia – same kind of search might need to scan 1000s times more rows to provide result set.

In the example above we did order by by last column, in fact index can be used for ORDER BY if sorting is done by leading column(s). Note however columns following column used for order by can’t be used to restrict result set. For example:

key(a,b,c) SELECT * FROM tbl WHERE c=5 ORDER BY a,b limit 10 – In this case first two columns from the index can be used to satisfy order by, index however will not be helpful to check c=5 (unless it is index covered query). Index on
(c,a,b) would work better for query above.

Do not sort by expressions I guess this one is obvious – expressions or functions will block index usage for order by.

Sort by column in leading table if you have JOIN with ORDER BY … LIMIT you should try hard to have sorting column(s) to be in the leading table. If ORDER BY is going by field from the table which is not first in the join order index can’t be used. Sometimes it means breaking normalization and duplicating column(s) you’re going to use in ORDER BY in other table.

Here is example when ORDER BY is done by second table which requires filesort:

However if first table has “const” or “system” access type it is effectively removed from join execution (replaced with constants) and so ORDER BY can be optimized even if it is done by second table:

The difference between these cases is “i” is primary key while “k” is simply indexes column.

Note: In some cases even if it is possible to use index to do ORDER BY with JOIN MySQL still will not be able to use it as Optimizer is not smart enough yet to detect such cases:

In this case there is index (k,j) on the table so indexes could be used on each of the tables to optimize order by, or at least local sort could be used for each t.k=const value for the second table. Which is not done however.

Sort in one direction. If you have ORDER BY col1, col2 it can be optimized using index, if you have
ORDER BY col1 DESC, col2 DESC same thing, however if you would have ORDER BY col1, col2 DESC MySQL will have to use filesort. Classic for solution for this would be to have index which is sorted appropriately (ascending by col1 and descending by col2) but MySQL can’t do it at this point. Workaround which can be currently used is separate column which holds reverse values, so you can do ORDER BY col1, col2_reverse instead.

Beware of large LIMIT Using index to sort is efficient if you need first few rows, even if some extra filtering takes place so you need to scan more rows by index then requested by LIMIT. However if you’re dealing with LIMIT query with large offset efficiency will suffer. LIMIT 1000,10 is likely to be way slower than LIMIT 0,10. It is true most users will not go further than 10 page in results, however Search Engine Bots may very well do so. I’ve seen bots looking at 200+ page in my projects. Also for many web sites failing to take care of this provides very easy task to launch a DOS attack – request page with some large number from few connections and it is enough. If you do not do anything else make sure you block requests with too large page numbers.

For some cases, for example if results are static it may make sense to precompute results so you can query them for positions.
So instead of query with LIMIT 1000,10 you will have WHERE position between 1000 and 1009 which has same efficiency for any position (as long as it is indexed)

Force index if needed In some cases MySQL Optimizer may prefer to use different index, which has better selectivity or just better estimates instead of which allows you to do the sort. For example if you would have indexes on (country_id,city_id) and index on (country_id,last_online) for query SELECT * FROM people WHERE country_id=5 and city_id=6 order by last_online desc limit 10; first index will be likely selected even if it leads to filesort.

The solution for this problem is ether extending your indexes so MySQL Optimizer does not have to chose between better sort or better lookup or use FORCE INDEX to force it to use appropriate index.

One more note about ORDER BY … LIMIT is – it provides scary explain statements and may end up in slow query log as query which does not use indexes, even if it is quite fast:

See – “rows” is showing us there are estimated 1.6 million of rows to be scanned, while we well know it will be just 5 in this case.

Many of the tips I’ve mentioned here work for ORDER BY without LIMIT as well but there are some differences. I should write another article about ORDER BY without limit and large tables soon.

About Peter Zaitsev

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.

Comments

  1. Jazz says:

    Hi Peter,
    one of our application query (book search) it takes more than 3 mins for execution. Below i have given the query & cnf details.

    Select identifier,sku_id,vol_num,TITLE,TITLE_OPT,AUTHOR_TITLE,AUTHOR_FNAME,AUTHOR_MNAME,AUTHOR_LNAME, AUTHOR_ALIAS,AUTHORS, ORIG_PUBLISH_DATE, match(TITLE,AUTHOR_TITLE,AUTHOR_FNAME,AUTHOR_MNAME,AUTHOR_LNAME,AUTHOR_ALIAS, AUTHORS,SUBTITLE,ORIG_PUBLISH_DATE,DESCRIPTION,OTHER_INFO,CATEGORY,REMARKS) against (‘+ history’ in boolean mode) as score
    from bp_search as bp
    where match(TITLE,AUTHOR_TITLE,AUTHOR_FNAME,AUTHOR_MNAME,AUTHOR_LNAME,AUTHOR_ALIAS,AUTHORS,SUBTITLE,ORIG_PUBLISH_DATE,DESCRIPTION, OTHER_INFO,CATEGORY,REMARKS) against (‘+ history’ in boolean mode) group by group_id order by score desc limit 0,10;

    TABLE DESC
    ——————–
    CREATE TABLE bp_search (
    BOOK_ID int(11) NOT NULL default ’0′,
    GROUP_ID varchar(128) default NULL,
    IDENTIFIER varchar(64) default NULL,
    SKU_ID varchar(15) default NULL,
    TITLE varchar(1600) default NULL,
    AUTHOR_TITLE varchar(100) default NULL,
    AUTHOR_FNAME varchar(256) default NULL,
    AUTHOR_MNAME varchar(256) default NULL,
    AUTHOR_LNAME varchar(2048) default NULL,
    AUTHOR_ALIAS varchar(128) default NULL,
    AUTHORS varchar(512) default NULL,
    SUBTITLE varchar(1600) default NULL,
    ORIG_PUBLISH_DATE varchar(64) default NULL,
    DESCRIPTION varchar(1024) default NULL,
    OTHER_INFO varchar(1024) default NULL,
    CATEGORY varchar(3072) default NULL,
    REMARKS varchar(1024) default NULL,
    VOL_NUM varchar(40) default NULL,
    SOURCE varchar(20) NOT NULL,
    TITLE_OPT varchar(1600) default NULL,
    LAST_UPDATE timestamp NOT NULL default ’0000-00-00 00:00:00′,
    KEY GROUP_ID_idx (GROUP_ID),
    FULLTEXT KEY book_semantics (TITLE,AUTHOR_TITLE,AUTHOR_FNAME,AUTHOR_MNAME,AUTHOR_LNAME,AUTHOR_ALIAS,AUTHORS,SUBTITLE,ORIG_PUBLISH_DATE,DESCRIPTION,OTHER_INFO,CATEGORY,REMARKS)
    ) ENGINE=MyISAM DEFAULT CHARSET=utf8

    CNF
    ——-

    [mysqld]
    socket = /var/lib/mysql/mysql.sock
    skip-locking
    key_buffer = 384M
    max_allowed_packet = 256M
    table_cache = 512M
    sort_buffer_size = 2M
    read_buffer_size = 2M
    read_rnd_buffer_size = 1M
    myisam_sort_buffer_size = 64M
    thread_cache_size = 80
    query_cache_size = 32M
    thread_concurrency = 8
    query_cache_type = 1
    query_cache_limit = 32M
    thread_stack=256K
    myisam_data_pointer_size=6
    net_buffer_length = 8K
    log-slow-queries=/var/lib/mysql/astdb01.as.osti.local-slow.log
    log-warnings=2
    max_connect_errors=1000
    max_connections=1000
    max_heap_table_size=1073741824
    local_infile=0
    ft_min_word_len=3
    tmpdir = /opt/
    key_buffer_size=256M
    connect_timeout=180
    wait_timeout=360
    tmp_table_size=851443712

    Can you pl help me with this?

  2. Dieter says:

    Thanks for the quick response. VEry Nice

  3. danial Disoosa says:

    ___________+_______________+________________+_____________

  4. james says:

    Hello Peter,

    I am going through your article its found pretty much inserting. I need one help as i am displaying my property query which is displaying properly and ordering the result by following 4 column

    p.is_move_up DESC,REPLACE(IFNULL(p.is_feature,””),”2015-12-31″,””) DESC,
    p.listing_order DESC,p.propid DESC

    In order site, properties are been displayed in this order: Moveup paid, Paid featured, paid, trial then expired. However, we want to show within each group the properties random and make them rotate. Now,
    the properties do not rotate so it is not fair for users to see always the same properties in same order. So, we need the rotation to be within same group not mixing all.

    Query :-

    SELECT DISTINCT(p.propid),p.category_id,p.people_sleep,p.bedrooms,p.bathrooms,p.children,
    p.airport,p.features,p.pets,p.smoke, p.wheelchair,p.prop_title,p.property_type,prop_date,
    display_calender,other_url_calender,is_feature
    FROM property p
    LEFT JOIN category ON (p.category_id=category.id)
    WHERE p.active IN(“yes”,”d”) AND p.is_feature >= SYSDATE()
    ORDER BY p.is_move_up DESC,REPLACE(IFNULL(p.is_feature,””),”2015-12-31″,””) DESC,
    p.listing_order DESC,p.propid DESC

    Can you please help me into this.

  5. Manasi Save says:

    Hi Peter,

    I need to know the syntax if I want to use index hint. [USE INDEX FOR ORDER BY (idx_name)] when we are using Joins in the query.

    For Ex:- Select a.test_id, b.test_data, c.test_details From test a left join test_data b on a.test_id = b.test_id left join test_details c on b.test_id = c.test_id
    Where a.test_flag = ‘Y’
    Order by c.test_sum Desc limit 50;

    Thanks.

  6. Jay Pipes says:

    Hi! Great article (as always). Just wanted to point out one quick exception to this:

    “Do not sort by expressions”

    On a FULLTEXT index, if you have a MATCH … AGAINST … in the WHERE expression as well as the ORDER BY expression (for instance, for the purpose of ordering by relevancy), then the FULLTEXT index will indeed by used and the ORDER BY will be quite efficient, as it is *not* reexecuted.

    Cheers, and keep up the fantastic articles, Peter and Vadim!

    Jay

  7. Thanks for another great blog post!

    “Do not sort by expressions”

    Picking random rows is another thing that social networking pages use a lot (random x users that live in the same city as you; random x users that share some attribute with you).

    ORDER BY RAND() is evil but what alternatives are there? Is there a best practice to tackle this issue with PHP5/MySQL5, especially when I only need little amounts (

  8. Something got stripped away:

    Is there a best practice to tackle this issue with PHP5/MySQL5, especially when I only need little amounts (less than 20) of random rows?

  9. Tinus says:

    Good point Maarten, I hope somebody knows an alternative / best practice?

  10. Apachez says:

    One way is to have something like a unique rowid in that table. Then in the script choose a random value and try to select a row matching that number.

    One algorithm would then be:

    1) SELECT MAX(rowid) AS maxrowid FROM t1 LIMIT 1; (we know that lowest rowid is 0 since its unsigned int).

    2) Run the random() function in your script which will return a numeric value between 0 and MAX(rowid) (dont forget to seed).

    3) SELECT col FROM t1 WHERE rowid >= “numeric value from 2)” LIMIT 1;

    4) If 0 rows were returned check to the other direction: SELECT col FROM t1 WHERE rowid = “numeric value from 2)” LIMIT 1;

  11. Apachez says:

    Uhm for some reason half my post vanished !?

    Anyway…

    5) If still no rows were returned then tell the client that no rows were found.

    In the case of “random x user from city y” you can have the select as:

    SELECT col FROM t1 WHERE city = y AND rowid >= “numeric value from 2)” LIMIT 1;

    along with an index such as INDEX (city, rowid)

  12. peter says:

    Jay,

    What do you mean exactly ? First I did not speak about full text search indexes here but about BTREE ones. If do full text search using natural language search you do not need to do any extra ORDER BY clause – it will be automatically sorted by “relevance”:

    mysql> select title, match (title,content) against (“internet”)as score from cont where match (title,content) against (“internet”) limit 10;
    +—————————————+—————–+
    | title | score |
    +—————————————+—————–+
    | Internet dynamics | 9.0042238235474 |
    | Internet Generation | 8.8267345428467 |
    | Neutral New Zealand Internet Exchange | 8.596981048584 |
    | Regional Internet Registries | 8.3543605804443 |
    | Internet organizations | 8.2909164428711 |
    | Internet exchange points | 8.2456722259521 |
    | Internet Explorer shells | 8.2083368301392 |
    | Internet celebrities | 8.0902004241943 |
    | IPv4 internet | 7.9001092910767 |
    | Botswana Internet Exchange | 7.8775291442871 |
    +—————————————+—————–+
    10 rows in set (0.08 sec)

    But you must NOT include ORDER BY score DESC if you want your query executed in decent time. I will require Filesort which can take very long:

    mysql> select title, match (title,content) against (“internet”)as score from cont where match (title,content) against (“internet”) order by score desc limit 10;
    +—————————————+—————–+
    | title | score |
    +—————————————+—————–+
    | Internet dynamics | 9.0042238235474 |
    | Internet Generation | 8.8267345428467 |
    | Neutral New Zealand Internet Exchange | 8.596981048584 |
    | Regional Internet Registries | 8.3543605804443 |
    | Internet organizations | 8.2909164428711 |
    | Internet exchange points | 8.2456722259521 |
    | Internet Explorer shells | 8.2083368301392 |
    | Internet celebrities | 8.0902004241943 |
    | IPv4 internet | 7.9001092910767 |
    | Botswana Internet Exchange | 7.8775291442871 |
    +—————————————+—————–+
    10 rows in set (1 min 46.73 sec)

    In fact it is one of major weaknesses of build in MySQL Full Text search – inability to efficiently combine it with sorting by anything else then relevance. For example if I would want to run boolean full text search for our dating example and sort by date registered or last login time it would crawl. This is where external solutions come in play.

  13. peter says:

    I’ve now modified post to include Beware of large LIMIT section which is somewhing I forgot to write about during initial revision.

  14. peter says:

    ORDER BY RAND() is indeed trouble maker you can meet in many applications. As workarounds are not trivial many people simply decide to leave it this way. It however will not scale for large data sets.

    Apachez gives one decent workaround – if you just need one random item and you have sequential row_ids or there are few holes and you do not care about a bit skewed distribution – use this approach – select random(id) in the range and find row close to it.

    If you need multiple rows or there is no sequential integer ids, you can use other approach.

    - add indexed column called “random” and populate it with random values
    - do order by random instead of order by rand()
    - once you’ve selected the row update “random” to new random value
    - periodically run process to resent random values for everyone otherwise if row has got very large value on update it would be never selected.

    This works well for large data set but small number of selects as you need to do a lot of updates in this case.

    Other approach would be to pre-create sequence of ordered values and use it. For example generate 1.000.000 of random rows at once. Now have as simple counter of how much of random rows you’ve used, ie first take 10 first rows, then second 10 rows etc. For many applications it is enough to run in circles for other you can generate more rows as soon as they are used.

    The good thing about last technique is – you can get same distribution as with order by rand() and you can have very limited overhead.

  15. Speeple says:

    If you know/calculate the size of your rows you could try this for a faster alternative to RAND() (example in PHP):

    $var = row(“SELECT COUNT(id) FROM mytable”);
    $var = rand(0, $var – 1);
    $result = query(“SELECT random_item FROM mytable WHERE … ORDER BY … LIMIT $var,1″);

    A dirty UNION query could produce a large result set.

  16. peter says:

    Speeple,

    This is pretty bad idea. You did not write what you do ORDER BY but it may require filesort. But even if you do it by index in average 50% of rows will have to be scanned because of LIMIT.

    LIMIT 50000,1 is bad – meaning 50.000 of rows will be generated and frown away.

    Apachez provided correct solution for the case if you have id column :)

  17. Speeple says:

    Wups, the ORDER BY is redundent in such a query.

  18. peter says:

    Better :) But even without ORDER BY LIMIT with non zero offset is expensive.

    Try LIMIT 10000000,1 on some large table :)

    On small tables everything may look fast – full table scans, joins without indexing and this is often the root cause of scalability problems later on.

  19. Speeple says:

    I’ve just tried the query on a several tens of gigabyte table with 12 Mil rows:

    SELECT * FROM index_archive LIMIT 1000000,1

    True, it is very slow, especially from a web application view point.

    I can’t remember where I read that this was a good method… But it’s obviously a poor option, even if it is marginally faster than RAND().

    I’ve implemented this in a social network project I’m working on for “random blog” & “random profile”… guess i’ll be giving Apachez’s method a try!

  20. peter says:

    Yeah. Random blog and random profile are indeed easy as they often have sequential integer ids. There could be holes due to removed accounts but still good enough

  21. Speeple says:

    Yes, the only flaw I have found with Apachez’s method is when there’s a integer jump e.g. 1 > 5 and id 3 is selected resulting in an error.

  22. peter says:

    Well. That is whole point. If you random value was 3

    you do where id>=3 order by id asc limit 1;

    which gives you value 3 or closest to it if it does not exist.

  23. Panos says:

    I just want to disagree with Jay’s point (1st post). I’m using MySQL 5 and am having some fun adapting to their new optimization engine. I have a table that I use a full text index on when I run a query with no order by on it it is increadibly fast. HOWEVER when I use an ORDER BY MATCH … AGAINST … LIMIT the query is painstakingly slow as it uses filesort for some reason that I do not understand.

    SELECT * FROM sites
    WHERE MATCH(title, url, stripped) AGAINST(‘test’ IN BOOLEAN MODE) AND stat = ‘A’
    ORDER BY match(title, url, stripped) AGAINST (‘test’ )
    DESC
    LIMIT 0, 50

    27 rows fetched (60.67 sec)

    id select_type table type possible_keys key key_len ref rows Extra
    1 SIMPLE sites fulltext stat,match_url_title_stripped match_url_title_stripped 0 1 Using where; Using filesort

    — Versus —

    SELECT * FROM sites
    WHERE MATCH(title, url, stripped) AGAINST(‘test’ IN BOOLEAN MODE) AND stat = ‘A’
    LIMIT 0, 50

    27 rows fetched (0.75 sec)

    id select_type table type possible_keys key key_len ref rows Extra
    1 SIMPLE sites fulltext stat,match_url_title_stripped match_url_title_stripped 0 1 Using where

  24. peter says:

    Panos,

    Jay is wrong on this. See my comment above.

    The behavior you’re getting is quite expected. Check my new presentation on FullText Search performance to see other things which will be painfully slow:
    http://www.mysqlperformanceblog.com/files/presentations/EuroOSCON2006-High-Performance-FullText-Search.pdf

  25. rob says:

    All the workarounds for ORDER BY RAND() usually assume you are looking for a single record, or some small number. What if you want to randomly select an arbitrary number of rows (say between 500 and 15000)? If someone has a faster alternative to ORDER BY RAND() for this situation, I would be interested in hearing it. Thanks.

  26. peter says:

    Rob ? Are you saying about random number of random rows ?

    I mentioned one workaround which may work for you – you can have separate pre-creaed table which has sequence of random row ID something like:

    id rnd
    1 456
    2 123
    3 905

    So you have sequential row number and random values. Now when you need N random you can do select rnd from tbl where id> LIMIT N;

    If N is random – so compute it on your application.

    Now you just need to advance as you use data from this table. Also you need something when all random data is used – for some application looping is enough for others you need to make sure you generate enough random data.

  27. Hi there….

    Great blog…

    I have a difficult (for me) mysql optimisation problem. Basically I want to optimise a query that uses a fulltext search and orders by a stactic field. I.e.

    SELECT * FROM WHERE MATCH() AGAINST(”) ORDER BY score DESC

    Where score is a precomputed score field.

    I have indexes built on the fulltext and score seperately. I cant combine them as they are different types. However, as mysql only uses the fulltext index it is forced to filesort for the orderby. And way round this?

    Thanks!

  28. Sorry, sql should read:

    SELECT * FROM WHERE MATCH(text_field) AGAINST(‘text’) ORDER BY score DESC

  29. peter says:

    Fulltext search is special. First it automatically does sorting by relevance unless you’re doing BOOLEAN search so you better just remove order by from this query.

    Second – are you sure you do not want to use LIMIT ? Getting all matches will be very slow for large data sets.

  30. Dan Musil says:

    Hi.

    At first I have to say this is the excellent site! :)

    And now my troubles. I discovered range and order by field in index is the last one used.
    Suppose I have a table with a, b and c fields and SQL query

    SELECT * FROM table WHERE a=1 AND b>10 ORDER BY c

    key(a,b) works, but filesort is used.
    key(a,c) works, but b range goes sequentially.
    key(a,b,c) doesn’t work (only a and b is used).

    Do you have any idea how to go around this? How to define index for a, b and c?

    Thank you anyway,
    Dan

  31. peter says:

    Dan,

    You’re right. This is the challenge and there is no solution which works for everything.
    If b range is small you can use this trick:

    http://www.mysqlperformanceblog.com/2006/08/14/mysql-followup-on-union-for-query-optimization-query-profiling/

  32. Anca says:

    Hi

    Your articles posted here are great. I’m a beginner in mysql optimization and i have a “maybe stupid” question.
    Let’s say i have a table mytable(a,b,c,d,e …) and i want to perform ORDER BY with LIMIT on more than 60-70% of the columns (i want to have many sorts available)
    example queries:
    select a,b,c FROM mytable ORDER BY a;
    select a,b,c FROM mytable ORDER BY b;
    … etc…

    is it ok to define indexes for every column used in order by? When the indexes defined on a table are considered to be too many (assuming that are not redundant are are used for different purposes…)

  33. peter says:

    Thank you.

    Having sorts by many fields is challenging. You basically have only two choices you ether have a lot of indexes which slows down insertions/update and require space on disk and in caches or you use filesort which does not scale well with large data sets.

  34. Anca says:

    You say “Sort by column in leading table”, but could it be an workaround to create a view by joining 2 tables and then sort by any colum you want (from the view). Or is the same thins as SELECT with join and ORDER BY

  35. peter says:

    Not really,

    View is virtual entity so creating view does not make it to magically behave like it would be table from indexing standpoint – optimizer looks at base tables instead.

  36. steve jin says:

    use subquery with in FROM, like

    select * from (select * from tab where … limit 0, 20) order by rand() limit 0,1

  37. peter says:

    Steve,

    What is your advice about ? Your query would not be faster than query in FROM clause and you’re selecting random row out of 20 rows selected by first query which is not entirely the same as fully random :)

  38. steve jin says:

    Peter,

    It is semi- or fake random to minimum sort and table scan. I have 5 robot client run in the same time. It is for make sure the 5 client don’t pick up the same record out of 10,000 records.

    It is random order in first 20, instead of random order in all (10,000 rows in my case). Performance is much better.

    for 5 clients, 5 is OK instead of 20 in the sql. even put 100 there the performance won’t be big difference. I put 20 there, in case I increase the robot number.

  39. peter says:

    Yes. In this case that works. Similar approach would be to create 20 rows store them in separate table and do select order by rand() from it. The benefit in such case is – you can regenerate the table every few minutes and get different set for rotation.

  40. Good article. I didn’t realize the join order would affect the ORDER BY clause.

  41. Davies says:

    How to optimize in this kind:

    update targets set status=’D’ where status=’C’ order by schedule limit 1;

    with index of (status, schedule), and it was really slow. And

    Select * from targets where status=’C’ order by schedule limit 1;
    update targets set status=’D’ where id=’id’;

    are very fast as expected.

    how to solve it?
    //Bow

  42. peter says:

    Davies,

    The problem in this case you want to UPDATE to use the same key as you’re updating which can cause “loops” for certain updates, consider for example

    update tbl set i=i+10 where i>5

    Updating value 10 will make it 20 and it will again match the clause etc.

    So MySQL can’t use such index for query execution. It could use temporary table in the same way you do it but I guess it does not do it right now.

  43. Rob says:

    Hey guys, here’s another good solution. I have a users table where I want to select five people at random who have a certain setting on their profile. I can’t use max ID because the IDs won’t necessarily be in sequence, so here’s what I do:

    Select all the user IDs that match my params and put them into an array…

    $query = “SELECT user_id FROM users WHERE profile_privacy=’0′ AND profile_display_name””;

    … Then use PHP to shuffle the resulting array and take the first five items:

    shuffle($random_profile_ids);
    $random_profile_ids = array_slice($random_profile_ids,0,5);

    … Then do an IN query on the user_id field for the five items left in your array:

    $in_sql = “‘” . implode($random_profile_ids,”‘,’”) . “‘”;
    $query = “SELECT * FROM users WHERE user_id IN ($in_sql) LIMIT 5″;

    On my 30k row table the new solution ran in a combined .0147 secs, compared to .0770 secs for ORDER BY RAND(). Not a bad savings if you ask me. Enjoy,

    Rob

  44. marrtins says:

    Rob,

    That wont work well on large datasets (it will eat all mamory sooner or later as data set grows). Acctually it seems to me, that using RAND() in this case will be more acceptable, even in on small data sets, becouse of not a such big performance gain.

    Btw, you can improve your code by *not* using shuffle` but instead get 5 values from rand(0, count($random_profile_ids) – 1); (make sure thay do not overlap) and then just pick appropriate values from array.

  45. John says:

    I have a question regarding order by and limit..

    i used to use this system in order to page the results 100 rows at a time but found it to be very slow as the page numbers grew larger and larger..

    what i thought to help it scale better is to have something like this:

    SELECT inverted_timestamp FROM whatever WHERE visible = ‘yes’ ORDER BY torrents.inverted_timestamp LIMIT 100,1

    the inverted_timestamp field and the orderby go together as i m sorting by a few different columns (its like a spreadsheet table which u can page and sort by a few fields – i have added indexes for all the used cases).. the 100 is just the currentpage*perpage which happens to be 100 since i m on the 2nd page (page 1) and have 100 rows per page.

    then i do the main query to fetch the actual results:

    SELECT users.username,bla bla….. FROM whatever LEFT JOIN categories ON category = categories.id LEFT JOIN users ON torrents.owner = users.id WHERE visible = ‘yes’ AND inverted_timestamp>=’3111748733′ ORDER BY inverted_timestamp LIMIT 100

    which i have indxed as (visible,inverted_timestamp) (and similarly for the rest of the columns i order by)
    the value >=’311..’ comes from the 1st query which i use to essentially find the first row of that page and then use that value in the main query to prevent the main one (which fetches all the results) to have to do an order by in the entire data set… From my testing it seems to be running very well upto around 1-2 million rows.. from then on the second query is still very fast but the first one is starting to take the hit.. previously i had just teh second query with no >= condition and i used limit offset,perpage which was ok up until 100k rows and then it was really slow…

    it seems this solution is an improvement .. my question really is, is there a way to make it better?

    to illustrate what i meant above these are the queries generated with a different ordering:
    SELECT size FROM whatever WHERE visible = ‘yes’ ORDER BY size DESC LIMIT 3800,1
    SELECT users.username,bla bla FROM whatever LEFT JOIN categories ON category = categories.id LEFT JOIN users ON owner = users.id WHERE visible = ‘yes’ AND size

  46. John says:

    it got cut off:
    AND size= value_found) which worked well but only when i ordered by id (so it was limited – though i can still probably use this way when the ordering is done via an integer that has sequential properties)..

    the problem really lies in how to speed up the other order by’s beyond what i have done already.

  47. John says:

    hmm it doesnt like my posts.. it cut it again :/

    size

  48. John says:

    ffs i guess its the single quote that fucks up … i wont bother trying again .. i believe u have understood what i m trying to say

  49. Mike says:

    Here is my logic to do random.

    $entries = array();
    $count = 10;
    while ($count > 0) {
    $DB->query(“SELECT col1,col2 FROM entries AS e1 INNER JOIN members ON userid=id AND active=1 JOIN (SELECT ROUND(RAND() * (SELECT MAX(entryid) FROM entries)) AS entryid) AS e2 WHERE e1.entryid >= e2.entryid AND e1.photos!=’a:0:{}’ ORDER BY e1.entryid ASC LIMIT 1″);
    if ($DB->get_num_rows() > 0) {
    $count–;
    $entries['entries'][] = $this->__processEntry($DB->fetch_array());
    }
    }

    It works pretty well for my standards

  50. chris says:

    Great Weblog, I did often visit it with good results but now, I totally stuck in the simplest thing.

    EXPLAIN SELECT p.products_id
    FROM products p
    ORDER BY p.manufacturers_id ASC
    LIMIT 3

    Gives me a

    id select_type table type possible_keys key key_len ref rows Extra
    1 SIMPLE p index NULL idx_manufacturers_id 5 NULL 3490

    could anyone give me a hint why it steps thure 3490 rows, even when using the idx_manufacturers_id?
    thanks for any idea (using mysql5) && sorry if this is too stupid post :)
    Chris

  51. Nick Jag says:

    That was extremely helpful. I was having a hard time locating good and clear resources on limit and order optimization. Thanks!

  52. Jan says:

    Mikes solution is by far the fastest.
    http://jan.kneschke.de/projects/mysql/order-by-rand a.id
    This brings trouble as well because the random number is generated inside the query resulting with two different random numbers. This can be overcome by making a pre-query to generate a random number in your application, still leaving you with the first problem.

    my choice was to combine the ORDER BY RAND() and the other one.

    SELECT * FROM
    (SELECT * FROM comments
    WHERE count > $number
    AND topic = $topic
    LIMIT 100)
    AS r1 ORDER BY RAND() LIMIT 5

    The resulting speed is acceptable. The first LIMIT can be use to adapt the level of random in the query.
    id select_type table type possible_keys key key_len ref rows Extra
    1 PRIMARY ALL NULL NULL NULL NULL 5 Using temporary; Using ilesort
    2 DERIVED comments ALL NULL NULL NULL NULL 140809 Using where

    The query as you can see goes through 140.000 rows in 0.0792 seconds. Not as fast as the other one but a lot more flexible and open for other WHERE statements

  53. Jan says:

    Once again a post has been cut into pieces.

    But it has one problem. What if you intend to look for multiple rows with another criteria.
    Lets say you have 100.000 rows and want to output 5 randomized rows. From the 100.000 rows 200 qualify to be in the resulting rows. You can not run the query 5 times because you might end up with the same row 5 times. If there is a large gap between two qualifying results it can happen that you generate a random number that is in the gap thus giving you the same row over and over again.
    Secondly you may hit the end of the set thus generating no output at all. Some ppl suggested to run the query backwards and forwards. i.id (smaller-equal) a.id and the i.id (greater) a.id
    This brings trouble as well because the random number is generated inside the query resulting with two different random numbers. This can be overcome by making a pre-query to generate a random number in your application, still leaving you with the first problem.

  54. Jan says:

    Addendum:

    I forgot to create a index over the concerning collumns. Now the efficiency has increased a lot.
    But I just found a problem. The results are not spread evenly. They allways cluster around the begining of the table.

    Any way to alter this behaviour?

  55. Simon says:

    Great resource Peter, and also thanks to Jan (comment #48) for his “compromise” solution to the ORDER BY RAND() dilemma. I’ve now managed to get a query (testing with 500k rows, requiring < 10 random rows) down from 0.8s execution time to just 0.05s. :)

  56. miha says:

    SELECT *, MATCH(title, url, stripped) AGAINST(’test’ IN BOOLEAN MODE) as score FROM sites
    WHERE stat = ‘A’
    ORDER BY score
    having score >0

  57. Tobias says:

    Hello Peter,

    in your presentation on FULLTEXT SEARCH you say:

    ● Be care careful with extra where clause if it
    filters out a lot of matches.
    – select title from cont where match (title,content) against (“global service
    company”) and date_added>unix_timestamp(“2006-07-18 18:00:00″)
    limit 10;
    ● Takes 26 min 43.59 sec

    I think, maybe it was the unix_timestamp-function, that slowed things down? Because maybe the unix-timestamp had to be calculated for each of the millions of records, and therefore no index could be used?

  58. peter says:

    Tobias,

    In this case unix_timestamp is called for constant so it is not executed millions of time. It also would not block index usage. The problem is once you’re using full text search index you can’t use other index to do sorting.

  59. Mathuvathanan Mou says:

    Peter, Thanks….

    This is where I found very useful matters about optimization which I have been looking for so long.

    Thanks again.

    Mathu Mou

  60. prashant says:

    How to optimized this query in PHP

    $sqls=”select id,state from states”;
    $rss=mysql_query($sqls) or die(mysql_error());
    $num_rowss=mysql_num_rows($rss);

    $count=1;

    while($num_rowss && $datas=mysql_fetch_assoc($rss))
    {

    $state_id=$datas['id'];
    $state=$datas['state'];
    $sqlc=”select * from companies,package where comp_status=1 and comp_account_site=0 and comp_balance >0 and comp_mem_pac=pac_id order by pac_amount DESC, rand()”;

    $rsc=mysql_query($sqlc) or die(mysql_error());
    $num_rowsc=mysql_num_rows($rsc);

    $checkonlyonetime=1; // header
    while($num_rowsc && $datac=mysql_fetch_assoc($rsc)) // loop of company
    {

    $comp_id=$datac['comp_id'];

    $comp_name=$datac['comp_name'];
    $comp_mapped_city=$datac['comp_mapped_city'];
    @$comp_mapped_city=explode(“,”,trim($comp_mapped_city,”,”));

    }
    }

  61. Sergio says:

    … LIMIT optimization :)

    SELECT *
    FROM forum_posts AS pa
    LEFT JOIN forum_posts_text AS pb ON pa.post_id = pb.post_id
    LEFT JOIN forum_users ON user_id = post_poster
    WHERE post_topic_id = ’450′
    ORDER BY pa.post_id ASC
    LIMIT 5475 , 15

    versus

    SELECT *
    FROM forum_posts AS pa
    LEFT JOIN forum_posts_text AS pb ON pa.post_id = pb.post_id
    LEFT JOIN forum_users ON user_id = post_poster
    WHERE post_topic_id = ’224′
    AND pa.post_id >= (
    SELECT MAX( post_id )
    FROM (
    SELECT post_id
    FROM forum_posts
    WHERE post_topic_id = ’224′
    ORDER BY post_id ASC
    LIMIT 6075 , 1
    ) AS tmp )
    ORDER BY pa.post_id ASC
    LIMIT 15

    ~ 0.09 sec versus ~ 0.004 sec

    Tables with posts: 2×292,000+ rows (25MB + 89MB)
    Topic: 6092 rows (rows that meet condition WHERE post_topic_id = ’224′)

    Source: http://www.easemarry.com/blog/mysql-limit-optimization/

    If you found something better, I would be glad to hear.

  62. Sergio says:

    Here SELECT MAX(post_id) FROM (SELECT …) is not necessary. It’s redundant
    (it’s obvious ‘cos that subquery extracts only one post_id – see LIMIT x, 1)

    SELECT *
    FROM forum_posts AS pa
    LEFT JOIN forum_posts_text AS pb ON pa.post_id = pb.post_id
    LEFT JOIN forum_users ON user_id = post_poster
    WHERE post_topic_id = ’224′
    AND pa.post_id >= (
    SELECT post_id
    FROM forum_posts
    WHERE post_topic_id = ’224′
    ORDER BY post_id ASC
    LIMIT 6075 , 1)
    ORDER BY pa.post_id ASC
    LIMIT 15

  63. Alex says:

    I have this query, but the orderby and the last 2 left joins affect considerably to the performance.
    Any idea how I can optimize this query?

    Thanks.

    SELECT
    *
    FROM
    message

    LEFT JOIN
    accounts a ON message_account = a.id

    LEFT JOIN
    opportunities o ON message_project = o.id

    LEFT JOIN
    leads l1 ON (UCASE(message_from) LIKE CONCAT(\’%\’, UCASE(l1.email1), \’%\’)
    AND l1.email1 \’\’) OR (UCASE(message_from) LIKE CONCAT(\’%\’,
    UCASE(l1.email2), \’%\’) AND l1.email2 \’\’)

    LEFT JOIN
    leads l2 ON (UCASE(message_to) LIKE CONCAT(\’%\’, UCASE(l2.email1), \’%\’)
    AND l2.email1 \’\’) OR (UCASE(message_to) LIKE CONCAT(\’%\’,
    UCASE(l2.email2), \’%\’) AND l2.email2 \’\’)

    WHERE
    message_visibility
    AND message_mailbox = UCASE(\’user\’)
    AND message_visibility 1

    ORDERBY
    message_ocode DESC

  64. Radek says:

    One tweak that we do for large limit offsets on pagination is to first select columns that are part of the index. Typically this would be the id’s. After that we use an “IN” to select the the other data columns from the same table. We find that there is a vast improvement on selects that use large offsets.

    Example (category,id index)
    select photo.id, photo.description from photo where category=1 limit 30000000,20
    Query time: 4 seconds

    To speed things up we split the call into 2 queries

    select photo.id from photo where category=1 limit 30000000,20
    Query time: 0.030 (this just uses the index so no traversing the data table)
    Select photo.id, photo.description from photo where photo.id IN (x1,x2…x20)
    The x1 – x20 are the results the first query returns
    Query time: 0.020

    So with a simple change like this we sped things up by a couple of orders of magnitude and allow our users to use large pagination.

    I assume it has something to do that the the first query only uses the index to get the ids and does not have to jump around getting a larger data set.

    Cheers.

  65. Sergio says:

    Radek, and what about if I want to order the resultset?
    eg: select id …. order by id desc ..

    Have you tried?

    I’m Joining multiple tables, so the join must be done in the second query..

    Unfortunately, with your solution I have to use 3 queries:
    - SELECT COUNT(*) .. (for total rows that meets the criteria blabla)
    - SELECT id … LIMIT x,y
    - SELECT id, name .. IN (x1,x2…x20)

    Anyway.. thank you for the tweak!

  66. Radek says:

    Yes Sergio, You dont need to do the count as usually you have the page number or offset in the get varaible as you go from page to page. Typically we do a count first to create all the pagination stuff. Current page, total pages, etc.

    Example of it in action here. http://www.pinkbike.com/photo/list/?date=all&page=6
    You can page to page 100,000 (24 per page, so the offset is 2.4 million) without and slow down. If this was done in one call it literally takes seconds to generate offsets of 2,000,000

    Yes, you can order by in the first query by id no problem. The id is indexed so no penalty there. Also your result set can be ordered by also but that is fast as you only have 24 items, or however many you show on the page.

    Cheers

  67. Sergio says:

    Hey Radek,

    Yes, I was talking about this: “Typically we do a count first to create all the pagination stuff. Current page, total pages, etc.” – you have to create a counter for pagination information.

    Thank you for your solution, maybe I will try it on other projects. Right now I’m fine with solution posted above by me.

  68. noob says:

    Hi, was wondering if there is a way to stop a SELECT COUNT(*) from continuing once it reached a max number of rows. Eg, i want to do something like:

    SELECT count(*) FROM table_a LIMIT 1000;

    So if the table has more than 1k rows, it should just return and say “1000″. What i’m trying to avoid is have the entire table scanned, in particular if it has millions of rows. Basically the question i want to ask is: “Does table X have at least N rows?”.

  69. bozek says:

    Hi Radek,
    I am trying replay this from your post:
    # select photo.id from photo where category=1 limit 30000000,20
    # Query time: 0.030 (this just uses the index so no traversing the data table)
    but withou success. Can you help me?
    In MySQL I do:
    > create table c (id integer auto_increment, a integer, b integer, primary key(id));
    > insert into c (a,b) value (1,2),(5,6),(3,9),(122,22),(32,54);
    > insert into c (a,b) value (1,2),(5,6),(3,9),(122,22),(32,54);
    about 21x times i run this to fill table:
    > insert into c (a,b) select a,b from c;
    So I have:
    > select count(*) from c;
    +———-+
    | count(*) |
    +———-+
    | 20971520 |
    +———-+
    1 row in set (0.00 sec)
    > show index from c;
    +——-+————+———-+————–+————-+———–+————-+———-+——–+——+————+———+
    | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
    +——-+————+———-+————–+————-+———–+————-+———-+——–+——+————+———+
    | c | 0 | PRIMARY | 1 | id | A | 20971520 | NULL | NULL | | BTREE | |
    +——-+————+———-+————–+————-+———–+————-+———-+——–+——+————+———+
    1 row in set (0.00 sec)

    And finally:
    > explain select id from c limit 20000000,20;
    +—-+————-+——-+——-+—————+———+———+——+———-+————-+
    | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
    +—-+————-+——-+——-+—————+———+———+——+———-+————-+
    | 1 | SIMPLE | c | index | NULL | PRIMARY | 4 | NULL | 20971520 | Using index |
    +—-+————-+——-+——-+—————+———+———+——+———-+————-+
    1 row in set (0.00 sec)

    > select id from c limit 20000000,20;
    +———-+
    | id |
    +———-+
    | 20000001 |
    ……..
    | 20000020 |
    +———-+
    20 rows in set (4.58 sec)

    So even if I use indexes, the paginationing on huge table is very slow :( Any idea why?
    My MySQL is 5.0.51a-18
    PC: Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz, bogomips: 4776.06, 8GB RAM, and system including mysql id on SSD hard drive
    What I am doing wrong?
    Thanks

  70. radek says:

    bozek, is it possible that your key_buffer size is set too small and therefore the whole index for your 30 million records is not large enough and therefore not caching the index? Not on my application this is set larger than the total index sizes for ALL tables and not just this one.

  71. DIA says:

    As you have somthing like this SELECT * FROM table LIMIT 10
    It working with this script SELECT * FROM table LIMIT 0,10

  72. Mark Rose says:

    Is there anyway to accomplish the same thing as this query, avoiding a temporary table?

    select ID_MEMBER_STARTED, COUNT(*) as hits from smf_topics group by ID_MEMBER_STARTED order by hits desc limit 20;

    There is an index on ID_MEMBER_STARTED and the table type is InnoDB.

  73. Yang Yang says:

    Thanks dude that is rather helpful!

  74. Kalyani says:

    My query performance is very slow for large database.Table A has 2050702 records in it. ID is primary key and index on time.
    SELECT ID, Name, Time from TableA WHERE Time >= ’2009-03-24 12:53:00′ and ID > 2050702
    Order By ID DESC LIMIT 100

    I would like to know ways to optimize the query above. SO retrieve last 100 records from TAble A matching the criteria in where clause

  75. Mark Rose says:

    Add a query on (ID, Time) and re-order the WHERE clause to “ID > 2050702 and Time >= 2009-03-24 12:53:00″.

    MySQL can only use one index in a query (basically), so you need an index that contains all the columns you are filtering and ordering by, in the same order.

  76. Mark Rose says:

    Sorry, that should be “Add an index on” not “Add a query on”.

  77. mohamed says:

    I have a query that takes 14 seconds to execute:

    SELECT posts.* FROM posts force index (idx_post_date) INNER JOIN follow ON follow.followuserid = posts.userid WHERE follow.userid=’61585′ ORDER BY date DESC LIMIT 0, 20;

    The table posts has ~1000000 records and the number of records in the posts table belonging to userid 61585 is 0.

    The table follow has 63537 records.

    This is how my indexes are set:-

    mysql> show index from follow;
    +——–+————+———-+————–+————–+———–+————-+———-+——–+——+————+———+
    | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
    +——–+————+———-+————–+————–+———–+————-+———-+——–+——+————+———+
    | follow | 0 | PRIMARY | 1 | userid | A | NULL | NULL | NULL | | BTREE | |
    | follow | 0 | PRIMARY | 2 | followuserid | A | 63537 | NULL | NULL | | BTREE | |
    +——–+————+———-+————–+————–+———–+————-+———-+——–+——+————+———+

    mysql> show index from posts;
    +——-+————+——————+————–+————-+———–+————-+———-+——–+——+————+———+
    | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
    +——-+————+——————+————–+————-+———–+————-+———-+——–+——+————+———+
    | posts | 0 | PRIMARY | 1 | id | A | 1047467 | NULL | NULL | | BTREE | |
    | posts | 1 | idx_post_date | 1 | date | A | 1047467 | NULL | NULL | | BTREE | |
    | posts | 1 | idx_posts_userid | 1 | userid | A | 22771 | NULL | NULL | | BTREE | |
    +——-+————+——————+————–+————-+———–+————-+———-+——–+——+————+———+

    mysql> explain SELECT posts.* FROM posts force index (idx_post_date) INNER JOIN follow ON follow.followuserid = posts.userid WHERE follow.userid=’61585′ ORDER BY date DESC LIMIT 0, 20;
    +—-+————-+——–+——–+—————+—————+———+——————————-+——+————-+
    | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
    +—-+————-+——–+——–+—————+—————+———+——————————-+——+————-+
    | 1 | SIMPLE | posts | index | NULL | idx_post_date | 4 | NULL | 20 | |
    | 1 | SIMPLE | follow | eq_ref | PRIMARY | PRIMARY | 8 | const,wwwproject.posts.userid | 1 | Using index |
    +—-+————-+——–+——–+—————+—————+———+——————————-+——+————-+

    Is there a way to optimize this query i have now been trying for quite some time but could not find a solution any help would be very much appreciated??

  78. Svenson says:

    Good article. I didn’t realize the join order would affect the ORDER BY clause.

  79. Carlos says:

    Thanks for the article, but I have an additional feature to throw at you.

    Suppose I request to select, say 10, ordered some way. If none of the 10 fit any additional constraints I may have, I would like to fetch the second best 10 set, and on like this until I get the record that fits all my constraints.

    How do I ensure the first 10 will not show up?

    Let me just add that the additional constraints may eventually be associated with features that are outside the database.

    Cheers,

    Carlos

  80. reshma says:

    I have a query :

    In this table category_id, date_created is indexed.

    EXPLAIN select newsid from news where category_id=53 and approval=’yes’ order by date_created desc limit 0,10

    id select_type table type possible_keys key key_len ref rows Extra
    1 SIMPLE news index PRIMARY,category_id date_created 9 NULL 216 Using where

    the above query examines 216 rows to fetch only 10 records.

    Whereas the below query examines 10 rows which it should, (here i am not searching by category_id):

    EXPLAIN select newsid from news where approval=’yes’ order by date_created desc limit 0,10

    id select_type table type possible_keys key key_len ref rows Extra
    1 SIMPLE news index NULL date_created 9 NULL 10 Using where

    Why does the first query examine 216 rows? Is this fine or does the query need to be optimised futher.

    Please advice, thanks in advance.

  81. peter says:

    Reshma,

    you seems to have poor index setup to begin with. You have “index scan” while you should have ref if you have (category_id,approval,date_created) for all queries.
    If Index is scanned the more rows have to be scanned to provide 10 results the more selective where clause is.

  82. reshma says:

    Hello Peter,

    Thanks for the quick response.

    I have removed the index on category_id. After removing it Explain for the below select query is as follows:

    EXPLAIN select newsid from news where category_id=53 and approval=’yes’ order by date_created desc limit 0,10

    id select_type table type possible_keys key key_len ref rows Extra
    1 SIMPLE news index NULL date_created 9 NULL 10 Using where

    This is fine i guess.
    Another question is: If there are 56 records for category_id=53 and i want to display 100 records for all given categories in news table, why does below query examine 100 records for category_id=53 when there are only 56 of them?

    EXPLAIN select newsid from news where category_id=53 and approval=’yes’ order by date_created desc limit 0,100

    id select_type table type possible_keys key key_len ref rows Extra
    1 SIMPLE news index NULL date_created 9 NULL 100 Using where

    Most of the rows are displayed based on category_id, so I want to know if I am doing it correctly.

    If this is confusing I could send the table structures and queries.

    In this post you have mentioned about DOS attack, do you mean paging records can cause DOS attack ?

    Any help would be highly appreciated.

  83. Naveen says:

    Hi Peter

    We have a tool which is processing 30 million records, this tool is a migration project where we are using perl script to do that

    we need to compare two tables where we are doing like this

    for using counter of 50,000 records in chunk size

    and then we are using

    select column1, column2 from table order by table limit index_num(21 million), (500,000)

    initially for smaller dataset, things are working fine (26 secs to 50 secs)

    when index_num is growing, it is taking almost 30 mins to do that

    We want to optimize this one.

    As per your recent posts,

    we need to have index, for column_1, we are creating indexes ..still it is not optimal.

  84. Batman says:

    Hey Peter,

    I’ve been reading your articles for some time now, especially since I’ve been running into issues with wrapping my head around this whole MySQL optimization stuff.

    Anyways, I get the jist of it, however, I run a real estate website (displays listings, etc, in a search results page), and there are just too many variables to try and cover an index on each scenario… I’d end up with 20 indexes, and I fear that any update/insert/delete queries to that table would blow up the internet.

    I’m not looking for anybody to write me anything, I’m just looking to get a better understanding of this:

    SELECT
    id,
    … more *necessary* columns …
    FROM
    listings
    WHERE
    category = 1
    AND country = 1
    AND state = 45
    AND city = 35859
    AND visible = 1
    ORDER BY
    price ASC
    LIMIT 0, 10

    Now, category numbers represent different listing types, ie. 1 = For Sale, 2 = For Rent, 3 = Pending Sale, and so on. 1 (For Sale is by the far the most popular, representing at least 75% of all listings in the table; probably more).

    country, state, city are always in the query. visible is either a 1 or 0. 1 meaning the listing has not expired, and 0 meaning it has. Probably 85%+ instances of visible = 1 in the table.

    Now, there are also other WHERE conditions that are possible, like WHERE price = 300000, or bedrooms = 2, or bathrooms = 3, etc. And who knows what the user is going to enter in.

    Then, there is the order by on top of that.

    What I have been looking for is how to accommodate these kinds of variables when there are so many possible scenarios? When I do an ORDER BY price DESC, it can take 4-5 seconds, maybe more… I guess that’s not the end of the world, but our traffic is always growing, and that will end up being unacceptable.

    Any thoughts?

    - Batman

  85. Hi,

    This is a hard one especially if you add ranges as MySQL will not be able to both use range for selection and sort by different column. You may and up with large scans and sorts which can use many resources. Sphinx (http://www.sphinxsearch.com) is often a good alternative in such cases even if you’re not using full text search query

  86. Peter says:

    Thanks for this artikel!! Was what I was looking for.

  87. Dieter says:

    Thanks for the quick response. VEry Nice

  88. james says:

    Hello Peter,

    I am going through your article its found pretty much inserting. I need one help as i am displaying my property query which is displaying properly and ordering the result by following 4 column

    p.is_move_up DESC,REPLACE(IFNULL(p.is_feature,””),”2015-12-31″,””) DESC,
    p.listing_order DESC,p.propid DESC

    In order site, properties are been displayed in this order: Moveup paid, Paid featured, paid, trial then expired. However, we want to show within each group the properties random and make them rotate. Now,
    the properties do not rotate so it is not fair for users to see always the same properties in same order. So, we need the rotation to be within same group not mixing all.

    Query :-

    SELECT DISTINCT(p.propid),p.category_id,p.people_sleep,p.bedrooms,p.bathrooms,p.children,
    p.airport,p.features,p.pets,p.smoke, p.wheelchair,p.prop_title,p.property_type,prop_date,
    display_calender,other_url_calender,is_feature
    FROM property p
    LEFT JOIN category ON (p.category_id=category.id)
    WHERE p.active IN(“yes”,”d”) AND p.is_feature >= SYSDATE()
    ORDER BY p.is_move_up DESC,REPLACE(IFNULL(p.is_feature,””),”2015-12-31″,””) DESC,
    p.listing_order DESC,p.propid DESC

    Can you please help me into this.

  89. Rbk says:

    Hi Peter,

    I have one question regarding your comment

    “Sort by column in leading table if you have JOIN with ORDER BY … LIMIT you should try hard to have sorting column(s) to be in the leading table. If ORDER BY is going by field from the table which is not first in the join order index can’t be used.”

    I have a similar situation where I can not have sorting columns in leading table; in such case please suggest any work around.

    Thanks in anticipation.
    Rbk

  90. Chris says:

    Thank you so much for this great post! It was so much of a huge help and just exactly what I needed to know and understand right now. It is always great how you can present a topic so logically and make it look so easy and understandable even for not yet highly experienced people like me! :)

  91. Good Article. Thank you for that!

  92. Cupidvogel says:

    Peter, in the 2nd example where the query was “SELECT * FROM sites WHERE category_id=5 ORDER BY date_created DESC LIMIT 10″, you suggested that using an index on (date_created,category_id) might fasten the look-up. But if you use this index, queries won’t be able to use the index on category_id because it comes 2nd in the compound index. So won’t this compensate the gain earned by using the compound index?

  93. Humble Pie says:

    @Cupidvogel:

    You should create another index in this case for category_id. There is always tradeoff between performance and space. Space will always remain cheap compared to the performance gain.

  94. Cupidvogel says:

    Thanks.

  95. I made a simple test case comparing performance of the LIMIT approach vs the JOIN workaround. You can see the results here: http://devoluk.com/sql-pagination.html

  96. Laurent says:

    Hello everybody,

    We tried the solution base on query see in older post :
    SELECT * FROM sites WHERE category_id=5 ORDER BY date_created DESC LIMIT 10

    changed to

    select * FROM (SELECT * FROM sites WHERE category_id=5 ORDER BY date_created DESC) as my_table LIMIT 10

    It works for us !

  97. Eric Chiang says:

    Super helpful and clear explanation! I was able to use the “sort by column in leading table” and “force index” approaches to avoid a file sort. Yay!

  98. Gaurav says:

    We’ve 1 query written in our legacy system. (using MySQL 5.5)
    Now, with growing data – the below mentioned query taking huge time.
    In our system, we’ve somewhat 200,00,00,000 (2 billion rows) approx 650 GB of data.
    Table is partitioned with respect to every day. (which means above query is fetching data from 30 partitions).

    => Allocate 16GB to innodb_buffer_pool_size.
    => Index is there on START_TIME

    Query-1
    SELECT * FROM ( SELECT a,b,c,d,e,f,g,h,i,j,k,l FROM TEST WHERE START_TIME between ’2013-11-14 00:00:01′ and ’2013-12-14 23:59:59′ ORDER BY START_TIME DESC) as TEST_DATA LIMIT 10000;

    Above Query => Means selecting all the columns for all the data between 1 month and performing sorting and at last show 10000 records to end user.

    Now, my doubt goes: Query-2
    SELECT a,b,c,d,e,f,g,h,i,j,k,l FROM TEST WHERE START_TIME between ’2013-11-14 00:00:01′ and ’2013-12-14 23:59:59′ ORDER BY START_TIME DESC limit 10000;

    Above Query => selecting all the columns from 1 month of data and perform sorting and display the result as soon as 10000 records sorted. (No sorting and buffering of all records).

    With Query-1 and Query-2 ->
    1> Does these 2 queries will display different result set? Or same?
    2> Any other impact on performance?

    Because in Query-1, we doing sorting on all records and then display 10k
    whereas in Query-2, we doing display 10k sorted records.

    Thanks a lot for your help.

  99. Thomas says:

    In the software for the directory, all categories are written the same.
    1 of the categories is continuing to fail when selected, although 100% of the sub-categories and listings in the parent category are available.
    I continue to get the following error in response to the attempt at accessing MySQL:

    [09-07-2014 07:25:38pm] Fatal Error: MySQL error.Query:SELECT id, friendly_url, IF(date_update!='0000-00-00 00:00:00',date_update,date) AS date, priority FROM pmd_listings WHERE status='active' ORDER BY id ASC LIMIT 180000, 10000Error: (3) Error writing file ‘/var/db/mysql-tmp/MYnTfXMV’ (Errcode: 28) in /home/citydir/public_html/prevpmd/includes/class_database.php on line 132

    [09-07-2014 07:27:38pm] Fatal Error: MySQL error.Query:SELECT SQL_CALC_FOUND_ROWS l.* FROM pmd_listings l INNER JOIN pmd_listings_categories lc ON l.id=lc.list_id WHERE lc.cat_id=192 AND l.status='active' ORDER BY priority DESC LIMIT 0,10Error: (126) Incorrect key file for table ‘/var/db/mysql-tmp/#sql_e66b1_0.MYI’; try to repair it in /home/citydir/public_html/prevpmd/includes/class_database.php on line 132

    [09-07-2014 07:28:55pm] Fatal Error: MySQL error.Query:SELECT id, friendly_url, IF(date_update!='0000-00-00 00:00:00',date_update,date) AS date, priority FROM pmd_listings WHERE status='active' ORDER BY id ASC LIMIT 1710000, 10000Error: (3) Error writing file ‘/var/db/mysql-tmp/MYeKsGGZ’ (Errcode: 28) in /home/citydir/public_html/prevpmd/includes/class_database.php on line 132

    I can’t get a response from the developer who wrote the software, so I’m left to my own devices in finding a resolve.

    I would appreciate any help you may render.

Speak Your Mind

*