What does Using filesort mean in MySQL?

PREVIOUS POST
NEXT POST

If you were interviewing to work at Percona, and I asked you “what does Using filesort mean in EXPLAIN,” what would you say?

I have asked this question in a bunch of interviews so far, with smart people, and not one person has gotten it right. So I consider it to be a bad interview question, and I’m going to put the answer here. If anyone gets it wrong from now on, I know they don’t read this blog!

The usual answer is something like “rows are being placed into a temporary table which is too big to fit in memory, so it gets sorted on disk.” Unfortunately, this is not the same thing. First of all, this is Using temporary. Secondly, temporary tables may go to disk if they are too big, but EXPLAIN doesn’t show that. (If I interview you, I might ask you what “too big” means, or I might ask you the other reason temporary tables go to disk!)

The truth is, filesort is badly named. Anytime a sort can’t be performed from an index, it’s a filesort. It has nothing to do with files. Filesort should be called “sort.” It is quicksort at heart.

If the sort is bigger than the sort buffer, it is performed a bit at a time, and then the chunks are merge-sorted to produce the final sorted output. There is a lot more to it than this. I refer you to Sergey Petrunia’s article on How MySQL executes ORDER BY. You can also read about it in our book, but if you read Sergey’s article you won’t need to.

PREVIOUS POST
NEXT POST

Comments

  1. Baron Schwartz says

    I don’t think so. I think the lack of indexing is the problem in that query. It will potentially examine an estimated 3 * 36 * 45 * 61 * 4 * 4 * 4 * 4 * 4 * 99 * 4 rows. Fix the cross joins (ALL) and it will almost surely run a lot faster.

  2. Karthikeyan says

    This EXPLAIN could be the right fit to the above note. Even though its a 25 record, it takes prolonged time.

    query result
    id select_type table type possible_keys key key_len ref rows Extra
    1 SIMPLE sol ALL (NULL) (NULL) (NULL) (NULL) 3 Using temporary; Using filesort
    1 SIMPLE dom ALL (NULL) (NULL) (NULL) (NULL) 36
    1 SIMPLE str ALL (NULL) (NULL) (NULL) (NULL) 45
    1 SIMPLE ctd ALL (NULL) (NULL) (NULL) (NULL) 61
    1 SIMPLE ctf eq_ref PRIMARY PRIMARY 4 lds.ctd.ctd_cou_id 1
    1 SIMPLE cli eq_ref PRIMARY PRIMARY 4 lds.ctf.ctf_cli_id 1
    1 SIMPLE cog eq_ref PRIMARY PRIMARY 4 lds.ctf.ctf_cog_id 1
    1 SIMPLE vis eq_ref PRIMARY PRIMARY 4 lds.ctf.ctf_vis_id 1
    1 SIMPLE ind eq_ref PRIMARY PRIMARY 4 lds.ctf.ctf_ind_id 1
    1 SIMPLE cts ALL (NULL) (NULL) (NULL) (NULL) 99 Using where
    1 SIMPLE cb eq_ref PRIMARY,cou_id PRIMARY 4 lds.ctf.ctf_cou_id 1 Using where

  3. Milos says

    You should give tips like this more often. :)

    p.s. Does me using a calculator to calculate my CAPTCHA question mae me any less smart?
    .. Maybe because I have a dedicated button for it on my laptop. :)

  4. William Newton says

    I’m sort of surprised more people didn’t get that right. Its often seen with using temporary ( on a poorly optimized query), but still. Oh wait, File – Sort. So they think its going to a File, file.

  5. George says

    Creating a temp table still requires there to be a table definition file created in the tmpdir.

  6. Salle says

    Trivial one ..

    My standard answer is “filesort refers to the algorithm MySQL is using and has nothing to do with files on disk” and if I’m in a mood to explain further or obliged to I add “you always get filesort with ORDER BY for example”. Yes it’s one of many bad names you can find in MySQL.

    It’s bad question for job interviews though especially the way you present it here.

    I will hire at once candidate who says “I’m under impression it means writing temp table to disk, but I might be wrong and will check it” and will show the door to a candidate who says “it’s badly named because your blog says so”.

    Matter of style indeed. Cheers.

  7. Ryan Lowe says

    @Salle Not to be pedantic you will not always get filesort with ORDER BY:

    mysql> CREATE TABLE t1 (id int unsigned NOT NULL);
    Query OK, 0 rows affected (0.05 sec)

    mysql> INSERT INTO t1 VALUES (1),(2);
    Query OK, 2 rows affected (0.00 sec)
    Records: 2 Duplicates: 0 Warnings: 0

    mysql> EXPLAIN SELECT * FROM t1 ORDER BY id\G
    *************************** 1. row ***************************
    id: 1
    select_type: SIMPLE
    table: t1
    type: ALL
    possible_keys: NULL
    key: NULL
    key_len: NULL
    ref: NULL
    rows: 2
    Extra: Using filesort
    1 row in set (0.00 sec)

    mysql> ALTER TABLE t1 ADD INDEX (id);
    Query OK, 2 rows affected (0.03 sec)
    Records: 2 Duplicates: 0 Warnings: 0

    mysql> EXPLAIN SELECT * FROM t1 ORDER BY id\G
    *************************** 1. row ***************************
    id: 1
    select_type: SIMPLE
    table: t1
    type: index
    possible_keys: NULL
    key: id
    key_len: 4
    ref: NULL
    rows: 2
    Extra: Using index
    1 row in set (0.00 sec)

    It is, as Baron noted, when an index cannot be used.

    — Ryan Lowe

  8. says

    How much space is required for the filesort? Presumably it needs all the columns in the result plus any used by HAVING?

    I am fairly convinced from observing its behaviour that it uses fixed-length records even for VARCHAR etc, which is very bad from the disc usage (and sort_buffer usage) perspective.

  9. says

    It does use fixed-length buffers, you’re right. And it allocates enough space to hold the worst case, so utf8 VARCHAR may use an unexpectedly large amount of space. It will be interesting to see what approach the Drizzle developers take to this problem, since the last word I saw on the mailing list said everything is going to be utf8.

  10. Salle says

    @Ryan

    You are right indeed. I missed couple of words in that post. Maybe not the words you expect because there are cases when “you always get filesort with ORDER BY”. There’s example below without the table structure which I leave to you to guess.

    The truth is that “always” and “never” are dangerous words in RDBMS realm. The proper answer in almost all cases is “it depends”, but even that is not always true.

    Example:

    mysql> INSERT INTO fs (id) VALUES (1),(2);
    Query OK, 2 rows affected (0.06 sec)
    Records: 2 Duplicates: 0 Warnings: 0

    mysql> EXPLAIN SELECT * FROM t1 ORDER BY id\G
    *************************** 1. row ***************************
    id: 1
    select_type: SIMPLE
    table: t1
    type: ALL
    possible_keys: NULL
    key: NULL
    key_len: NULL
    ref: NULL
    rows: 3
    Extra: Using filesort
    1 row in set (0.03 sec)

    mysql> ALTER TABLE t1 ADD INDEX (id);
    Query OK, 3 rows affected (0.09 sec)
    Records: 3 Duplicates: 0 Warnings: 0

    mysql> EXPLAIN SELECT * FROM t1 ORDER BY id\G
    *************************** 1. row ***************************
    id: 1
    select_type: SIMPLE
    table: t1
    type: ALL
    possible_keys: NULL
    key: NULL
    key_len: NULL
    ref: NULL
    rows: 3
    Extra: Using filesort
    1 row in set (0.00 sec)

  11. says

    @Salle

    If t1 has a compound primary key which doesn’t include id, then the index you’re creating won’t be used for the optimization.
    Here’s an example with another schema:

    mysql> create table t1 (var int not null, id int not null, data int not null, primary key (var,data));
    Query OK, 0 rows affected (0.27 sec)

    mysql> insert into t1 values (1,1,1),(1,1,5),(1,2,4),(2,1,3);
    Query OK, 4 rows affected (0.00 sec)
    Records: 4 Duplicates: 0 Warnings: 0

    mysql> explain select * from t1 order by id\G
    *************************** 1. row ***************************
    id: 1
    select_type: SIMPLE
    table: t1
    type: ALL
    possible_keys: NULL
    key: NULL
    key_len: NULL
    ref: NULL
    rows: 4
    Extra: Using filesort
    1 row in set (0.00 sec)

    mysql> alter table t1 add index(id);
    Query OK, 4 rows affected (0.01 sec)
    Records: 4 Duplicates: 0 Warnings: 0

    mysql> explain select * from t1 order by id\G
    *************************** 1. row ***************************
    id: 1
    select_type: SIMPLE
    table: t1
    type: ALL
    possible_keys: NULL
    key: NULL
    key_len: NULL
    ref: NULL
    rows: 4
    Extra: Using filesort
    1 row in set (0.00 sec)

    You can hint the optimizer:

    mysql> explain select * from t1 force index (id) order by id\G
    *************************** 1. row ***************************
    id: 1
    select_type: SIMPLE
    table: t1
    type: index
    possible_keys: NULL
    key: id
    key_len: 4
    ref: NULL
    rows: 4
    Extra:
    1 row in set (0.00 sec)

    Regards,

  12. says

    It is called “filesort.” There is no “big sorting.” And it is quicksort on chunks, followed by a k-way merge. It is not like binary search.

  13. Dmitry Dulepov says

    If people do not answer this question incorrectly, it simply means they do not read. The answer is clean and clear at the MySQL web site:

    =========================
    Using filesort

    MySQL must do an extra pass to find out how to retrieve the rows in sorted order. The sort is done by going through all rows according to the join type and storing the sort key and pointer to the row for all rows that match the WHERE clause. The keys then are sorted and the rows are retrieved in sorted order.
    =========================

    http://dev.mysql.com/doc/refman/5.0/en/using-explain.html

    In fact, such quaestions are VERY bad on the interview. It is ok to test skills during the interview but it is bad to ask for very low details. Much better to test if the person can ~find~ the answer to the question. Someone who knows is not necessarily the best. Someone who does not know but can find the answer is typically better.

  14. Salle says

    Dmitry,

    You will be amazed how many MySQL “experts” don’t read the manual. They either “know” or assume.

    At the other had digging out such information is difficult. MySQL manual is not easy to navigate and it changes all the time. What’s in there now could be added just couple of hours ago.

    So don’t blame people who didn’t know how to find such information.

    As for such questions being bad for interviews it depends on what do you want to understand about the candidate and even more what job you are interviewing for.

    If it’s about on-site consulting you definitely prefer one being able to answer everything on top of his head.

  15. thomas says

    Gee, I wonder what it is like to be in an interview and get this question from Baron?

    I think I always had the general idea, but reading these posts gave an extra layer.

    T

  16. rob says

    @Salle

    Reading through this thread what scares me is it appears you also have some misconceptions about MySQL and yet you “show people to the door” who in this case seem to have a good grasp of the answer. Hope I don’t have to cross paths I wouldn’t want to waste my time.

  17. says

    @Salle: Your Post #8

    I see you insert two records in the table ‘fs’ then EXPLAIN table ‘t1′ which has three records.
    In my opinion your example is not very clear. Moreover I get the behaviour as explained by Baron.
    Am I missing something?

    >>>
    (root@localhost) [test] CREATE TABLE t1 (id int unsigned NOT NULL);
    Query OK, 0 rows affected (0.13 sec)

    (root@localhost) [test] INSERT INTO t1 (id) VALUES (1),(2);
    Query OK, 2 rows affected (0.14 sec)
    Records: 2 Duplicates: 0 Warnings: 0

    (root@localhost) [test] EXPLAIN SELECT * FROM t1 ORDER BY id\G
    *************************** 1. row ***************************
    id: 1
    select_type: SIMPLE
    table: t1
    type: ALL
    possible_keys: NULL
    key: NULL
    key_len: NULL
    ref: NULL
    rows: 2
    Extra: Using filesort
    1 row in set (0.00 sec)

    (root@localhost) [test] ALTER TABLE t1 ADD INDEX (id);
    Query OK, 2 rows affected (0.33 sec)
    Records: 2 Duplicates: 0 Warnings: 0

    (root@localhost) [test] EXPLAIN SELECT * FROM t1 ORDER BY id\G
    *************************** 1. row ***************************
    id: 1
    select_type: SIMPLE
    table: t1
    type: index
    possible_keys: NULL
    key: id
    key_len: 4
    ref: NULL
    rows: 2
    Extra: Using index
    1 row in set (0.00 sec)

    (root@localhost) [test]
    <<<

  18. says

    This was very interesting to find, and it lead me to Percona for a consult. I knew what the filesort meant, but am having a hard time composing the index to reduce the number of rows searched.

    I would very much like the chance to interview to become a freelance consultant for Percona. Although my regular business keeps me very busy, I’m always willing to help other people when I can, and I have a son in college – so I could use the extra income!

  19. Mikhail says

    Here’s a trivial but interesting example:

    EXPLAIN SELECT 20 AS C UNION SELECT 10 AS C ORDER BY C

    id select_type table type possible_keys key key_len ref rows Extra
    1 PRIMARY NULL NULL NULL NULL NULL NULL NULL No tables used
    2 UNION NULL NULL NULL NULL NULL NULL NULL No tables used
    NULL UNION RESULT ALL NULL NULL NULL NULL NULL Using filesort

    Obviously there’s no merge-sorting here, and no hard drive files are in use.

  20. says

    I looked at the “ORDER BY OPTIMIZATION” section in MySQL manual but could not find a reason why MySQL does not use index in the following scenario:

    CREATE TABLE table1 (
    id bigint(20) NOT NULL AUTO_INCREMENT,
    a varchar(100) DEFAULT NULL,
    b varchar(100) DEFAULT NULL,
    PRIMARY KEY (id),
    KEY (a),
    KEY (b)
    ) ENGINE=InnoDB

    Table 1 has 512000 randomly generated rows. This query uses index (type: index, key: a, key_len: 103, Extra: ”):

    EXPLAIN SELECT * FROM table1 ORDER BY a LIMIT 1698, 1

    This one uses filesort (Extra: Using filesort):

    EXPLAIN SELECT * FROM table1 ORDER BY a LIMIT 1699, 1

    Can someone EXPLAIN if there is a rule that MySQL uses to determine whether to use index or filesort? On mysql manual I notice that the above query satisfies all conditions required for MySQL to use an index but it does not. I just see this note:

    With EXPLAIN SELECT … ORDER BY, you can check whether MySQL can use indexes to resolve the query. It cannot if you see Using filesort in the Extra column.

  21. human says

    Salman: Your experience is probably because your index does not fix in memory – read the manual with respect to innodb buffer and check your config.

Leave a Reply

Your email address will not be published. Required fields are marked *