Downloads

Blog

Non-Deterministic Order for SELECT with LIMIT

April 7, 2017

Author

Alexander Rubin

Insight for DBAs

MySQL

Share this Post:

In this blog, we’ll look at how queries in systems with parallel processing can return rows in a non-deterministic order (and how to fix it).

Short story:

Do not rely on the order of your rows if your query does not use ORDER BY. Even with ORDER BY, rows with the same values can be sorted differently. To fix this issue, always add ORDER BY ... ID when you have LIMIT N.

Long story:

While playing with MariaDB ColumnStore and Yandex ClickHouse, I came across a very simple case. In MariaDB ColumnStore and Yandex ClickHouse, the simple query (which I used for testing) select * from <table> where ... limit 10 returns results in a non-deterministic order.

This is totally expected. SELECT * from <table> WHERE ... LIMIT 10 means “give me any ten rows, and as there is no order they can be anything that matches the WHERE condition.” What we used to get in vanilla MySQL + InnoDB, however, is different: SELECT * from <table> WHERE ... LIMIT 10 gives us the rows sorted by primary key. Even with MyISAM in MySQL, if the data doesn’t change, the results are repeatable:

mysql> select * from City where CountryCode = 'USA' limit 10;
+------+--------------+-------------+--------------+------------+
| ID   | Name         | CountryCode | District     | Population |
+------+--------------+-------------+--------------+------------+
| 3793 | New York     | USA         | New York     |    8008278 |
| 3794 | Los Angeles  | USA         | California   |    3694820 |
| 3795 | Chicago      | USA         | Illinois     |    2896016 |
| 3796 | Houston      | USA         | Texas        |    1953631 |
| 3797 | Philadelphia | USA         | Pennsylvania |    1517550 |
| 3798 | Phoenix      | USA         | Arizona      |    1321045 |
| 3799 | San Diego    | USA         | California   |    1223400 |
| 3800 | Dallas       | USA         | Texas        |    1188580 |
| 3801 | San Antonio  | USA         | Texas        |    1144646 |
| 3802 | Detroit      | USA         | Michigan     |     951270 |
+------+--------------+-------------+--------------+------------+
10 rows in set (0.01 sec)

mysql> select * from City where CountryCode = 'USA' limit 10;
+------+--------------+-------------+--------------+------------+
| ID   | Name         | CountryCode | District     | Population |
+------+--------------+-------------+--------------+------------+
| 3793 | New York     | USA         | New York     |    8008278 |
| 3794 | Los Angeles  | USA         | California   |    3694820 |
| 3795 | Chicago      | USA         | Illinois     |    2896016 |
| 3796 | Houston      | USA         | Texas        |    1953631 |
| 3797 | Philadelphia | USA         | Pennsylvania |    1517550 |
| 3798 | Phoenix      | USA         | Arizona      |    1321045 |
| 3799 | San Diego    | USA         | California   |    1223400 |
| 3800 | Dallas       | USA         | Texas        |    1188580 |
| 3801 | San Antonio  | USA         | Texas        |    1144646 |
| 3802 | Detroit      | USA         | Michigan     |     951270 |
+------+--------------+-------------+--------------+------------+
10 rows in set (0.00 sec)

mysql> select * from City where CountryCode = 'USA' limit 10;

+------+--------------+-------------+--------------+------------+

+------+--------------+-------------+--------------+------------+

| 3793 | New York | USA | New York | 8008278 |

| 3794 | Los Angeles | USA | California | 3694820 |

| 3795 | Chicago | USA | Illinois | 2896016 |

| 3796 | Houston | USA | Texas | 1953631 |

| 3797 | Philadelphia | USA | Pennsylvania | 1517550 |

| 3798 | Phoenix | USA | Arizona | 1321045 |

| 3799 | San Diego | USA | California | 1223400 |

| 3800 | Dallas | USA | Texas | 1188580 |

| 3801 | San Antonio | USA | Texas | 1144646 |

| 3802 | Detroit | USA | Michigan | 951270 |

+------+--------------+-------------+--------------+------------+

10 rows in set (0.01 sec)

mysql> select * from City where CountryCode = 'USA' limit 10;

+------+--------------+-------------+--------------+------------+

+------+--------------+-------------+--------------+------------+

| 3793 | New York | USA | New York | 8008278 |

| 3794 | Los Angeles | USA | California | 3694820 |

| 3795 | Chicago | USA | Illinois | 2896016 |

| 3796 | Houston | USA | Texas | 1953631 |

| 3797 | Philadelphia | USA | Pennsylvania | 1517550 |

| 3798 | Phoenix | USA | Arizona | 1321045 |

| 3799 | San Diego | USA | California | 1223400 |

| 3800 | Dallas | USA | Texas | 1188580 |

| 3801 | San Antonio | USA | Texas | 1144646 |

| 3802 | Detroit | USA | Michigan | 951270 |

+------+--------------+-------------+--------------+------------+

10 rows in set (0.00 sec)

The results are ordered by ID here. In most cases, when the data doesn’t change and the query is the same, the order of results will be deterministic: open the file, read ten lines from the beginning, close the file. (When using indexes it can be different if different indexes are selected. For the same query, the database will probably select the same index if the data is static.)

But this is still not guaranteed. Here’s why: imagine we now introduce parallelism, split our table into ten pieces and run ten threads. Each will work on its own piece. Then, unless we specifically wait on each thread to finish and order the results, it will give us a random order of results. Let’s simulate this in a bash script:

for y in {2000..2010}
do
  sql="select YearD, count(*), sum(ArrDelayMinutes) from ontime where yeard=$y and carrier='DL' limit 1"
  mysql -Nb ontime -e "$sql" &
done
wait

for y in {2000..2010}

sql="select YearD, count(*), sum(ArrDelayMinutes) from ontime where yeard=$y and carrier='DL' limit 1"

mysql -Nb ontime -e "$sql" &

done

wait

The script’s purpose is to perform aggregation faster by taking advantage of multiple CPU cores on the server in parallel. It opens ten connections to MySQL and returns results as they arrive:

$ ./parallel_test.sh 
2009    428007  5003632
2007    475889  5915443
2008    451931  5839658
2006    506086  6219275
2003    660617  5917398
2004    687638  8384465
2002    728758  7381821
2005    658302  8143431
2010    732973  9169167
2001    835236  8339276
2000    908029  11105058
$ ./parallel_test.sh 
2009    428007  5003632
2008    451931  5839658
2007    475889  5915443
2006    506086  6219275
2005    658302  8143431
2003    660617  5917398
2004    687638  8384465
2002    728758  7381821
2010    732973  9169167
2001    835236  8339276
2000    908029  11105058

$ ./parallel_test.sh

2009 428007 5003632

2007 475889 5915443

2008 451931 5839658

2006 506086 6219275

2003 660617 5917398

2004 687638 8384465

2002 728758 7381821

2005 658302 8143431

2010 732973 9169167

2001 835236 8339276

2000 908029 11105058

$ ./parallel_test.sh

2009 428007 5003632

2008 451931 5839658

2007 475889 5915443

2006 506086 6219275

2005 658302 8143431

2003 660617 5917398

2004 687638 8384465

2002 728758 7381821

2010 732973 9169167

2001 835236 8339276

2000 908029 11105058

In this case, the faster queries arrive first and are on top, with the slower on the bottom. If the network was involved (think about different nodes in a cluster connected via a network), then the response time from each node can be much more random due to non-deterministic network latency.

In the case of MariaDB ColumnStore or Yandex Clickhouse, where scans are performed in parallel, the order of the results can also be non-deterministic. An example for ClickHouse:

:) select * from wikistat where project = 'en' limit 1;

SELECT *
FROM wikistat
WHERE project = 'en'
LIMIT 1

┌───────date─┬────────────────time─┬─project─┬─subproject─┬─path─────┬─hits─┬──size─┐
│ 2008-07-11 │ 2008-07-11 14:00:00 │ en      │            │ Retainer │   14 │ 96857 │
└────────────┴─────────────────────┴─────────┴────────────┴──────────┴──────┴───────┘

1 rows in set. Elapsed: 0.031 sec. Processed 2.03 million rows, 41.40 MB (65.44 million rows/s., 1.33 GB/s.)

:) select * from wikistat where project = 'en' limit 1;

SELECT *
FROM wikistat
WHERE project = 'en'
LIMIT 1

┌───────date─┬────────────────time─┬─project─┬─subproject─┬─path─────────┬─hits─┬───size─┐
│ 2008-12-15 │ 2008-12-15 14:00:00 │ en      │            │ Graeme_Obree │   18 │ 354504 │
└────────────┴─────────────────────┴─────────┴────────────┴──────────────┴──────┴────────┘

1 rows in set. Elapsed: 0.023 sec. Processed 1.90 million rows, 68.19 MB (84.22 million rows/s., 3.02 GB/s.)

:) select * from wikistat where project = 'en' limit 1;

SELECT *

FROM wikistat

WHERE project = 'en'

LIMIT 1

┌───────date─┬────────────────time─┬─project─┬─subproject─┬─path─────┬─hits─┬──size─┐

│ 2008-07-11 │ 2008-07-11 14:00:00 │ en │ │ Retainer │ 14 │ 96857 │

└────────────┴─────────────────────┴─────────┴────────────┴──────────┴──────┴───────┘

1 rows in set. Elapsed: 0.031 sec. Processed 2.03 million rows, 41.40 MB (65.44 million rows/s., 1.33 GB/s.)

:) select * from wikistat where project = 'en' limit 1;

SELECT *

FROM wikistat

WHERE project = 'en'

LIMIT 1

┌───────date─┬────────────────time─┬─project─┬─subproject─┬─path─────────┬─hits─┬───size─┐

│ 2008-12-15 │ 2008-12-15 14:00:00 │ en │ │ Graeme_Obree │ 18 │ 354504 │

└────────────┴─────────────────────┴─────────┴────────────┴──────────────┴──────┴────────┘

1 rows in set. Elapsed: 0.023 sec. Processed 1.90 million rows, 68.19 MB (84.22 million rows/s., 3.02 GB/s.)

An example for ColumnStore:

MariaDB [wikistat]> select * from wikistat limit 1
date: 2008-01-18
time: 2008-01-18 06:00:00
project: en
subproject: NULL
path: Doctor_Who:_Original_Television_Soundtrack
hits: 2
size: 2
1 row in set (1.63 sec)
MariaDB [wikistat]> select * from wikistat limit 1
date: 2008-01-31
time: 2008-01-31 10:00:00
project: de
subproject: NULL
path: Haramaki
hits: 1
size: 1
1 row in set (1.58 sec)

MariaDB [wikistat]> select * from wikistat limit 1

date: 2008-01-18

time: 2008-01-18 06:00:00

project: en

subproject: NULL

path: Doctor_Who:_Original_Television_Soundtrack

size: 2

1 row in set (1.63 sec)

MariaDB [wikistat]> select * from wikistat limit 1

date: 2008-01-31

time: 2008-01-31 10:00:00

project: de

subproject: NULL

path: Haramaki

size: 1

1 row in set (1.58 sec)

In another case (bug#72076) we use ORDER BY, but the rows being sorted are the same. MySQL 5.7 contains the “ORDER BY” + LIMIT optimization:

If multiple rows have identical values in the ORDER BY columns, the
server is free to return those rows in any order, and may do so
differently depending on the overall execution plan. In other words,
the sort order of those rows is nondeterministic with respect to
the nonordered columns.

If multiple rows have identical values in the ORDER BY columns, the

server is free to return those rows in any order, and may do so

differently depending on the overall execution plan. In other words,

the sort order of those rows is nondeterministic with respect to

the nonordered columns.

Conclusion
In systems that involve parallel processing, queries like select * from table where ... limit N can return rows in a random order (even if the data doesn’t change between the calls). This is due to the async nature of the parallel calls: whoever serves results faster wins. In MySQL, you run select * from table limit 1 three times and get the same data in the same order (especially if the table data doesn’t change), but the response time will be slightly different. In a massively parallel system, the difference in the response times can cause the rows to be ordered differently.

To fix: always add ORDER BY ... ID when you have LIMIT N.

0 0 votes

Article Rating

2 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Rick James

9 years ago

SELECT with out an ORDER BY — in MyISAM without parallelism — will deliver results based on the order in the .MYD file. This will be either the order they were originally inserted into the table, or something more random if there have been updates/deletes since then.

In any Engine, if there is a “covering” index, then the rows are likely to be in the order of some such index.

One form of apparent parallelism is use of UNION. But, there is no parallelism; the first table is read, then the second, etc. So, it is deterministic. However, there are new optimisms that will break this assumption. So, again, do not depend on it.

Parallelism is virtually useless if you are I/O-bound.

And, as you found out, ColumnStore has a mind of its own. It has parallelism, but it avoids some of the I/O in two ways: by filtering at the “chunk” level, and by using good compression.

If “any 10” is OK, use LIMIT 10. Example: Give me a feel for what is in the table. Note that some UIs do this implicitly.

If a predictable 10 in needed, add an ORDER BY.

Adding an ORDER BY can cause severely worse performance — in the cases where no index can be used to handle the ORDER BY. In this case, all the relevant rows must be fetched into a temp table, then sorted. All this before peeling off 10.

When using UNION, put LIMIT 10 on each subquery and the UNION as a whole. Ditto for ORDER BY.

Øystein Grøvlen

9 years ago

Thanks Alexander, for pointing this out.

At least once a year, someone will file a bug report about non-deterministic ordering for queries where ORDER BY clause does not specify a deterministic ordering. The issue is most of the time related to paging. E.g., first a query does LIMIT 10, and then another query does OFFSET 10 LIMIT 10. If the optimizer choose to use different query plans for the two queries, the same row may appear in both result sets.

The query optimizer can not read people’s mind. If people need a deterministic ordering, they need to specify that.