Non-Deterministic Order for SELECT with LIMIT

April 7, 2017
Author
Alexander Rubin
Share this Post:

Non-Deterministic OrderIn this blog, we’ll look at how queries in systems with parallel processing can return rows in a non-deterministic order (and how to fix it).

Short story:

Do not rely on the order of your rows if your query does not use ORDER BY. Even with ORDER BY, rows with the same values can be sorted differently. To fix this issue, always add ORDER BY ... ID when you have LIMIT N.

Long story:

While playing with MariaDB ColumnStore and Yandex ClickHouse, I came across a very simple case. In MariaDB ColumnStore and Yandex ClickHouse, the simple query (which I used for testing) select * from <table> where ... limit 10  returns results in a non-deterministic order.

This is totally expected. SELECT * from <table> WHERE ... LIMIT 10 means “give me any ten rows, and as there is no order they can be anything that matches the WHERE condition.” What we used to get in vanilla MySQL + InnoDB, however, is different: SELECT * from <table> WHERE ... LIMIT 10 gives us the rows sorted by primary key. Even with MyISAM in MySQL, if the data doesn’t change, the results are repeatable:

The results are ordered by ID here. In most cases, when the data doesn’t change and the query is the same, the order of results will be deterministic: open the file, read ten lines from the beginning, close the file. (When using indexes it can be different if different indexes are selected. For the same query, the database will probably select the same index if the data is static.)

But this is still not guaranteed. Here’s why: imagine we now introduce parallelism, split our table into ten pieces and run ten threads. Each will work on its own piece. Then, unless we specifically wait on each thread to finish and order the results, it will give us a random order of results. Let’s simulate this in a bash script:

The script’s purpose is to perform aggregation faster by taking advantage of multiple CPU cores on the server in parallel. It opens ten connections to MySQL and returns results as they arrive:

In this case, the faster queries arrive first and are on top, with the slower on the bottom. If the network was involved (think about different nodes in a cluster connected via a network), then the response time from each node can be much more random due to non-deterministic network latency.

In the case of MariaDB ColumnStore or Yandex Clickhouse, where scans are performed in parallel, the order of the results can also be non-deterministic. An example for ClickHouse:

An example for ColumnStore:

In another case (bug#72076) we use  ORDER BY, but the rows being sorted are the same. MySQL 5.7 contains the “ORDER BY” + LIMIT optimization:

Conclusion
In systems that involve parallel processing, queries like select * from table where ... limit N can return rows in a random order (even if the data doesn’t change between the calls). This is due to the async nature of the parallel calls: whoever serves results faster wins. In MySQL, you run select * from table limit 1 three times and get the same data in the same order (especially if the table data doesn’t change), but the response time will be slightly different. In a massively parallel system, the difference in the response times can cause the rows to be ordered differently.

To fix: always add ORDER BY ... ID  when you have LIMIT N.

Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Far
Enough.

Said no pioneer ever.
MySQL, PostgreSQL, InnoDB, MariaDB, MongoDB and Kubernetes are trademarks for their respective owners.
© 2026 Percona All Rights Reserved