Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

To SQL_CALC_FOUND_ROWS or not to SQL_CALC_FOUND_ROWS?

August 28, 2007

Author

Alexey Kovyrin

Benchmarks

Insight for Developers

Share this Post:

When we optimize clients’ SQL queries I pretty often see a queries with SQL_CALC_FOUND_ROWS option used. Many people think, that it is faster to use this option than run two separate queries: one – to get a result set, another – to count total number of rows. In this post I’ll try to check, is this true or not and when it is better to run two separate queries.

For my tests I’ve created following simple table:

CREATE TABLE `count_test` (
  `a` int(10) NOT NULL auto_increment,
  `b` int(10) NOT NULL,
  `c` int(10) NOT NULL,
  `d` varchar(32) NOT NULL,
  PRIMARY KEY  (`a`),
  KEY `bc` (`b`,`c`)
) ENGINE=MyISAM

CREATE TABLE `count_test` (

`a` int(10) NOT NULL auto_increment,

`b` int(10) NOT NULL,

`c` int(10) NOT NULL,

`d` varchar(32) NOT NULL,

PRIMARY KEY (`a`),

KEY `bc` (`b`,`c`)

) ENGINE=MyISAM

Test data has been created with following script (which creates 10M records):

mysql_connect("127.0.0.1", "root");
mysql_select_db("test");

for ($i = 0; $i < 10000000; $i++) {
    $b = $i % 1000;
    mysql_query("INSERT INTO count_test SET b=$b, c=ROUND(RAND()*10), d=MD5($i)");
}

mysql_connect("127.0.0.1", "root");

mysql_select_db("test");

for ($i = 0; $i < 10000000; $i++) {

$b = $i % 1000;

mysql_query("INSERT INTO count_test SET b=$b, c=ROUND(RAND()*10), d=MD5($i)");

}

First of all, let’s try to perform some query on this table using indexed column b in where clause:

mysql> SELECT SQL_NO_CACHE SQL_CALC_FOUND_ROWS * FROM count_test WHERE b = 555 ORDER BY c LIMIT 5;

1	mysql> SELECT SQL_NO_CACHE SQL_CALC_FOUND_ROWS * FROM count_test WHERE b = 555 ORDER BY c LIMIT 5;

Results with SQL_CALC_FOUND_ROWS are following: for each b value it takes 20-100 sec to execute uncached and 2-5 sec after warmup. Such difference could be explained by the I/O which required for this query – mysql accesses all 10k rows this query could produce without LIMIT clause.

Let’s check, how long it’d take if we’ll try to use two separate queries:

mysql> SELECT SQL_NO_CACHE * FROM count_test WHERE b = 666 ORDER BY c LIMIT 5;

1	mysql> SELECT SQL_NO_CACHE * FROM count_test WHERE b = 666 ORDER BY c LIMIT 5;

The results are following: it takes 0.01-0.11 sec to run this query first time and 0.00-0.02 sec for all consecutive runs.

And now – we need too check how long our COUNT query would take:

mysql> SELECT SQL_NO_CACHE count(*) FROM count_test WHERE b = 666;

1	mysql> SELECT SQL_NO_CACHE count(*) FROM count_test WHERE b = 666;

Result is really impressive here: 0.00-0.04 sec for all runs.

So, as we can see, total time for SELECT+COUNT (0.00-0.15 sec) is much less than execution time for original query (2-100 sec). Let’s take a look at EXPLAINs:

mysql> explain SELECT SQL_CALC_FOUND_ROWS * FROM count_test WHERE b = 999 ORDER BY c LIMIT 5;
+----+-------------+------------+------+---------------+------+---------+-------+-------+-------------+
| id | select_type | table      | type | possible_keys | key  | key_len | ref   | rows  | Extra       |
+----+-------------+------------+------+---------------+------+---------+-------+-------+-------------+
|  1 | SIMPLE      | count_test | ref  | bc            | bc   | 4       | const | 75327 | Using where | 
+----+-------------+------------+------+---------------+------+---------+-------+-------+-------------+
1 row in set (0.00 sec)

mysql> explain SELECT SQL_NO_CACHE count(*) FROM count_test WHERE b = 666;
+----+-------------+------------+------+---------------+------+---------+-------+------+-------------+
| id | select_type | table      | type | possible_keys | key  | key_len | ref   | rows | Extra       |
+----+-------------+------------+------+---------------+------+---------+-------+------+-------------+
|  1 | SIMPLE      | count_test | ref  | bc            | bc   | 4       | const | 5479 | Using index | 
+----+-------------+------------+------+---------------+------+---------+-------+------+-------------+
1 row in set (0.00 sec)

mysql> explain SELECT SQL_CALC_FOUND_ROWS * FROM count_test WHERE b = 999 ORDER BY c LIMIT 5;

+----+-------------+------------+------+---------------+------+---------+-------+-------+-------------+

+----+-------------+------------+------+---------------+------+---------+-------+-------+-------------+

| 1 | SIMPLE | count_test | ref | bc | bc | 4 | const | 75327 | Using where |

+----+-------------+------------+------+---------------+------+---------+-------+-------+-------------+

1 row in set (0.00 sec)

mysql> explain SELECT SQL_NO_CACHE count(*) FROM count_test WHERE b = 666;

+----+-------------+------------+------+---------------+------+---------+-------+------+-------------+

+----+-------------+------------+------+---------------+------+---------+-------+------+-------------+

| 1 | SIMPLE | count_test | ref | bc | bc | 4 | const | 5479 | Using index |

+----+-------------+------------+------+---------------+------+---------+-------+------+-------------+

1 row in set (0.00 sec)

Here is why our count was much faster – MySQL accessed our table data when calculated result set size even when this was not needed (after the first 5 rows specified in LIMIT clause). With count(*) it used index scan inly which is much faster here.

Just to be objective I’ve tried to perform this test without indexes (full scan) and with index on b column. Results were following:

1. Full-scan:
  - - 7 seconds for SQL_CALC_FOUND_ROWS.
  - - 7+7 seconds in case when two queries used.

1. Filesort:
  - - 1.8 seconds for SQL_CALC_FOUND_ROWS.
  - - 1.8+0.05 seconds in case when two queries used.

So, obvious conclusion from this simple test is: when we have appropriate indexes for WHERE/ORDER clause in our query, it is much faster to use two separate queries instead of one with SQL_CALC_FOUND_ROWS.

0 0 votes

Article Rating

50 Comments

Oldest

Newest Most Voted

Dathan Pattishall

18 years ago

Have you tried this same test under high concurrency. Say 12 active threads? I did a benchmark a while back (2 years ago using INNODB of course) under concurrency the trend of using SQL_CALC_FOUND_ROWS was a bit faster, but then again this finding may have been tainted by the INNODB scalability bug you found.

Nima

18 years ago

IMHO comparison is not true, in “EXPLAIN SELECT SQL_NO_CACHE count(*) FROM count_test WHERE b = 666;”, count is performed on index without access physical data. But in “EXPLAIN SELECT SQL_CALC_FOUND_ROWS * FROM count_test WHERE b = 999 ORDER BY c LIMIT 5;”, mysql uses index to determine number of rows to examine (through compound index on b and c) and returns a result set by picking physical data which takes some time (because columns are not included in index). I think use SQL_CALC_FOUND_ROWS in queries that use complex where (for example search queries with many parameters and paging result set) is much faster than sending two queries. I use a stored procedure that accept a dynamic sql query, an offset and a limit then use prepared statement to execute query and assigns FOUND_ROW() to a output parameter then give it to paging class to build paging links. I tested that my approach is much faster than issuing two queries. because second query which accepts limit and offset do the same thing but one with SQL_CALC_FOUND_ROWS just stores number of records before restricting rows. If I am wrong, please clarify it for me.
thanks.

Maarten

18 years ago

Which version of mysql server did you use? Does this apply to all current versions or is the SQL_CALC_FOUND_ROWS option more optimized in mysql 5 ?

Depending on the availability of indexes, mysql should be able to optimize a SQL_CALC_FOUND_ROWS query in such a way that users do not need to think about using a separate count(*) query anymore.

Admin

Peter Zaitsev

18 years ago

Dathan,

The things can be well different for different situations and different storage engines. In this case there is simply much more work which needs to be done if sql_calc_found_rows is used. And it is frequently so. In some cases you may have the fact you need only one real query instead of 2 (select found_rows() does not touch data) may affect things ie if innodb had to spend a lot of time in the queue.

Admin

Peter Zaitsev

18 years ago

Nima,

There are two very different things – using index to restict amount of rows examined and using index ONLY to get all data query needs.
For count(*) the data is not even touched, for original select it has to be accessed and accessed for each row rather than only few 5 rows which is where the difference comes from.

There are different cases – filesort and full table scan are mentioned examples.

use of Innodb primary key is another case worth to mention – it effectively has all columns so touching data does not add much.

Belive me SQL_CALC_FOUND_ROWS can make things slower in many cases – if this is your case I do not know, check it what makes sense. Just do not assume it is ALWAYS faster as I’ve seen people doing.

Admin

Peter Zaitsev

18 years ago

Marten,

This issues of being unable to switch to traversing index only exists in MySQL 5.0 and 5.1 In MySQL 5.2 there are some optimizations in similar area so I have not tested if it is the case.

Alexey Kovyrin

18 years ago

Marten,

All tests were performed on MySQL 5.0.45 on MacOS X 10.4.8.

Ryan

18 years ago

take a look at http://bugs.mysql.com/bug.php?id=18454. This was submitted off a support case I opened with mysql last year. It’s currectly marked as an enhancement

Perrin Harkins

18 years ago

I tested this on an application a few months ago and found the same: SQL_CALC_FOUND_ROWS was always much slower than two queries. The reason appears to be that MySQL does a good job of optimizing LIMIT queries by stopping them when the desired number of rows has been found. Using SQL_CALC_FOUND_ROWS defeats this optimization.

Sinisa Milivojevic

18 years ago

I happen to be the one who has introduced this feature in the first place. Even for MyISAM there are cases when using this option is faster then two queries. For InnoDB this option is even more usefull. But in any case, every query should be checked out for few different ranges. Largest problem that I have percieved with some customers is that they forget to remove this option when (due to application change) number of rows is no longer required.

Admin

Peter Zaitsev

18 years ago

Sure Sinisa, in some cases this feature makes sense – when filesort is done or temporary table is used for count(*) query as well as select query and few others – but it is too frequently assumed it always makes sense to use instead of 2 queries, which is how it logically should be. Unfortunately this is not the case.

And you’re right. Forgetting sql_calc_found_rows when you do not need it any more is a bad mistake to do.

Yvan

18 years ago

Hello,

You seem to compare 2 different queries (I’m not talking about the ORDER), as you filter on 2 different values. One is reading 75327 rows (the slower of course, and that’s your SQL_CALC_FOUND_ROWS), the other has only 5479 rows. So it’s obviously faster for the second one, even if you’re using indexes.
Can you try again the same test with the very same amount of rows limited by the WHERE clause?

Sergio

17 years ago

Yeap, I’ve tested and found that “2 query mode” (0,015 sec*) are about 10x times faster than “one with SQL_CALC_FOUND_ROWS” (0,153 sec*).

The application that I’ve tested has 164.520 rows (42MB table).

* generation time of the page where the queries were executed. That time is the minimal value which was gained after some page refreshes.

Sunfox

17 years ago

I have to say that COUNT(*) is faster so far.

Query with LIMIT and SQL_CALC_FOUND_ROWS: 0.80 – 1.00 seconds
Query without SQL_CALC_FOUND_ROWS: 0.06 seconds
Added COUNT(*) query: 0.09-0.12 seconds

vlado107

17 years ago

In first sql you are using “ORDER BY c”, this column is not indexed and mysql must pass all rows to order result

Mart

17 years ago

I guess the post is quite old, but the topic will stay alive…

@Nirma, it depends. I got a live example from a new service of mine, which applies tables with a few millions of rows. Something was killing it and I recognized it was SQL_CALC_FOUND_ROWS. After some analysis I found the following:

I’m retrieving paginated results from the quite large tables (WHERE column like pattern ORDER BY column LIMIT small OFFSET huge). When using SQL_CALC_FOUND_ROWS, the query will always need to calculate the complete result set for “column linke pattern ORDER BY column”. But actually the tables are updated in regular intervals, and the number of results would stay constant in the mean time. So I decided to drop SQL_CALC_FOUND_ROWS. Instead I will do a second query on a (stored) function, which delivers a count(*) for the pattern or, if the pattern was used before, a cached value of the count. Queries are now simply: quick.

In short: when using large tables, which stay constant for a while, avoid SQL_CALC_FOUND_ROWS; cache the counts of the result sets instead.

george

17 years ago

Could you please repeat this test with a proper case featuring 2x JOIN, a WHERE clause, a GROUP_BY, and an ORDER_BY, please?
my point is, your test is most probably misleading.

Brian

17 years ago

In my case COUNT(*) was the hog that’s been driving my search slow, and SQL_CALC_FOUND_ROWS saved me, and like george suggests I’ve been using ORDER BY, JOIN, not some childish case like the one presented here.

Menno

17 years ago

I found that when doing complex multitable selects, SQL_CALC_FOUND_ROWS adds virtually no extra execution time to the query.
Like Brian said, the example mentioned in this blog post is about as far from a real world scenario as possible.

Baron Schwartz

17 years ago

I think the comments on this blog can be answered with “your mileage may vary.” In the “real world” I have seen severe performance degradation many times. I’m sure there are cases where it doesn’t degrade performance, and I don’t notice because I’m looking at what IS the performance problem on a server.

ibrahim oÄŸuz

16 years ago

select count(*) from users group by city
for this query if you use count this not will give you true result. in this case you have to use mysql_num_rows or SQL_CALC_FOUND_ROWS , mysql_num_rows runs very slow. we have only one option it is SQL_CALC_FOUND_ROWS.

if any one have a solution let me know?

thanks.

Ani

16 years ago

Hello,

I would like to state a simple point here. Please visit the mysql official documentation here http://dev.mysql.com/doc/refman/5.1/en/information-functions.html

Please have a look at FOUND_ROWS() function. They have mentioned that it is much faster to use the query with sql_calc_found_rows rather than using a query again. And I don’t find any reason not to trust them.

Ani

Prasanna

16 years ago

One question or may be suggestion – count(*) will be slower than count(primary_key). What you think?

artyv

16 years ago

Prasanna, http://www.mysqlperformanceblog.com/2007/04/10/count-vs-countcol/

Matt

16 years ago

COUNT(field) for counting.
SQL_COUNT_FOUND_ROWS and FOUND_ROWS() for paging.

Simples.

Nadeem

16 years ago

I’m agree with “Matt” comment, yes actually SQL_COUNT_FOUND_ROWS use when someone required result + paging with single query in associative array so this always make a good sense to use SQL_COUNT_FOUND_ROWS otherwise COUNT(field with Key) use to take the result of your desire.

Hope this contribute!

Jim Heng

16 years ago

Agree with Nadeem

If you are using SELECT SQL_CALC_FOUND_ROWS, MySQL must calculate how many rows are in the full result set. However, this is faster than running the query again without LIMIT, because the result set need not be sent to the client.

Daniel

16 years ago

Hi.

FYI, in my benchmarks it’s about 50% faster to SELECT “null” instead of “*” if you just want to do a FOUND_ROWS():

SELECT /*! SQL_CALC_FOUND_ROWS */ NULL FROM aaa LIMIT 0 UNION ALL SELECT NULL FROM bbb LIMIT 0
instead of
SELECT /*! SQL_CALC_FOUND_ROWS */ * FROM aaa LIMIT 0 UNION ALL SELECT * FROM bbb LIMIT 0;

and then
SELECT FOUND_ROWS();

Peter Lieverdink

16 years ago

I’ve noticed for some time that I actually get opposite results to what the blog suggests. Running two queries is pretty much never faster than SQL_CALC_FOUND_ROWS when I use it.

I’ve run a bunch of benchmarks, and on larger datasets it’s pretty much twice as fast as COUNT(*) and then a SELECT.

http://cafuego.net/2010/05/26/fast-paging-real-world

zeojex

16 years ago

For a current project that i need to optimize, i use SQL_CALC_FOUND_ROWS.
I did some tests too and this solution was faster than using 2 queries even if i use appropriate indexes.
But my query are complex : 10 table-relations, almost 10 conditions in where clause, with group by and limit…so a very bad query !
You probably right on your test because you use simple query…??
I will do other tests to be sure.

Thx from france !

Vili

15 years ago

On a table with about 5 million rows (average row length is 71, engine: MyISAM) these are my measurements:

17.000s – SELECT SQL_CALC_FOUND_ROWS * FROM tab WHERE id>600000 LIMIT 10000
00.100s – SELECT * FROM tab WHERE id>600000 LIMIT 10000

02.250s – SELECT SQL_CALC_FOUND_ROWS * FROM tab WHERE id<600000 LIMIT 10000
00.100s – SELECT * FROM tab WHERE id<600000 LIMIT 10000

00.062s – SELECT SQL_CALC_FOUND_ROWS * FROM tab LIMIT 10000
00.062s – SELECT * FROM tab LIMIT 10000

The primary id field is INT(10) UNSIGNED NOT NULL AUTO_INCREMENT.

If there is a WHERE the SCFR is slowing down the query.
But the time difference between id… is also remarkable.

Paul

15 years ago

Hello,

I tried the SQL_CALC_FOUND_ROWS/FOUND_ROWS query method vs the select/count() query method, both using WHERE … LIKE ‘%…%’ , ORDER BY, and LIMIT X,Y on a real table with ~72,000 entries (MYISAM).

Result: The SQL_CALC_FOUND_ROWS/FOUND_ROWS method is 90% faster than the select/count() method in this case!

Scott Noyes

14 years ago

I just repeated the test shown here (except that I got bored and killed the initial set of INSERTs at 8M instead of 10M). After warming up the cache, I get 0.06 seconds for the SQL_CALC_FOUND_ROWS approach, and 0.02 seconds for the COUNT(*) approach. The EXPLAIN plans are identical.

Roy N

14 years ago

I just deployed sql_calc_rows on a high traffic large server (16gb ram ssd, yada yada, high availability setup with multiple slaves, etc) and it completely brought the server to its knees… select tables were running for hundreds of seconds “copying data into tmp table” or something like that…

in our pre-production tests, time for access was much better, but under load it could not perform unfortunately…

back to using double queries…

Roy N

14 years ago

PS: the query was fairly complex with a couple of joins (all indexed) despite not being extremely large. the datasets returned were about 100 – 500 matches on average, and the main table <100k rows. innodb.

Angelin

14 years ago

Here is a beautiful solution;

SELECT COUNT(*)
FROM (SELECT 1 FROM test WHERE some_conditions LIMIT any_no) alias_table;

This results count of total rows wihtout any optimizations.

Ryan

14 years ago

Using SQL_CALC_FOUND_ROWS defiantly appears to be situational on the query (at least now on Percona 5.5 server)

I have a complex query w. joins on a table with 10+ million rows. Results are

Data & Count Query = 1.8 seconds
1) .90 for the data
2) .90 for the count

Data & FOUND_ROWS = .913
1) .90 for the data with SQL_CALC_FOUND_ROWS. No performance impact when using SQL_CALC_FOUND_ROWS
2) .013 for FOUND_ROWS() query

Arvids

13 years ago

I had a run in with the SQL_CALC_FOUND_ROWS on a highload project (local social network) and the use case for the SQL_CALC_FOUND_ROWS is pretty easy – use it for queries that make a full scan of a table, in this case there is no overhead from the SQL_CALC_FOUND_ROWS because the server already preforms the scan. Any other query that uses indexes to filter data – stick with the second query with COUNT() – that will be much faster.

Leda Ferreira

13 years ago

I was doing queries against a 400MB+ database with phpMyAdmin, and the queries were taking minutes do run (most of times I got a timeout). I thought that I need to optimize the database and indexes. Then, I went to the terminal and issued a SHOW PROCESSLIST during one of these queries done in phpMyAdmin, and I found that phpMyAdmin was automatically inserting this SQL_CALC_FOUND_ROWS in my queries, without my knowledge, and that’s what caused the long delay.
When I made the same queries directly in the terminal, without SQL_CALC_FOUND_ROWS, the results came almost instantly.

ivanhoe

13 years ago

Keep in mind that this test is a bit outdated (2007.) and also covers only a very specific situation, using MyIsam engine and relatively small limit with a simple query on a big DB. I’ve tried replicating this tests, but using a client’s real-life (big) news publishing site with MySql 5.1 and InnoDB engine and real life queries (that I was optimizing anyway), and I’ve got a completely opposite results… which makes sense sort of, since SQL_CALC_FOUND_ROWS was added to optimize queries in the first place… my guess is that the code handling it got better since 2007., but please don’t trust me on this either, one should always run tests on an actual system to make sure the theory still applies…

Arvids

13 years ago

‘@ivanhoe that actually realy depends on the query. If you have a complex WHERE statement – SQL_CALC_FOUND_ROWS would be better. But if you have a query with a simple WHERE with indexed columns – two queries would be faster. The basic rule is – if you have “Temporary table” in your EXPLAIN – SQL_CALC_FOUND_ROWS will work fine.

discus fish facebook

12 years ago

When some one searches for his essential thing, so he/she wants
to be available that in detail, thus that thing is maintained
over here.

diskusfische

12 years ago

This is a topic that’s near to my heart… Cheers!

Exactly where are your contact details though?

ip checker

12 years ago

caused we confuse to calc rows. I found this article..
thanks

Pierre-Olivier Vares

11 years ago

Very interseting article.
I totally agree with your “don’t assume SCFR will be faster” sentence.
But you can’t neither assume that SCFR will be slower – and the conclusion seems to be written in this sense.

It seems it’s really, really depends of what your query is.
Depending of your query, the Explain for a SCFR will be different, or not.

I would note that
– the test table is a particular case. A table filled with more representative data could lead to different results
– the test case (WHERE, ORDER BY AND LIMIT clauses) is a particular case.
in fact, there are 2 measures that enter competitions in your test :
– the selectivity of the “where b = ” clause
– the selectivity of a “where c = 1” clause, (because that’s what you do when ordering by c, then limiting)
– you order by c (which has only 10 different values) and then limit. For consistency, you should always have a sort combination that makes it unique (sort by id in last field if necessary). What you’re getting is only “5 rows where c = 1 and b = 666”. Given your filling algorithm, there are roughly 1000 corresponding rows. Since you can’t assume mysql internal algorithms, you just get 5 random rows between these 1000.

It would be interesting to :
– create more random data, with a more representative distribution
– test different queries :
– more complex, as suggested in the comments
– with where and order clauses on different columns (with different selectivity)
– test with an offset to the limit : when on the last page of a paginator, mysql will anyway have to calculate all the rows. So use of SCFR can’t be slower than adding a COUNT query…

In conclusion, there is no perfect answer :
If Mysql needs to calculate all rows to do the query, then a SCFR will (at least should) be faster.
But in certain cases, as presented in this article (which, although my remarks, may be not-so-rare in practice), mysql will be able to optimize each of the 2 queries – and then 2 queries will (at least should) be faster than one with the SCFR.
And, moreover, prepare to get results as unattended as “for the 3 first pages, use 2 queries; for the following pages, use a SCFR” !

Charly

7 years ago

Thank you for helping !

Jadel

7 years ago

This is a real helping, thank’s

Maurice Bertio

7 years ago

***

sainty

7 years ago

Maybe the test case is not sufficient enough. Compare these two test case may get different result:
1.SELECT SQL_NO_CACHE SQL_CALC_FOUND_ROWS a, b, c, d FROM count_test WHERE b = 555 ORDER BY c LIMIT 5;

2. SELECT SQL_NO_CACHE a, b, c, d FROM count_test WHERE b = 555 ORDER BY c LIMIT 5;
SELECT count(*) FROM count_test WHERE b = 555 ORDER BY c LIMIT 5;

sainty

7 years ago

Reply to sainty

My last comment may have some problem, here is the correction:
Compare these two test case may get different result:
1.SELECT SQL_NO_CACHE SQL_CALC_FOUND_ROWS a, b, c FROM count_test WHERE b = 555 ORDER BY c LIMIT 5;

2. SELECT SQL_NO_CACHE a, b, c FROM count_test WHERE b = 555 ORDER BY c LIMIT 5;
SELECT count(*) FROM count_test WHERE b = 555 ORDER BY c LIMIT 5;

The difference between these two case is whether all the select keys are in the index.