Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

COUNT(*) vs COUNT(col)

April 10, 2007

Author

Peter Zaitsev

Insight for Developers

Share this Post:

Looking at how people are using COUNT(*) and COUNT(col) it looks like most of them think they are synonyms and just using what they happen to like, while there is substantial difference in performance and even query result.

Lets look at the following series of examples:

CREATE TABLE `fact` (
  `i` int(10) unsigned NOT NULL,
  `val` int(11) default NULL,
  `val2` int(10) unsigned NOT NULL,
  KEY `i` (`i`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1


mysql> select count(*) from fact;
+----------+
| count(*) |
+----------+
|  7340032 |
+----------+
1 row in set (0.00 sec)


mysql> select count(val) from fact;
+------------+
| count(val) |
+------------+
|    7216582 |
+------------+
1 row in set (1.17 sec)


mysql> select count(val2) from fact;
+-------------+
| count(val2) |
+-------------+
|     7340032 |
+-------------+
1 row in set (0.00 sec)

CREATE TABLE `fact` (

`i` int(10) unsigned NOT NULL,

`val` int(11) default NULL,

`val2` int(10) unsigned NOT NULL,

KEY `i` (`i`)

) ENGINE=MyISAM DEFAULT CHARSET=latin1

mysql> select count(*) from fact;

+----------+

| count(*) |

+----------+

| 7340032 |

+----------+

1 row in set (0.00 sec)

mysql> select count(val) from fact;

+------------+

| count(val) |

+------------+

| 7216582 |

+------------+

1 row in set (1.17 sec)

mysql> select count(val2) from fact;

+-------------+

| count(val2) |

+-------------+

| 7340032 |

+-------------+

1 row in set (0.00 sec)

As this is MYISAM table MySQL has cached number of rows in this table. This is why it is able to instantly answer COUNT(*) and
COUNT(val2) queries, but not COUNT(val). Why ? Because val column is not defined as NOT NULL there can be some NULL values in it and so MySQL have to perform table scan to find out. This is also why result is different for the second query.

So COUNT(*) and COUNT(col) queries not only could have substantial performance performance differences but also ask different question.

MySQL Optimizer does good job in this case doing full table scan only if it is needed because column can be NULL.

Now lets try few more queries:

mysql> select count(*) from fact where i<10000;
+----------+
| count(*) |
+----------+
|   733444 |
+----------+
1 row in set (0.40 sec)

mysql> explain select count(*) from fact where i<10000 G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: fact
         type: range
possible_keys: i
          key: i
      key_len: 4
          ref: NULL
         rows: 691619
        Extra: Using where; Using index
1 row in set (0.00 sec)


mysql> select count(val) from fact where i<10000;
+------------+
| count(val) |
+------------+
|     720934 |
+------------+
1 row in set (1.29 sec)

mysql> explain select count(val) from fact where i<10000 G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: fact
         type: range
possible_keys: i
          key: i
      key_len: 4
          ref: NULL
         rows: 691619
        Extra: Using where
1 row in set (0.00 sec)

mysql> select count(val2) from fact where i<10000;
+-------------+
| count(val2) |
+-------------+
|      733444 |
+-------------+
1 row in set (1.30 sec)

mysql> explain select count(val2) from fact where i<10000 G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: fact
         type: range
possible_keys: i
          key: i
      key_len: 4
          ref: NULL
         rows: 691619
        Extra: Using where
1 row in set (0.00 sec)

mysql> select count(*) from fact where i<10000;

+----------+

| count(*) |

+----------+

| 733444 |

+----------+

1 row in set (0.40 sec)

mysql> explain select count(*) from fact where i<10000 G

*************************** 1. row ***************************

id: 1

select_type: SIMPLE

table: fact

type: range

possible_keys: i

key: i

key_len: 4

ref: NULL

rows: 691619

Extra: Using where; Using index

1 row in set (0.00 sec)

mysql> select count(val) from fact where i<10000;

+------------+

| count(val) |

+------------+

| 720934 |

+------------+

1 row in set (1.29 sec)

mysql> explain select count(val) from fact where i<10000 G

*************************** 1. row ***************************

id: 1

select_type: SIMPLE

table: fact

type: range

possible_keys: i

key: i

key_len: 4

ref: NULL

rows: 691619

Extra: Using where

1 row in set (0.00 sec)

mysql> select count(val2) from fact where i<10000;

+-------------+

| count(val2) |

+-------------+

| 733444 |

+-------------+

1 row in set (1.30 sec)

mysql> explain select count(val2) from fact where i<10000 G

*************************** 1. row ***************************

id: 1

select_type: SIMPLE

table: fact

type: range

possible_keys: i

key: i

key_len: 4

ref: NULL

rows: 691619

Extra: Using where

1 row in set (0.00 sec)

As you can see even if you have where clause performance for count(*) and count(col) can be significantly different. In fact this example shows just 3 times performance difference because all data fits in memory, for IO bound workloads you frequently can see 10 and even 100 times performance difference in this case.

The thing is count(*) query can use covering index even while count(col) can’t. Of course you can extend index to be (i,val) and get query to be index covered again but I would use this workaround only if you can’t change the query (ie it is third party application) or in case column name is in the query for reason, and you really need count of non-NULL values.

It is worth to note in this case MySQL Optimizer does not do too good job optimizing the query. One could notice (val2) column is not null so count(val2) is same as count(*) and so the query could be run as index covered query. It does not and both queries have to perform row reads in this case.

mysql> alter table fact drop key i, add key(i,val);
Query OK, 7340032 rows affected (37.15 sec)
Records: 7340032  Duplicates: 0  Warnings: 0


mysql> select count(val) from fact where i<10000;
+------------+
| count(val) |
+------------+
|     720934 |
+------------+
1 row in set (0.78 sec)

mysql> alter table fact drop key i, add key(i,val);

Query OK, 7340032 rows affected (37.15 sec)

Records: 7340032 Duplicates: 0 Warnings: 0

mysql> select count(val) from fact where i<10000;

+------------+

| count(val) |

+------------+

| 720934 |

+------------+

1 row in set (0.78 sec)

As you can see extending index helps in this case but it makes query about 2 times slower compared to count(*) one. This is probably because index becomes about two times longer in this case.

0 0 votes

Article Rating

42 Comments

Oldest

Newest Most Voted

Peter Forret

19 years ago

And what about “SELECT count(1) FROM fact;” – is that treated the same way as count(*) ?
I was taught (15 yrs ago) that count(1) was faster

Doki

19 years ago

What about innodb ? innodb doesnt have row count cached or ?

Author

Peter Zaitsev

19 years ago

Right. Innodb does not have cached row count. so count(*) without where clause is slow with Innodb tables.

Author

Peter Zaitsev

19 years ago

Peter,

I can’t see any reason why count(1) would be faster. May be it was specific to the database you was using.

SELECT 1 FROM TABLE WHERE ID=123 is faster way to check if row exists than SELECT * but it is other story.

Ken Jacobs

19 years ago

The statement that “COUNT(*) with InnoDB is slow without a WHERE clause because InnoDB doesn’t cache the row count” only tells part of the story. No, it doesn’t cache the row count. Wish it were that simple. The whole situation is more complex. The points Peter makes about use of indexes, covering indexes and NOT NULL are true, and apply to InnoDB as well as MyISAM. But there is more to the story.

It is easy for MyISAM to maintain the count of rows since only one transaction executes at any time, because MyISAM does table-level locking. After a transaction commits, it is easy for MyISAM to update the row count for a table (taking into account inserts and deletes done by the transaction) before the next transaction starts. Of course, the downside to table-level locking is a limit on throughput in the presence of updates (including deletes and inserts).

However, with InnoDB, which uses row-level locking, there are multiple concurrently executing (and as-yet uncommitted) transactions. This obviously improves throughput when transactions are making changes in the tables. To ensure consistency, each transaction has to see the table (including the number of rows in the table) as of the beginning of the transaction, plus its own changes. Thus the number of rows (and indeed the data) in the table is obviously potentially different for each concurrent transaction. Therefore there is no single “correct” total number of rows at any given time (unless there are no update transactions running). With multiple concurrent transactions, it is not really possible to cache the total number of rows in a table, without creating another point of serialization (making it so transactions execute one at a time).

Now one must also ask the question whether or not it is a valuable thing to know the accurate number of rows in an entire table at any time. What is the business/application reason for needing to know this? Is it more often the case that you WILL have a WHERE clause, to count the number of rows for a specific key (e.g., number of posts in a blog by a certain author)? That information isn’t cached, of course, and as the discussion above makes clear, in most cases you need to have the proper indexes to ensure good performance of queries that use a WHERE clause.

Further, there may be usable application-level design changes (e.g., triggers) to maintain a total count of rows in a “side table” if the number of inserts and delete transactions is limited, and you can accept the serialization of such transactions.

So this all implies that it is not best, in most situations, to choose a storage engine because of differences in the performance of counting all the rows in a table. This discussion is a good example of Peter’s other comments today … simple answers are fine as far as they go. But it always is a good idea to have a more complete understanding of the technology, so you can make the best decisions on how to apply it to your particular problem.

Author

Peter Zaitsev

19 years ago

Ken,

Than you for commenting I know this is your favorite topic.
In fact I already wrote a lot on this matter so I simply put it to “does not store row count” without too much explanations of why it is not as trivial as it seems.

http://www.mysqlperformanceblog.com/2006/12/01/count-for-innodb-tables/

Interesting enough with Solid, PBXT, Falcon transactional storage engines with multi versioning, will any of them implement fast count(*) without where clause ? We shall see.

Author

Peter Zaitsev

19 years ago

Oh also about triggers and serialization,

There is nice trick which helps you to get accurate count(*) via triggers but avoid serialization. Instead of 1 counter row have for example 10, update random of them and run select sum(count) to get the value.

danielj

19 years ago

so what is the answer to the first question, is COUNT(1) similar to COUNT(*)?

klaus

19 years ago

count(*) is similar to count(0) as you cas se here:

count(1) is similar to count(1) as you can se here:

So in conclusion count(*) == count(0) and count(*) != count(1)

Hope it helps.

Vadim Voituk

19 years ago

There is a post with similar problem but in russian
http://voituk.kiev.ua/2006/07/27/mysql-right-count-statement/

Raj

19 years ago

Just saw this thread and started wondering about how the fact that innodb uses clustered indexes would play into the equation. I find that, with a judicious choice of (col), count(col) may actually be a lot faster than count(*).

Consider this example (restarting mysql each time to flush the cache, and using O_DIRECT to bypass OS caching)

mysql> select count(*) from wp_comments;
+———-+
| count(*) |
+———-+
| 230120 |
+———-+
1 row in set (32.57 sec)

and if you look at the buffer pool used for the count(*):

Total memory allocated 589194920; in additional pool allocated 714496
Buffer pool size 32768
Free buffers 14104
Database pages 18664
Modified db pages 0

restart mysql and check the count of (rajid):

mysql> select count(rajid) from wp_comments;
+————–+
| count(rajid) |
+————–+
| 230120 |
+————–+
1 row in set (0.28 sec)

Much faster. Why? Because it only reads the index data, not the row data for the table when you use a secondary index. Check the buffer pool usage:

Total memory allocated 589194920; in additional pool allocated 714496
Buffer pool size 32768
Free buffers 32332
Database pages 436
Modified db pages 0

The column rajid btw, is just a copy of the PK column comment_ID with a unique index:

mysql> show create table wp_commentsG
*************************** 1. row ***************************
Table: wp_comments
Create Table: CREATE TABLE wp_comments (
comment_ID bigint(20) unsigned NOT NULL auto_increment,
comment_post_ID int(11) NOT NULL default ‘0’,
comment_author tinytext NOT NULL,
comment_author_email varchar(100) NOT NULL default ”,
comment_author_url varchar(200) NOT NULL default ”,
comment_author_IP varchar(100) NOT NULL default ”,
comment_date datetime NOT NULL default ‘0000-00-00 00:00:00’,
comment_date_gmt datetime NOT NULL default ‘0000-00-00 00:00:00’,
comment_content text NOT NULL,
comment_karma int(11) NOT NULL default ‘0’,
comment_approved enum(‘0′,’1′,’spam’) NOT NULL default ‘1’,
comment_agent varchar(255) NOT NULL default ”,
comment_type varchar(20) NOT NULL default ”,
comment_parent bigint(20) NOT NULL default ‘0’,
user_id bigint(20) NOT NULL default ‘0’,
rajid bigint(20) unsigned default NULL,
PRIMARY KEY (comment_ID),
UNIQUE KEY rajidx (rajid)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

mysql>

mysql> select count(*) from wp_comments;
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect…
Connection id: 2
Current database: test

+———-+
| count(*) |
+———-+
| 230120 |
+———-+
1 row in set (1.19 sec)

mysql>
and the buffer pool usage:

Total memory allocated 589194920; in additional pool allocated 721920
Buffer pool size 32768
Free buffers 32449
Database pages 319
Modified db pages 0

there’s only 3 distinct values in the table for this column, so its not actually counting the values but the pointers from this index:
mysql> select count( distinct (comment_approved) ) from wp_comments;
+————————————–+
| count( distinct (comment_approved) ) |
+————————————–+
| 3 |
+————————————–+
1 row in set (0.32 sec)

mysql>

cheers!
–Raj.

Author

Peter Zaitsev

19 years ago

Raj,

Good point. It looks like a bug to me as Innodb automatically picks up “shorter” indexes while in practice even first index you tried is shorter.

The other interesting point I should note – smaller index may not be faster for huge tables, because if you insert data in order primary key tree is often sequential while secondary indexes may be scattered having few sequential pages.

Pavel

19 years ago

What about count(id) (id is primary key)?
Is count(id) slower than count(*)?

Using innodb + where and myisam + where;

Author

Peter Zaitsev

19 years ago

Pavel,

For primary key it should be the same as primary key can’t have null values.
It is same as count(val2) in my example.

howard

19 years ago

hello, can i say:

the rule of thumb is…

1. use count(primary_key)
2. use count(indexed_field_and_not_null)
3. use count(indexed_field)
4. use count(*)

how abt the length of the index, does it matter?

thanks.

dalin

16 years ago

It should be noted that often when people use COUNT they really just want to know if there’s anything there and aren’t particularly interested in a accurate count. In this case a far more performant approach is
SELECT 1 FROM foo WHERE bar=”baz”
The query completes as soon as the first row is found where bar=”baz”.

dalin

16 years ago

And to correct my own typo I meant
SELECT 1 FROM foo WHERE bar=â€bazâ€ LIMIT 1

the limit being very important

kene

16 years ago

mysql> SELECT count(val2) FROM fact; it is true ?

kene

16 years ago

54.mysql> EXPLAIN SELECT count(val2) FROM fact WHERE i<10000 G
55.*************************** 1. row ***************************
56. id: 1
57. select_type: SIMPLE
58. TABLE: fact
59. type: range
60.possible_keys: i
61. KEY: i
62. key_len: 4
63. ref: NULL
64. rows: 691619
65. Extra: USING WHERE
66.1 row IN SET (0.00 sec)

it is true

TerryC

16 years ago

So can I say in this way: Always use count(*)?

Baron Schwartz

16 years ago

TerryC, if you are trying to count the number of rows, then YES.

ibrahim oguz

16 years ago

you say count(*) fast because it founds if there are indexes. for auto increment primary keys also is index. then if we use primary keys select count(id) from users; then it has to be fast too.
how about group by statement. nobody asked about it. select count(*) from users where country=’USA’ group by CITY;
how can we use it in innodb and myisam.
thanks.

ricky

16 years ago

How about select count(1) vs select count(id)

Which one is better?

Ilan Hazan

16 years ago

If you need to count num results but up to a specific limit (to improve performance), you can use the Limited-Count solution.
You can read more about it here: http://www.mysqldiary.com/limited-select-count/

Rishabh Jain

15 years ago

can I use count(*)>3

Foobarista

15 years ago

In InnoDB, does a simple count(*) do anything evil with the database cache? I ask because some database engine’s I’ve seen don’t do caching carefully and tablescans (or count(*) that does a tablescan) can end up pulling the whole table into the cache (and pushing more useful pages out). Smarter db engines will just reuse the same buffer page to do table scans (after making sure it isn’t in-cache already), but many don’t.

Author

Peter Zaitsev

15 years ago

Hi,

You’re looking at full index scan or full table scan in this case. Innodb has some protection from full table scan polluting cache, though it is done differently than using same page

deepa

19 years ago

1. SELECT 1 FROM table_name WHERE condition LIMIT 1
2. SELECT count(*) FROM table_name WHERE condition

In Innodb which query will be faster.

Author

Peter Zaitsev

19 years ago

Hi,

You’re looking at full index scan or full table scan in this case. Innodb has some protection from full table scan polluting cache, though it is done differently than using same page

Foobarista

19 years ago

OnLine

14 years ago

Hy,
I have 2 tables.
Can I make a single Select query to return the total number of rows in the first table and the rows (their data) of the seccond table?
And if this is posible, how does this query look like?

muhammed sekertekin

14 years ago

very nice article i used to think that you wont notice a a difference if you use count by row i guess to have to recode a lot of my codes…. Thank you from muhammed sekertekin

saaj

14 years ago

MySQL 5.5 seems to be optimized in concern of COUNT(*) queries. At the same machine, for the table of ~120K records I’ve ~2.4 seconds for 5.1 and 0.08 seconds for 5.5 for “SELECT COUNT(*) FROM table” query.

eash khatri

14 years ago

Hi Peter,

For pagination stuff in high traffic websites having huge tables, we can do it in two ways:-

select count(*) from table where condition1 and condition2;
select col1,col2 from table where condition1 and condition2 limit 10,20;

Just fire a single query and get all the data and then get the count at PHP end like:-
select col1,col2 from table where condition1 and condition2;

Please suggest which one is better idea in both INNODB and MyIsam

rashid

13 years ago

hi what will be the command to count field value who digits less then 8 like your command >select count(val2) from fact where i<10000; in my case i want to count those values who length is equal to 8 or less then 8 digit.

Emad

13 years ago

Good tip, Thank You.

Cristian Weiser

13 years ago

Can you help me with a simple thing?

i have one table named dealviewer and i want to count values in column message_sent if they are set as true. If they are null leave.

A need to set this values to a php variable

Iam newbie, so please post kind a ready code if possible.
Thanks in advance.

Jose Alejandro Realza

13 years ago

Very interesting, the problem is when we have a request about interacting is with several patterns search inside, example, table1.field, table2.field from table1, table2 where table1.field = table2.field

Hujšanje

12 years ago

Kind of big pile to understand, but I always had problems with understanding of why count(val) does not work in some cases. Many thanks!

Bhumi

11 years ago

Very important for a beginner. Thanks for this valuable content.

Nell

11 years ago

Thanks Pete that was enlightening ! Definitely using count(*) only if it doesn’t give me those irritating php parsing errors. Still new to this baby. 😀

lefaver

10 years ago

I am doing a statistical series of counts, it is a rather complex query, so hope this is okay.

set @Hrs := 240;
Select @total := Count(e.SNID) ,@totalH := Count(Distinct(e.SNID))
from endorsements e where e.timestamp >= (SELECT max(timestamp) FROM endorsements)-(3600 * @Hrs);

select concat(If (SNID = ‘Grand Total’,@Rank, @Rank := @Rank+1),’ of ‘,@totalH) AS Rank
,SNID ,Ledger ,Seeder ,Mail ,ERC ,DEX ,Vault
,DMP ,ENS ,Profiles ,Ratings ,Scripts
,Totals, percent
From (Select * From
(SELECT IFNULL(SNID,’Grand Total’) as SNID

, COUNT(servicetype = ‘2’ or Null) As Ledger
, COUNT(servicetype = ‘4’ or Null) AS Seeder
, COUNT(servicetype = ‘8’ or Null) AS Mail
, COUNT(servicetype = ’16’ or Null) AS ERC
, Count(servicetype = ’32’ or Null) AS DEX
, Count(servicetype = ’64’ or Null) AS Vault
, COUNT(servicetype = ‘128’ or Null) AS DMP
, COUNT(servicetype = ‘256’ or Null) AS ENS
, COUNT(servicetype = ‘512’ or Null) AS Profiles
, COUNT(servicetype = ‘1024’ or Null) AS Ratings
, COUNT(servicetype = ‘2048’ or Null) AS Scripts
, count(*) as Totals
, concat(cast((count(*)*100.0/@total) as Decimal(5,2)),’ %’) AS percent

FROM endorsements a
WHERE a.timestamp >= (SELECT max(timestamp) FROM endorsements)-(3600 * @Hrs)
GROUP BY SNID WITH ROLLUP ) q
ORDER by q.Totals desc ) t1
,(Select @Rank :=0) t2 ;

The count (servicetype = ‘2’ or Null) as Ledger ….. etc does work correctly, and I have not been able to find out why it does. It does not use the IS Null syntax and yet this works in MySql.

Lloyd