COUNT(*) vs COUNT(col)

COUNT(*) vs COUNT(col)

PREVIOUS POST
NEXT POST

Looking at how people are using COUNT(*) and COUNT(col) it looks like most of them think they are synonyms and just using what they happen to like, while there is substantial difference in performance and even query result.

Lets look at the following series of examples:

As this is MYISAM table MySQL has cached number of rows in this table. This is why it is able to instantly answer COUNT(*) and
COUNT(val2) queries, but not COUNT(val). Why ? Because val column is not defined as NOT NULL there can be some NULL values in it and so MySQL have to perform table scan to find out. This is also why result is different for the second query.

So COUNT(*) and COUNT(col) queries not only could have substantial performance performance differences but also ask different question.

MySQL Optimizer does good job in this case doing full table scan only if it is needed because column can be NULL.

Now lets try few more queries:

As you can see even if you have where clause performance for count(*) and count(col) can be significantly different. In fact this example shows just 3 times performance difference because all data fits in memory, for IO bound workloads you frequently can see 10 and even 100 times performance difference in this case.

The thing is count(*) query can use covering index even while count(col) can’t. Of course you can extend index to be (i,val) and get query to be index covered again but I would use this workaround only if you can’t change the query (ie it is third party application) or in case column name is in the query for reason, and you really need count of non-NULL values.

It is worth to note in this case MySQL Optimizer does not do too good job optimizing the query. One could notice (val2) column is not null so count(val2) is same as count(*) and so the query could be run as index covered query. It does not and both queries have to perform row reads in this case.

As you can see extending index helps in this case but it makes query about 2 times slower compared to count(*) one. This is probably because index becomes about two times longer in this case.

PREVIOUS POST
NEXT POST

Share this post

Comments (49)

  • deepa Reply

    1. SELECT 1 FROM table_name WHERE condition LIMIT 1
    2. SELECT count(*) FROM table_name WHERE condition

    In Innodb which query will be faster.

    April 10, 2007 at 12:00 am
  • Peter Zaitsev Reply

    Hi,

    You’re looking at full index scan or full table scan in this case. Innodb has some protection from full table scan polluting cache, though it is done differently than using same page

    April 10, 2007 at 12:00 am
  • Foobarista Reply

    In InnoDB, does a simple count(*) do anything evil with the database cache? I ask because some database engine’s I’ve seen don’t do caching carefully and tablescans (or count(*) that does a tablescan) can end up pulling the whole table into the cache (and pushing more useful pages out). Smarter db engines will just reuse the same buffer page to do table scans (after making sure it isn’t in-cache already), but many don’t.

    April 10, 2007 at 12:00 am
  • Peter Forret Reply

    And what about “SELECT count(1) FROM fact;” – is that treated the same way as count(*) ?
    I was taught (15 yrs ago) that count(1) was faster

    April 10, 2007 at 2:02 pm
  • Doki Reply

    What about innodb ? innodb doesnt have row count cached or ?

    April 10, 2007 at 3:16 pm
  • peter Reply

    Right. Innodb does not have cached row count. so count(*) without where clause is slow with Innodb tables.

    April 11, 2007 at 2:01 am
  • peter Reply

    Peter,

    I can’t see any reason why count(1) would be faster. May be it was specific to the database you was using.

    SELECT 1 FROM TABLE WHERE ID=123 is faster way to check if row exists than SELECT * but it is other story.

    April 11, 2007 at 2:37 am
  • Ken Jacobs Reply

    The statement that “COUNT(*) with InnoDB is slow without a WHERE clause because InnoDB doesn’t cache the row count” only tells part of the story. No, it doesn’t cache the row count. Wish it were that simple. The whole situation is more complex. The points Peter makes about use of indexes, covering indexes and NOT NULL are true, and apply to InnoDB as well as MyISAM. But there is more to the story.

    It is easy for MyISAM to maintain the count of rows since only one transaction executes at any time, because MyISAM does table-level locking. After a transaction commits, it is easy for MyISAM to update the row count for a table (taking into account inserts and deletes done by the transaction) before the next transaction starts. Of course, the downside to table-level locking is a limit on throughput in the presence of updates (including deletes and inserts).

    However, with InnoDB, which uses row-level locking, there are multiple concurrently executing (and as-yet uncommitted) transactions. This obviously improves throughput when transactions are making changes in the tables. To ensure consistency, each transaction has to see the table (including the number of rows in the table) as of the beginning of the transaction, plus its own changes. Thus the number of rows (and indeed the data) in the table is obviously potentially different for each concurrent transaction. Therefore there is no single “correct” total number of rows at any given time (unless there are no update transactions running). With multiple concurrent transactions, it is not really possible to cache the total number of rows in a table, without creating another point of serialization (making it so transactions execute one at a time).

    Now one must also ask the question whether or not it is a valuable thing to know the accurate number of rows in an entire table at any time. What is the business/application reason for needing to know this? Is it more often the case that you WILL have a WHERE clause, to count the number of rows for a specific key (e.g., number of posts in a blog by a certain author)? That information isn’t cached, of course, and as the discussion above makes clear, in most cases you need to have the proper indexes to ensure good performance of queries that use a WHERE clause.

    Further, there may be usable application-level design changes (e.g., triggers) to maintain a total count of rows in a “side table” if the number of inserts and delete transactions is limited, and you can accept the serialization of such transactions.

    So this all implies that it is not best, in most situations, to choose a storage engine because of differences in the performance of counting all the rows in a table. This discussion is a good example of Peter’s other comments today … simple answers are fine as far as they go. But it always is a good idea to have a more complete understanding of the technology, so you can make the best decisions on how to apply it to your particular problem.

    April 11, 2007 at 10:40 am
  • peter Reply

    Ken,

    Than you for commenting I know this is your favorite topic.
    In fact I already wrote a lot on this matter so I simply put it to “does not store row count” without too much explanations of why it is not as trivial as it seems.

    http://www.mysqlperformanceblog.com/2006/12/01/count-for-innodb-tables/

    Interesting enough with Solid, PBXT, Falcon transactional storage engines with multi versioning, will any of them implement fast count(*) without where clause ? We shall see.

    April 11, 2007 at 2:16 pm
  • peter Reply

    Oh also about triggers and serialization,

    There is nice trick which helps you to get accurate count(*) via triggers but avoid serialization. Instead of 1 counter row have for example 10, update random of them and run select sum(count) to get the value.

    April 11, 2007 at 2:18 pm
  • danielj Reply

    so what is the answer to the first question, is COUNT(1) similar to COUNT(*)?

    April 13, 2007 at 6:37 am
  • blog.psax.de » COUNT(*) oder doch lieber COUNT(foo)? Reply

    […] und COUNT(foo) gibt. Antwort auf diese Frage findet man, wie könnte es anders sein, im MySQL Performance Blog. Da mir gerade die Zeit fehlt das ganze auf deutsch nieder zu schreiben müsst ihr euch mit der […]

    April 14, 2007 at 2:07 pm
  • PHP Performance » COUNT(*) gegen COUNT(spalte) Reply

    […] Als Argument nimmt diese Funktion entweder ein ‘*’ oder eine bestimmte Spalte. Im MySQL Performance Blog wird verglichen, ob man dabei COUNT(*) oder COUNT(spalte) verwenden […]

    April 15, 2007 at 11:38 am
  • klaus Reply

    count(*) is similar to count(0) as you cas se here:

    mysql> explain extended select count(*) from mytable;
    +—-+————-+——-+——+—————+——+———+——+——+——————————+
    | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
    +—-+————-+——-+——+—————+——+———+——+——+——————————+
    | 1 | SIMPLE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away |
    +—-+————-+——-+——+—————+——+———+——+——+——————————+
    1 row in set, 1 warning (0.00 sec)

    mysql> show warnings;
    +——-+——+———————————————————+
    | Level | Code | Message |
    +——-+——+———————————————————+
    | Note | 1003 | select count(0) AS count(*) from database.mytable |
    +——-+——+———————————————————+
    1 row in set (0.00 sec)

    count(1) is similar to count(1) as you can se here:

    mysql> explain extended select count(1) from mytable;
    +—-+————-+——-+——+—————+——+———+——+——+——————————+
    | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
    +—-+————-+——-+——+—————+——+———+——+——+——————————+
    | 1 | SIMPLE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away |
    +—-+————-+——-+——+—————+——+———+——+——+——————————+
    1 row in set, 1 warning (0.00 sec)

    mysql> show warnings;
    +——-+——+———————————————————+
    | Level | Code | Message |
    +——-+——+———————————————————+
    | Note | 1003 | select count(1) AS count(1) from database.mytable |
    +——-+——+———————————————————+
    1 row in set (0.00 sec)

    So in conclusion count(*) == count(0) and count(*) != count(1)

    Hope it helps.

    April 19, 2007 at 8:29 pm
  • Vadim Voituk Reply

    There is a post with similar problem but in russian
    http://voituk.kiev.ua/2006/07/27/mysql-right-count-statement/

    April 24, 2007 at 4:33 am
  • Raj Reply

    Just saw this thread and started wondering about how the fact that innodb uses clustered indexes would play into the equation. I find that, with a judicious choice of (col), count(col) may actually be a lot faster than count(*).

    Consider this example (restarting mysql each time to flush the cache, and using O_DIRECT to bypass OS caching)

    mysql> explain select count(*) from wp_comments;
    +—-+————-+————-+——-+—————+———+———+——+——–+————-+
    | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
    +—-+————-+————-+——-+—————+———+———+——+——–+————-+
    | 1 | SIMPLE | wp_comments | index | NULL | PRIMARY | 8 | NULL | 424672 | Using index |
    +—-+————-+————-+——-+—————+———+———+——+——–+————-+
    1 row in set (0.11 sec)

    mysql> explain select count(rajid) from wp_comments;
    +—-+————-+————-+——-+—————+——–+———+——+——–+————-+
    | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
    +—-+————-+————-+——-+—————+——–+———+——+——–+————-+
    | 1 | SIMPLE | wp_comments | index | NULL | rajidx | 9 | NULL | 424672 | Using index |
    +—-+————-+————-+——-+—————+——–+———+——+——–+————-+
    1 row in set (0.00 sec)

    mysql> select count(*) from wp_comments;
    +———-+
    | count(*) |
    +———-+
    | 230120 |
    +———-+
    1 row in set (32.57 sec)

    and if you look at the buffer pool used for the count(*):

    Total memory allocated 589194920; in additional pool allocated 714496
    Buffer pool size 32768
    Free buffers 14104
    Database pages 18664
    Modified db pages 0

    restart mysql and check the count of (rajid):

    mysql> select count(rajid) from wp_comments;
    +————–+
    | count(rajid) |
    +————–+
    | 230120 |
    +————–+
    1 row in set (0.28 sec)

    Much faster. Why? Because it only reads the index data, not the row data for the table when you use a secondary index. Check the buffer pool usage:

    Total memory allocated 589194920; in additional pool allocated 714496
    Buffer pool size 32768
    Free buffers 32332
    Database pages 436
    Modified db pages 0

    The column rajid btw, is just a copy of the PK column comment_ID with a unique index:

    mysql> show create table wp_comments\G
    *************************** 1. row ***************************
    Table: wp_comments
    Create Table: CREATE TABLE wp_comments (
    comment_ID bigint(20) unsigned NOT NULL auto_increment,
    comment_post_ID int(11) NOT NULL default ‘0’,
    comment_author tinytext NOT NULL,
    comment_author_email varchar(100) NOT NULL default ”,
    comment_author_url varchar(200) NOT NULL default ”,
    comment_author_IP varchar(100) NOT NULL default ”,
    comment_date datetime NOT NULL default ‘0000-00-00 00:00:00’,
    comment_date_gmt datetime NOT NULL default ‘0000-00-00 00:00:00’,
    comment_content text NOT NULL,
    comment_karma int(11) NOT NULL default ‘0’,
    comment_approved enum(‘0′,’1′,’spam’) NOT NULL default ‘1’,
    comment_agent varchar(255) NOT NULL default ”,
    comment_type varchar(20) NOT NULL default ”,
    comment_parent bigint(20) NOT NULL default ‘0’,
    user_id bigint(20) NOT NULL default ‘0’,
    rajid bigint(20) unsigned default NULL,
    PRIMARY KEY (comment_ID),
    UNIQUE KEY rajidx (rajid)
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1
    1 row in set (0.00 sec)

    mysql>

    btw, if there is a smaller secondary index on the table, innodb is smart enough to use that for a count, even if it is not a unique index. For instance:
    mysql> explain select count(*) from wp_comments;
    +—-+————-+————-+——-+—————+———————-+———+——+——–+————-+
    | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
    +—-+————-+————-+——-+—————+———————-+———+——+——–+————-+
    | 1 | SIMPLE | wp_comments | index | NULL | comment_approved_idx | 1 | NULL | 228484 | Using index |
    +—-+————-+————-+——-+—————+———————-+———+——+——–+————-+
    1 row in set (0.23 sec)

    mysql> select count(*) from wp_comments;
    ERROR 2006 (HY000): MySQL server has gone away
    No connection. Trying to reconnect…
    Connection id: 2
    Current database: test

    +———-+
    | count(*) |
    +———-+
    | 230120 |
    +———-+
    1 row in set (1.19 sec)

    mysql>
    and the buffer pool usage:

    Total memory allocated 589194920; in additional pool allocated 721920
    Buffer pool size 32768
    Free buffers 32449
    Database pages 319
    Modified db pages 0

    there’s only 3 distinct values in the table for this column, so its not actually counting the values but the pointers from this index:
    mysql> select count( distinct (comment_approved) ) from wp_comments;
    +————————————–+
    | count( distinct (comment_approved) ) |
    +————————————–+
    | 3 |
    +————————————–+
    1 row in set (0.32 sec)

    mysql>

    cheers!
    –Raj.

    April 25, 2007 at 6:22 pm
  • peter Reply

    Raj,

    Good point. It looks like a bug to me as Innodb automatically picks up “shorter” indexes while in practice even first index you tried is shorter.

    The other interesting point I should note – smaller index may not be faster for huge tables, because if you insert data in order primary key tree is often sequential while secondary indexes may be scattered having few sequential pages.

    April 26, 2007 at 9:34 am
  • Pavel Reply

    What about count(id) (id is primary key)?
    Is count(id) slower than count(*)?

    Using innodb + where and myisam + where;

    April 28, 2007 at 4:02 pm
  • peter Reply

    Pavel,

    For primary key it should be the same as primary key can’t have null values.
    It is same as count(val2) in my example.

    April 29, 2007 at 7:05 pm
  • » Easy MySQL Performance Tips Reply

    […] Never do a COUNT(*) (or anything *, says Zach). Instead, replace the * with the name of the column you’re searching against (and is hopefully indexed). That way some queries can execute entirely in the keycache (while * forces MySQL to read every matching row from the table). […]

    June 6, 2007 at 5:12 am
  • howard Reply

    hello, can i say:

    the rule of thumb is…

    1. use count(primary_key)
    2. use count(indexed_field_and_not_null)
    3. use count(indexed_field)
    4. use count(*)

    how abt the length of the index, does it matter?

    thanks.

    June 23, 2007 at 3:33 am
  • لدى حلين لمعرفة عدد السجلات اى الحلول تنصحوتى - سوالف سوفت Reply

    […] هذا الموضوع بالنسبة Ù„ count أنصح بمتابعة هذه المدونه COUNT(*) vs COUNT(col) | MySQL Performance Blog __________________ I Love PHP (d4d@hotmail.com) […]

    September 12, 2008 at 4:31 am
  • 关于 count(*) å’Œ count(col) 的种种 - Pro Evolution Swine Reply

    […] MysqlPerformance的原文 […]

    April 30, 2009 at 1:24 am
  • Simple Pagination - CodeCall Programming Forum Reply

    […] from table" -"select count(*) from table" Check this out: COUNT(*) vs COUNT(col) | MySQL Performance Blog Good performance information there. __________________ CodeCall Blog | CodeCall Wiki | Write a […]

    October 1, 2009 at 5:47 am
  • dalin Reply

    It should be noted that often when people use COUNT they really just want to know if there’s anything there and aren’t particularly interested in a accurate count. In this case a far more performant approach is
    SELECT 1 FROM foo WHERE bar=”baz”
    The query completes as soon as the first row is found where bar=”baz”.

    November 2, 2009 at 6:54 pm
  • dalin Reply

    And to correct my own typo I meant
    SELECT 1 FROM foo WHERE bar=”baz” LIMIT 1

    the limit being very important

    November 2, 2009 at 6:55 pm
  • kene Reply

    mysql> SELECT count(val2) FROM fact; it is true ?

    December 28, 2009 at 4:55 am
  • kene Reply

    54.mysql> EXPLAIN SELECT count(val2) FROM fact WHERE i<10000 \G
    55.*************************** 1. row ***************************
    56. id: 1
    57. select_type: SIMPLE
    58. TABLE: fact
    59. type: range
    60.possible_keys: i
    61. KEY: i
    62. key_len: 4
    63. ref: NULL
    64. rows: 691619
    65. Extra: USING WHERE
    66.1 row IN SET (0.00 sec)

    it is true

    December 28, 2009 at 4:59 am
  • TerryC Reply

    So can I say in this way: Always use count(*)?

    December 28, 2009 at 6:39 am
  • Baron Schwartz Reply

    TerryC, if you are trying to count the number of rows, then YES.

    December 29, 2009 at 6:35 am
  • ibrahim oguz Reply

    you say count(*) fast because it founds if there are indexes. for auto increment primary keys also is index. then if we use primary keys select count(id) from users; then it has to be fast too.
    how about group by statement. nobody asked about it. select count(*) from users where country=’USA’ group by CITY;
    how can we use it in innodb and myisam.
    thanks.

    January 31, 2010 at 2:26 am
  • ricky Reply

    How about select count(1) vs select count(id)

    Which one is better?

    February 11, 2010 at 6:44 am
  • Ilan Hazan Reply

    If you need to count num results but up to a specific limit (to improve performance), you can use the Limited-Count solution.
    You can read more about it here: http://www.mysqldiary.com/limited-select-count/

    May 16, 2010 at 5:37 am
  • count(*) and count(column_name) in MySQL, what’s the diff? » Eric Lo Reply

    […] Performance Blog Post:COUNT(*) vs COUNT(col) Reference: […]

    May 20, 2010 at 8:29 pm
  • Rishabh Jain Reply

    can I use count(*)>3

    December 29, 2010 at 12:42 am
  • Foobarista Reply

    In InnoDB, does a simple count(*) do anything evil with the database cache? I ask because some database engine’s I’ve seen don’t do caching carefully and tablescans (or count(*) that does a tablescan) can end up pulling the whole table into the cache (and pushing more useful pages out). Smarter db engines will just reuse the same buffer page to do table scans (after making sure it isn’t in-cache already), but many don’t.

    March 22, 2011 at 2:21 pm
  • Peter Zaitsev Reply

    Hi,

    You’re looking at full index scan or full table scan in this case. Innodb has some protection from full table scan polluting cache, though it is done differently than using same page

    March 22, 2011 at 3:51 pm
  • OnLine Reply

    Hy,
    I have 2 tables.
    Can I make a single Select query to return the total number of rows in the first table and the rows (their data) of the seccond table?
    And if this is posible, how does this query look like?

    August 6, 2011 at 12:06 am
  • muhammed sekertekin Reply

    very nice article i used to think that you wont notice a a difference if you use count by row i guess to have to recode a lot of my codes…. Thank you from muhammed sekertekin

    October 15, 2011 at 5:48 pm
  • saaj Reply

    MySQL 5.5 seems to be optimized in concern of COUNT(*) queries. At the same machine, for the table of ~120K records I’ve ~2.4 seconds for 5.1 and 0.08 seconds for 5.5 for “SELECT COUNT(*) FROM table” query.

    January 22, 2012 at 5:31 am
  • eash khatri Reply

    Hi Peter,

    For pagination stuff in high traffic websites having huge tables, we can do it in two ways:-

    select count(*) from table where condition1 and condition2;
    select col1,col2 from table where condition1 and condition2 limit 10,20;

    OR

    Just fire a single query and get all the data and then get the count at PHP end like:-
    select col1,col2 from table where condition1 and condition2;

    Please suggest which one is better idea in both INNODB and MyIsam

    April 3, 2012 at 10:10 pm
  • rashid Reply

    hi what will be the command to count field value who digits less then 8 like your command >select count(val2) from fact where i<10000; in my case i want to count those values who length is equal to 8 or less then 8 digit.

    October 18, 2012 at 2:26 am
  • Emad Reply

    Good tip, Thank You.

    October 23, 2012 at 4:57 am
  • Cristian Weiser Reply

    Can you help me with a simple thing?

    i have one table named dealviewer and i want to count values in column message_sent if they are set as true. If they are null leave.

    A need to set this values to a php variable

    Iam newbie, so please post kind a ready code if possible.
    Thanks in advance.

    December 10, 2012 at 2:45 pm
  • Jose Alejandro Realza Reply

    Very interesting, the problem is when we have a request about interacting is with several patterns search inside, example, table1.field, table2.field from table1, table2 where table1.field = table2.field

    February 28, 2013 at 9:24 am
  • Hujšanje Reply

    Kind of big pile to understand, but I always had problems with understanding of why count(val) does not work in some cases. Many thanks!

    February 20, 2014 at 2:22 am
  • Bhumi Reply

    Very important for a beginner. Thanks for this valuable content.

    June 22, 2015 at 1:49 am
  • Nell Reply

    Thanks Pete that was enlightening ! Definitely using count(*) only if it doesn’t give me those irritating php parsing errors. Still new to this baby. 😀

    July 22, 2015 at 3:45 am
  • lefaver Reply

    I am doing a statistical series of counts, it is a rather complex query, so hope this is okay.

    set @Hrs := 240;
    Select @total := Count(e.SNID) ,@totalH := Count(Distinct(e.SNID))
    from endorsements e where e.timestamp >= (SELECT max(timestamp) FROM endorsements)-(3600 * @Hrs);

    select concat(If (SNID = ‘Grand Total’,@Rank, @Rank := @Rank+1),’ of ‘,@totalH) AS Rank
    ,SNID ,Ledger ,Seeder ,Mail ,ERC ,DEX ,Vault
    ,DMP ,ENS ,Profiles ,Ratings ,Scripts
    ,Totals, percent
    From (Select * From
    (SELECT IFNULL(SNID,’Grand Total’) as SNID

    , COUNT(servicetype = ‘2’ or Null) As Ledger
    , COUNT(servicetype = ‘4’ or Null) AS Seeder
    , COUNT(servicetype = ‘8’ or Null) AS Mail
    , COUNT(servicetype = ’16’ or Null) AS ERC
    , Count(servicetype = ’32’ or Null) AS DEX
    , Count(servicetype = ’64’ or Null) AS Vault
    , COUNT(servicetype = ‘128’ or Null) AS DMP
    , COUNT(servicetype = ‘256’ or Null) AS ENS
    , COUNT(servicetype = ‘512’ or Null) AS Profiles
    , COUNT(servicetype = ‘1024’ or Null) AS Ratings
    , COUNT(servicetype = ‘2048’ or Null) AS Scripts
    , count(*) as Totals
    , concat(cast((count(*)*100.0/@total) as Decimal(5,2)),’ %’) AS percent

    FROM endorsements a
    WHERE a.timestamp >= (SELECT max(timestamp) FROM endorsements)-(3600 * @Hrs)
    GROUP BY SNID WITH ROLLUP ) q
    ORDER by q.Totals desc ) t1
    ,(Select @Rank :=0) t2 ;

    The count (servicetype = ‘2’ or Null) as Ledger ….. etc does work correctly, and I have not been able to find out why it does. It does not use the IS Null syntax and yet this works in MySql.

    Lloyd

    January 30, 2016 at 9:40 pm

Leave a Reply