October 1, 2014

COUNT(*) for Innodb Tables

I guess note number one about MyISAM to Innodb migration is warning what Innodb is very slow in COUNT(*) queries. The part which I often however see omitted is fact it only applies to COUNT(*) queries without WHERE clause.

So if you have query like SELECT COUNT(*) FROM USER It will be much faster for MyISAM (MEMORY and some others) tables because they would simply read number of rows in the table from stored value. Innodb will however need to perform full table scan or full index scan because it does not have such counter, it also can’t be solved by simple singe counter for Innodb tables as different transactions may see different number of rows in the table.

If you have query like SELECT COUNT(*) FROM IMAGE WHERE USER_ID=5 this query will be executed same way both for MyISAM and Innodb tables by performing index rage scan. This can be faster or slower both for MyISAM and Innodb depending on various conditions.

In real applications there are much more queries of second type rather than first type so it is typically not as bad problem as it may look. Most typically count of rows is needed by admin tools which may show it in table statistics, it may also be used in application stats to show something like “We have 123.345 users which have uploaded 1.344.656 images” but these are normally easy to remove.

So remember Innodb is not slow for ALL COUNT(*) queries but only for very specific case of COUNT(*) query without WHERE clause. It does not mean I would not like to see it fixed though, it is pretty annoying.

About Peter Zaitsev

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.

Comments

  1. wesley says:

    You say count(*), would count(id) result in a table scan (MyISAM) since it doesn’t know if id is NULL or not? I’m not even sure if it counts NULL or not, I guess i should check it out.

  2. peter says:

    It depends on now field is defined if it is NOT NULL it will read the counter. If it can be NULL it will perform full table scan

  3. Wesley,

    COUNT ( column ) counts all non-NULL occurences (or should ;)

    COUNT ( * ) counts all rows.

  4. This just needs to be fixed. I forgot about this when I migrated a couple of table sot MyISAM. We were computing COUNT in order to graph DB stats. Bad idea….

    The issue is that most people don’t need a 100% accurate count. They just need lazy data. INNODB should add the ability to get this even if it’s not 100% correct.

  5. peter says:

    Kevin,

    I guess having counter such as MyISAM which would count “dirty” number of uncommitted rows could be possible but this would require some special option to enable as you can’t return wrong (approximate) result for query in default mode.

    It is quite interesting where other transactional storage engines would stay on this.

  6. Sean says:

    I agree with Kevin – unless you’re doing some sort of life or death calculation, it never needs to be totally accurate. Retarded decision on MySQL’s part…

  7. peter says:

    Sean,

    The question in this case is semantics. SELECT COUNT(*) must return accurate value by standards, There is no way it would be made to return wrong data by default as this may break some application. So we’re speaking about option which will need to be specified such as inaccurate_count=1 ? Still not everyone will be able to use it because single instance can be shared by multiple applications. Having Some form of SQL FLAG, such as SELECT APPROXIMATE COUNT(*) would be better but will require application changes and at the same time one could use info from SHOW TABLE STATUS.

  8. John says:

    I am seeing a query take up to 2 seconds (viewing the listing of members in my site) because at the top there’s a count. What steps can I take to speed this up? I’ve thought about doing a quick count total cache stored in a separate field, but that won’t work either, because depending on the search criteria, the count number can change at any time.

  9. kiran says:

    How to find the top values of table row?

  10. John Swapceinski says:

    If you want a quick and dirty count of the number of rows in a InnoDB table, this seems to work:

    explain select count(*) from Table;

    The result is fast and will give you a “rows” count that should be within 10% of the number of rows in the Table. So it seems the row count is being cached SOMEWHERE (I don’t know where). BTW, I am using MySQL v5.0.45.

  11. Kishore says:

    Dear Friends,
    I am getting headache with MySql. I am new to .net and MYSQl.
    At present lots of Issues occured in MySql …but the problem is slow..
    “Whenever We are executing the Query it is taking lot’z of time to execute. Our tables contains nearly 3000 records…”
    How can I optimize the Stored procedures..
    (or)
    How can I change this Sp into faster manner…
    “Thanks In advance”

  12. to Kishore.

    I you need to read upon SQL, usage of indexes etc. 3000 records is nothing for any DB and shouldn’t be slow in any case.
    Most likely your DB is not properly normalized and you don’t have proper indexes.

    Ries

  13. peter says:

    Ries, Kishore

    Indeed 3000 records is very small unless these are 10MB faxes stored directly in the database :) Wrong indexing or query which MySQL can’t optimize well is likely reason. If you report the EXPLAIN on our forums we might be able to help.

  14. Fekke says:

    3,000 records may be a problem. If you have three tables with 1,000 records each and you do a join involving the three tables with no indexes at all then you would have 1,000,000,000 records full scan to fetch the results.

    So the optimization you acquire by creating the proper indexes not only relies on how many records your tables have but also what your queries are.

  15. nop says:

    ready for a headache? try “select count($field) from $table;” and compare the time it takes to “select count($field) from $table use index($field);”
    ($field is the primary key (int autoincrement) in $table which was converted from myisam and has around 900k rows in it).

    for me, the standard count without use index takes around 8 seconds – the count with forced index takes below 0.1 seconds.

    any thoughts on why this is so?

  16. nop says:

    correction: replace “8 seconds” with “5 minutes, 8 seconds”.

  17. Wallace says:

    Why the performance gap?

    Running this slow query took 138 seconds (2 min 18.09 sec)
    SELECT COUNT(*) FROM dbmail_messageblks;
    +———-+
    | COUNT(*) |
    +———-+
    | 262788 |
    +———-+
    1 row in set (2 min 18.09 sec)

    After optimizing the SQL, it took 0.27 seconds.
    SELECT COUNT(*) FROM dbmail_messageblks use index(physmessage_id_index);
    +———-+
    | COUNT(*) |
    +———-+
    | 262796 |
    +———-+
    1 row in set (0.27 sec)

    > SHOW CREATE TABLE dbmail_messageblks\G
    *************************** 1. row ***************************
    Table: dbmail_messageblks
    Create Table: CREATE TABLE `dbmail_messageblks` (
    `messageblk_idnr` bigint(21) NOT NULL auto_increment,
    `physmessage_id` bigint(21) NOT NULL default ‘0’,
    `messageblk` longblob NOT NULL,
    `blocksize` bigint(21) NOT NULL default ‘0’,
    `is_header` tinyint(1) NOT NULL default ‘0’,
    PRIMARY KEY (`messageblk_idnr`),
    KEY `physmessage_id_index` (`physmessage_id`),
    KEY `physmessage_id_is_header_index` (`physmessage_id`,`is_header`),
    CONSTRAINT `dbmail_messageblks_ibfk_1` FOREIGN KEY (`physmessage_id`) REFERENCES `dbmail_physmessage` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
    ) ENGINE=InnoDB AUTO_INCREMENT=602519 DEFAULT CHARSET=utf8

    > EXPLAIN SELECT COUNT(*) FROM dbmail_messageblks\G
    *************************** 1. row ***************************
    id: 1
    select_type: SIMPLE
    table: dbmail_messageblks
    type: index
    possible_keys: NULL
    key: PRIMARY
    key_len: 8
    ref: NULL
    rows: 1930308
    Extra: Using index

    > EXPLAIN SELECT COUNT(*) FROM dbmail_messageblks use index(physmessage_id_index)\G
    *************************** 1. row ***************************
    id: 1
    select_type: SIMPLE
    table: dbmail_messageblks
    type: index
    possible_keys: NULL
    key: physmessage_id_index
    key_len: 8
    ref: NULL
    rows: 1930310
    Extra: Using index

  18. SC says:

    If your table has a primary key, you can also get the count from
    show index from , the cardinality of the key will give you count of the rows in the table.

  19. peter says:

    SC,

    Note these stats are approximate.

  20. Peter says:

    @John – pretty clever. I haven’t done any testing to see how close this number stays, or how slow it is to update, but it seems to be a great solution if you just need an approximation. Thanks!

  21. Josh Q says:

    I’ve tried the following metrics on one of our production MySQL servers and found some interesting results related to InnoDB count().

    We’re using XtraDB(InnoDB by Percona) tables (version 5.5.14-rel20.5-log), this is a web-mail system that stores all messages in a table, so it is fairly good sized.

    I did a simple count(*) on the messages table:

    mysql> select count(*) from messages;
    +———-+
    | count(*) |
    +———-+
    | 2849204 |
    +———-+
    1 row in set (5.82 sec)

    I did this a dozen times and it came back at around 6 seconds each time.

    Then I counted the primary key:
    mysql> select count(id) from messages;
    +———–+
    | count(id) |
    +———–+
    | 2849205 |
    +———–+
    1 row in set (6.05 sec)

    Basically the same results, even after a dozen iterations. (this is a production system so the count will increase).

    Then I counted a column that is a key from another table, not null :

    mysql> desc messages;
    +—————+————–+——+—–+———+—————-+
    | Field | Type | Null | Key | Default | Extra |
    +—————+————–+——+—–+———+—————-+
    | id | bigint(20) | NO | PRI | NULL | auto_increment |
    | id_acct | int(11) | NO | MUL | 0 | |

    mysql> select count(id_acct) from awm_messages;
    +—————-+
    | count(id_acct) |
    +—————-+
    | 2849213 |
    +—————-+
    1 row in set (3.40 sec)

    Each time, right around 3.5 seconds. I’m not sure if this is just more efficient for InnoDB/XtraDB, or if it’s due to the difference in field type (bigint vs int).

  22. Dave says:

    I seek a related clarification:

    What about the folk developer wisdom that “select count(1) …” is better than “select count(*) …” – does that apply?
    To Innodb? To MyISAM?
    Or is it actually only dependent on indexes being set?
    Or on the right index being invoked in the “select count () … ” statement?

  23. Leonid says:

    I am sorry if this question is rather stupid (or more).
    If it’s all about having the WHERE, why not just use AND WHERE 1?

  24. Andrea says:

    Hi guys, this blog is very useful, thanks and keep up the good work.
    My 2 cents: I had an autoincrement coloumn, so the last inserted value is (starting from 0) the exact number of rows. So making (on InnoDB table):

    SELECT MAX(id) FROM my_table;

    You obviously obtain the same result as COUNT(*) for that table.

    The explain result in this case was :

    id select_type table type possible_keys key key_len ref rows filtered Extra
    1 SIMPLE NULL NULL NULL NULL NULL NULL NULL NULL Select tables optimized away

    So the best way as it was MyIsam table

  25. MyXAMoP says:

    you can get approximate number of rows as fast in Innodb as in myisam tables using “SHOW TABLE STATUS” command
    it will varry from real number to small degree, but this is perfectly fine for statistics on a page:

    mysql> show table status like ‘customers’
    *************************** 1. row ***************************
    Name: customers
    Engine: InnoDB
    Version: 10
    Row_format: Compact
    Rows: 11861
    Avg_row_length: 222
    Data_length: 2637824
    Max_data_length: 0
    Index_length: 5718016
    Data_free: 5219811328
    Auto_increment: 56416
    Create_time: 2012-07-19 07:14:09
    Update_time: NULL
    Check_time: NULL
    Collation: utf8_general_ci
    Checksum: NULL
    Create_options:
    Comment: customer table
    1 row in set (0.00 sec)

    mysql> select count(*) from customers
    *************************** 1. row ***************************
    count(*): 11649
    1 row in set (0.03 sec)

  26. Meena says:

    Table row count is not accurate from information_Schema.tables & table status…
    How can I get accurate row count of all tables in a database?

Speak Your Mind

*