Reading Barons post about Kickfire Appliance and of course talking to them directly I learned a lot in their product is about beating data processing limitations of current systems.
This raises valid question how fast can MySQL process (filter) data using it current architecture ?
I decided to test the most simple case – what if we take the in memory table with very narrow row and run simple query which needs to do simple filtering – how many rows per second it will be able to do?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
CREATE TABLE `m` ( `i` int(11) NOT NULL ) ENGINE=MEMORY DEFAULT CHARSET=latin1 mysql> select count(*) from m; +----------+ | count(*) | +----------+ | 1047684 | +----------+ 1 row in set (0.00 sec) mysql> select count(*) from m where i>0; +----------+ | count(*) | +----------+ | 349229 | +----------+ 1 row in set (0.15 sec) |
So we get 0.15 sec to scan about 1.000.000 rows which gives us peak filtering speed of about 7M rows/sec on this Intel(R) Xeon(R) CPU 5130 @ 2.00GHz CPU. This number is per core. In theory this box which has 4 cores should be able to do up to 4 times more, though in practice scaling factor is less of course.
Interesting enough if we get bigger table (so smaller portion of table will fit in CPU cache) the filtering speed stays about the same:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
mysql> select count(*) from m3; +----------+ | count(*) | +----------+ | 10476840 | +----------+ 1 row in set (0.00 sec) mysql> select count(*) from m3 where i>0; +----------+ | count(*) | +----------+ | 3492290 | +----------+ 1 row in set (1.49 sec) |
To check completely in-cache scenario I created a table with just 10000 rows and wrote little stored procedure to make timing easier:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
mysql> DELIMITER // mysql> CREATE PROCEDURE test_read(pl INT) -> BEGIN -> DECLARE t INT; -> SET @x = 0; -> REPEAT SET @x = @x + 1; -> SELECT COUNT(*) FROM m1 WHERE i>0 INTO t; -> UNTIL @x > pl -> END REPEAT; -> END; -> // Query OK, 0 rows affected (0.00 sec) mysql> DELIMITER ; mysql> call test_read(1000); Query OK, 0 rows affected (1.37 sec) |
So we can get 10Mil rows filtered in 1.37 sec giving us again a bit over 7M rows/sec.
This CPU is 2Ghz so we get some 280 CPU Cycles per filtered row, which is not that bad considering abstraction of storage engine which requires “row by row” processing which means function calls for each row.
Lets see if we do some row function on those 10.000.000 rows (to keep it simple)
(In reality I did multiple runs to get accurate results, but I show only one here)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
mysql> select sum(i) from m3; +---------+ | sum(i) | +---------+ | 3492290 | +---------+ 1 row in set (1.86 sec) mysql> select avg(i) from m3; +--------+ | avg(i) | +--------+ | 0.3333 | +--------+ 1 row in set (1.97 sec) mysql> select avg(i+sqrt(i+1)+abs(i)) from m3; +-------------------------+ | avg(i+sqrt(i+1)+abs(i)) | +-------------------------+ | 1.8047401583918 | +-------------------------+ 1 row in set (2.56 sec) |
So as you see as we add some math the row scan speed is significantly affected.
I also decided to see how longer rows affect performance and created the following table:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
CREATE TABLE `m4` ( `c` char(128) NOT NULL ) ENGINE=MEMORY DEFAULT CHARSET=latin1 mysql> show table status like "m4" \G *************************** 1. row *************************** Name: m4 Engine: MEMORY Version: 10 Row_format: Fixed Rows: 5000000 Avg_row_length: 129 Data_length: 685609952 Max_data_length: 948528873 Index_length: 0 Data_free: 0 Auto_increment: NULL Create_time: NULL Update_time: NULL Check_time: NULL Collation: latin1_swedish_ci Checksum: NULL Create_options: Comment: 1 row in set (0.00 sec) mysql> select count(*) from m4 where c>"a"; +----------+ | count(*) | +----------+ | 0 | +----------+ 1 row in set (1.16 sec) |
So with a bit longer rows we instantly get 4.3M rows per second. And now if we look at the memory amount consumed by table we can see the filtering speed is about 600MB/sec which is surely small fraction of what memory bus capacity of this system can deliver.
Comments (3)
Sorry, correcty the query above
SELECT COUNT(*) FROM table WHERE id = x AND cat_id = y;
Hello.. I have query:
SELECT COUNT(*) WHERE id = x AND cat_id = y
It’s very slow at 1 million rows, even id and cat_id having index. Please how to make it so fast, please I need the answer. Sorry about my english, I’am fom Asia.
I see you use the ENGINE=MEMORY. This also means that the count(*) you are doing is not transactional save, this is what I understand…
Which in return means that if you have multiple deletes/inserts/updates going on on your table you are never sure if the count(…) value you get is correct at that point in time??
Also I do feel doing doing a count(*) is a weird form of ‘testing’ I think what you do a bit below doing the aggregates is more valid because then MySQL really needs to scan the table rather then looking at some index (if that is optimized).
I am also wondering why you didn’t use an