I am working on a customer’s system where the requirement is to store a lot of timeseries data from different sensors.
For performance reasons we are going to use SSD, and therefore there is a list of requirements for the architecture:
Looking on these requirements I actually think that TokuDB might be a good fit for this task.
There are several aspects to consider. This time I want to compare TokuDB vs InnoDB on an initial load time and space consumption.
Let’s assume the schema is following
|
1 |
CREATE TABLE `sensordata` (<br> `ts` int(10) unsigned NOT NULL DEFAULT '0',<br> `sensor_id` int(10) unsigned NOT NULL,<br> `data1` double NOT NULL,<br> `data2` double NOT NULL,<br> `data3` double NOT NULL,<br> `data4` double NOT NULL,<br> `data6` double NOT NULL,<br> `cnt` int(10) unsigned NOT NULL,<br> PRIMARY KEY (`sensor_id`,`ts`)<br>) |
where sensor_id is in a range from 1 to about 1000 and ts is monotonically increasing timestamp.
This schema exploits both TokuDB and InnoDB clustering primary key, and all inserts are “almost” sequential, which guarantee that all inserts will not require disk access and work with data in memory.
The same for SELECTS – select queries on the most recent time periods will be executed only by a memory access.
I am doing this research on the Dell PowerEdge R420 box with 48GB of memory (40GB for InnoDB buffer pool size, and default memory allocation for TokuDB, which is 24GB for tokudb cache). The storage is a very fast PCI-e Flash card.
The test export CSV file, suitable for LOAD DATA INFILE is 40GB in size and contains over 1 bln records (1.238.201.948 exactly)
MySQL Versions:
So, first, let’s load data into InnoDB, again, I am using LOAD DATA INFILE statement
Now for TokuDB:
So TokuDB is the obvious leader in both load time and compression. Of course just these are not enough, and now we need to see the performance of further INSERTs and SELECTs queries. This is what I am running right now and will post the results when I have them.