Often, the first step in evaluating and deploying a database is to load an existing dataset into the database. In the latest version, TokuDB makes use of multi-core parallelism to speed up loading (and new index creation). Using the loader, MySQL tables using TokuDB load 5x-8x faster than with previous versions of TokuDB.
We generated several different datasets to measure the performance of TokuDB when doing a LOAD DATA INFILE … command. To characterize performance, we vary
All generated keys, including the primary, are random, 8-byte values. The remaining data, needed to pad out the row length to specified length, is text.
Two files files are produced as part of data generation.
For instance, if the number of keys is 3 and the row length is 256 bytes, the following SQL statement is produced:
|
1 |
CREATE TABLE load_table (<br> val0 BIGINT UNSIGNED NOT NULL,<br> val1 BIGINT UNSIGNED NOT NULL,<br> val2 BIGINT UNSIGNED NOT NULL,<br> pad VARCHAR(232) NOT NULL,<br> PRIMARY KEY (val0),<br> KEY valkey1 (val1),<br> KEY valkey2 (val2)<br> ) ENGINE=tokudb |
We can make the data generation program available if anyone is interested.
A simple shell script
For the experiments to be meaningful, we created datasets that do not fit in memory.
We ran our benchmark on an Amazon Web Services c1.large node with 8 cores and 7 GB of memory. The test loads 100M rows (NOT pre-sorted). The data file was on a 2 disk RAID-0, the MySQL DB files on a different 2 disk RAID-0.
| Keys | Row Len | v3 rows/s | v4 rows/s | Speedup |
| 1 | 64 | 27K | 142K | 5.1 |
| 4 | 64 | 13K | 82K | 6.2 |
| 1 | 256 | 7K | 54K | 7.2 |
| 4 | 256 | 5K | 43K | 8.2 |
Several metrics can be used to measure performance:
Metrics for TokuDB v4:
| Keys | Row Len | Rows/sec | KV-pairs/sec | MB/sec |
| 1 | 64 | 142K | 142K | 9.1 |
| 4 | 64 | 82K | 330K | 5.3 |
| 1 | 256 | 54K | 54K | 13.9 |
| 4 | 256 | 43K | 173K | 11.1 |
These results show
We will report further results, especially speedups on larger CPU count machines, as they become available.
Resources
RELATED POSTS