Loading Tables with TokuDB 4.0

September 2, 2010

Author

Tokutek

MySQL

Share this Post:

Often, the first step in evaluating and deploying a database is to load an existing dataset into the database. In the latest version, TokuDB makes use of multi-core parallelism to speed up loading (and new index creation). Using the loader, MySQL tables using TokuDB load 5x-8x faster than with previous versions of TokuDB.

Measuring Load Performance

We generated several different datasets to measure the performance of TokuDB when doing a LOAD DATA INFILE … command. To characterize performance, we vary

rows to load

keys per row

row length (including keys)

All generated keys, including the primary, are random, 8-byte values. The remaining data, needed to pad out the row length to specified length, is text.

Two files files are produced as part of data generation.

data file, containing ‘|’ separated fields

sql file, containing the CREATE TABLE command corresponding to the generated data

For instance, if the number of keys is 3 and the row length is 256 bytes, the following SQL statement is produced:

     CREATE TABLE load_table (<br>         val0 BIGINT UNSIGNED NOT NULL,<br>         val1 BIGINT UNSIGNED NOT NULL,<br>         val2 BIGINT UNSIGNED NOT NULL,<br>         pad VARCHAR(232) NOT NULL,<br>         PRIMARY KEY (val0),<br>         KEY valkey1 (val1),<br>         KEY valkey2 (val2)<br>         ) ENGINE=tokudb

1	CREATE TABLE load_table (<br> val0 BIGINT UNSIGNED NOT NULL,<br> val1 BIGINT UNSIGNED NOT NULL,<br> val2 BIGINT UNSIGNED NOT NULL,<br> pad VARCHAR(232) NOT NULL,<br> PRIMARY KEY (val0),<br> KEY valkey1 (val1),<br> KEY valkey2 (val2)<br> ) ENGINE=tokudb

We can make the data generation program available if anyone is interested.

Load Test

A simple shell script

creates the test table

performs a LOAD DATA INFILE <datafile> INTO TABLE load_table FIELDS TERMINATED BY ‘|’

returns execution time

For the experiments to be meaningful, we created datasets that do not fit in memory.

Results

We ran our benchmark on an Amazon Web Services c1.large node with 8 cores and 7 GB of memory. The test loads 100M rows (NOT pre-sorted). The data file was on a 2 disk RAID-0, the MySQL DB files on a different 2 disk RAID-0.