TokuDB has a big advantage over B-trees when trickle loading data into existing tables. However, it is possible to preprocess the data when bulk loading into empty tables or when new indexes are created. TokuDB release 4 now uses a parallel algorithm to speed up these types of bulk insertions. How does the parallel loader performance compare with the serial loader? We use the Air Traffic Control (ATC) data and queries described in a Percona blog and also used in an experiment with TokuDB 2.1.0 to gain some insight.
Our ATC data is about 122M rows in size, is stored in a 40GiB CSV file, and can be found in our Amazon S3 public bucket. See the end of this blog for details. We use a table schema with 8 indices to speed up the ATC queries. The loader inserts the data into the primary fractal tree and one fractal tree for each of the 8 keys.
The load was run on two (old) machines:
|Load Times for the ATC Database|
|TokuDB 2.1.0 and MySQL 5.1.36||TokuDB 4.1.1 and MySQL 5.1.46||TokuDB Speedup|
TokuDB data sizes (including indices):
|Query Times for the ATC Database|
|TokuDB 2.1.0||TokuDB 4.1.1|
The ATC CSV data files, the schema, and the queries can be retrieved from our public Amazon S3 bucket called
tokutek-pub. Here are the Amazon S3 keys:
Percona’s widely read Percona Data Performance blog highlights our expertise in enterprise-class software, support, consulting and managed services solutions for both MySQL® and MongoDB® across traditional and cloud-based platforms. The decades of experience represented by our consultants is found daily in numerous and relevant blog posts.
Besides specific database help, the blog also provides notices on upcoming events and webinars.
Want to get weekly updates listing the latest blog posts? Subscribe to our blog now! Submit your email address below and we’ll send you an update every Friday at 1pm ET.