Air traffic queries in MyISAM and Tokutek (TokuDB)

November 6, 2009
Author
Vadim Tkachenko
Share this Post:

This is next post in series
Analyzing air traffic performance with InfoBright and MonetDB
Air traffic queries in LucidDB
Air traffic queries in InfiniDB: early alpha

Let me explain the reason of choosing these engines. After initial three posts I am often asked “What is baseline ? Can we compare results with standard MySQL engines ?”. So there come MyISAM to consider it as base point to see how column-oriented-analytic engines are better here.

However, take into account, that for MyISAM we need to choose proper indexes to execute queries effectively, and there is pain coming with indexes: – load of data is getting slower; – to design proper indexes is additional research, especially when MySQL optimizer is not smart in picking best one.

The really nice thing about MonetDB, InfoBright, InfiniDB is that they do not need indexes, so you may not worry about maintaining them and picking best one. I am not sure about LucidDB, I was told indexes are needed, but creating new index was really fast even on full database, so I guess, it’s not B-Tree indexes. So this my reflexion on indexes turned me onto TokuDB direction.

What is so special about TokuDB ? There two things: indexes have special structure and are “cheap”, by “cheap” I mean the maintenance cost is constant and independent on datasize. With regular B-Tree indexes cost grows exponentially on datasize (Bradley Kuszmaul from Tokutek will correct me if I am wrong in this statement). Another point with TokuDB, it uses compression, so I expect less size of loaded data and less IO operations during query execution.

So what indexes we need for queries. To recall you details, the schema is available in this post
https://www.percona.com/blog/2009/10/02/analyzing-air-traffic-performance-with-infobright-and-monetdb/, and
queries I posted on sheet “Queries” in my summary Spreadsheet.

With Bradley’s help we chose next indexes:

And I measured load time for both MyISAM and TokuDB in empty table with created indexes.

Load time for MyISAM: 16608 sec
For TokuDB: 19131 sec

Datasize (including indexes)

MyISAM: 36.7GB
TokuDB: 6.7GB

I am a bit surprised that TokuDB is slower loading data, but my guess it is related to compression, and I expect with bigger amount of data TokuDB will be faster MyISAM.

Now to queries. Bradley pointed me that query Q5

can be rewritten as

( I name it as Query Q5i)

The summary table with queries execution time (in sec, less is better):

 

Query MyISAM TokuDB
Q0 72.84 50.25
Q1 61.03 55.01
Q2 98.12 58.36
Q3 123.04 66.87
Q4 6.92 6.91
Q5 13.61 11.86
Q5i 7.68 6.96
Q6 123.84 69.03
Q7 187.22 159.62
Q8 (1y) 8.75 7.59
Q8 (2y) 102.17 64.95
Q8 (3y) 104.7 69.76
Q8 (4y) 107.05 70.46
Q8 (10y) 119.54 84.64
Q9 69.05 47.67

For reference I used 5.1.36-Tokutek-2.1.0 for both MyISAM and TokuDB tests.

And if you are interested to compare MyISAM with previous engines:

 

Query MyISAM MonetDB InfoBright LucidDB InfiniDB
Q0 72.84 29.9 4.19 103.21 NA
Q1 61.03 7.9 12.13 49.17 6.79
Q2 98.12 0.9 6.73 27.13 4.59
Q3 123.04 1.7 7.29 27.66 4.96
Q4 6.92 0.27 0.99 2.34 0.75
Q5 13.61 0.5 2.92 7.35 NA
Q6 123.84 12.5 21.83 78.42 NA
Q7 187.22 27.9 8.59 106.37 NA
Q8 (1y) 8.75 0.55 1.74 6.76 8.13
Q8 (2y) 102.17 1.1 3.68 28.82 16.54
Q8 (3y) 104.7 1.69 5.44 35.37 24.46
Q8 (4y) 107.05 2.12 7.22 41.66 32.49
Q8 (10y) 119.54 29.14 17.42 72.67 70.35
Q9 69.05 6.3 0.31 76.12 9.54

The all results are available in summary Spreadsheet

I especially do not put TokuDB in the same table with analytic oriented databases, to highlight TokuDB is OLTP engine for general purposes.
As you see it is doing better than MyISAM in all queries.

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Far
Enough.

Said no pioneer ever.
MySQL, PostgreSQL, InnoDB, MariaDB, MongoDB and Kubernetes are trademarks for their respective owners.
© 2026 Percona All Rights Reserved