As Calpont announced availability of InfiniDB I surely couldn’t miss a chance to compare it with previously tested databases in the same environment.
See my previous posts on this topic:
Analyzing air traffic performance with InfoBright and MonetDB
Air traffic queries in LucidDB
I could not run all queries against InfiniDB and I met some hiccups during my experiment, so it was less plain experience than with other databases.
So let’s go by the same steps:
Load data
InfiniDB supports MySQL’s
|
1 |
LOAD DATA |
statement and it’s own
|
1 |
colxml / cpimport |
utilities. As
|
1 |
LOAD DATA |
is more familiar for me, I started with that, however after issuing LOAD DATA on 180MB file ( for 1989 year, 1st month) very soon it caused extensive swapping (my box has 4GB of RAM) and statement failed with
|
1 |
ERROR 1 (HY000) at line 1: CAL0001: Insert Failed: St9bad_alloc |
Alright,
|
1 |
colxml / cpimport |
was more successful, however it has less flexibility in syntax than
|
1 |
LOAD DATA |
, so I had to transform the input files into a format that
|
1 |
cpimport |
could understand.
Total load time was 9747 sec or 2.7h (not counting time spent on files transformation)
I put summary data into on load data time, datasize and query time to Google Spreadsheet so you can easy compare with previous results. There are different sheets for queries, datasize and time of load.
Datasize
Size of database after loading is another confusing point. InfiniDB data directory has complex structure like
|
1 |
<br>./000.dir/000.dir/003.dir/233.dir<br>./000.dir/000.dir/003.dir/233.dir/000.dir<br>./000.dir/000.dir/003.dir/233.dir/000.dir/FILE000.cdf<br>./000.dir/000.dir/003.dir/241.dir<br>./000.dir/000.dir/003.dir/241.dir/000.dir<br>./000.dir/000.dir/003.dir/241.dir/000.dir/FILE000.cdf<br>./000.dir/000.dir/003.dir/238.dir<br>./000.dir/000.dir/003.dir/238.dir/000.dir<br>./000.dir/000.dir/003.dir/238.dir/000.dir/FILE000.cdf<br>./000.dir/000.dir/003.dir/235.dir<br>./000.dir/000.dir/003.dir/235.dir/000.dir<br>./000.dir/000.dir/003.dir/235.dir/000.dir/FILE000.cdf<br> |
so it’s hard to day what files are related to table. But after load, the size of 000.dir is 114G, which is as twice big as original data files. SHOW TABLE STATUS does not really help there, it shows
|
1 |
<br> Name: ontime<br> Engine: InfiniDB<br> Version: 10<br> Row_format: Dynamic<br> Rows: 2000<br> Avg_row_length: 0<br> Data_length: 0<br>Max_data_length: 0<br> Index_length: 0<br> Data_free: 0<br> Auto_increment: NULL<br> Create_time: NULL<br> Update_time: NULL<br> Check_time: NULL<br> Collation: latin1_swedish_ci<br> Checksum: NULL<br> Create_options: <br> Comment: <br> |
with totally misleading information.
So I put 114GB as size of data after load, until someone points me how to get real size, and also explains what takes so much space.
Queries
First count start query
|
1 |
SELECT count(*) FROM ontime |
took 2.67 sec, which shows that InfiniDB does not store counter of records, however calculates it pretty fast.
Q0:
|
1 |
select avg(c1) from (select year,month,count(*) as c1 from ontime group by YEAR,month) t; |
Another bumper, on this query InfiniDB complains
|
1 |
<br>ERROR 138 (HY000): <br>The query includes syntax that is not supported by InfiniDB. Use 'show warnings;' to get more information. Review the Calpont InfiniDB Syntax guide for additional information on supported distributed syntax or consider changing the InfiniDB Operating Mode (infinidb_vtable_mode).<br>mysql> show warnings;<br>+-------+------+------------------------------------------------------------+<br>| Level | Code | Message |<br>+-------+------+------------------------------------------------------------+<br>| Error | 9999 | Subselect in From clause is not supported in this release. |<br>+-------+------+------------------------------------------------------------+<br> |
Ok, so InfiniDB does not support DERIVED TABLES, which is big limitation from my point of view.
As workaround I tried to create temporary table, but got another error:
|
1 |
<br>mysql> create temporary table tq2 as (select Year,Month,count(*) as c1 from ontime group by Year, Month);<br>ERROR 122 (HY000): Cannot open table handle for ontime.<br> |
As warning suggests I turned
|
1 |
infinidb_vtable_mode = 2 |
, which is:
|
1 |
<br>2) auto-switch mode: InfiniDB will attempt to process the query internally, if it <br>cannot, it will automatically switch the query to run in row-by-row mode.<br> |
but query took 667 sec :
so I skip queries Q5, Q6, Q7 from consideration, which are also based on DERIVED TABLES, as not supported by InfiniDB.
Other queries: (again look on comparison with other engines in Google Spreadsheet or in summary table at the bottom)
Query Q1:
|
1 |
mysql> SELECT DayOfWeek, count(*) AS c FROM ontime WHERE Year BETWEEN 2000 AND 2008 GROUP BY DayOfWeek ORDER BY c DESC; |
7 rows in set (6.79 sec)
Query Q2:
|
1 |
mysql> SELECT DayOfWeek, count(*) AS c FROM ontime WHERE DepDelay>10 AND Year BETWEEN 2000 AND 2008 GROUP BY DayOfWeek ORDER BY c DESC;<br> |
7 rows in set (4.59 sec)
Query Q3:
|
1 |
SELECT Origin, count(*) AS c FROM ontime WHERE DepDelay>10 AND Year BETWEEN 2000 AND 2008 GROUP BY Origin ORDER BY c DESC LIMIT 10;<br> |
4.96 sec
Query Q4:
|
1 |
mysql> SELECT Carrier, count(*) FROM ontime WHERE DepDelay > 10 AND YearD=2007 GROUP BY Carrier ORDER BY 2 DESC;<br> |
I had another surprise with query, after 15 min it did not return results, I check system and it was totally idle, but query stuck. I killed query, restarted mysqld but could not connect to mysqld anymore. In processes I see that InfiniDB started couple external processes:
|
1 |
ExeMgr, DDLProc, PrimProc, controllernode fg, workernode DBRM_Worker1 fg |
which cooperate each with other using IPC shared memory and semaphores. To clean system I rebooted server, and only after that mysqld was able to start.
After that query Q4 took 0.75 sec
Queries Q5-Q7 skipped.
Query Q8:
|
1 |
SELECT DestCityName, COUNT( DISTINCT OriginCityName) FROM ontime WHERE YearD BETWEEN 2008 and 2008 GROUP BY DestCityName ORDER BY 2 DESC LIMIT 10;<br> |
And times for InfiniDB:
1y: 8.13 sec
2y: 16.54 sec
3y: 24.46 sec
4y: 32.49 sec
10y: 1 min 10.35 sec
Query Q9:
Q9:
|
1 |
select Year ,count(*) as c1 from ontime group by Year;<br> |
Time: 9.54 sec
Ok, so there is summary table with queries times (in sec, less is better)
| Query | MonetDB | InfoBright | LucidDB | InfiniDB |
| Q0 | 29.9 | 4.19 | 103.21 | NA |
| Q1 | 7.9 | 12.13 | 49.17 | 6.79 |
| Q2 | 0.9 | 6.73 | 27.13 | 4.59 |
| Q3 | 1.7 | 7.29 | 27.66 | 4.96 |
| Q4 | 0.27 | 0.99 | 2.34 | 0.75 |
| Q5 | 0.5 | 2.92 | 7.35 | NA |
| Q6 | 12.5 | 21.83 | 78.42 | NA |
| Q7 | 27.9 | 8.59 | 106.37 | NA |
| Q8 (1y) | 0.55 | 1.74 | 6.76 | 8.13 |
| Q8 (2y) | 1.1 | 3.68 | 28.82 | 16.54 |
| Q8 (3y) | 1.69 | 5.44 | 35.37 | 24.46 |
| Q8 (4y) | 2.12 | 7.22 | 41.66 | 32.49 |
| Q8 (10y) | 29.14 | 17.42 | 72.67 | 70.35 |
| Q9 | 6.3 | 0.31 | 76.12 | 9.54 |
Conclusions
|
1 |
Server version: 5.1.39-community InfiniDB Community Edition 0.9.4.0-5-alpha (GPL) |