In my previous rounds with DataWarehouse oriented engines I used single table without joins, and with small (as for DW) datasize (see https://www.percona.com/blog/2009/10/02/analyzing-air-traffic-performance-with-infobright-and-monetdb/, https://www.percona.com/blog/2009/10/26/air-traffic-queries-in-luciddb/, https://www.percona.com/blog/2009/11/02/air-traffic-queries-in-infinidb-early-alpha/). Addressing these issues, I took Star Schema Benchmark, which is TPC-H modification, and tried run queries against InfoBright, InfiniDB, LucidDB and MonetDB. I did not get results for MonetDB, will explain later why. Again primary goal for test was not to get just numbers, but understand specifics of each engine and their ability to handle amount of data and execute queries.
All details I have are available on our Wiki https://www.percona.com/docs/wiki/benchmark:ssb:start and the specification of benchmarks you can get there https://www.percona.com/docs/wiki/_media/benchmark:ssb:starschemab.pdf.
I generated data with scale factor = 1000, which gave me 610GB of data in raw format and loaded into each engine.
There difference in engines gets into play. While InfoBright and InfiniDB does not need indexes at all (you actually can’t create indexes here), they needed for LucidDB and MonetDB, and it changes load time and datasize after load significantly. The numbers
I put in results do not include indexing time, but it also should be considered.
And indexes are exactly reason why I could not get results for MonetDB, there I faced issue
I was not prepared for. MonetDB requires that index fits into memory during indexing procedure, and for 610GB the index may get to 120GB size, and I did not have that amount of memory ( the box is only 16GB of RAM). MonetDB experts recommended me to extended
swap partition to 128GB, but my partitions layout was not really prepared for, I just did not expect I need big swap partition.
So load time.
InfiniDB can really utilize all available cores/cpus in systems ( I run benchmark on 8 cores box), and it allowed to load data faster than other engines. Though LucidDB and MonetDB
are also have multi-thread loaders, only InfoBright ICE used single core.
InfiniDB: 24 010 sec
MonetDB: 42 608 sec (without indexes)
InfoBright: 51 779 sec
LucidDB: 140 736 sec (without indexes)
I should note that time to create indexes in LucidDB was also significant and exceeded loading time. Full report on indexes are available here https://www.percona.com/docs/wiki/benchmark:ssb:luciddb:start
Size after load is also interesting factor. InfoBright is traditionally good with compression,
though compression rate is less than in case with AirTraffic table. I was told this is because
lineorder table comes not in sorted order, which one would expect in real life. Actually
the same complain I heard from InfiniDB experts – if put lineorder data in sorted order, loading
time can decrease significantly.
Datasize after load:
LucidDB: 120GB (without indexes)
MonetDB: 650GB (without indexes)
Now on queries time.
Full results you can find on page https://www.percona.com/docs/wiki/benchmark:ssb:start,
and graph is below. There couple comments from me.
InfoBright was fully 1 CPU bound during all queries. I think the problem
that engine can use only single cpu/core is getting significant limitation
for them. For query 3.1 I got the surprising result, after 36h of work I got
error that query can’t be resolved by InfoBright optimizer and I need
to enable MySQL optimizer.
InfiniDB is otherwise was IO-bound, and processed data fully utilizing
sequential reads and reading data with speed 120MB/s. I think it allowed
InfiniDB to get the best time in the most queries.
LucidDB on this stage is also can utilize only singe thread with results sometime better,
sometime worse than InfoBright.
|Q1.1||48 min 21.67 sec (2901.67 sec)||24 min 26.05 sec (1466.05 sec)||3503.792 sec|
|Q1.2||44 min 55.37 sec (2695.37 sec)||24 min 25.83 sec (1465.83 sec)||2889.903 sec|
|Q1.3||45 min 53.49 sec (2753.49 sec)||24 min 27.25 sec (1467.25 sec)||2763.464 sec|
|Q2.1||1 hour 54 min 27.74 sec (6867.74)||19 min 44.35 sec (1184.35 sec)||9694.534 sec|
|Q2.2||1 hour 13 min 33.15 sec (4413.15)||19 min 49.56 sec (1189.56 sec)||9399.965 sec|
|Q2.3||1 hour 8 min 23.41 sec (4103.41)||19 min 52.27 sec (1192.25 sec)||8875.349 sec|
|Q3.1||NA||19 min 11.23 sec (1151.23 sec)||16376.93 sec|
|Q3.2||3 hours 30 min 17.64 sec (12617.64 sec)||19 min 28.55 sec (1168.55 sec)||5560.977 sec|
|Q3.3||2 hours 58 min 18.87 sec (10698.87 sec)||19 min 58.29 sec (1198.29 sec)||2517.621 sec|
|Q3.4||1 hour 41 min 41.29 sec (6101.29 sec)||12 min 57.96 sec (777.96 sec)||686.202 sec|
|Q4.1||8 hours 53 min 52.55 sec (32032.55 sec)||32 min 57.49 sec (1977.49 sec )||19843.213 sec|
|Q4.2||5 hours 38 min 7.60 sec / 5 hours 36 min 35.69 sec (20195.69 sec)||33 min 35.45 sec (2015.45 sec)||15292.648 sec|
|Q4.3||12 hours 58 min 4.27 sec (46684.27 sec)||33 min 47.32 sec (2027.32 sec)||7241.791 sec|
Graph with results (time in sec, less time is better)
- InfiniDB is doing just great using available CPU cores full IO bandwidth reading from disk. You can see more details on InfiniDB scalability on InfiniDB’s blog http://infinidb.org/infinidb-blog/mysql-parallel-query-processing-of-ssb-queries-via-infinidb-.html
- SSB benchmark may be not good for InfoBright, the synthetic nature of benchmark
does not allow InfoBright to show better results. But I hope InfoBright will be able to reuse multi-cores / multi-disks soon.
- I’d like MonetDB is able to use disk to build indexes, not only rely on available memory
- Taking complains on SSB I am looking to get another more realistic dataset and
compare bigger set of available DW solutions