Testing MySQL column storesJustin Swanhart
Recently I had the opportunity to do some testing on a large data set against two MySQL column-store storage engines. I’d like to note that this effort was sponsored by Infobright, but this analysis reflects my independent testing from an objective viewpoint.
I performed two different types of testing. The first focused on core functionality and compatibility of ICE (Infobright Community Edition) compared with MyISAM on a small data set. The second part of my testing compared the performance and accuracy of ICE with InfiniDB Community Edition on a 950GB data set.
The first first part of my analysis focused on testing specific MySQL functionally with Infobright’s storage engine. A lot of my tests involved corner or edge cases since I have experience developing such cases for MySQL column-based storage engines in the past. I reported any bugs that I found, and contributed my test cases to ICE. In fact, some of the issues have already been addressed in the most recent software release. An example of such a problem would be “select avg(char_column) from table” where the column contains a mix of ascii data, such as names and numeric data. This is an example of implicit casting that probably wouldn’t happen in a real application.
Importantly, I didn’t find significant defects in Infobright that would be “show stoppers” for typical OLAP analysis queries. These tests were intended to approximate what would happen if you ported a MyISAM OLAP application to ICE. My testing suggests that with some basic testing, an application could be ported to ICE with a good chance of success. Of course, a good test environment is something that I think every operations team should insist on.
The second part of my analysis focused on testing a total of 29 queries on the large data set. I compared a number of different factors between the two databases, including:
- Ease of installation
- Loading capability and speed of loading
- Accuracy of results of queries over the large data set
- Speed of results of queries over the large data set
- Basic security aspects
The report that I produced may be found here.
It should be noted that the second set of tests included 29 different queries, some of which were provided by Infobright and others which I contributed. InfiniDB does not support as many data types and aggregate functions as ICE, and therefore it could not run some of the queries. ICE supports almost all of the MySQL aggregation functions. Notably, GROUP_CONCAT is not supported, which is something I hope they rectify in a future version.
In addition, I was not able to get accurate results for all of the queries on InfiniDB. In particular the query “select count(*) from carsales.sales_fact” when run on InfiniDB returned a number that was higher than it should have been, and several GROUP BY queries returned unexpected results as well. I did not change any “out of the box” settings for Infobright. Even after I modified the configuration settings on a 16GB box, one query did not have enough memory to complete on InfiniDB.
Overall, I would say that ICE is more “ready for prime time” than InfiniDB given the inconsistencies that I encountered. I will try to reproduce the problems on InfiniDB into easily reproducible test cases which I may contribute to them, but this is difficult given the size of the data set involved. ICE was able to execute queries quickly, and with accurate results.