At the recent OpenSQL Camp in Charlottesville, VA, Tokutek offered a challenge to the MySQL community – who can insert a billion rows into MySQL the fastest? We will post the results on our website and the winner gets a $100 Starbucks card, along with valuable bragging rights.
Tokutek’s technical founders (Michael A. Bender, Martin Farach-Colton, and I), in our academic roles (at Stony Brook, Rutgers, and MIT, respectively) have been investigating how to maintain indexes for large databases. Part of the challenge for this kind of research is to figure out what to measure.
Some other benchmarks, such as TPC-H and SSB, measure bulk load time rather than insertions. We are interested in the case where you must insert a small number of rows at a time at a high rate, and keep the index up-to-date. Indexed insertions are interesting in situations with high incoming data rates and a desire to concurrently query on new data without waiting for periodic batch loads. We wrote, with the help of Tokuteknologist Vincenzo Liberatore, a simple open source benchmark named iiBench, specifically designed to stress indexed insertion performance. Using iiBench, we tested InnoDB and MyISAM, and found that insertion rates for both storage engines drop off dramatically as the database grows.
This benchmark is a work in progress. It has problems, and we’re looking for feedback on how to improve it. The benchmark is essentially to insert a billion rows into a table, maintaining an interesting primary key plus two interesting indexes.
- You may modify the benchmark code.
- The database should be fully indexed after every 1000 insertions. Our intent is that loaded data should be fully indexed as it is loaded, and queries should reflect up-to-the-minute data.
- The benchmark has the obvious flaw that it includes no queries. Again, our intent is that data should be queryable. I would consider a solution that provides fast insertions at the expense of poor query performance to be worse, but I don’t know how to formalize that idea. So for now we just say “you must insert with indexes”, and point out that a bulk load followed by an alter table isn’t satisfying the rules.
- The machine we used is a fairly inexpensive server: A 2-socket quad-core 3.16GHz Xeon with 16GB of main memory and 8 146GB 10,000 RPM SAS drives and a hardware raid 5 controller.
- We used MySQL 5.1 on Linux.
- We’re interested in how to get the best performance on InnoDB or MyISAM.
Although our research is on how to improve insertions, this contest isn’t about how much faster I can solve this problem with some other storage engine. The contest is more like a “peer review” to demonstrate that we’ve gotten as much out of MyISAM and InnoDB as we can. Admittedly, we may not have found the optimal MySQL parameters, so we are sponsoring a contest to see who can insert 1B rows into MySQL the fastest using iiBench. We’re hoping to learn just how fast MySQL can do indexed insertions given better tuning or via other innovative techniques. An overview of the contest with ground rules along with the iiBench source is available at http://www.tokutek.com/contest.php
We want to improve iiBench. Some day it may be good enough to be a useful benchmark.
Take a look and submit an entry by 31 Dec 2008. If you have any questions, please e-mail us at firstname.lastname@example.org.