We modified the iiBench benchmark to perform deletions as well as insertions, and compared InnoDB to Tokutek’s Fractal TreeTM storage engine, both running on MySQL 5.1. I’ll post the revised iiBench tarball soon.
Here is what the performance looks like:
The iiBench-with-deletions benchmark works as follows. The benchmark employs a fact table with an autoincremented primary key. First it inserts 250M rows (1000 rows per INSERT statement). Then it treats the fact table as a FIFO: every time it inserts 1000 rows at the end, it deletes 1000 rows from the beginning. The benchmark maintains three indexes as data is inserted and deleted. As usual for iiBench, the three indexes exhibit a lot of entropy (that is, they are essentially random).
In the graph above, the X-axis shows the number of items inserted so far. The Y axis shows the number of insertions per second (insertions/s), where each data point averages a million insertions. The Tokutek engine starts at around 20,000 insertions/s, and slowly drops down to around 15,000 insertions/s after 250M insertions. Then the deletions start executing, and the insertion rate drops to about 5,000 insertions/s. That’s 5,000 insertions plus 5,000 deletions per second. In the Tokutek engine, deletions are about twice as expensive as insertions, which explains the performance change.
For InnoDB we have collected only 387M rows so far. For the most recent 10M rows, InnoDB averaged 204 rows/s. Assuming it doesn’t slow down further, it will take about 6 more days to complete 500M rows.
We quit after 500M rows because the Tokutek performance leveled off. For the last 10M rows, Tokutek averaged 5,088 insertions/s, about 25x faster than InnoDB.
We ran iiBench on the same hardware as usual, a Sun x4150, 8 cores @ 3.16GHz, 16GB memory, 6 SAS disk HW RAID 0.
For the insertion-only measurements that I reported last week, after the 1B insertion point, InnoDB was completing 876 insertions/s, and Tokutek was completing 11,220 insertions/s, a ratio of 12.8x.
We added this “fifo-delete” pattern to the benchmark because we frequently meet database users who want to maintain a rolling window of data. For example, one user we talked to wants to maintain the most recent 60 days of data, inserting new data as it arrives, and deleting the old data. Using MyISAM or InnoDB, one way to solve the problem is to partition (shard) the database by time, one partition per day. Sharding is good for insertions and deletions (which can simply drop the partition), but slows cross-date queries.
We believe that a storage engine that can maintain rich indexes while inserting and deleting can provide improved performance and simplify MySQL administration, but that’s a topic for another posting.