I have always believed that TokuMX’s Fractal Tree indexes are an ideal fit for MongoDB’s sharding model, especially when it comes time for balancing to occur. At a very high level, balancing is needed when one shard contains more chunks than another. The actual formula is well described in the MongoDB documentation. Balancing shards impacts performance, so MongoDB implemented a scheduler to allow control of when the balancing process is allowed to occur, hopefully during a window of low activity.
Chunk migration is a 6 step process, but the most expensive operations are as follows:
The above workload is ideal for TokuMX, especially if your working set fits in memory but your full data set is larger than RAM, here is why…
1. Get all the documents for a given chunk from the donor shard.
TokuMX creates a clustering index on the shard key, so the documents in a chunk are located together on disk, so the range scan is fast and the IO is more sequential than random.
2. Insert all the documents from step 1 into the receiving shard.
TokuMX’s indexed insertion performance is unmatched, see the iiBench benchmark.
3. Delete all the documents from step 1 on the donor shard.
TokuMX’s deletes require very little IO, plus TokuMX supports concurrent writers which eliminates blocking the running workload, see the Sysbench benchmark.
Benchmark Description and Environment
The Java code to run this benchmark is available here, I’ll have it on GitHub soon.
Benchmark Results
Always worth noting is insert performance. I inserted 120 million documents into the collection, with 12 concurrent threads. Overall insert throughput for MongoDB was 4,596 inserts per second versus TokuMX’s 43,769 inserts per second.
Once the data was loaded I began the traditional Sysbench workload (point queries, range queries, aggregation, update, delete, insert), and gated the throughput for MongoDB at 15 Sysbench “transactions” per second. TokuMX was gated at 30 transactions per second, double that of MongoDB.
After ten minutes (600 seconds), I performed 3 sh.moveChunk() operations; moving a single chunk from shard0001 to shard0002, then one from shard0002 to shard0000, and finally one from shard0000 to shard0001. The impact of this operation is clearly shown in the following graph.
The TokuMX cluster was running twice as fast as MongoDB, yet was unaffected by the chunk migrations. The MongoDB cluster performance was measurably impacted, sometimes down over 50% for a 10 second interval.
This benchmark specifically highlights a few of the performance advantages of TokuMX (clustering indexes, concurrent writers, low IO index maintenance) but there are others, plus TokuMX includes high compression and transactions. Try it on your workload today by visiting our download page.
Resources
RELATED POSTS