Why a Partitioned Collection Cannot Be ShardedZardosht.Kasheff
In TokuMX 1.5, we introduced partitioned collections for non-sharded clusters. That is, one can have a partitioned collection in a replica set, but one cannot shard a partitioned collection. In this post, I explain why.
As I mentioned here, partitioned collections are useful for time-series data where we would like to keep a rolling period of data (e.g. last 6 months worth). In all likelihood, most writes go to the last partition and periodically (yet infrequently), the first partition is efficiently dropped to reclaim space. So, partitioned collections are designed to efficiently maintain a finite amount of data (kind of like capped collections). But this does not explain why we don’t allow a partitioned collection to be sharded.
One can envision reasons to have a partitioned collection be sharded. Suppose a user wants to use a partitioned collection to drop data efficiently, but the write load on the last partition is so great that a single machine cannot keep up. With a normal collection, one would shard, so naturally, one would like to shard in this scenario as well.
So why do we not allow this? In short, partitioning a non-sharded collection is a separate feature from partitioning a sharded collection, and we’ve decided to do just one before the other. Or, put another way, putting sharding on top of a partitioned collection is not performant, so if we want sharded+partitioned collections, we likely need to put partitioning on top of sharded collections.
Let me elaborate.
Suppose we want to allow the sharding of a partitioned collection. Data migrations over the partitioned collection would not be performant. Here is why. With a partitioned collection, the partition key is likely a key with a right-most insertion pattern, like timestamp. With a sharded collection, the shard key is likely a key that distributes writes across shards, or, in other words, random. Therefore, the partition key is not the same as the shard key. So, each chunk will be distributed across all partitions. This makes splitting chunks, and eventually migrating data, challenging, because the information about each chunk will be divided amongst partitions.
To properly support a sharded and partitioned collection (in my opinion), the proper way to do it is as follows. Instead of building sharding on top of partitioning, build partitioning on top of sharding. That is, have config servers and mongos conceptually understand a sharded collection that is partitioned, the way TokuMX’s mongod understands a non-sharded collection that is partitioned. This way, each individual partition is its own sharded collection.
As one can tell, this feature is its own separate workitem from what we’ve done. So, we at Tokutek had a two options. Option 1 was to wait until we developed both features before releasing either. Option 2 was to get the first feature, non-sharded/partitioned collections, out to the world so it can be used by the many applications that don’t require sharding, and implement sharded/partitioned collections once we received feedback showing it was worthwhile. We chose option 2.