Since we released TokuMX, one of the most frequent requests has been for a migration tool. TokuMX has a completely different storage format than MongoDB, which means that you have to actually move all of your data out of MongoDB and into TokuMX, you can’t just switch out the servers and use the same data. This is a problem for live applications. You don’t want to just take your whole site down or make it read-only for the entire duration of the migration process.
With the new mongo2toku tool in TokuMX 1.0.3, you can migrate your data in a totally seamless way, with as little downtime as you’d have doing a normal failover to a secondary. In fact, this tool basically lets TokuMX act like a secondary to a MongoDB replica set.
I’ve recorded an asciicast demo walking through a full migration process: ascii.io/a/4285. It’s about 15 minutes long. You can see the scripts I use and the presentation itself on github.

Basically, mongo2toku acts like a bridge, reading oplog entries on the MongoDB set, and replaying them (with TokuMX semantics) on another cluster. You need to pick the right starting point in the oplog (an OpTime), but you can write this down when you take a snapshot of your data and then use it after you’ve restored the snapshot into TokuMX. You can shut it off for a while (it’ll tell you how far it has synced) and turn it back on later, as long as the sync point hasn’t fallen off the end of the vanilla replica set’s oplog.
The migration process is pretty simple for replica sets. It’s four steps:
mongodump to get a snapshot of the data, and write down the optime that the secondary has synced to (rs.status() can tell you this).
mongorestore to load this dump into TokuMX, and bring up TokuMX. You can start adding secondaries to the TokuMX set at this point.
mongo2toku to get the TokuMX replica set caught up with the vanilla set, and keep it in sync.
mongo2toku and take down the vanilla set.These are described in detail in the Users’ Guide and in the migration demo above. There are also advanced strategies for migrating with limited resources and for sharded clusters as well.
As always, please let us know what you think and if there’s anything we can do to make it better.