Since we released TokuMX, one of the most frequent requests has been for a migration tool. TokuMX has a completely different storage format than MongoDB, which means that you have to actually move all of your data out of MongoDB and into TokuMX, you can’t just switch out the servers and use the same data. This is a problem for live applications. You don’t want to just take your whole site down or make it read-only for the entire duration of the migration process.
With the new
mongo2toku tool in TokuMX 1.0.3, you can migrate your data in a totally seamless way, with as little downtime as you’d have doing a normal failover to a secondary. In fact, this tool basically lets TokuMX act like a secondary to a MongoDB replica set.
How does it work?
mongo2toku acts like a bridge, reading oplog entries on the MongoDB set, and replaying them (with TokuMX semantics) on another cluster. You need to pick the right starting point in the oplog (an OpTime), but you can write this down when you take a snapshot of your data and then use it after you’ve restored the snapshot into TokuMX. You can shut it off for a while (it’ll tell you how far it has synced) and turn it back on later, as long as the sync point hasn’t fallen off the end of the vanilla replica set’s oplog.
How should you use it?
The migration process is pretty simple for replica sets. It’s four steps:
- Dump: Take a vanilla secondary out of the set, use vanilla
mongodumpto get a snapshot of the data, and write down the optime that the secondary has synced to (
rs.status()can tell you this).
- Restore: Use TokuMX
mongorestoreto load this dump into TokuMX, and bring up TokuMX. You can start adding secondaries to the TokuMX set at this point.
- Catchup: Use
mongo2tokuto get the TokuMX replica set caught up with the vanilla set, and keep it in sync.
- Switch: Take your application offline for a moment, make sure TokuMX is fully synced up, and then restart your application pointed toward TokuMX. Then you can stop
mongo2tokuand take down the vanilla set.
These are described in detail in the Users’ Guide and in the migration demo above. There are also advanced strategies for migrating with limited resources and for sharded clusters as well.
As always, please let us know what you think and if there’s anything we can do to make it better.