EmergencyEMERGENCY? Get 24/7 Help Now!

TokuMX 1.0.3: Seamless Migrations from MongoDB

 | July 23, 2013 |  Posted In: Tokutek, TokuView

PREVIOUS POST
NEXT POST

Since we released TokuMX, one of the most frequent requests has been for a migration tool. TokuMX has a completely different storage format than MongoDB, which means that you have to actually move all of your data out of MongoDB and into TokuMX, you can’t just switch out the servers and use the same data. This is a problem for live applications. You don’t want to just take your whole site down or make it read-only for the entire duration of the migration process.

With the new mongo2toku tool in TokuMX 1.0.3, you can migrate your data in a totally seamless way, with as little downtime as you’d have doing a normal failover to a secondary. In fact, this tool basically lets TokuMX act like a secondary to a MongoDB replica set.

I’ve recorded an asciicast demo walking through a full migration process: ascii.io/a/4285. It’s about 15 minutes long. You can see the scripts I use and the presentation itself on github.

Migrating from MongoDB to TokuMX
Migrating from MongoDB to TokuMX

How does it work?

Basically, mongo2toku acts like a bridge, reading oplog entries on the MongoDB set, and replaying them (with TokuMX semantics) on another cluster. You need to pick the right starting point in the oplog (an OpTime), but you can write this down when you take a snapshot of your data and then use it after you’ve restored the snapshot into TokuMX. You can shut it off for a while (it’ll tell you how far it has synced) and turn it back on later, as long as the sync point hasn’t fallen off the end of the vanilla replica set’s oplog.

How should you use it?

The migration process is pretty simple for replica sets. It’s four steps:

  1. Dump: Take a vanilla secondary out of the set, use vanilla mongodump to get a snapshot of the data, and write down the optime that the secondary has synced to (rs.status() can tell you this).
  2. Restore: Use TokuMX mongorestore to load this dump into TokuMX, and bring up TokuMX. You can start adding secondaries to the TokuMX set at this point.
  3. Catchup: Use mongo2toku to get the TokuMX replica set caught up with the vanilla set, and keep it in sync.
  4. Switch: Take your application offline for a moment, make sure TokuMX is fully synced up, and then restart your application pointed toward TokuMX. Then you can stop mongo2toku and take down the vanilla set.

These are described in detail in the Users’ Guide and in the migration demo above. There are also advanced strategies for migrating with limited resources and for sharded clusters as well.

As always, please let us know what you think and if there’s anything we can do to make it better.

PREVIOUS POST
NEXT POST

6 Comments

  • Hi –
    Whats the best way to default compression settings on the mongorestore initial load? Is there a way to globally set the default collection and index compression algorithms?
    Thanks!

    • There isn’t one, and that’s a great idea. I just put this in our issue tracker, you can track it’s progress there: http://github.com/Tokutek/mongo/issues/377.

      For now, you may be able to modify the dumped data files by hand but I’m pretty sure you’ll only be able to change the options for the secondary indexes, not the _id index.

      • Got it, thanks a lot. Migration seems fairly painless given this tool!

        FWIW, I got around the compression defaulting issue by scripting pre-creation of all my collections using db.runCommand with the compression tag on them. I then ran mongorestore without the drop option enabled so it loaded using that compression scheme.

        Thanks!

  • Do you need to do a mongodump/mongorestore or can you use mongo2toku and MongoDB’s normal initial syncing feature to do a complete sync?

Leave a Reply