In this blog post, we will discuss how we can migrate data from MongoDB Atlas to self-hosted MongoDB. There are a couple of third-party tools in the market to migrate data from Atlas to Pecona Server for MongoDB (PSMDB), like MongoPush, Hummingbird, and MongoShake. Today, we are going to discuss how to use MongoShake and migrate and sync the data from Atlas to PSMDB.
NOTE: These tools are not officially supported by Percona.
MongoShake is a powerful tool that facilitates the migration of data from one MongoDB cluster to another. These are step-by-step instructions on how to install and utilize MongoShake for data migration from Atlas to PSMDB. So, let’s get started!
A MongoDB Atlas account. I created a test account (replica set) and loaded sample data with one click in Atlas:
|
1 |
Atlas atlas-mhnnqy-shard-0 [primary] test> show dbs<br>sample_airbnb 52.69 MiB<br>sample_analytics 9.44 MiB<br>sample_geospatial 1.23 MiB<br>sample_guides 40.00 KiB<br>sample_mflix 109.43 MiB<br>sample_restaurants 6.42 MiB<br>sample_supplies 1.05 MiB<br>sample_training 46.77 MiB<br>sample_weatherdata 2.59 MiB<br>admin 336.00 KiB<br>local 20.35 GiB<br>Atlas atlas-mhnnqy-shard-0 [primary] test> |
An EC2 instance with PSMDB installed. I installed PSMDB on the EC2 machine:
|
1 |
rs0 [direct: primary] test><br><br>rs0 [direct: primary] test> show dbs<br>admin 40.00 KiB<br>config 12.00 KiB<br>local 40.00 KiB<br>rs0 [direct: primary] test> |
Make sure Atlas and PSMDB both have the same DB version (I have also used this tool on MongoDB 4.2, which is already EOL).
PSMDB version:
|
1 |
rs0 [direct: primary] test> db.version()<br>6.0.9-7<br>rs0 [direct: primary] test> |
MongoDB Atlas version:
|
1 |
Atlas atlas-mhnnqy-shard-0 [primary] test> db.version()<br>6.0.10<br>Atlas atlas-mhnnqy-shard-0 [primary] test> |
To install MongoShake, follow these steps:
Step 1: Install Go
Ensure that Go is installed on your system. If not, download it from the official website and follow the installation instructions. I used Amazon Linux 2, so used the below command to install go:
|
1 |
sudo yum install golang -y |
Step 2: Install MongoShake
Open the terminal and run the following command to install MongoShake:
|
1 |
git clone https://github.com/alibaba/MongoShake.git |
Once you have installed MongoShake, you need to configure it for the migration process. Here’s how:
|
1 |
mongo_urls = mongodb+srv://gautam:****@cluster0.teeeayh.mongodb.net/ // Atlas conn string<br>Tunnel.address = mongodb:127.0.0.1:27017 // PSMDB conn string<br>Sync_mode = all // default incr<br>log.dir = /home/percona/MongoShake/log/ // default /root/mongoshake/ |
There are other parameters as well in the configuration file, which you can tune as per your needs. For example, if you want to read data from the Secondary node and do not want to overwhelm the Primary with the reads, you can set below parameter:
|
1 |
mongo_connect_mode = secondaryPreferred |
Step 3: Once you are done with the configuration, run MongoShake in a screen session like the one below:
|
1 |
./bin/collector.linux -conf=conf/collector.conf -verbose 0 |
Step 4: Monitor the log file in the log directory to check the progress of migration.
Below is the sample log when you start MongoShake:
|
1 |
[2023/09/25 21:09:13 UTC] [INFO] New session to mongodb+srv://gautam:***@cluster0.teeeayh.mongodb.net/ successfully<br>[2023/09/25 21:09:13 UTC] [INFO] Close client with mongodb+srv://gautam:***@cluster0.teeeayh.mongodb.net/<br>[2023/09/25 21:09:13 UTC] [INFO] New session to mongodb+srv://gautam:***@cluster0.teeeayh.mongodb.net/ successfully<br>[2023/09/25 21:09:19 UTC] [INFO] Close client with mongodb+srv://gautam:***@cluster0.teeeayh.mongodb.net/<br>[2023/09/25 21:09:19 UTC] [INFO] GetAllTimestamp biggestNew:{1695675385 26}, smallestNew:{1695675385 26}, biggestOld:{1695668185 9}, smallestOld:{1695668185 9}, MongoSource:[url[mongodb+srv://gautam:***@cluster0.teeeayh.mongodb.net/], name[atlas-mhnnqy-shard-0]], tsMap:map[atlas-mhnnqy-shard-0:{7282839399442677769 7282870323207208986}]<br>[2023/09/25 21:09:19 UTC] [INFO] all node timestamp map: map[atlas-mhnnqy-shard-0:{7282839399442677769 7282870323207208986}] CheckpointStartPosition:{1 0}<br>[2023/09/25 21:09:19 UTC] [INFO] New session to mongodb+srv://gautam:***@cluster0.teeeayh.mongodb.net/ successfully<br>[2023/09/25 21:09:19 UTC] [INFO] atlas-mhnnqy-shard-0 Regenerate checkpoint but won't persist. content: {"name":"atlas-mhnnqy-shard-0","ckpt":1,"version":2,"fetch_method":"","oplog_disk_queue":"","oplog_disk_queue_apply_finish_ts":1}<br>[2023/09/25 21:09:19 UTC] [INFO] atlas-mhnnqy-shard-0 checkpoint using mongod/replica_set: {"name":"atlas-mhnnqy-shard-0","ckpt":1,"version":2,"fetch_method":"","oplog_disk_queue":"","oplog_disk_queue_apply_finish_ts":1}, ckptRemote set? [false]<br>[2023/09/25 21:09:19 UTC] [INFO] atlas-mhnnqy-shard-0 syncModeAll[true] ts.Oldest[7282839399442677769], confTsMongoTs[4294967296]<br>[2023/09/25 21:09:19 UTC] [INFO] start running with mode[all], fullBeginTs[7282870323207208986[1695675385, 26]]<br> |
You will see the below log once full sync is completed, and incr will start (incr means it will start syncing live data via oplog):
|
1 |
[2023/09/25 22:12:04 UTC] [INFO] GetAllTimestamp biggestNew:{1695679924 3}, smallestNew:{1695679924 3}, biggestOld:{1695677613 1}, smallestOld:{1695677613 1}, MongoSource:[url[mongodb+srv://gautam:***@cluster0.teeeayh.mongodb.net/], name[atlas-mhnnqy-shard-0]], tsMap:map[atlas-mhnnqy-shard-0::{7282879892394344449 7282889818063765507}]<br>[2023/09/25 22:12:04 UTC] [INFO] ------------------------full sync done!------------------------<br>[2023/09/25 22:12:04 UTC] [INFO] oldestTs[7282879892394344449[1695677613, 1]] fullBeginTs[7282889689214746625[1695679894, 1]] fullFinishTs[7282889818063765507[1695679924, 3]]<br>[2023/09/25 22:12:04 UTC] [INFO] finish full sync, start incr sync with timestamp: fullBeginTs[7282889689214746625[1695679894, 1]], fullFinishTs[7282889818063765507[1695679924, 3]]<br>[2023/09/25 22:12:04 UTC] [INFO] start incr replication |
You will see the logs like this when both nodes are in sync (when lag is 0, i.e., tps=0):
|
1 |
[2023/09/25 22:14:41 UTC] [INFO] [name=atlas-mhnnqy-shard-0, stage=incr, get=24, filter=24, write_success=0, tps=0, ckpt_times=0, lsn_ckpt={0[0, 0], 1970-01-01 00:00:00}, lsn_ack={0[0, 0], 1970-01-01 00:00:00}]]<br>[2023/09/25 22:14:46 UTC] [INFO] [name=atlas-mhnnqy-shard-0, stage=incr, get=24, filter=24, write_success=0, tps=0, ckpt_times=0, lsn_ckpt={0[0, 0], 1970-01-01 00:00:00}, lsn_ack={0[0, 0], 1970-01-01 00:00:00}]]<br>[2023/09/25 22:14:51 UTC] [INFO] [name=atlas-mhnnqy-shard-0, stage=incr, get=25, filter=25, write_success=0, tps=0, ckpt_times=0, lsn_ckpt={0[0, 0], 1970-01-01 00:00:00}, lsn_ack={0[0, 0], 1970-01-01 00:00:00}]]<br>[2023/09/25 22:14:56 UTC] [INFO] [name=atlas-mhnnqy-shard-0, stage=incr, get=25, filter=25, write_success=0, tps=0, ckpt_times=0, lsn_ckpt={0[0, 0], 1970-01-01 00:00:00}, lsn_ack={0[0, 0], 1970-01-01 00:00:00}]] |
Once the full data replication process is complete and both clusters are in sync, you can stop pointing the application to Atlas. Check the logs of MongoShake, and when the lag is 0, as we can see in the above logs, stop the replication/sync from Atlas or stop MongoShake. Verify that the data has been successfully migrated to PSMDB. You can use MongoDB shell or any other client to connect to the PSMDB instance to verify this.
MongoDB Atlas databases and their collection count:
|
1 |
Database: sample_airbnb<br>-----<br>Collection 'listingsAndReviews' documents: 5555<br><br>Database: sample_analytics<br>-----<br>Collection 'transactions' documents: 1746<br>Collection 'accounts' documents: 1746<br>Collection 'customers' documents: 500<br><br>Database: sample_geospatial<br>-----<br>Collection 'shipwrecks' documents: 11095<br><br>Database: sample_guides<br>-----<br>Collection 'planets' documents: 8<br><br>Database: sample_mflix<br>-----<br>Collection 'embedded_movies' documents: 3483<br>Collection 'users' documents: 185<br>Collection 'theaters' documents: 1564<br>Collection 'movies' documents: 21349<br>Collection 'comments' documents: 41079<br>Collection 'sessions' documents: 1<br><br>Database: sample_restaurants<br>-----<br>Collection 'neighborhoods' documents: 195<br>Collection 'restaurants' documents: 25359<br><br>Database: sample_supplies<br>-----<br>Collection 'sales' documents: 5000<br><br>Database: sample_training<br>-----<br>Collection 'posts' documents: 500<br>Collection 'trips' documents: 10000<br>Collection 'grades' documents: 100000<br>Collection 'routes' documents: 66985<br>Collection 'inspections' documents: 80047<br>Collection 'companies' documents: 9500<br>Collection 'zips' documents: 29470<br><br>Database: sample_weatherdata<br>-----<br>Collection 'data' documents: 10000<br><br><br>Atlas atlas-mhnnqy-shard-0 [primary] sample_weatherdata> |
PSDMB databases and their collection count:
|
1 |
rs0 [direct: primary] test> show dbs<br>admin 80.00 KiB<br>config 240.00 KiB<br>local 468.00 KiB<br>mongoshake 56.00 KiB<br>sample_airbnb 52.20 MiB<br>sample_analytics 9.21 MiB<br>sample_geospatial 984.00 KiB<br>sample_guides 40.00 KiB<br>sample_mflix 108.17 MiB<br>sample_restaurants 5.57 MiB<br>sample_supplies 980.00 KiB<br>sample_training 40.50 MiB<br>sample_weatherdata 2.39 MiB<br>rs0 [direct: primary] test> |
|
1 |
Database: sample_airbnb<br>-----<br>Collection 'listingsAndReviews' documents: 5555<br><br>Database: sample_analytics<br>-----<br>Collection 'transactions' documents: 1746<br>Collection 'accounts' documents: 1746<br>Collection 'customers' documents: 500<br><br>Database: sample_geospatial<br>-----<br>Collection 'shipwrecks' documents: 11095<br><br>Database: sample_guides<br>-----<br>Collection 'planets' documents: 8<br><br>Database: sample_mflix<br>-----<br>Collection 'embedded_movies' documents: 3483<br>Collection 'users' documents: 185<br>Collection 'theaters' documents: 1564<br>Collection 'movies' documents: 21349<br>Collection 'comments' documents: 41079<br>Collection 'sessions' documents: 1<br><br>Database: sample_restaurants<br>-----<br>Collection 'neighborhoods' documents: 195<br>Collection 'restaurants' documents: 25359<br><br>Database: sample_supplies<br>-----<br>Collection 'sales' documents: 5000<br><br>Database: sample_training<br>-----<br>Collection 'posts' documents: 500<br>Collection 'trips' documents: 10000<br>Collection 'grades' documents: 100000<br>Collection 'routes' documents: 66985<br>Collection 'inspections' documents: 80047<br>Collection 'companies' documents: 9500<br>Collection 'zips' documents: 29470<br><br>Database: sample_weatherdata<br>-----<br>Collection 'data' documents: 10000<br><br><br>rs0 [direct: primary] sample_weatherdata><br><br> |
Above, you can see we have verified data in PSMDB. Now, update the connection string of the application to point to PSMDB.
NOTE: Sometimes, during the migration process, it is possible for some indexes to replicate. So, during the data verification process, please verify the indexes, and if an index is missing, create that index before the cutover time.
MongoShake simplifies the process of migrating MongoDB data from Atlas to self-hosted MongoDB. Percona experts can assist you with migration as well. By following the steps outlined in this blog, you can seamlessly install, configure, and utilize MongoShake for migrating your data from MongoDB Atlas.
To learn more about the enterprise-grade features available in the license-free Percona Server for MongoDB, we recommend going through our blog MongoDB: Why Pay for Enterprise When Open Source Has You Covered?
Percona Distribution for MongoDB is a freely available MongoDB database alternative, giving you a single solution that combines the best and most important enterprise components from the open source community, designed and tested to work together.
Resources
RELATED POSTS