Companies use specific database versions because they’re proven performers or because it’s hard to keep up with frequent releases. But lagging behind has some major issues. When it’s time to upgrade, is it better to update binaries through each major revision or skip versions?

TL;DR: Upgrading a MongoDB cluster using backups and skipping versions is not recommended, but this article demonstrates how to upgrade from v3.6 to v7.0 and the issues you might encounter doing it.

Introduction

Keeping our database environments up-to-date with the vendors is one of the main tasks of database administrators since it is generally recommended to upgrade our systems regularly to take advantage of the fixes and improvements that come with each major release. It is common knowledge that some companies prefer to follow a policy to hold on to a specific version due to its ease of use or because the following versions show some worsening in performance. As new features are added to the database, it can become more complex and slower.

It is sometimes hard to stay up-to-date in a context in which major versions are released frequently. We can find ourselves multiple versions behind, or worse, using a version in production that has reached its lifecycle end-of-life, resulting in exposure to bugs, absence of new features, reduced support, and possible issues with certifications and auditory.

Taking into consideration that the standard and recommended upgrade path of MongoDB is to update the binaries through each major version, when you find yourself using MongoDB v3.6 and wanting to upgrade to something like v6.0 or v7.0, you already imagine going through 3.6 -> 4.0 -> 4.2 -> 4.4 -> 5.0 -> 6.0, with multiple failovers involved in case of replica sets. If the environment can’t have any downtime, numerous driver upgrades are needed, too. It is frequent to see administrators postpone this task indefinitely, mainly when there are multiple servers and environments to upgrade.

Using backups and restores, you could skip the multiple versions of the standard upgrade path with the risk of having various issues with your data during and after the restore and having no official support or documentation for the process. Also, data size becomes a huge factor, as backing up and restoring logically something like 5TB can be a lengthy and painful task.

Environment

MongoDB version

For this experiment, I deployed:

  • One replica set with 3 PSMDB nodes in v3.6.13
  • One replica set with 3 PSMDB nodes in v7.0.11
  • One PBM Agent in v2.5.0 for each node, backing up data to a shared storage location (the ancient PSMDB version is not certified with any PBM, so we are at our own risk here (again)). I used another Percona blog post, Configuring Percona Backup for MongoDB in a Multi-Instances Environment, as the source for this task.

Data

I inserted data divided into two collections containing Binary data types 0, 2, 3, and 4. The second collection had a specific collation (es@collation=search). I also created four indexes and two views per collection.

Process

Preparing for the upgrade

  • Backup your MongoDB data to prevent data loss during the upgrade process.
  • Check the current MongoDB version using the command $ mongod –version.
  • Ensure you have the latest version of the MongoDB drivers installed.
  • Plan the upgrade during a predefined maintenance window to minimize downtime.
  • Consider the compatibility of your applications with the new MongoDB version.

1. Start the v3.6 nodes and set up PBM

I used mlaunch for this, which helped me spin up three mongod processes quickly.

2. Insert data, views, and indexes

Here, I used a custom script that generates and inserts 500k extensive documents into the two database collections. Example of a document:

Indexes and views:

3. Take a logical backup

A warning from PBM reminds us that the version is not certified, but the backup is executed even so.

4. Keep inserting data to create PITR chunks

As PITR is enabled in PBM, I inserted 500k more documents after the backup to have something in the oplog to replay

5. Startup the v7.0 nodes and setup PBM

6. Restore the logical backup and the PITR

PBM has a command called pbm status that returns, among other information, the snapshots and point-in-time-intervals available for restoration:

So I restored to the latest point-in-time available to make sure that the oplog will be replayed too:

7. Check data on both sides

After the restore was completed, I first checked if counts on both sides matched with stats():

After the successful match, I used another script to validate if the data was the same on both collections:

Surprisingly or not, some of the data did not match:

When I visually compared the entries on both sides, it was clear that the binary field with subtype 2 had issues during the restore:

One interesting discovery here is that, as the issues started and continued after the 500k counter, the problem is related to the oplog replay process, not to the dump and restore, since the first 500k documents were inserted before the dump and the last 500k after.

Although investigating the issue with the oplog replay is out of the scope of this post, I ran a couple of backups and restores (with and without PBM), and this issue does not happen if the oplog replay is not involved.

I also redid the process using mongodump + mongorestore with oplog replay instead of PBM. The process finished with success:

But the data inconsistency problem also happened:

Post-upgrade tasks and best practices

  • Update your applications to use the latest MongoDB drivers.
  • Take a backup of your data after the upgrade.
  • Monitor the MongoDB instance for any issues or errors.
  • Consider implementing a regular backup and maintenance schedule.
  • Keep your MongoDB instance up-to-date with the latest security patches and updates.

Conclusion

Even with the apparent success of the restore process in a different engine version, we can see that some silent errors might happen, and some data can be changed in the process, which we can only discover after some time and after some incorrect data was used in our applications, reports, forms and such. 

The tested, approved, supported, and recommended way is to follow the instructions in the documentation, and if you need some help, Percona Experts are available to consult you. If you want to proceed on this path anyway, you should validate your data and test the applications with it thoroughly in a non-productive environment before you move on with it to production.

before you renew mongodb

Subscribe
Notify of
guest

2 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
lee

Can I know how to check the consistency after transferring MongoDB?
Ex) Collection Count
And in the case of MongoDB in general, to what level do we check the consistency after transfer?

In my case, MYSQL monitors the number of objects / ROWCOUNT per table.