Recently, one of our customers reported a problem after upgrading a sharded cluster from MongoDB 5.0 to 6.0. The upgrade of data-bearing nodes was fine, but in the final part of the process, where mongos routers needed to be restarted, the new version did not go well.
This caused problems for the applications, where suddenly all queries started failing. Going through the logs, the following entries appeared multiple times:
1 |
Command find failed: Encountered non-retryable error during query :: caused by :: BSON field 'DatabaseVersion.timestamp' is missing but a required field |
This seemed to point to a metadata mismatch between the mongos router and the config servers.
When looking at the upgrade guide in detail, we can see that there is a new extra step introduced in the MongoDB 6.0 release notes. We need to refresh the cached routing table on all mongos routers as a prerequisite for the upgrade by running:
1 |
db.adminCommand("flushRouterConfig") |
This was not required when upgrading previous versions of MongoDB, and interestingly it is also not required for migrating to MongoDB 7.0+.
After checking with the customer, it turns out that this extra step was missed when performing the upgrade procedure. So, in the end, refreshing the cached routing table for each of the mongos routers solved the issue.
This highlights the importance of a thorough review of the release notes for every new software version. Otherwise, we might be tempted to follow the same procedures we are used to and miss new required steps, which leads to unforeseen issues.
good!