Diving into the MongoDB 4.2 Release Small Print

MongoDB 4.2In my previous blog post “Percona’s View on MongoDB’s 4.2 Release – The Good, the Bad, and the Ugly…” I discussed the release of transaction support in sharded clusters, field-level encryption, search-engine integration, and the new update command syntaxes.

Those are all very important, but to me making MongoDB easier to use and removing technical debt from the code base is even more exciting. These ‘small print’ developments are the ones that DBAs, site reliability, and support engineers need the most.

MMAPv1 storage engine removed

Dearly beloved, we are here today to mourn MMAPv1 Storage Engine. Called into the world 11 years ago:

and sadly gone without any descendant, named “v2” or otherwise. The obituary SERVER-35112 reads: “156 changed files with 89 additions and 33,895 deletions.”

A trip down memory lane

The first time I heard that the simple unix MMAP API was the storage engine in a new database receiving great acclaim for its performance (amongst other things) I had some doubts. It was too simple – just make a file-backed mapping and call fsync() when you need to commit to disk?

But, then I thought; “The kernel manages this and the kernel is really, really efficient”. Then; “But fsync() is orders of magnitude too slow to be called as often as the db ops, so there must durability delay and isn’t that risky?” So, from there I considered the journal (a.k.a. write-ahead log) and replication and … well, soon I wasn’t thinking too much about the MMAP part much. It all led nicely to the exploration of the mechanisms of the whole database daemon process.

MongoDB’s straight-and-plain use of MMAP confirmed (and maybe made a notable contribution to) the software-developing community’s sea change in thinking about hardware. ‘Just put a fat SDRAM DIMM into your servers and use that RAM like it was disk.’  

So there’s a bit of nostalgia here, but let us not forget how coarse the locking was. As far as I/O bottlenecks are concerned, it made all writes to a collection effectively single-threaded. This wasn’t as bad as it might seem – the writes get lined up efficiently from the multiple client-request-servicing threads, and you skip whole realms of conflict detection steps that MVCC methods need. But WiredTiger thrashed MMAP in most benchmarks, particularly ones covering the great majority of real-world cases.

A flat memory range (like the MMAP API provides) cannot support an MVCC mechanism by itself – you add that on top. So, for semantic reasons alone, there can be no transaction-supporting ‘MMAPvX engine’ in the future, either.

queryHash added to log files, currentOp and profiling docs (SERVER-23332)

Here’s one for you site reliability or technical service engineers, weary of trying to parse mongod log files to group the same query type.

The mongod/mongos logger is basically a C++ stream that always prepends four fields: timestamp, severity, component and thread name. What comes after those, however, is a free-form string. It is impossible to write a log-parsing function that can comprehensively parse all the different messages that come in that free-form part, even if each separate developer thought what they were putting there was quite logical.

But, thanks to this improvement there is a new simple text token “queryHash:<8digitHex>” being surfaced in OpDebug::report(). This function prints what goes into COMMAND log lines such as the one shown below.

When you are troubleshooting slow performance, and need to find the commands that are the main trouble-makers, you can now grep, awk, sort and uniq -c so much more easily. I expect in time this will be leveraged to make the mloginfo and mplotqueries run so much faster (4.2+ log files only).

The queryHash will also be included in currentOp output and saved in system.profile profiling sample documents.

Autosplitting thread removed from mongos nodes

“Rather than mongos tracking chunk size and dictating to a shard when a chunk should be split, the primary node of the shard will now track the chunk size, providing more consistent splitting behavior.”

This will possibly surprise many users, but the responsibility of running the balancer and finding large chunks and splitting them was originally programmed to be run by a mongos node, rather than in one config server or some fixed shard node. All the mongos nodes had an extra thread running the code for this necessary sharding maintenance; all would race to take the cluster’s distributed lock and be the one mongos node that would be allowed to proceed.

There could be conflict for the balancer lock etc., which is one problem. Another problem is that the sudden death of mongos node might leave the balancer locked indefinitely (“stale” lock). Both of these were resolved when the balancer logic was migrated to the primary config server in v3.4. Debugging a balancer lock issue would no longer involve looking through every mongos log.

However, auto-splitting remained in the mongos node even after 3.4, when it theory the shard is the most efficient place.

SERVER-34448 marks the end link of a chain of tickets that finally removes the mongos node from these duties. Shards will now search for their own large chunks, and split them and update the chunk ranges in the config db when they are found.

mongod –configExpand “rest,expr”

“4.2 upcoming” documentation link. (This link may stop working in the near future. If so search for “Externally Sourced Configuration File Values” when the 4.2 docs become official.)

There are three reasons this new feature is important to me.

  1. It adds a capacity to modularize your configuration. Put your network section in one file; security bits in another, etc. This will make MongoDB configuration future-compatible for quite a while i.m.o.
  2. Calling in dynamic values – single values, subsections, even the whole configuration – is now possible. This feature will make automation a lot easier, however you do it. Well, maybe not for every possible way, but certainly for most of them. Being able to execute local scripts will be helpful for self-reflection (like finding IP addresses that should be bound) and many, many ‘auto-magic’ tricks. Admittedly this makes a new class of foot + shotgun incidents possible too.
  1.  –outputConfig command-line option. A downside to dynamically-sourced values is risk. Even with static config files, there’s always been the risk that with one typo you might, say, connect your development instances to production, or start using the wrong directory for data files then hit a disk-full error, etc.. Externally-sourced config values will make those type of mistakes easier. But in 4.2 there is the safe –outputConfig option which will let you reflect the YAML back on stdout then exit, without doing anything beyond config parsing. This is also an easy way to quickly reprint your YAML config as it came out of YAML lint. I know this will be a timesaver when updating configuration files.

Improved WiredTiger data file repair

With a replica set, it’s OK, in my opinion, to use a disk storage format that is unrepairable in the event of disk corruption. I say this even knowing disk corruption will probably happen soon enough if you have lots of disks across your environment.

But at the same time if the restore of damaged files in the dbpath directory can be made easier than a full-on forensic exercise, then yes please!

List open cursors (SERVER-3090)

Tears of joy! At last, we can find all cursors, not just the ones that were active the moment a currentOp() was running.

Discuss on HackerNews

Learn more about Percona Server for MongoDB

Share this post

Leave a Reply