A MongoDB Prototype With New Heterogeneous-Memory Storage Engine (Hse)

mongodb heterogeneous memory storage engineIntroducing a new MongoDB Storage Engine

Q. What is the Heterogeneous-memory Storage Engine?

A key value store library developed (and open-sourced) by Micron that will work with any normal storage but works especially well with emerging SSDs or NVDIMMS (or other Storage Class Memory) that have faster even NVM media in them (Bleeding-edge NAND, or Optane / 3D XPoint).

Q. So it goes faster with faster NVM storage? Doesn’t everything?

No. The maximum potential of Non-volatile memory storage devices isn’t capitalized on when used like classic block devices. And this will apply to future NVM storage products even more so.

This storage engine (or more to the point the “mpool” driver it uses) only accepts writes in blocks or append-only log streams which end up being whole blocks too as soon as they’re committed. It will work with the NVME Zoned Namespaces (ZNS) spec when SSD drives supporting it come out. It has also been run with a NVDIMM sometime earlier in the development (a year or more?) but a caveat is no test with a NVDIMM  has been re-run recently.

Beyond a point of speed, there’s also one of endurance. Bytes in this media, unlike HDD, cannot be rewritten – a page (say 4kb ~ 16kb) can only be written in one go, and has to be completely erased before being written to again. An application that modifies a page in n steps is inadvertently causing n full-page writes and erasures in the SSD’s flash. If this is your software’s typical write pattern your SSD ‘s endurance will be reduced by a factor of n. From what I gather reading through database-related papers and presentations from the storage industry average n seems to be 3 ~ 6.

Q. Does this HSE storage engine beat WiredTiger in MongoDB?

Yes, especially when the write load is high. And this applies to SSDs that only support traditional block interfaces so far, without using ZNS SSDs or SCM. Please see the YCSB test case results.

But it was built a while ago, so No in a marketplace sense for now because it is only available in the already EOL’ed 3.4 version of MongoDB. There’s no blockers to making a v3.6, v4.0+ compatible version according to the developers, but it hasn’t been kept in sync with MongoDB storage API changes since it was developed a year or two ago.

Latency

The best feature of HSE MongoDB isn’t the improved average latency / higher throughput for heavy write loads. It is the much better tail latency.

A checkpointing storage engine such as WiredTiger will have high latency during checkpoints if a large volume of updates/inserts is written between one checkpoint and the next. It’s like hitting a road-bump once per minute. This is not WiredTiger’s bug – it’s well-tuned as it can be by default, and affords further manual tuning as well. Periodic bump latency is a property/symptom of any consistent data store that does complete flushing periodically rather than continuous.

When tail latency is your key SLA HSE-using MongoDB would definitely be better than WiredTiger for you. (Probably RocksDB too, so long as the compaction is tuned.)

Q. New driver – More admin work?

Although you have to install an extra driver and initialize an SSD to be used by it, I think the answer is no, it would end up reducing admin work for DBAs who will have to scale up their DB in the coming years. And that isn’t that the case for the majority of database deployments?

This storage engine will enable better vertical scaling by using the NVM storage that will outperform normal SSDs. That, in turn, will delay the day you have to start horizontal scaling (i.e. change to a sharded cluster). Or, if you already are sharded, it will reduce the number of shards needed.

Summary

Micron has created, open-sourced (and published to Redhat repositories so far) a new driver and an associated Key-value Store library (“HSE”) that improves performance and endurance for non-volatile memory media types (including stock-standard SSDs).

The HSE library is an interesting project in its own right for Key-value applications in general, but it was also wrapped as a MongoDB storage engine as proof of concept. In throughput and average latency, this engine matches or exceeds WiredTiger performance depending on the load type. As a rough summary, it is several multiples faster for high write loads; Plus or minus 10% on the low write cases as far as I see. The best point though is that WiredTiger checkpoint impacts can be avoided and hence latency is more consistent.

How to Try MongoDB With HSE

Overview

You can build from source for yourself, or just install from packages already made for RHEL 7.7 or 8.1.

But either way, before you can run the modified v3.4 MongoDB binaries the following prerequisites must be done.

  • A) Install mpool kernel module and util commands
  • B) Format/initialize one (or more) drives as a mpool
  • C) Install the HSE library, create an HSE KVDB on the mpool

Installing HSE and Its mpool Dependencies

The HSE project’s wiki includes install instructions for its prerequisites mpool-kmod, mpool and mpool-dev

https://github.com/hse-project/hse/wiki/Install-from-Packages

❗Only supported in RHEL 7.7 or RHEL 8.1

Eg. Attempting to build from source on Ubuntu 18 hit the following make error in mpool-kmod

akira:pvar_src$ cd mpool-kmod
akira:mpool-kmod$ make package
Makefile:168: *** invalid MPOOL_DISTRO (unknown unknown0 0 0 unsupported) . Stop.

Eg. 2. the mpool.ko module will fail to be installed with “Invalid parameters” error in RHEL 8.0.

RHEL 8.1 was used in this document’s examples.

Installing mpool-kmod From Package

Download the rpm package from https://github.com/hse-project/mpool-kmod/releases. In this case it was mpool-kmod-1.7.0-r107.20200416-4.18.0-147.el8.x86_64.rpm.

❗ If this package fails to install the kernel module due to the “modprobe: ERROR: could not insert ‘mpool’: Permission denied” error (shown in example mis-installation above) that is a known bug caused by a conflict with SELinux on some but not all distributions – it seems to be an issue in some that are found in AWS at the moment. Run the sudo command below as a workaround. Confirm the “mpool” is module loaded by checking for it in the output of lsmod. You will need to repeat this after each restart.

Installing mpool and mpool-dev

Download the rpm packages from the https://github.com/hse-project/mpool/releases. In this case they were mpool-1.7.0-r106.20200416.el8.x86_64.rpm and mpool-devel-1.7.0-r106.20200416.el8.x86_64.rpm.

Create a mpool Device and Test It

https://github.com/hse-project/mpool/wiki

https://github.com/hse-project/mpool/wiki/Create-and-Destroy (The briefer quickstart suggestions in the HSE KV store documentation at https://github.com/hse-project/hse/wiki/Configure-Storage are also sufficient.)

Before proceeding: Confirm that /dev/mpoolctl exists – if it doesn’t then mpool-kmod was not installed successfully.

This example shows a server with an as-of-yet unmounted, unformatted 1.7TB disk /dev/nvme0n1, which is the one that will be used by mpool. As it is only one for this test I’ve skipped putting it under LVM.

Executing this command with the “mpool” command-line tool to create a mpool device. The mpool dev name “mydb” used here is chosen arbitrarily.

Installing hse and hse-devel

Download the rpm packages from the https://github.com/hse-project/hse/releases. In this case they were hse-1.7.0-r193.20200420.el8.x86_64.rpm and hse-devel-1.7.0-r193.20200420.el8.x86_64.rpm.

Test a HSE KVDB Can Be Created

https://github.com/hse-project/hse/wiki/Create-a-KVDB

❗ The KV DB shares the name with the mpool device. I.e. whatever name you gave in the “mpool create” command must be used again here. The syntax “kvdb create <name>” suggests you’re choosing the name, and can do so arbitrarily. But you’re only specifying the encompassing mpool device. (A better syntax i.m.o. would be “hse kvdb create –mpool <name>”)

There are more operations that can be done such as creating the KV stores within this KVDB, but for the purpose of installation confirmation, the above is enough.

Time to Rename: The examples above have created a mpool device and HSE KVDB called “mydb”. The following section for running MongoDB with the HSE MongoDB storage engine assumes it will be called “mongoData” instead. So now would be a good time to deactivate and rename this mpool if you want to follow the next section letter-for-letter.

MongoDB with HSE

https://github.com/hse-project/hse/wiki/MongoDB

Installing From Packages

See https://github.com/hse-project/hse/wiki/MongoDB#install-mongodb-with-hse-from-packages

Building hse-mongo From Source

There are instructions at https://github.com/hse-project/hse/wiki/MongoDB#compile-mongodb.

For those already familiar with building MongoDB I summarize it as being like this:

  • You build the “v3.4.17.1-hse-1.7.0” branch
  • libuuid-devel, lz4-devel, and openssl-devel are extra dependencies
  • –ssl will work in RHEL 7.7, but does not in RHEL 8.1 (for now at least).
  • The HSE Wiki instructions install scons as a normal executable, which you would get by yum, dnf or pip package. I used the buildscript/scons.py script already in the source code instead, by habit. I found I had to add “-D MONGO_VERSION=3.4.17” as an extra scons parameter to start the build.

Configuration of the mongod Node

https://github.com/hse-project/hse/wiki/MongoDB#new-mongodb-options

N.b. the storage.hse.mpoolName option will have to be the same as a mpool device you’ve already created, and a HSE KVDB will need to be created on it. If you haven’t already done that do so before proceeding. If you want it is possible to change the mpool name (see Managing KVBD).

See the “New MongoDB Options” link into the HSE Wiki above for an example of the arguments that must be set in the mongod.conf options file. You’ll probably be merging those into an existing configuration file template you use; beware that there are some comments that make it easy to miss the nested YAML levels. In particular the “engine:” and “hse:”+”mpoolName:” lines are meant to be under “storage:” section.

Start the mongod Node

Launching

Common error warning: If the KVDB named in the storage.hse.mpoolName configuration option is missing, or the wrong mpool name, the node will have a fatal assertion and abort. In the mongod log file it will look like this:

If it starts OK the following will be printed to stdout as it begins:

The mongod log in this MongoDB 3.4.7 + HSE 1.7.0 build contains nothing special when it starts normally. As of this version (April 2020) the only evidence in the log of the HSE storage engine being used is in the “[initandlisten] options” line that reflects the configuration options.

Post-Launch, Regular MongoDB Administration

If Standalone node, or first node in new replica set

The first time you connect with the mongo shell there will be no authentication or authorization enabled. So simply use the “mongo” shell without any parameters except the host (and that can be empty if it’s localhost:27017).

If this node has replication enabled run rs.initiate() first.

It is of no concern to the HSE Storage engine, but by habit this is when we create the first user in a new replicaset or cluster, so let’s do that now.

Adding a HSE node to an existing replicaset

This procedure is not HSE storage engine-specific, this is just a reminder of the standard MongoDB procedure.

If it is on a host:port that is new to the replica set connect to the current primary and run rs.add(“…”) to include it. If the HSE node is being started in place of an existing WiredTiger (or MMAPv1) node then nothing should be done other than to start it – the other nodes will notify and share the rs config so long as the replicaset name (i.e. the replication.setName config value) It will replicate everything, including user authentication information, from the other nodes.

Check HSE Storage Engine in Effect

One way to confirm dynamically that a mongod is a HSE-using one is to look for the presence of a “hse” child object in the db.serverStatus() output:

Share this post

Leave a Reply