Differentiating Between MongoDB Storage EnginesThe tremendous data growth of the last decade has affected almost all aspects of applications and application use. Since nearly all applications interact with a database at some point, this means databases needed to adapt to the change in usage conditions as well. Database technology has grown significantly in the last decade to meet the needs of constantly changing applications. Enterprises often need to scale, modify, or replace their databases in order to meet new business demands.
Within a database management system (DBMS), there are many levels that can affect performance, including the choice of your database storage engine. Surprisingly, many enterprises don’t know they have a choice of storage engines, or that specific storage engine types are architected to handle specific scenarios. Often the best option depends on what function the database in question is designed to fulfill.
With Percona’s acquisition of Tokutek, we’ve moved from a mostly-MySQL company to having several MongoDB-based software options available.
MongoDB is a cross-platform, NoSQL, document-oriented database. It doesn’t use the traditional table-based relational database structure, and instead employs JSON-type documents with dynamic schemas. The intention is making the integration of certain application data types easier and faster.
This blog (the first in a series) will briefly review some of the available options for a MongoDB database storage engine, and the pros and cons of each. Hopefully it will help database administrators, IT staff, and enterprises realize that when it comes to MongoDB, you aren’t limited to a single storage engine choice.
A database storage engine is the underlying software that a DBMS uses to create, read, update and delete data from a database. The storage engine should be thought of as a “bolt on” to the database (server daemon), which controls the database’s interaction with memory and storage subsystems. Thus, the storage engine is not actually the database, but a service that the database consumes for the storage and retrieval of information. Given that the storage engine is responsible for managing the information stored in the database, it greatly affects the overall performance of the database (or lack thereof, if the wrong engine is chosen).
Most storage engines are organized using one of the following structures: a Log-Structured Merge (LSM) tree, B-Tree or Fractal tree.
Fractal Trees and the LSM trees sound very similar. The main differentiating factor, however, is the manner in which they sort the data into the tree for efficient searches. LSM trees merge data into a tree from a series of logs as the logs fill up. Fractal Trees sort data into log-like structures (message buffers) along the proper data path in the tree.
That question is not a simple one. In order decide which engine to choose, it’s necessary to determine the core functionality provided in each engine. Core functionality can generally be aggregated into three areas:
Now that we’ve established a general framework, we’ll move onto discussing engines. For the first blog in this series, we’ll look at MMAPv1 (the default storage engine that comes with MongoDB up until the release 3.0).
Find it in: MongoDB or Percona builds
MMAPv1 is MongoDB’s original storage engine, and was the default engine in MongoDB 3.0 and earlier. It is a B-tree based system that offloads much of the functions of storage interaction and memory management to the operating system. MongoDB is based on memory mapped files.
The MMAP storage engine uses a process called “record allocation” to grab disk space for document storage. All records are contiguously located on disk, and when a document becomes larger than the allocated record, it must allocate a new record. New allocations require moving a document and updating all indexes that refer to the document, which takes more time than in-place updates and leads to storage fragmentation. Furthermore, MMAPv1 in it’s current iterations usually leads to high space utilization on your filesystem due to over-allocation of record space and it’s lack of support for compression.
As mentioned previously, a storage engine’s locking scheme is one of the most important factors in overall database performance. MMAPv1 has collection-level locking – meaning only one insert, update or delete operation can use a collection at a time. This type of locking scheme creates a very common scenario in concurrent workloads, where update/delete/insert operations are always waiting for the operation(s) in front of them to complete. Furthermore, oftentimes those operations are flowing in more quickly than they can be completed in serial fashion by the storage engine. To put it in context, imagine a giant supermarket on Sunday afternoon that only has one checkout line open: plenty of customers, but low throughput!
Given the storage engine choices brought about by the storage engine API in MongoDB 3.0, it is hard to imagine an application that demands the MMAPv1 storage engine for optimized performance. If you read between the lines, you could conclude that MongoDB, Inc. would agree given that the default engine was switched to WiredTiger in v3.2.
Most people don’t know that they have a choice when it comes to storage engines, and that the choice should be based on what the database workload will look like. Percona’s Vadim Tkachenko performed an excellent benchmark test comparing the performances of RocksDB, PerconaFT and WiredTiger to help specifically differentiate between these engines.
In Part Two of this blog series, we examine the ins and outs of MongoDB’s new default storage engine, WiredTiger.
Resources
RELATED POSTS