Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

Percona MongoDB 3.4 Bundle Release: Percona Server for MongoDB 3.4 Features Explored

February 23, 2017

Author

Share this Post:

This blog post is another in the series on the Percona Server for MongoDB 3.4 bundle release. This release includes Percona Server for MongoDB, Percona Monitoring and Management, and Percona Toolkit. In this post, we’ll look at the features included in Percona Server for MongoDB.

I apologize for the long blog, but there is a good deal of important information to cover. Not just about what new features exist, but also why that are so important. I have tried to break this down into clear areas for you to cover the most amount of data, while also linking to further reading on these topics.

The first and biggest new feature for many people is the addition of collation in MongoDB. Wikipedia says about collation:

Collation is the assembly of written information into a standard order. Many systems of collation are based on numerical order or alphabetical order, or extensions and combinations thereof. Collation is a fundamental element of most office filing systems, library catalogs, and reference books.

What this is saying is a collation is an ordering of characters for a given character set. Different languages order the alphabet differently or even have different base characters (such as Asian, Middle Eastern and other regions) that are not English-native. Collations are critical for multi-language support and sorting of non-English words for index ordering.

Sharding

General

All members of a cluster are aware of sharding (all members, sharding set name, etc.). Due to this, the sharding.clusterRole must be defined on all shard nodes, a new requirement.

Mongos processes MUST connect to 3.4 mongod instances (shard and config nodes). 3.2 and lower is not possible.

Config Servers

Balancer on Config Server PRIMARY

In MongoDB 3.4, the cluster balancer is moved from the mongos processes (any) to the config server PRIMARY member.

Moving to a config-server-based balancer has the following benefits:

Predictability: the balancer process is always the config server PRIMARY. Before 3.4, any mongos processes could become the balancer, often chosen at random. This made troubleshooting difficult.

Lighter “mongos” process: the mongos/shard router benefits from being as light and thin as possible. This removes some code and potential for breakage from “mongos.”

Efficiency: config servers have dedicated nodes with very low resource utilization and no direct client traffic, for the most part. Moving the balancer to the config server set moves usage away from critical “router” processes.

Reliability: balancing relies on fewer components. Now the balancer can operate on the “config” database metadata locally, without the chance of network interruptions breaking balancing.

Config servers are a more permanent member of a cluster, unlikely to scale up/down or often change, unlike “mongos” processes that may be located on app hosts, etc.

Config Server Replica Set Required

In MongoDB 3.4, the former “mirror” config server strategy (SCCC) is no longer supported. This means all sharded clusters must use a replica-set-based set of config servers.

Using a replica-set based config server set has the following benefits:

Adding and removing config servers is greatly simplified.

Config servers have oplogs (useful for investigations).

Simplicity/Consistency: removing mirrored/SCCC config servers simplifies the high-level and code-level architecture.

Chunk Migration / Balancing Example

(from docs.mongodb.com)

Parallel Migrations

Previous to MongoDB 3.4, the balancer could only perform a single chunk migration at any given time. When a chunk migrates, a “source” shard and a “destination” shard are chosen. The balancer coordinates moving the chunks from the source to the target. In a large cluster with many shards, this is inefficient because a migration only involves two shards and a cluster may contain 10s or 100s of shards.

In MongoDB 3.4, the balancer can now perform many chunk migrations at the same time in parallel — as long as they do not involve the same source and destination shards. This means that in clusters with more than two shards, many chunk migrations can now occur at the same time when they’re mutually exclusive to one another. The effective outcome is (Number of Shards / 2) -1 == number of max parallel migrations: or an increase in the speed of the migration process.

For example, if you have ten shards, then 10/2 = 5 and 5-1 = 4. So you can have four concurrent moveChunks or balancing actions.

Tags and Zone

Sharding Zones supersedes tag-aware sharding. There is mostly no changes to the functionality. This is mostly a naming change and some new helper functions.

New commands/shell-methods added:

addShardToZone / sh.addShardToZone().

removeShardFromZone / sh.removeShardFromZone().

updateZoneKeyRange / sh.updateZoneKeyRange() + sh.removeRangeFromZone().

You might recall MongoDB has for a long time supported the idea of shard and replication tags. They break into two main areas: hardware-aware tags and access pattern tags. The idea behind hardware-aware tags was that you could have one shard with slow disks, and as data ages, you have a process to move documents to a collection that lives on that shard (or tell specific ranges to live on that shard). Then your other shards could be faster (and multiples of them) to better handle the high-speed processing of current data.

The other is a case based more in replication, where you want to allow BI and other reporting systems access to your data without damaging your primary customer interactions. To do this, you could tag a node in a replica set to be {reporting: true}, and all reporting queries would use this tag to prevent affecting the same nodes the user-generated work would live on. Zones is this same idea, simplified into a better-understood term. For now, there is no major difference between these areas, but it could be something to look at more in the 3.6 and 3.8 MongoDB versions.

Replication

New “linearizable” Read Concern: reflects all successful writes issued with a “majority” and acknowledged before the start of the read operation.

Adjustable Catchup for Newly Elected Primary: the time limit for a newly elected primary to catch up with the other replica set members that might have more recent writes.

Write Concern Majority Journal Default replset-config option: determines the behavior of the { w: "majority" } write concern if the write concern does not explicitly specify the journal option j.

Initial-sync improvements:

Now the initial sync builds the indexes as the documents are copied.

Improvements to the retry logic make it more resilient to intermittent failures on the network.

Data Types

MongoDB 3.4 adds support for the decimal128 format with the new decimal data type. The decimal128 format supports numbers with up to 34 decimal digits (i.e., significant digits) and an exponent range of −6143 to +6144.

When performing comparisons among different numerical types, MongoDB conducts a comparison of the exact stored numerical values without first converting values to a common type.

Unlike the double data type, which only stores an approximation of the decimal values, the decimal data type stores the exact value. For example, a decimal NumberDecimal("9.99") has a precise value of 9.99, whereas a double 9.99 would have an approximate value of 9.9900000000000002131628….

To test for decimal type, use the $type operator with the literal “decimal” or 19

db.inventory.find( { price: { $type: "decimal" } } )

New Number Wrapper Object Type

db.inventory.insert( {_id: 1, item: "The Scream", price: NumberDecimal("9.99"), quantity: 4 } )

To use the new decimal data type with a MongoDB driver, an upgrade to a driver version that supports the feature is necessary.

Aggregation Changes

Stages

Recursive Search

MongoDB 3.4 introduces a stage to the aggregation pipeline that allows for recursive searches.

Stage	Description
$graphLookup	Performs a recursive search on a collection. To each output document, adds a new array field that contains the traversal results of the recursive search for that document.

Faceted Search

Faceted search allows for the categorization of documents into classifications. For example, given a collection of inventory documents, you might want to classify items by a single category (such as by the price range), or by multiple groups (such as by price range as well as separately by the departments).

3.4 introduces stages to the aggregation pipeline that allow for faceted search.

Stage	Description
$bucket	Categorizes or groups incoming documents into buckets that represent a range of values for a specified expression.
$bucketAuto	Categorizes or groups incoming documents into a specified number of buckets that constitute a range of values for a specified expression. MongoDB automatically determines the bucket boundaries.
$facet	Processes multiple pipelines on the input documents and outputs a document that contains the results of these pipelines. By specifying facet-related stages ($bucket, $bucketAuto, and$sortByCount) in these pipelines, $facet allows for multi-faceted search.
$sortByCount	Categorizes or groups incoming documents by a specified expression to compute the count for each group. Output documents are sorted in descending order by the count.

Also read: https://www.percona.com/blog/2016/12/13/mongodb-3-4-facet-aggregation-features-and-server-27395-mongod-crash/

Reshaping Documents

MongoDB 3.4 introduces stages to the aggregation pipeline that facilitate replacing documents as well as adding new fields.

Stage	Description
$addFields	Adds new fields to documents. The stage outputs documents that contain all existing fields from the input documents as well as the newly added fields.
$replaceRoot	Replaces a document with the specified document. You can specify a document embedded in the input document to promote the embedded document to the top level.

Count

MongoDB 3.4 introduces a new stage to the aggregation pipeline that facilitates counting document.

Stage	Description
$count	Returns a document that contains a count of the number of documents input to the stage.

Operators

Array Operators

Operator	Description
$in	Returns a boolean that indicates if a specified value is in an array.
$indexOfArray	Searches an array for an occurrence of a specified value and returns the array index (zero-based) of the first occurrence.
$range	Returns an array whose elements are a generated sequence of numbers.
$reverseArray	Returns an output array whose elements are those of the input array but in reverse order.
$reduce	Takes an array as input and applies an expression to each item in the array to return the final result of the expression.
$zip	Returns an output array where each element is itself an array, consisting of elements of the corresponding array index position from the input arrays.

Date Operators

Operator	Description
$isoDayOfWeek	Returns the ISO 8601-weekday number, ranging from 1 (for Monday) to 7 (for Sunday).
$isoWeek	Returns the ISO 8601 week number, which can range from 1 to 53. Week numbers start at 1with the week (Monday through Sunday) that contains the year’s first Thursday.
$isoWeekYear	Returns the ISO 8601 year number, where the year starts on the Monday of week 1 (ISO 8601) and ends with the Sundays of the last week (ISO 8601).

String Operators

Operator	Description
$indexOfBytes	Searches a string for an occurrence of a substring and returns the UTF-8 byte index (zero-based) of the first occurrence.
$indexOfCP	Searches a string for an occurrence of a substring and returns the UTF-8 code point index (zero-based) of the first occurrence.
$split	Splits a string by a specified delimiter into string components and returns an array of the string components.
$strLenBytes	Returns the number of UTF-8 bytes for a string.
$strLenCP	Returns the number of UTF-8 code points for a string.
$substrBytes	Returns the substring of a string. The substring starts with the character at the specified UTF-8 byte index (zero-based) in the string for the length specified.
$substrCP	Returns the substring of a string. The substring starts with the character at the specified UTF-8 code point index (zero-based) in the string for the length specified.

Others/Misc

Other new operators:

$switch: Evaluates, in sequential order, the case expressions of the specified branches to enter the first branch for which the case expression evaluates to “true”.

$collStats: Returns statistics regarding a collection or view.

$type: Returns a string which specifies the BSON Types of the argument.

$project: Adds support for field exclusion in the output document. Previously, you could only exclude the _id field in the stage.

Views

MongoDB 3.4 adds support for creating read-only views from existing collections or other views. To specify or define a view, MongoDB 3.4 introduces:

- theViewOn and pipeline options to the existing create command:
  - db.runCommand( { create: <view>, viewOn: <source>, pipeline: <pipeline> } )
    
    1
    
    db.runCommand( { create: <view>, viewOn: <source>, pipeline: <pipeline> } )
- or if specifying a default collation for the view:
  - db.runCommand( { create: <view>, viewOn: <source>, pipeline: <pipeline>, collation: <collation> } )
    
    1
    
    db.runCommand( { create: <view>, viewOn: <source>, pipeline: <pipeline>, collation: <collation> } )
- and a corresponding mongo shell helper db.createView():
  - db.createView(<view>, <source>, <pipeline>, <collation>)
    
    1
    
    db.createView(<view>, <source>, <pipeline>, <collation>)