Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

Contact Us

Percona’s View on MongoDB’s 4.2 Release – The Good, the Bad, and the Ugly…

June 21, 2019

Author

Akira Kurogane

MongoDB

Share this Post:

View on MongoDB 4.2 Release This is part 1 of 2: The keynote-marketed features.

Percona’s MongoDB Tech Lead Akira Kurogane takes a look at MongoDB’s 4.2 release.

Initial thoughts? Some great! Some not so compelling.

Distributed Transactions

Including distributed transactions is a great accomplishment, making MongoDB the only popular NoSQL distributed database to plug the gap with this fundamental feature, previously only available in RDBMS.

There is tight use of the WiredTiger storage engine API, implementation of logical clocks, sharding logic enhancement, and so many changes in various dimensions. But, maybe most importantly, there is a big change in the cost dimension for consumers of the existing transaction-supporting distributed data products. Distributed processing is not an add-on feature in MongoDB and the sharded transactions released this week are not an add-on Enterprise-only feature either.

I wouldn’t be surprised if some customers of proprietary RDBMS’s start calling their account executives to ask if the coming year’s renewal license fees are going to be ramped down for the coming years. They could say; “We’re thinking about migrating to a DB with better pricing, performance, availability, and flexibility, one that just overcame its last feature gap. I understand that migrations are expensive, but then again so are your annually-recurring, per-node, database software licenses.”

To go back to the technical points, my favorite part in the MongoDB World 2019 Morning Keynotes was the clear demonstration of how distributed transactions appear in the oplog. You can see this 32:00 minutes into this recording.

Atlas

“[Atlas is] The best way to run MongoDB in the public cloud.”

I’m sure you’ll see that again. And again. And again. After all, repetition is for emphasis and the more you say it, the more likely it is that people will believe it…

But Atlas is the MongoDB paywall.

You pay. It is the wall.

You can look at it, but you can’t look behind it. Think of it as a car with pedals, a steering wheel, a dashboard (we love those dashboards!), comfortable seats, etc… but the hood of this car doesn’t open.

For many customers this may be great, however, the inability to get under the hood means you’re unable to tune, and thus unable to make your own assessments as to how well it works for you.

Field-level encryption (FLE)

This is interesting – I was aware of Multiverse databases having recent advances and remember wishing for them long ago before any were implemented as public projects.

Although MongoDB’s FLE isn’t that sort of implementation, it achieves a similar business goal: A database-internalized safety catch, preventing the revelation of a user’s document data to another without making separate database collections for each user. I deduce the implementation is a collection of keys (as small or as large as you might like) to encrypt documents independently of other documents, with application rules by database object namespace and/or user id matching. I haven’t yet been in, but undoubtedly the DBA’s burden is going to be key management. Sadly we weren’t given a preview of what that looks like, as we had been with the other hands-on demos in the morning keynote.

(A) Third-party search engine integration

I think MongoDB has absolutely done the right thing here – by not trying to further enhance full-text search within the database server itself.

Speaking from experience as a search engine developer, I know the inner loop of a search engine’s algorithm is a very different business to a database’s. Having a pure search engine run in a different process, presumably on a separate server, is just good sense for the following reasons.

Which algorithm you choose varies dramatically according to what you value in relevancy. Do popular search terms in the last 1, 6, and 24hrs get a boost? Do you need phrase detection? Automatic detection and suppression of keyword-stuffing content? Inclusion of non-European languages? There are so many different sorts of search that the demand would overwhelm MongoDB’s core server development if they tried to take it all on.

Also, there would be a big impact on performance. Search index (re)building is compute-intensive, to put it mildly. If it were within the mongod process, database operation latency would be volatile while the search engine reindexes.

One interface

In my opinion, providing search queries through the same MongoDB driver interface is totally the right way to go, so kudos for that! Caveat: only as long as the syntax design is right – though it looked correct in the on-stage demonstration.

Arguably you could have the same thing right now if you just (mis)use one of the popular open-source search engines as your database, but the performance for doing typical database operations won’t be as high.

The downside

Lucene is the only search engine supported. MongoDB did not announce a generic interface to integrate with an external index-making service. So, owners of in-house search solutions who have excellent, unreplaceable relevancy will not be able to integrate them with MongoDB using this new feature. For them, goals such as consolidation of data feed, or getting combined database documents and search matches in a single query/request, will remain just dreams for now.

I also wonder how accessible and modifiable the configuration of the Lucene server will be. There was a claim that you will get the “full power of Lucene,” but that is immediately false if it is unconfigurable. And there are other pressing questions. When you change something in the Lucene configuration the search indexes (and hence document ‘hits’) will typically change and be rebuilt over, say, hours. Is it a full downtime situation? Or is more like the indexes are dropped and will reappear after background index build? I look forward to getting clarification on this.

Server-side document updates

As Eliot Horowitz put it succinctly on Tuesday morning:

“There is one thing you haven’t been able to do [with a MongoDB update command] before though, and that is set the value of A to value of B + C.”

Thanks to long-running development this is now possible, by both the classic update command:

db.foo.update(
   { "_id": 100 },
   { "$set": {
        "A": {"$sum": ["$B", "$C"] }
   } }
)

1

2

3

4

5

6

db.foo.update(

{ "_id": 100 },

{ "$set": {

"A": {"$sum": ["$B", "$C"] }

} }

)

And also (much more impressively) in the aggregation pipeline through a new $merge stage.

Let’s dive straight into the examples. The first was given in the context of on-demand materialized views:

db.orders.aggregate([
    { "$match": { "date": { "$gte": new Date("2019-06-01"),
                 "$lt":  new Date("2019-07-01") } } },
    { "$group": { "_id": {"$dateToString": {
          "format": { "%Y-%m-%d", "date": "$date" } },
          "totalRevenue": { "$sum": "$total" } } },
    { "$project": { "_id": 0, "orderDate": { "$toDate": "$_id" },
                          "totalRevenue": 1 } },
    { "$merge": { "into": "dailyOrders", "on", "orderDate",
            "whenMatched": "replace", "whenNotMatched": "insert" } }
] )

1

2

3

4

5

6

7

8

9

10

11

db.orders.aggregate([

{ "$match": { "date": { "$gte": new Date("2019-06-01"),

"$lt": new Date("2019-07-01") } } },

{ "$group": { "_id": {"$dateToString": {

"format": { "%Y-%m-%d", "date": "$date" } },

"totalRevenue": { "$sum": "$total" } } },

{ "$project": { "_id": 0, "orderDate": { "$toDate": "$_id" },

"totalRevenue": 1 } },

{ "$merge": { "into": "dailyOrders", "on", "orderDate",

"whenMatched": "replace", "whenNotMatched": "insert" } }

] )

The second is from the manual page linked above:

db.votes.aggregate([
   { $match: { date: { $gte: new Date("2019-05-07"), $lt: new Date("2019-05-08") } } },
   { $project: { _id: { $dateToString: { format: "%Y-%m", date: "$date" } }, thumbsup: 1, thumbsdown: 1 } },
   { $merge: {
         into: "monthlytotals",
         on: "_id",
         whenMatched:  [
            { $addFields: {
                thumbsup: { $add:[ "$thumbsup", "$$new.thumbsup" ] },
                thumbsdown: { $add: [ "$thumbsdown", "$$new.thumbsdown" ] }
            } } ],
         whenNotMatched: "insert"
   } }
])

1

2

3

4

5

6

7

8

9

10

11

12

13

14

db.votes.aggregate([

{ $match: { date: { $gte: new Date("2019-05-07"), $lt: new Date("2019-05-08") } } },

{ $project: { _id: { $dateToString: { format: "%Y-%m", date: "$date" } }, thumbsup: 1, thumbsdown: 1 } },

{ $merge: {

into: "monthlytotals",

on: "_id",

whenMatched: [

{ $addFields: {

thumbsup: { $add:[ "$thumbsup", "$$new.thumbsup" ] },

thumbsdown: { $add: [ "$thumbsdown", "$$new.thumbsdown" ] }

} } ],

whenNotMatched: "insert"

} }

])

Okay, do you need to take a break after trying to parse that? Fair enough. I admit even the Soyuz T Control Panel is more soothing on the eyes than the above.

There are pros and cons. The primary con is that the aggregation pipeline syntax is very verbose. It would be an unreasonable expectation that people can create statements like the two examples above on the first try, or even on the third. Even when you do learn to create the commands you want, you will not retain that memory in fluent recall capacity for very long, and you’ll be back to the manual pages every time.

The pro (at least for me) is that it makes it easier to picture how the server processes the command. With a language like SQL that abstracts over implementation details, you know that you don’t know how it is making the access to table/collection data. It was MongoDB’s open source and open JIRA ticket information in the early years that provided, for me, the sudden break away from that ‘learned helplessness’ as an RDBMS user.

Come back soon for the sequel post: “Diving into the small-print of MongoDB 4.2 features” (which I am much more excited about!).”

Learn more about Percona Server for MongoDB