EmergencyEMERGENCY? Get 24/7 Help Now!

Percona Back to Basics: MongoDB updates

 | February 25, 2016 |  Posted In: MongoDB

PREVIOUS POST
NEXT POST

MongoDB Updates

Welcome to the first in a new series of MongoDB blogs. These blogs will cover a selection of topics, including:

  • How-tos
  • New release and new features
  • Getting back to the basics
  • Solutions from the field

In this first blog, we’ll discuss MongoDB updates. You can use the update method to update documents in a collection. MongoDB updates are well-covered in the MongoDB documentation, but there are some cases we should review for clarity, and to understand how and when to use them.

In this blog post, we’ll cover:

  • To $set or $inc 
  • What about the engine differences?
  • What is a move?
  • How can I limit updates when a customer wants to make massive changes and remain online?

$set vs $inc

The $set and $inc options are a throwback to the MMAPv1 storage engine, but are a consideration for optimization rather than a rule. If we know we want to add 100 to some value in a document that is 200k in size, it could cost many more times the disk IO to update the entire document (using $set). The question is how much more efficient is $inc? The manual talks about it being faster because it writes less, and that moves are more costly (we’ll cover them in a second). However, it doesn’t give the technical logic behind this argument.

$set could update 3200 to 3300 with no issue, and would not initiate a move (in MMAP). However, anything adding an entry to an array, adding a subdoc, adding characters to a string, adding new fields, etc., might cause a move. The larger issue at hand is that $set requires you to fetch the data first to be able to set it, while $inc lets you blindly increment the data. In practice, this might look something like:

Replacing the whole document might look like this:

With regards to incrementing data, BSON is designed to advertise the length of a field at the start of each field, making it easy to skip over bytes you don’t need to read, parse and consider. As the cursor is at a very specific offset, it can change a number since it will still take the same storage size – meaning nothing else in the document needs to be shifted around. The important point is the number of seeks we need to make on a document. With BSON, if we want to update the 900th field, we would make 900 “jumps” to get to the correct position. JSON on the other hand, would read the whole document into memory and then parse each and every bracket. This requires significantly more CPU.

For BSON, the application must spend some application CPU to move between BSON and native types – but this isn’t a deal breaker: the CPU on apps is more scalable.

What about engines?

There are cases where $set  could be optimal – especially if the storage engine uses a fast-update concept (this is also known as “read-less”). What this means is we can just write blindly to the document space, making the changes we want. If the space needed is the same as what is available, we might even be able to avoid a move or restructure of the document. This is true in TokuMX, PerconaFT, or MMAPv1. However in other engines – such as systems built on LSM structures like WiredTiger and RocksDB – this is impossible (you can read more about LSM’s later, but the way an insert or update works is largely the same). It will append a new copy of the full record to the end of a file, which is very fast because it doesn’t need to look for a free item in a free list of the right size.

The downside is that using $set to append a field, or $inc to increase a counter, is much more expensive as it executes a full document read and a complete document write. This is why the type of storage engine is critically important when explaining methods for updating documents and the expected overhead.

What is a move?

A move occurs when a document is using 64k, but an update would make it 65k. Since this is larger, the new document will not fit in the existing location. This means from a storage perspective an update becomes a read, an insert, and delete. In some engines, this might be fine (for example, RocksDB will just mark the delete for later), but in other engines (i.e., LSM-based engines) too many reads can force the engine to clean up when the history list gets too long. This forced overhead is one of the reasons that LSM read operations can get bogged down, while writes are very fast.

It could be said that the default LSM state is that it needs to perform a read in any case. For the memory map, however, this means the write lock has escalated and could be many times more expensive than a non-move.

Limiting the effect of massive updates

Let’s assume that we are a shopping cart system, and we have following document structure in a collection with 100 million documents:

This has worked well for tracking, but now we want to support users having multiple addresses. We have a million users, and we want to force them to a new form, as having mixed types for a long time could be an issue. (There are cases and designs to help a client be intelligent and self-updating, however, that is out of the scope of this blog.)

The document now should be:

There are a couple of reasons for this selection. You should NEVER reuse a field with different data types that are indexed. MongoDB can technically store both times, however far in the past; the index could return incorrect data or due to scan order, causing user confusion by not matching types you might think it would. In MongoDB  “123” is not anything like 123. Therefore, depending on your query you might not get all expected results. Also, we incremented the version to “2”, so that if you were programmatically checking and fixing versions in your application, you would know if it needs to be done. That model does not work for inactive users, however, which is more relevant to this example. This means we have two ways we could make our update

Option 1:

Option 2:

Option 1 is much more secure and exact, while option 2 is based more on the outcome. We would want to use option 1 for clarity and repeatability, but how do we ensure it doesn’t update all 100 million documents (as the IO needed and impact on the system would be far too expensive – such as filling the oplog so much it could make a restore impossible):

In this example, the specific bit was “buildAddressDelta”, and the more generic part was “updateWithPauses”. A future improvement would be to make the “buildAddressDelta” become “buildDelta”, and pass it an array of deltas to apply. As you can see, the delta is adding the new array of addresses with the current as a member, updating the version, and unsetting the old fields – which should be pretty straightforward. Our focus here is more on the “updateWithPauses” script, which is doing a few things:

  1. Splitting and setting up an NS object for ease-of-use
  2. Finding out if we still have documents to change (and quitting when it’s done)
  3. Getting one document at a time and updating it, we could up a bulk op per batch also.
  4. Forcing a pause and reporting each time we hit a batchSize (% in JS means modulo)

It is possible to do more, but this is safe and has a natural slow down by not batching while still doing a forced yield from time to time. You can also safely bail out of the shell to stop the process if it is impacting the system too much, and restart it again later as it will try and find documents it needs to change just in time for each loop.

Conclusion

Hopefully, this blog has helped to demonstrate some of the finer points of MongoDB updates. The MongoDB documentation has comprehensive coverage of the processes, but don’t hesitate to ping us for specific questions!

 

PREVIOUS POST
NEXT POST
David Murphy

David is the Practice Manager for MongoDB @ Percona. He joined Percona in Oct 2015, before that he has been deep in both the MySQL and MongoDB database communities for some time. Other passions include DevOps , tool building, and security.

2 Comments

  • How much does it cost in CPU/latency to jump to field 900 (or some other large number)? I read a paper by a vendor that has a faster BSON to fix this problem and wondered whether this is a big deal.

    WiredTiger, RocksDB and anything Toku are copy-on-write. There is no way to avoid a move, that is part of the engine’s algorithm. The only engine that can do in-place updates to storage is mmapv1.

    I wasn’t aware that Toku+Mongo has a read-free option for user updates (ignoring replication here), but I am not a Mongo expert. Toku has that option in the MySQL space.

    RocksDB for Mongo could expose read-free updates for some user updates via the merge operator. That would be an interesting feature.

    Non-unique secondary index maintenance after insert/update is read-free for RocksDB (Mongo & MySQL). It is not for a b-tree (mmapv1, WT). I don’t know about Toku.

  • Mark

    The CPU needed to walk a BSON structure is not a huge amount given the document is copied into memory before walking it. The size and field name are next to each other in the BSON format, letting it jump N bytes to skip over the content of a field, sub-document, or array. Then it checks if it needs that next field and repeats the process until it can satisfy all the projection items or it has walked the entirety of the document. The server side has this same behavior with projections and non-covered indexes as a way to quickly get the data. In both these cases using swap space will make things painful, but assuming that is not the case generally the field order is rarely something most people detect in testing. It could be an interesting benchmark given we are moving to copy on write in MongoDB’s eco-system. I wonder how valuable such a benchmark would be to the overall community?

    Regarding “read-free” I use this term loosely given its hard to be 100% read-free all the time. PerconaFT is normally read-free with updates and even secondary index maintenance. However, there are cases where an escalation is needed – for example avoiding possible write conflict exceptions (WCE) on a very rapidly updated document.

Leave a Reply