MongoDB 4.0 is around, and there are a lot of new features and improvements. In this article we’re going to focus on the major feature which is, undoubtedly, the support for multi-document ACID transactions. This novelty for a NoSQL database could be seen as a way to get closer to the relational world. Well, it’s not that—or maybe not just that. It’s a way to add to the document-based model a new, important, and often requested feature to address a wider range of use cases. The document model and its flexibility should remain the best way to start building an application on MongoDB. At this stage, transactions should be used in specific cases, when you absolutely need them: for example, because your application is aware of data consistency and atomicity. Transactions incur a greater performance cost over single document writes, so the denormalized data model will continue to be optimal in many cases and this helps to minimize the need for transactions.
Single writes are atomic by design: as long as you are able to embed documents in your collections you absolutely don’t need to use a transaction. Even so, transaction support is a very good and interesting feature that you can rely on in MongoDB from now on.
MongoDB 4.0 provides fully ACID transactions support but remember:
ACID properties are well known in the world of relational databases, but let’s recap what the acronym means.
The support for transactions introduced some limitations:
Sessions were deployed in version 3.6 in order to run the retryable writes (for example) but they are very important, too, for transactions. In fact any transaction is associated with an open session. Prior to starting a transaction, a session must be created. A transaction cannot be run outside a session.
At any given time you may have multiple running sessions in the system, but each session may run only a single transaction at a time. You can run transactions in parallel according to how many open sessions you have.
Three new commands were introduce for creating, committing, and aborting transactions:
Note: in the following examples, we use two different connections to create two sessions. We do this for the sake of simplicity, but remember that you can create multiple sessions even inside a single connection, assigning each session to a different variable.
To test our first transaction if you don’t have a replica set already configured let’s start a standalone server like this:
|
1 |
#> mongod --dbpath /data/db --logpath /data/mongo.log --fork --replSet foo |
Create a new collection, and insert some data.
|
1 |
foo:PRIMARY> use percona<br>switched to db percona<br>foo:PRIMARY> db.createCollection('people')<br>{<br> "ok" : 1,<br> "operationTime" : Timestamp(1538483120, 1),<br> "$clusterTime" : {<br> "clusterTime" : Timestamp(1538483120, 1),<br> "signature" : {<br> "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),<br> "keyId" : NumberLong(0)<br> }<br> } <br>}<br>foo:PRIMARY> db.people.insert([{_id:1, name:"Corrado"},{_id:2, name:"Peter"},{_id:3,name:"Heidi"}])<br> |
Create a session
|
1 |
foo:PRIMARY> session = db.getMongo().startSession()<br>session { "id" : UUID("dcfa7de5-527d-4b1c-a890-53c9a355920d") }<br> |
Start a transaction and insert some new documents
|
1 |
foo:PRIMARY> session.startTransaction()<br>foo:PRIMARY> session.getDatabase("percona").people.insert([{_id: 4 , name : "George"},{_id: 5, name: "Tom"}])<br>WriteResult({ "nInserted" : 2 })<br> |
Now read the collection from inside and outside the session and see what happens
|
1 |
foo:PRIMARY> session.getDatabase("percona").people.find()<br>{ "_id" : 1, "name" : "Corrado" }<br>{ "_id" : 2, "name" : "Peter" }<br>{ "_id" : 3, "name" : "Heidi" }<br>{ "_id" : 4, "name" : "George" }<br>{ "_id" : 5, "name" : "Tom" }<br><br>foo:PRIMARY> db.people.find()<br>{ "_id" : 1, "name" : "Corrado" }<br>{ "_id" : 2, "name" : "Peter" }<br>{ "_id" : 3, "name" : "Heidi" }<br> |
As you might notice, since the transaction is not yet committed, you can see the modifications only from inside the session. You cannot see any of the modifications outside of the session, even in the same connection. If you try to open a new connection to the database, then you will not be able to see any of the modifications either.
Now, commit the transaction and see that you can now read the same data both inside and outside the session, as well as from any other connection.
|
1 |
foo:PRIMARY> session.commitTransaction()<br><br>foo:PRIMARY> session.getDatabase("percona").people.find()<br>{ "_id" : 1, "name" : "Corrado" }<br>{ "_id" : 2, "name" : "Peter" }<br>{ "_id" : 3, "name" : "Heidi" }<br>{ "_id" : 4, "name" : "George" }<br>{ "_id" : 5, "name" : "Tom" }<br><br>foo:PRIMARY> db.people.find()<br>{ "_id" : 1, "name" : "Corrado" }<br>{ "_id" : 2, "name" : "Peter" }<br>{ "_id" : 3, "name" : "Heidi" }<br>{ "_id" : 4, "name" : "George" }<br>{ "_id" : 5, "name" : "Tom" }<br> |
When the transaction is committed, all the data are written consistently and durably in the database, just like any typical write. So, writing to the journal file and to the oplog takes place in the same way it as for any single write that’s not inside a transaction. As long as the transaction is open, any modification is stored in memory.
Let’s test now the isolation between two concurrent transactions.
Open the first connection, create a session and start a transaction:
|
1 |
//Connection #1 <br>foo:PRIMARY> var session1 = db.getMongo().startSession()<br>foo:PRIMARY> session1.startTransaction()<br> |
do the same on the second connection:
|
1 |
//Connection #2 <br>foo:PRIMARY> var session2 = db.getMongo().startSession()<br>foo:PRIMARY> session2.startTransaction()<br> |
Update the document on connection #1 to record Heidi’s document. Add the gender field to the document.
|
1 |
//Connection #1<br>foo:PRIMARY> session1.getDatabase("percona").people.update({_id:3},{$set:{ gender: "F" }})<br>WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })<br><br>foo:PRIMARY> session1.getDatabase("percona").people.find()<br>{ "_id" : 1, "name" : "Corrado" }<br>{ "_id" : 2, "name" : "Peter" }<br>{ "_id" : 3, "name" : "Heidi", "gender" : "F" }<br>{ "_id" : 4, "name" : "George" }<br>{ "_id" : 5, "name" : "Tom" }<br> |
Update the same collection on connection #2 to add the same gender field to all the males:
|
1 |
//Connection #2<br>foo:PRIMARY> session2.getDatabase("percona").people.update({_id:{$in:[1,2,4,5]}},{$set:{ gender: "M" }},{multi:"true"})<br>WriteResult({ "nMatched" : 4, "nUpserted" : 0, "nModified" : 4 })<br><br>foo:PRIMARY> session2.getDatabase("percona").people.find()<br>{ "_id" : 1, "name" : "Corrado", "gender" : "M" }<br>{ "_id" : 2, "name" : "Peter", "gender" : "M" }<br>{ "_id" : 3, "name" : "Heidi" }<br>{ "_id" : 4, "name" : "George", "gender" : "M" }<br>{ "_id" : 5, "name" : "Tom", "gender" : "M" }<br> |
The two transactions are isolated, each one can see only the ongoing modifications that it has made itself.
Commit the transaction in connection #1:
|
1 |
//Connection #1<br>foo:PRIMARY> session1.commitTransaction()<br><br>foo:PRIMARY> session1.getDatabase("percona").people.find()<br>{ "_id" : 1, "name" : "Corrado" }<br>{ "_id" : 2, "name" : "Peter" }<br>{ "_id" : 3, "name" : "Heidi", "gender" : "F" }<br>{ "_id" : 4, "name" : "George" }<br>{ "_id" : 5, "name" : "Tom" }<br> |
In the connection #2 read the collection:
|
1 |
//Connection #2<br>foo:PRIMARY> session1.getDatabase("percona").people.find()<br>{ "_id" : 1, "name" : "Corrado", "gender" : "M" }<br>{ "_id" : 2, "name" : "Peter", "gender" : "M" }<br>{ "_id" : 3, "name" : "Heidi" }<br>{ "_id" : 4, "name" : "George", "gender" : "M" }<br>{ "_id" : 5, "name" : "Tom", "gender" : "M" }<br> |
As you can see the second transaction still sees its own modifications, and cannot see the already committed updates of the other transaction. This kind of isolation works the same as the “REPEATABLE READ” level of MySQL and other relational databases.
Now commit the transaction in connection #2 and see the new values of the collection:
|
1 |
//Connection #2<br>foo:PRIMARY> session2.commitTransaction()<br><br>foo:PRIMARY> session2.getDatabase("percona").people.find()<br>{ "_id" : 1, "name" : "Corrado", "gender" : "M" }<br>{ "_id" : 2, "name" : "Peter", "gender" : "M" }<br>{ "_id" : 3, "name" : "Heidi", "gender" : "F" }<br>{ "_id" : 4, "name" : "George", "gender" : "M" }<br>{ "_id" : 5, "name" : "Tom", "gender" : "M" }<br> |
When two (or more) concurrent transactions modify the same documents, we may have a conflict. MongoDB can detect a conflict immediately, even while transactions are not yet committed. The first transaction to acquire the lock on a document will continue, the second one will receive the conflict error message and fail. The failed transaction can then be retried later.
Let’s see an example.
Create a new transaction in connection #1 to update Heidi’s document. We want to change the name to Luise.
|
1 |
//Connection #1 <br>foo:PRIMARY> session.startTransaction()<br><br>foo:PRIMARY> session.getDatabase("percona").people.update({name:"Heidi"},{$set:{name:"Luise"}})<br>WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })<br> |
Let’s try to modify the same document in a concurrent transaction in connection #2. Modify the name from Heidi to Marie in this case.
|
1 |
//Connection #2 <br>foo:PRIMARY> session.startTransaction()<br><br>foo:PRIMARY> session.getDatabase("percona").people.update({name:"Heidi"},{$set:{name:"Marie"}})<br>WriteCommandError({<br> "errorLabels" : [<br> "TransientTransactionError"<br> ],<br> "operationTime" : Timestamp(1538495683, 1),<br> "ok" : 0,<br> "errmsg" : "WriteConflict",<br> "code" : 112,<br> "codeName" : "WriteConflict",<br> "$clusterTime" : {<br> "clusterTime" : Timestamp(1538495683, 1),<br> "signature" : {<br> "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="), <br> "keyId" : NumberLong(0)<br> }<br> }<br>}) |
We received an error and the transaction failed. We can retry it later.
Transaction support in MongoDB 4.0 is a very interesting new feature, but it isn’t fully mature yet, there are strong limitations at this stage: a transaction cannot be larger than 16MB, you cannot use it on sharded clusters and others. If you absolutely need a transaction in your application use it. But don’t use transactions only because they are cool, since in some cases a proper data model based on embedding documents in collections and denormalizing your data could be the best solution. MongoDB isn’t by its nature a relational database; as long as you are able to model your data keeping in mind that it’s a NOSQL database you should avoid using transactions. In specific cases, or if you already have a database with strong “informal relations” between the collections that you cannot change, then you could choose to rely on transactions.
Image modified from original photo: by Annie Spratt on Unsplash
Learn more about Percona Server for MongoDB
Resources
RELATED POSTS