November 27, 2014

Designing one to many relations – MongoDB vs MySQL

We already discussed one to one relations in MongoDB, and the main conclusion was that you should design your collections according to the most frequent access pattern. With one to many relations, this is still valid, but other factors may come into play.

Let’s look at a simple problem: we are a shop and we want to store customers’ information as well as their orders. Each customer can make several orders, this is a one to many relation. With MySQL or any relational database system, we would create 2 tables:

(Like in the previous post, I’m omitting foreign keys for clarity)

In MongoDB, we can use the same design but of course as we cannot do joins, it would not always work well. For instance, if we want to know the name of the customer who bought the order with _id = 100, we would need 2 queries:

and then

While with MySQL, this is easily done in a single query:

 

A good way to solve this problem with MongoDB would be to embed orders into customers, such as:

And the query giving the name of the customer who bought the order with _id = 100 would be:

So far, so good. But here are a few questions about this design.

1. Would it still work if we needed to run queries on orders, for instance if we wanted to know the number of orders with status = 2?
Yes, this can be done with the aggregation framework with a query such as:

Of course the query would have been much easier to write and would be more efficient if we had embedded customers into orders (in an order2 collection for instance):

So as always you will have to make decisions to find the design that best fits with your most frequent access pattern. And you will have to accept that the others access patterns may be slow. This is very different from a normalized schema that will be equally good for nearly every access pattern.

Also note that embedding orders into customers does not duplicate data because each order is unique. But embedding customers into orders would create a lot of data duplication because if a customer has 100 orders, the customer’s detail would be repeated 100 times. This can create inconsistencies that the application code will have to handle correctly.

2. Does embedding scale? By that I mean what happens if a customer has hundreds of thousands of orders?
This is in my opinion the main limitation of this design. First a document in MongoDB is limited to 16MB, so embedding a lot of objects into a document may not even be possible. With customers and orders you are likely not to meet this problem, but if you want to build a directory of people per city, it would be a bad design to create a document per city and embed all the people’s information.

And then anyway even if you do not reach the physical limits of MongoDB, having very large documents is bad for performance. All operations on very big documents will take a long time, so you cannot expect good performance in this case. Your only choice is then to normalize your data, which will make your queries harder to write and less efficient.

Conclusion

In this article, we have seen several topics that you will have to keep in mind when designing one to many relations in MongoDB:

  • Denormalizing by embedding objects (like embedding orders into a customer) is a common desing pattern to deal with the lack of JOINs in MongoDB, and it applies well to this kind of relation.
  • Depending on the way you use embedding, it may create data duplication. It is of course better if you can avoid it.
  • Embedding works well when the one to many relation is actually a one to few relation. If the many is large, you may have to use a normalized schema for which the main drawback is that some queries will be difficult to write and/or very slow.

Therefore do not believe that because MongoDB is schemaless, you will not have to take care of your schema design!

Do you want to learn more on MongoDB? Come to my tutorial at PLUK in November!

About Stephane Combaudon

Stéphane joined Percona in July 2012, after working as a MySQL DBA for leading French companies such as Dailymotion and France Telecom.

In real life, he lives in Paris with his wife and their twin daughters. When not in front of a computer or not spending time with his family, he likes playing chess and hiking.

Comments

  1. Perlover says:

    > db.customer.find({_id:1},{name:1,_id:0}) # Would return { “name” : “Stephane” }
    I think should be:
    db.customer.find({_id:123},{name:1,_id:0}) # Would return { “name” : “Stephane” }

  2. Oleg says:

    Now can someone please explain how exactly all that is related to MySQL Performance?

  3. Philipp says:

    This article shows you scope of MongoDB applicability.
    If you have many-to-many relations in DB or you need to produce complex analytics, MongoDB is not convinient tool.

  4. Stephane Combaudon says:

    @Perlover: thank you, fixed!

    @Oleg: This post is not directly related to MySQL. But in many organizations, the people responsible for MySQL performance/administration/whatever are also responsible for the performance/administration/whatever of other software, for instance MongoDB. Knowing differences that are key but that are not obvious can avoid a lot of pain.

    @Philipp: I agree that many-to-many relations are not convenient to use with MongoDB. As for analytical queries, it depends. For instance, using map-reduce or the aggregation framework are powerful ways to execute analytical queries. You can also use Hadoop to analyze data stored in MongoDB.

  5. Yudho says:

    This post is useful..thanks

  6. Fermin says:

    I’m starting to learn MongoDB and I at one moment I was asking myself how to solve the “one to many” relationship design in MongoDB. While searching, I found many comments in another post like ” you are thinking relational “.
    Ok, I agree. There will be some cases like duplication of information won’t be a problem, like in this example. But the my issue is I don’t find some real life examples that don’t fit MongoDB

    For example, suppose you have the tables: ORDERS, that has an embedded DETAIL structure with the PRODUCTS:
    So for one thing or another, you need to change a product name (or another kind of information) that is already embedded in several orders.

    At the end, you are force to do a one-to-many relashionship in MongoDB so you can solve this simple problem, don’t you ?
    But every time I found some article/comment about this, it says that will be a performance fault in Mongo. It’s kind disappointment

Speak Your Mind

*