Buy Percona ServicesBuy Now!

MongoDB: Investigate Queries with explain() and Index Usage (part 2)

 | September 6, 2018 |  Posted In: Insight for Developers, MongoDB, open source databases, Percona Server for MongoDB

PREVIOUS POST
NEXT POST

MongoDB explain() method optimization toolThis is the second part of a two parts series. In MongoDB: index usage and MongoDB explain() we introduced the main index types supported by MongoDB, and how to create and use them. In this second article, we are going to see some examples on how to use explain() method to investigate queries. Do you need to optimize MongoDB queries? You’ll see how to use explain() to find out how your query uses indexes. Or, perhaps, that it doesn’t!

What is explain()

explain() is a method that you can apply to simple queries or to cursors to investigate the query execution plan. The execution plan is how MongoDB resolves a query. Looking at all the information returned by explain() we can see find out stuff like:

  • how many documents were scanned
  • how many documents were returned
  • which index was used
  • how long the query took to be executed
  • which alternative execution plans were evaluated

…and other useful information.

The aim of using explain() is to find out how to improve the query. For example, by creating missing indexes or by rewriting it in order to use existing indexes more correctly. If you are familiar with the EXPLAIN command in MySQL, the goals of MongoDB’s explain() method are exactly the same.

Explainable object

You can apply the explain() method to a query or a cursor in the following way, as per any other method:

However, the preferred way to investigate queries in the mongo shell is to create first the explainable object.

We can create an explainable object like this:

Once you have created the explainable object, then any kind of operation can be run against it to investigate a query or cursor execution plan. For example:

Restaurants test database

To see some examples we need a collection with some data.
For our purposes we can use the New York restaurants database. You can download this from the following url:

https://www.w3resource.com/mongodb-exercises/retaurants.zip

Unzip the archive, and import the JSON file into MongoDB:

This collection has 3772 documents, all the restaurants in New York City. Here is a document sample.

Explain() verbosity

The explain() method has three verbosity modes.

  • queryPlanner – this is the default mode. At this level, explain provides information about the winning plan, including the index used or if a collection scan is needed (COLLSCAN)
  • executionStats – this mode includes all the information provided by the queryPlanner, plus the statistics. Statistics include details such as the number of documents examined and returned, the execution time in milliseconds, and so on.
  • allPlansExecution – this mode includes all the information provided by the executionStats plus information about the discarded execution plans

We’ll see the explain() output in the following examples.

Example 1

It’s time to use the restaurants collection to run our first example: find out all the restaurants in the Manhattan borough.

Let’s create first the explainable object with the executionStats mode.

Then let’s investigate the query.

Here we can see the output of the explain(). First of all, we can clearly distinguish the “queryPlanner” and the “executionStats” modes. We won’t describe each one of the values, as some are really intuitive. Let’s have a look just at some of them:

queryPlanner.winningPlan.stage = “COLLSCAN”
This provides very important information about the winning plan: it means that MongoDB needs to do a collection scan. The query it’s not optimized because all the documents must be read.

queryPlanner.winningPlan.rejectedPlans = []
It’s empty. There aren’t rejected plans. When the query needs to be executed with COLLSCAN the only execution plan is the winning plan. We don’t have any indexes in the collection, apart the one on _id, so there aren’t other execution plans.

executionStats.nReturned = 1883
The number of documents returned is 1883, the number of restaurants located in Manhattan.

executionStats.totalDocsExamined = 3772
The number of documents examined is exactly the number of documents in the collection. This was expected because the query uses COLLSCAN

executionStats.executionTimeMillis = 1
The execution time of the query. It’s just 1 millisecond. This might seem good, but remember that this is the time needed to scan just 3772 documents, a very small test collection. Think about what this time could be in the case of a collection with millions of documents!

How can we improve the query?

In this case it’s simple. Let’s try to create a single field index on borough, the only condition we have in the find(). Then let’s try to explain the same query again.

Interpreting the output

Now, the output is completely different. Let’s have a look at some of the most relevant values.

queryPlanner.winningPlan.inputStage.stage = “IXSCAN”
This is very important. IXSCAN means that now MongoDB doesn’t need to do a collection scan but an index can be used to find the documents.

queryPlanner.winningPlan.inputStage.indexName = “borough_1”
The name of the index used. This is the default name of an index: the name of the field plus the _1 for ascending or _-1 for descending order.

queryPlanner.winningPlan.inputStage.direction = “forward”
MongoDB traverses the index in a forward direction.

executionStats.nRertuned = 1883
The number of documents returned. Obviously this is the same as before.

executionStats.totalKeysExamined = 1883
The number of keys examined in the index.

executionStats.totalDocsExamined  = 1883
Now the number of documents examined corresponds to the number of elements examined in the index.

We have optimized the query.

Example 2

Now we would like to examine a query to find out all the restaurants with Italian cuisine that received a grade score of greater than 50.

We have a COLLSCAN again. Let’s try to improve the query creating an index on the cuisine field.

The query has improved. The created index cuisine_1 is used, but still we have 325 documents examined with only 6 documents returned. Let’s see if we can do better by creating instead a compound index that uses both the fields in the condition: cuisine and grades.score.

Now, the winning plan uses the new compound index cuisine_1_grades.score_1 and we have only 6 documents examined. Please note also that now we have a rejected plan, the one that uses the single field index cuisine_1 previously created.

We have optimized the query.

Example 3

Let’s find out all the restaurants that don’t prepare any “American” cuisine in Brooklyn and achieved a grade of ‘A’. We want to see the  results ordered by cuisine, descending.

At this point you should be a little familiar with the explain() output, so in the next box we truncate it for the sake of simplicity, leaving only the relevant parts.

Looking at the winning plan, we see that we have IXSCAN on the index cuisine_1. The important thing to notice here is the choice of the index on cuisine field—it’s because we have used sort({cuisine:-1}). There is not a SORT stage because the documents are already extracted using the index, and so they are already sorted. Just notice, too, direction:”backward” —this is because we specified descending order in the query. If we try to execute a slightly different query, changing the sorting to name:1 instead of cuisine:-1 we’ll see a completely different winning plan.

In this case, we have fewer documents examined but since the cuisine_1 index cannot be used, a SORT stage is needed, and the index used to fetch the document is borough_1. While MongoDB has examined fewer documents, the execution time is worse because of the extra stage used to sort the documents.

Let’s return now to the original query. We can also notice that the number of documents examined is still too high (2517) compared to the documents returned (493). That’s not optimal. Let’s see if we can further improve the query by adding another compound index on (cuisine,borough,grades.grade).

Now, MongoDB uses the new index and does not need the extra sorting stage. The number of documents examined is the same as the number of documents returned. Also, the execution time is better.

We have optimized the query.

Example 4

This is the final example. Let’s find out the restaurants where the grades array contains a grade of ‘A’ and a score of 9 for a specific date.

A COLLSCAN again. In the query we notice that all the conditions refer to embedded fields in an array object. So let’s try to create a multikey index. Let’s just a create the index on the date field only and see what happens.

MongoDB uses the index and the winning plan is already good enough. You can try as an exercise to create a compound index including other embedded fields of the array, like {“grades.date”:1, “grades.grade”:1, “grades.score”:1} and see what happens. You will probably see that the index we have created only on date is good enough. Enlarging the compound index will generate only rejected plans. This is because the date field is the most selective.

Hint: when dealing with compound indexes, please remember that the order of the fields is important. The first field should be the most selective one, and the last one should be the less selective. Or, in cases where you don’t need to put lot of fields in the index, the most selective ones are probably all that you need to improve the queries.

Conclusion

That’s all folks. In this two part series we have seen the indexes available in MongoDB and how to use explain() to investigate queries and find out how to improve their performance. We don’t expect now that you’ll know everything, but we hope this will be a good a starting point for you to practice MongoDB query optimization using explain(). You’ll find plenty more information in the manual and on the internet about indexes and explain(), if you are motivated to find out more.

While you are here

If you found this article interesting, you might like to check out some of our other resources. For example, you might find this recorded webinar MongoDB Sharding 101 by my colleague Adamo Tonete to be useful. Or perhaps Tim Vaillancourt’s MongoDB backup and recovery field guide would provide some useful information. If you want to catch our webinars as they happen, please subscribe to receive notifications in good time.

PREVIOUS POST
NEXT POST
Corrado Pandiani

Prior to joining Percona as a Senior Consultant, Corrado spent more than 20 years in developing web sites and designing and administering MySQL. He is a MySQL enthusiast since version 3.23 and his skills are focused on performances and architectural design. He's also a trainer and a MongoDB consultant.

2 Comments

    • Hi Mark. Yes, it should be more clear. It takes some time to familiarize with the JSON output to investigate the execution plans. MySQL is simpler because it provides useful informations in a more concise format. MongoDB provides more details.

      As a general rule, you just need to have a look at the “stage” field inside the “winningPlan” subdocuments. If you see somewhere IXSCAN it’s good, because an index is used, if you see COLLSCAN it’s not.

      Maybe the presence of more “stage” fields in the is confusing. As you can see in some of the examples you can have more than 1 “stage” field on the “winningPlan” because there is an additional “inputStage” subdocument. The “inputStage” is a document that describes the child stage, which provides the documents or index keys to its parent. The field is present if the parent stage has only one child. In most of the cases the parent “stage” should be “FETCH”, In more complex queries, where you have for example OR conditions or where there is some kind of indexe intersection there could be more “inputStages”.
      In case of a COLLSCAN no inputStages are present and it’s very simple to understand what’s going on.

Leave a Reply