MongoDB Index Types and MongoDB explain() (part 1)

MongoDB index typesIn this two-part series, I’m going to explain explain. No, the repetition is not a pun or a typo. I’m talking about explain(), the most useful MongoDB feature to investigate a query. Here, I’ll introduce the index types available in MongoDB, their properties and how to create and use them. In the next article, we’ll see explain() in action on some real examples.

Using explain(), questions like these will find a proper answer:

  • Is the query using an index?
  • How is it using the index?
  • How many documents are scanned?
  • How many documents are returned?
  • For how many milliseconds does the query run?

And many others. The explain() command provides a lot of information.

If you are familiar with MySQL, then you probably know about the command EXPLAIN. In MongoDB, we have a similar feature, and the goal of using it is exactly the same: find out how we can improve a query in order to reduce the execution time as far as possible. Based on the information provided, we can decide to create a new index or to rewrite the query to take advantage of indexes that already exist. The same as you do when using MySQL.

Indexes supported by MongoDB

The concept of an index in MongoDB is the same as in relational databases. An index is generally a small structure—in comparison to the collection size—that provides a better way to access documents more quickly. Without an index, the only way that MongoDB has to retrieve the documents is to do a collection scan: reading sequentially all the documents in the collection. This is exactly the same as the full scan table in MySQL. Another similarity to MySQL is that the indexes in MongoDB have a structure based on the well known B-Tree. To understand explain() you’ll need to understand the index structures.

Let’s have an overview of the index types available in MongoDB, and their features and properties. We focus, in particular, on the single, compound and multikey index types. These are by far the most used and most useful in the majority of cases. We’ll present also the other types, but when using the explain(), in the second part of this article series, we’ll use only single field and compound indexes in the examples.

Single Field Indexes

This is an index built on a single field of the documents. The entries could be a single value, such as a string or a number, but also could be an embedded document.

By default, each collection has a single field index automatically created on the _id field, the primary key.

The index can be defined in ascending or descending order.

Let’s see some examples.

Assume we have a collection people containing the following type of document:

We can define, for example, a single field index on the age field.

In this case, we have created an ascending index. If we had wanted to create a descending index we would have used this syntax:

With this kind of index we can improve all the queries that find documents with a condition and the age field, like the following:

It’s interesting to highlight that the index can be used to improve even the sorting of the results. In this case, it doesn’t matter whether you have defined the index as ascending or descending. MongoDB can traverse the items in the index in both directions.

Both these queries use the index to retrieve all of the documents in the specified order.

The next query can use the index to retrieve the documents in the order specified by the sort.

Indexes on embedded documents

We can even define an index on an embedded document.

In this case, each item in the index is the embedded document as a whole. The index can be used when the condition in the query matches exactly the embedded document. This query can retrieve the document using the index:

but the next two examples are not able to use the index, and MongoDB will do a collection scan:

More useful is to create indexes on embedded fields rather than on embedded documents. For doing this we can use the dot notation.

We have created two separate ascending indexes on the name and surname embedded fields. Now, queries like the following can rely on the new indexes to be resolved.

Compound indexes

A compound index is an index on multiple fields. Using the same people collection we can create a compound index combining the city and age field.

In this case, we have created a compound index where the first entry is the value of the city field, the second is the value of the age field, and the third is the person.name. All the fields here are defined in ascending order.

Queries such as the following can benefit from the index:

The order we define the fields is important. In fact, only the left prefix of the index can be used to retrieve the documents. In the examples above the index can be used because the fields in the find() are always a left-prefix of the index definition. The following queries will not use the index because the field city is not specified in the query.  A collection scan is required to solve them.

The ascending or descending order is more important in the case of compound indexes. In fact, we need to pay attention when defining the order in the index because not all the queries can rely on it, in particular when using the sort() function.

Assume we have the compound index defined as follow:

The specified sort direction in the documents, when use sort(), must match the same pattern of the index definition or the inverse pattern. Any other pattern will not be supported by the index.

So, the following queries will use the index for sorting:

The following queries will not use the index: