Buy Percona ServicesBuy Now!

MongoDB: index usage and MongoDB explain() (part 1)

 | September 4, 2018 |  Posted In: Insight for DBAs, Insight for Developers, MongoDB, open source databases, Percona Server for MongoDB

PREVIOUS POST
NEXT POST

MongoDB index typesIn this two part series, I’m going to explain explain. No, the repetition is not a pun or a typo. I’m talking about explain(), the most useful MongoDB feature to investigate a query. Here, I’ll introduce the index types available in MongoDB, their properties and how to create and use them. In the next article we’ll see explain() in action on some real examples.

Using explain(), questions like these will find a proper answer:

  • Is the query using an index?
  • How is it using the index?
  • How many documents are scanned?
  • How many documents are returned?
  • For how many milliseconds does the query run?

And many others. The explain() command provides a lot of information.

If you are familiar with MySQL, then you probably know about the command EXPLAIN. In MongoDB we have a similar feature, and the goal of using it is exactly the same: find out how we can improve a query in order to reduce the execution time as far as possible. Based on the information provided, we can decide to create a new index or to rewrite the query to take advantage of indexes that already exist. The same as you do when using MySQL.

Indexes supported by MongoDB

The concept of an index in MongoDB is the same as in relational databases. An index is generally a small structure—in comparison to the collection size—that provides a better way to access documents more quickly. Without an index, the only way that MongoDB has to retrieve the documents is to do a collection scan: reading sequentially all the documents in the collection. This is exactly the same as the full scan table in MySQL. Another similarity to MySQL is that the indexes in MongoDB have a structure based on the well known B-Tree. To understand explain() you’ll need to understand the index structures.

Let’s have an overview of the index types available in MongoDB, and their features and properties. We focus, in particular, on the single, compound and multikey index types. These are by far the most used and most useful in the majority of cases. We’ll present also the other types, but when using the explain(), in the second part of this article series, we’ll use only single field and compound indexes in the examples.

Single Field Indexes

This is an index built on a single field of the documents. The entries could be a single value, such as a string or a number, but also could be an embedded document.

By default each collection has a single field index automatically created on the _id field, the primary key.

The index can be defined in ascending or descending order.

Let’s see some examples.

Assume we have a collection people containing the following type of document:

We can define, for example, a single field index on the age field.

In this case we have created an ascending index. If we had wanted to create a descending index we would have used this syntax:

With this kind of index we can improve all the queries that find documents with a condition an the age field, like the following:

It’s interesting to highlight that the index can be used to improve even the sorting of the results. In this case, it doesn’t matter whether you have defined the index as ascending or descending. MongoDB can traverse the items in the index in both directions.

Both these queries use the index to retrieve all of the documents in the specified order.

The next query can use the index to retrieve the documents in the order specified by the sort.

Indexes on embedded documents

We can even define an index on an embedded document.

In this case, each item in the index is the embedded document as a whole. The index can be used when the condition in the query matches exactly the embedded document. This query can retrieve the document using the index:

but the next two examples are not able to use the index, and MongoDB will do a collection scan:

More useful is to create indexes on embedded fields rather than on embedded documents. For doing this we can use the dot notation.

We have created two separate ascending indexes on the name and surname embedded fields. Now, queries like the following can rely on the new indexes to be resolved.

Compound indexes

A compound index is an index on multiple fields. Using the same people collection we can create a compound index combining the city and age field.

In this case, we have created a compound index where the first entry is the value of city field; the second is the value of the age field; and the third is the person.name. All the fields here are defined in ascending order.

Queries such as the following can benefit from the index:

The order we define the fields is important. In fact, only the left prefix of the index can be used to retrieve the documents. In the examples above the index can be used because the fields in the find() are always a left-prefix of the index definition. The following queries will not use the index because the field city is not specified in the query.  A collection scan is required to solve them.

The ascending or descending order is more important in the case of compound indexes. In fact we need to pay attention when defining the order in the index because not all the queries can rely on it, in particular when using the sort() function.

Assume we have the compound index defined as follow:

The specified sort direction in the documents, when use sort(), must match the same pattern of the index definition or the inverse pattern. Any other pattern will not be supported by the index.

So, the following queries will use the index for sorting:

The following queries will not use the index:

The following queries will use the index both for retreiving the documents and for sorting.

They use the left prefix and the pattern of the sort() stage matches exactly the index pattern or the inverse pattern.

Multikey indexes

This is the index type for arrays. When creating an index on an array, MongoDB will create an index entry for every element.

Let’s assume we have these documents:

The multikey index can be created as:

Queries such as these next examples will use the index:

Geospatial and text indexes

MongoDB supports specialized indexes also for geospatial data and for text searches. They are out of scope of the goal of this article, but it is worth mentioning them. We’ll look at these in a future post maybe.

Index options

TTL

The acronym stands for Time To Live. This is a special option that we can apply only to a single field index to permit the automatic deletion of documents after a certain time.

During index creation we can define an expiration time. After that time, all the documents that are older than the expiration time will be removed from the collection. This kind of feature is very useful when we are dealing with data that don’t need to persist in the database. A good example of this is session data.

Let’s see how to define the TTL option on the sessionlog collection.

In this case, MongoDB will drop the documents from the collection automatically once half an hour (1800 seconds) has passed since the value in lastUpdateTime field.

MongoDB runs a background process every 60 seconds to drop the expired documents. So, this means that the real deletion of a document can be applied with a short delay, not precisely after expireAfterSeconds.

There are some restrictions:

  • as already mentioned only single field indexes can have the TTL option
  • the _id single field index cannot support the TTL option
  • the indexed field must be a date type
  • a capped collection cannot have a TTL index

In a replica set, documents are deleted only in the primary node; the background process works only on the primary node, with deletions correctly replicated as per any other event.

Partial indexes

A partial index is an index that contains only a subset of the values based on a filter rule. They are useful in cases where:

  • the index size can be reduced
  • we want to index the most relevant and used values in the query conditions
  • we want to index the most selective values of a field

Here’s how to create a partial index using the clause partialFilterExpression.

We have created a compound index on city and person.surname but only for the documents with age less then 30.

In order for the partial index to be used the queries must contain a condition on the age field.

The following queries cannot use the partial index because the results would be incomplete. In these examples, MongoDB will use a collection scan:

Sparse indexes

Sparse indexes are a subset of partial indexes. A sparse index only contains elements for the documents that have the indexed field, even if it is null.

Since MongoDB is a schemaless database, the documents in a collection can have different fields, so an indexed field may not be present in some of them.

To create such an index use the sparse option:

In this case we are assuming there could be documents in the collection with the field city missing.

Regular indexes—without the sparse option—contain all the documents of the collection with a null value for the elements  whose documents don’t contain the indexed field.

Sparse indexes are based on the existence of a field in the documents, and are useful to reduce the size of the index. Since they are a subset of partial indexes, if you have to decide then you should choose partial indexes.

Defining a partial index with the following filter is the same of having a sparse index:

Unique indexes

MongoDB can create an index as unique. An index defined this way cannot contain duplicate entries.

To create such an index use the unique option.

Uniqueness can be defined for compound indexes too.

By default the index on _id is automatically created as unique.

You cannot create the index as unique if you already have documents in the collection with duplicates for the proposed index.

As we know, missing fields in the documents are inserted in the index with a null value, even in the case of a unique index creation. Consequently only one null value is permitted in the index because of the uniqueness restraint. Our suggestion is to try to avoid missing fields when creating a unique index.

A couple of useful commands

Here are a couple of useful commands when dealing with indexes.

List of the indexes in a collection

Use the getIndexes() method.

In this sample collection we have 4 indexes, including the _id.

Drop an existing index

Use the dropIndex() method passing the name of the index to be dropped.

Conclusions

In this first part of this two part series I’ve described the indexes available in MongoDB, and how you can create and use them. We’ve also discussed some of their main features. In the next part, we’ll focus on the explain() method, and using some examples we’ll show how we can investigate queries and, if possible, how we can optimize them.

You won’t have to wait long! Come back on Thursday to read about MongoDB explain().

While you are here

If you found this article interesting, you might like to check out some of our other resources. For example, you might find this recorded webinar MongoDB Sharding 101 by my colleague Adamo Tonete to be useful. Or perhaps Tim Vaillancourt’s MongoDB backup and recovery field guide would provide some useful information. If you want to catch our webinars as they happen, please subscribe to receive notifications in good time.

PREVIOUS POST
NEXT POST
Corrado Pandiani

Prior to joining Percona as a Senior Consultant, Corrado spent more than 20 years in developing web sites and designing and administering MySQL. He is a MySQL enthusiast since version 3.23 and his skills are focused on performances and architectural design. He's also a trainer and a MongoDB consultant.

2 Comments

  • Thanks Corrado for this article, but is quite for beginners. a more advanced would be very appreciated, going inside explain command and show how execution plans are using indexes…

    • Hi Alex, the second part of the article will be published this week on Thursday. In the second part you will see some real examples.

Leave a Reply