Want MongoDB Performance? You Will Need to Add and Remove Indexes!

MongoDB PerformanceGood intentions can sometimes end up with bad results.  Adding indexes boosts performance until it doesn’t. Avoid over-indexing.

The difference between your application being fast, responsive, and scaling properly is often dependent on how you use indexes in the database.  MongoDB is no different, its performance (and the overall performance of your application) is heavily dependent on getting the proper amount of indexes on the right things.   A simple index or two can speed up getting data from MongoDB a million-fold for million-records tables.  But at the same time having too many indexes on a large collection can lead to massive slowdowns in overall performance.  You need to get your indexes just right.

For this blog, we are going to talk about having too many indexes and help you find both duplicate and unused indexes.  If you are interested in finding out if you need additional indexes or if your query is using an index, I would suggest reading previous Percona articles on query tuning (Part 1 & Part 2 of that series).

So, indexes are very good for getting faster queries. How many indexes do I need to create on a collection? What are the best practices for the indexes? How do I find which indexes are being used or not?  What if I have duplicated indexes?

Common Performance Problems

After analyzing a lot of different MongoDB environments I can provide the following list summarizing the typical errors I have seen:

  • Not creating indexes at all, other than the primary key _id created by design.
    • I’m not joking – I have seen databases without any user-created indexes, which had owners surprised the server was overloaded and/or the queries were very slow.
  • Over-indexing the collection.
    • Some developers usually create a lot of indexes without a specific reason or just for testing a query. Then they forget to drop them.
    • In some cases, the size of all the indexes was larger than the data. This is not good; indexes should be as small as possible to be really effective.

I’m not considering the first case. I’m going to discuss instead the second one.

How Many Indexes you Need in a Collection

It depends – that’s the right answer. Basically, it depends on your application workload. You should consider the following rules when indexing a collection:

  • Create as many indexes as possible for your application.
  • Don’t create a lot of indexes.

What? These rules are stating the opposite thing! Well, we can summarize in just one simple rule:

  • You need to create all the indexes your application really needs for solving the most frequent queries. Not one more, not one less.

That’s it.

Pros and Cons of Indexing

The big advantage of the indexes is that they permit the queries, updates, and deletes to run as fast as possible if they are used. (Every update or delete also needs to do a lookup step first). More indexes in a collection can benefit several queries.

Unfortunately, the indexes require some extra work for MongoDB. Any time your run a write, all the indexes must be updated. The new values are stored or dropped into the B-Tree structure, some splitting or merging is needed, and this requires some time.

The main problem is that “more indexes you have in a collection, the slower all the writes will be”.

A very large collection with just 10 or 15 indexes can have a significant performance loss for the writes. Also, remember that indexes have to be copied into the WiredTiger cache. More indexes imply also more pressure for the memory cache. The pressure can then lead to more cache evictions and slowness.

A good example of this is when I was working with a customer a few weeks ago we found 12 extra indexes on a collection they did not need. The collection was around 80GB; the total index size was more than the data size. They had a relevant write load based on several frequent inserts and updates all the time. Cleaning these indexes increased their write queries execution time by 25-30 percent on average. The improvement observed for this real case won’t be the same quantitative amount in other cases, but for sure the fewer indexes you have the faster all the writes will be.

We need to find some kind of balancing: creating more indexes, but not that much.

How to Reduce Over-Indexing

Very easy to say: drop all the indexes you don’t need.

There are two things you can do to identify the indexes to get dropped:

  • Check for the duplicates.
  • Check for the unused indexes.

For dropping an index you need to run something like the following:

Find Duplicate Indexes

A duplicate index could be an index with the same exact definition as another index that already exists in the collection. Fortunately, MongoDB is able to check this and it is not permitted to create such an index.

Let’s do a test using a simple collection with no indexes.

MongoDB is then clever enough to avoid the creation of duplicate indexes. But what about the creation of an index that is the left-prefix of an existing index? Let’s test it.