This post explains how to perform a Rolling Index Build on a Kubernetes environment running Percona Operator for MongoDB.

Rolling Index Builds with Percona Operator for MongoDB

Why and when to perform a Rolling Index Build?

Building an index requires:

  • CPU and I/O resources
  • Database locks (even if brief)
  • Network bandwidth

If you have very tight SLAs or systems that are already operating close to their peak capacity, building an index the traditional way could lead to an outage.

A Rolling Index Build approach takes advantage of the replica set architecture by building the index on a single non-primary member at a time. This reduces the impact while maintaining the availability of the system.

If you have been managing MongoDB for some time, you likely know how to perform a Rolling Index Build. The following workflow explains the process:

However, things get more complicated in the Kubernetes world, as you cannot simply stop the mongod process in a pod. If you were to do so, the health check would fail, causing Kubernetes to spin up a new replacement pod immediately.

This prevents a straightforward approach to performing a rolling index build. Fortunately, there’s Percona Operator for MongoDB that addresses this challenge.

Step-by-step: Rolling Index Build

Let’s see the procedure for a 3-node replica set. In case you want to replicate this, here is the cr.yaml I’ve used after installing Percona Operator for MongoDB on my Kubernetes cluster:

You can deploy it in the current namespace by running:

1. Verify the topology

After the deployment is complete, find the MongoDB pods:

Pick one pod and start a shell against it:

Verify the topology to see the current primary and secondary members:

2. Stop one Secondary

In this case, we can take advantage of the Operator feature to avoid the restart-on-fail loop for Percona Server for MongoDB containers.

Let’s start by doing this in one of our secondary nodes testrs-rs0-1:

Now we can safely stop the mongod process:

This causes the pod to be restarted in “infinite sleep” mode, without starting mongod. We can connect to a different member and verify the status of the replica set:

3. Starting in standalone mode

We’ll be starting mongod manually, so let’s write down the configuration options from one of the remaining nodes:

Now we are ready to start our stopped node in standalone mode. Since our pod is in the “infinite sleep” mode, we need to build the TLS certificates. In “normal” mode, these steps happen automatically:

Now, based on the configuration options we wrote down, we can prepare the command to start mongod in standalone mode. Start by removing the –replset parameter, changing the default port, and binding to localhost (for extra security). Optionally remove the –auth parameter for convenience. You should end up with something like this:

Run that command, and mongod will start. Logs will be printed to stdout, so we leave this shell session alone for now.

4. Build the Index

Start a new shell session, connect to the pod we are working with, and build the index:

When the index build finishes, shut down mongod from the shell:

5. Resume normal operation

After the shutdown is complete, we can delete the “sleep-forever” file to go back to “normal” pod behavior:

As soon as this file is removed, the mongod process on the pod will automatically start with the usual arguments. The node should eventually catch up via oplog apply and go back to Secondary state.

6. Repeat on other Secondaries

Now simply repeat steps one through five on each remaining secondary node. This process can be scripted, but it’s safer to proceed manually unless you’re very sure of your automation.

7. Execute on the Primary

After all secondaries have the new index, you can perform a controlled failover:

This part obviously will have some impact, so ideally, perform it during off-hours. Once the new primary is promoted, all that remains is to repeat the steps 1-5 on the former primary.

Summary

When managing a production-grade MongoDB, schema changes—like adding indexes—must be carefully planned to avoid performance degradation or downtime. While MongoDB has improved the index build process over time, in some cases, it is still impossible to create an index directly on a primary server without affecting the system.

With a rolling approach, you can safely add indexes across your MongoDB replica set with minimal disruption to production workloads, even with the added complexity of Kubernetes.

One caveat to keep in mind is that the oplog window has to be big enough. If your index takes two hours to build, you should have at least a two hour oplog window (probably even a bit more to be safe).

 

Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments