Implementing Distributed MongoDB Backups Without the Coordinator

January 14, 2020

Author

Andrew Pogrebnoi

MongoDB

Percona Software

Share this Post:

distributed mongodb backups no coordinator Prior to version 1.0, Percona Backup for MongoDB (PBM) had a separate coordinator daemon as a key part of its architecture. The coordinator was handling all communications with backup agents and the control program. It held a PBM configuration and backups list, made a decision on what agents should do a backup or restore, and ensured a consistent backup and restore points across all shards.

Having a dedicated single source of truth can simplify communications and decision making. But it brings extra complexity to the system in other ways, though (one more part of the system to maintain along with the communication protocol, etc.). And more importantly, it becomes a single point of failure. So we decided to abandon the coordinator in the 1.0 version.

Since v1.0 we’ve been using MongoDB by itself as a communication hub and the configuration and other backups-related data storage. It removes one extra layer of communication from the system (in our case it was gRPC) and provides us with a durable distributed storage. So we don’t have to maintain those things anymore. Cool!

Backups and Restores Without the Coordinator?

How can we conduct the backups and restores without the coordinator? Who is going to make decisions on which node in the replica set should run the backup? And who is going to ensure consistent backup points across all shards in the cluster?

Looks like we need a leader anyway. Or do we? Let’s have a closer look.

Actually, we can split this problem into two levels. First, we have to make decisions on which node among the replica set should be in charge of the operation (backup, restore, etc). And the second, which replica set’s lead is going to have an extra duty to ensure cluster-wide consistency in case of a sharded cluster.

The second one is easier. Since the sharded cluster has to have the one and only Config Server replica set anyway, we can agree that one of its members should always be in charge of cluster-wide operations.

So one problem solved, one more to go.

How to choose a deputy of a replica set? The kinda obvious answer is to run some leader election among the replica set nodes and let the leader decide further steps. We can use some software like Apache ZooKeeper for this, but it brings a heavy component and an external dependency into the system which we’d like to avoid. Or we can use some distributed consensus algorithms like Paxos or Raft. A leader can be elected beforehand and take responsibility for the operations (backup, restore, etc). In such a case, we have to properly be able to deal when the leader fails – detect it, reelect a new leader, etc. Or we can run the election process on each operation request. But it means extra routine and time spent before operation get started (which is really not a big deal taking into account the usual frequency of backup/restore operations and a few extra network roundtrips seem to be nothing compared with the backup/restore operation by itself).

But can we avoid a leader election at all? Yes, and here is what we do. When the operation command is issued by the backup control program, it gets delivered to each backup agent. Then the agent checks if the node it’s attached to is appropriate for the job (has an acceptable replication lag, secondary is preferred for the backups, etc.) and if so, the agent tries to acquire a lock for this operation. And if it happened to succeed it moves on and became in charge of the replica set for this job. All other agents which failed to get a lock just skip this job. Actually we kill two birds with one stone since we have to have some mechanism in place also to prevent two or more backups and/or restores running concurrently.

The last thing to consider is how we actually do locking. We use MongoDB unique indexes.

First of all, when PBM started on the new cluster it automatically creates internal collections. One of them is admin.pbmLock with the unique index constraint for the field “replest”. So later on, when agents acting on behalf of the replica set trying to get a lock, only one can succeed.

Below is a simplified code from PBM (we use the official MongoDB Go Driver).

Creating a new collection with the unique index:

err := mongoclient.Database("admin").RunCommand(<br>    context.Background(),<br>    bson.D{{"create", "pbmLock"}}, <br>).Err()<br>if err != nil {<br>    return err<br>}<br><br>// create index for Locks<br>c := mongoclient.Database("admin").Collection("pbmLock")<br>_, err = c.Indexes().CreateOne(<br>    context.Background(),<br>    mongo.IndexModel{<br>        Keys: bson.D{{"replset", 1}},<br>        Options: options.Index().<br>            SetUnique(true),<br>    },<br>)<br>if err != nil {<br>    return errors.Wrap(err, "ensure lock index")<br>}<br><br>Acquiring a lock:<br>// LockHeader describes the lock. This data will be serialised into the mongo document.<br>type LockHeader struct {<br>	Type       Command `bson:"type,omitempty"`<br>	Replset    string  `bson:"replset,omitempty"`<br>	Node       string  `bson:"node,omitempty"`<br>	BackupName string  `bson:"backup,omitempty"`<br>}<br><br>// Acquire tries to acquire a lock with the given header and returns true if it succeeds<br>func Acquire(lock LockHeader) (bool, error) {<br>	var err error<br>	l.Heartbeat, err = l.p.ClusterTime()<br>	if err != nil {<br>		return false, errors.Wrap(err, "read cluster time")<br>	}<br><br>	_, err = l.c.InsertOne(l.p.Context(), lock)<br>	if err != nil && !strings.Contains(err.Error(), "E11000 duplicate key error") {<br>		return false, errors.Wrap(err, "acquire lock")<br>	}<br><br>	// if there is no duplicate key error, we got the lock<br>	if err == nil {<br>		return true, nil<br>	}<br><br>	return false, nil<br>}<br>

1

err := mongoclient.Database("admin").RunCommand( context.Background(), bson.D{{"create", "pbmLock"}}, ).Err() if err != nil { return err } // create index for Locks c := mongoclient.Database("admin").Collection("pbmLock") _, err = c.Indexes().CreateOne( context.Background(), mongo.IndexModel{ Keys: bson.D{{"replset", 1}}, Options: options.Index(). SetUnique(true), }, ) if err != nil { return errors.Wrap(err, "ensure lock index") } Acquiring a lock: // LockHeader describes the lock. This data will be serialised into the mongo document. type LockHeader struct { Type Command `bson:"type,omitempty"` Replset string `bson:"replset,omitempty"` Node string `bson:"node,omitempty"` BackupName string `bson:"backup,omitempty"` } // Acquire tries to acquire a lock with the given header and returns true if it succeeds func Acquire(lock LockHeader) (bool, error) { var err error l.Heartbeat, err = l.p.ClusterTime() if err != nil { return false, errors.Wrap(err, "read cluster time") } _, err = l.c.InsertOne(l.p.Context(), lock) if err != nil && !strings.Contains(err.Error(), "E11000 duplicate key error") { return false, errors.Wrap(err, "acquire lock") } // if there is no duplicate key error, we got the lock if err == nil { return true, nil } return false, nil }

An alternative to the unique indexes for locking purposes could be transactions. But transactions were introduced in MongoDB 4.0 and we’re bound to support a still quite widely used 3.6 version.

In Summary

Solutions that solve complex problems in a simple but yet effective way is what we’re seeking. Reducing complexity simplifies support and development and leaves less room for bugs. And in spite of the sophisticated nature of distributed backups and coordination challenges we were facing, I believe we came up with a simple but effective solution.

Stay tuned for more on PBM internals, and you can start making consistent backups of your MongoDB cluster right now. Check out Percona Backup for MongoDB the Github for the same.