In 2025, the Percona Operator for MongoDB focused on the hardest parts of running MongoDB in Kubernetes: reliable backups and restores, clearer behavior during elections and restores, better observability at scale, and safer defaults as MongoDB 8.0 became mainstream. The year included real course corrections, such as addressing PBM connection leaks and being explicit about when not to upgrade. The result is an Operator that is more transparent about its guarantees and better suited for multi cluster, multi region, and compliance driven environments.

For many teams, 2025 was not about learning Kubernetes or MongoDB for the first time. It was about running more clusters with fewer people, meeting stricter security and compliance expectations, and expecting routine operations to stay routine.

Community conversations reflected that shift. Questions were less about “how do I deploy” and more about:

  • “How do restores behave under pressure?”
  • “What happens during elections when nodes are drained?”
  • “How do I see what the operator is actually doing across many clusters?”

So let’s see how Percona addressed those and many other requests to deliver best on a market open source kubernetes operator for MongoDB 😉 

Backups and restores became more flexible and less stressful

Early in the year, with version 1.19.0 released in January, the Operator expanded its storage flexibility in two important ways. In addition to adding filesystem based backups over NFS as a tech preview, it also introduced extensibility for PVC resizing to work with external autoscalers. Together, these changes addressed common realities in restricted or highly regulated environments where S3 compatible object storage may not be available and where storage growth needs to be handled dynamically without manual intervention.

That same release also removed a long-standing limitation by allowing backups in unmanaged replica clusters, which simplified disaster recovery designs that rely on secondary or remote clusters.

In May, version 1.20.0 focused heavily on backup workflows. Point-in-time recovery was improved so restores could be performed from any configured storage without waiting for cluster reconfiguration. This reduced friction in environments that rotate or separate storage by purpose.

Incremental physical backups were also introduced around this time as a tech preview. The motivation was straightforward: smaller backups, faster completion, and better recovery time objectives for larger datasets. The boundaries were kept explicit, including the requirement for a base backup and a single storage location for the backup chain.

Across the year, restore behavior was refined based on real-world usage, especially around balancer handling and PBM integration. These changes made restores more predictable, even if they were not always visible as “new features.”

MongoDB 8.0 became easier to adopt with confidence

Support for Percona Server for MongoDB 8.0 arrived at the start of the year and matured steadily over subsequent releases. By October, MongoDB 8.0 became the default version for new clusters, reflecting growing confidence in its stability and readiness.

Along the way, the Operator adapted monitoring roles, backup logic, and restore behavior to match MongoDB 8.x expectations. One notable addition was persistent cluster level MongoDB logging through the logcollector configuration. This made debugging and day two operations significantly easier by ensuring logs survive pod restarts and are accessible at the cluster level rather than being tied to individual containers.

The net result was not just “support for a new version,” but a clearer path to adopting it without rewriting operational playbooks.

Multi-cluster and multi-region operations felt more intentional

As more teams ran MongoDB clusters across namespaces, regions, or even multiple Kubernetes clusters, operational clarity became more important than raw functionality.

Mid-year improvements made it easier to give clusters meaningful names in monitoring, so PMM dashboards stayed readable even in complex environments. This small change reduced confusion during incidents, where identifying the right cluster quickly matters.

Later in the year, concurrent reconciliation was introduced so a single Operator instance could manage multiple clusters more efficiently. Instead of updates queueing behind one another, reconciliation could be tuned to match the scale of the environment.

CRDs also gained clearer version labeling, making it easier to verify that a given CRD definition is consistent with the Operator version running in the cluster. This helped teams avoid subtle mismatches when newer Operator versions introduced updated or expanded CRD schemas, particularly during upgrades and audits.

Replica set behavior got calmer and more predictable

Several improvements throughout the year focused on reducing surprises around elections and topology.

Early on, the Operator added support for manually adjusting replica set member priority, giving teams more control during maintenance or planned failovers.

Later in the year, hidden nodes became available. These nodes hold full copies of data without serving client traffic, making them useful for backups or reporting workloads. 

Together, these changes helped align the Operator more closely with how MongoDB is actually operated in production environments.

Security and access management became simpler

Security-related improvements in 2025 focused on reducing manual work rather than adding complexity.

Automatic password generation for custom MongoDB users allowed teams to declare users directly in the Custom Resource and let the Operator handle secret creation safely.

Support for IAM roles for service accounts reduced the need to manage long-lived credentials for cloud storage access, aligning better with modern cloud security practices.

These changes quietly removed a lot of custom scripting around the Operator.

What we learned as a community

A few themes came up again and again, and most of the work in 2025 followed directly from those lessons.

  • Restore correctness matters more than restore speed
  • Clear boundaries are better than hidden automation
  • Observability needs to scale with the number of clusters, not just the size of one
  • Making tradeoffs explicit builds more trust than pretending they do not exist

What’s next

2025 was about making the Percona Operator for MongoDB feel less surprising and more dependable. Looking ahead, the priorities remain deliberately practical and focused on production realities.

Backup and restore workflows will continue to be hardened, including support for backups using PVC snapshots. This opens the door to faster and more storage native recovery paths, especially in environments where snapshot based workflows are already standard.

Storage automation will advance further with automatic PVC resizing, reducing manual intervention as datasets grow and making it easier to pair the Operator with external autoscalers.

Credential management is another area of active investment. Integrating Vault for system user credential management will help teams standardize secrets handling and align MongoDB operations with broader security and compliance practices.

Restore workflows will also become more flexible with planned support for replica set remapping during restores. This will make it easier to recover into different topologies, regions, or cluster layouts without requiring post restore reconfiguration.

Across all of this, the guiding goal remains the same. Make MongoDB on Kubernetes easier to operate at scale, easier to recover under pressure, and easier to trust when things go wrong.

Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments