Introducing Percona Load Generator for MongoDB Clusters: The Benchmark Tool That Simulates Your Actual Application

If you have ever tuned a MongoDB cluster that passed every synthetic benchmark with flying colors, only to choke the moment real user traffic hit, you are not alone.

For years, database administrators and developers have relied on a standard suite of tools to test MongoDB performance (YCSB, Sysbench, POCDriver and mgodatagen – just to name a few). While effective for measuring raw hardware throughput, these tools often fail to answer the most critical question: “How will this database handle my specific application load?”

In this post, we’ll compare the mentioned standard suites against a new challenger, Percona Load Generator For MongoDB Clusters (PLGM), to see which tool offers the most value for modern engineering teams.

The “Old Guard”: Synthetic Benchmarking Tools

These tools are excellent for comparing one server instance against another (e.g., “Is AWS m5.large faster than Azure D4s?”), but they often fall short on realism.

Tool	Primary Purpose	Strengths	Limitations	Best Used When
YCSB	NoSQL benchmarking	Industry standard; widely adopted; ideal for vendor and hardware comparisons	Highly synthetic data; no realistic document structures or index selectivity; primary-key CRUD only	Comparing raw performance across vendors or hardware
Sysbench	System stress testing	Excellent at exposing CPU and disk I/O limits	Steep learning curve; Lua scripting required; limited use of MongoDB’s document model	Finding infrastructure bottlenecks
POCDriver	Basic workload generation	Simple CLI; quick to start generating load	Limited configurability; poor support for multi-stage application workflows	Generating background load or quick demos
mgodatagen	Data seeding	Maintains relational integrity; supports derived fields, sharding, and index creation	Static dataset only; no workload simulation	Creating realistic initial datasets before testing

The Challenger: plgm

Enter plgm. Unlike the tools above, which focus on server performance or static data generation, plgm focuses on realism. It was built on the premise that a benchmark is useless if the data and the behavior don’t look like your application. Instead of blasting random keys at the database, plgm allows you to define custom schemas and query patterns that strictly mirror your actual application.

The plgm Advantage

1. Real Data, Not Random Junk

plgm integrates with gofakeit to generate realistic data as opposed to filling your database with random strings.

Need a user profile with a nested array of 3 distinct addresses?
Need valid email addresses, UUIDs, or realistic dates?

plgm handles this natively. This means your indexes and compression ratios will behave exactly as they do in production. You can literally provide the exact collection definitions and query patterns your application uses, and plgm will execute that precise workload.

2. Native Aggregation Support

Most benchmarks only test simple “Find by ID” queries. But real MongoDB apps run heavy aggregation pipelines among other queries. plgm allows you to define from the most simple query to the most complex pipelines (with $match, $group, $lookup, etc.) in a simple JSON format. You can finally stress-test that analytical dashboard query before it takes down your production cluster.

3. “Configuration as Code” for Workloads

Instead of learning Lua (Sysbench) or complex Java classes (YCSB), plgm uses simple JSON files to define your workload.

Collections.json: Define your document structure.
Queries.json: Define your mix of Finds, Updates, Deletes and Aggregates.

You can look at your application logs, copy the slow queries into queries.json, and instantly reproduce that exact load in your staging environment. Simply replace the specific values with type placeholders (<int> , <string>, etc …), and plgm will work its magic—automatically generating randomized, type-safe values for every execution.

4. High-Performance Go Architecture

Written purely in Go, plgm utilizes Goroutines to spawn thousands of concurrent workers with minimal memory usage. It automatically detects your CPU cores to maximize throughput, ensuring the bottleneck is the database, not the benchmark tool.

Zero-Dependency Installation & DevOps Ready

One of the biggest pain points with legacy benchmarking tools is the setup. YCSB requires a Java Runtime Environment (JRE) and complex Maven setups. Python-based tools require virtual environments and often struggle with driver version conflicts.

plgm is different.

Because it is written in Go, it compiles down to a single, static binary. There are no dependencies to install. You don’t need Python, Java, or Ruby on your machine.

Step 1: Download

You simply download the appropriate binary for your operating system and run it. Navigate to Releases section of our repository , select the version that best fits your use case, then extract, configure, and run the application.

# 1. Extract the binary
tar -xzvf plgm-linux-amd64.tar.gz

1 2	# 1. Extract the binary tar -xzvf plgm-linux-amd64.tar.gz

Step 2: Configure

Instead of long command-line arguments, plgm uses a clean and very easy to configure config.yaml file (environment variables are also supported).

Set your Connection

Open config.yaml and set your MongoDB URI

uri: "mongodb://localhost:27017"

1	uri: "mongodb://localhost:27017"

Define Your Reality (Optional)

If you want to simulate your specific application, simply edit the configuration and point to your own JSON definitions

collections_path: "./my_app_schema.json"
queries_path: "./my_app_queries.json"

1 2	collections_path: "./my_app_schema.json" queries_path: "./my_app_queries.json"

Fine tune your workload (Optional)

Additional optimization and configuration can be performed through config.yaml. The tool also supports environment variables, enabling quick configuration changes between workload runs. This allows you to version-control your benchmark configuration alongside your application code, ensuring your performance tests always match your current schema. Some of the available options include:

Configuring default workloads
Defining multiple workloads
Providing your custom collection definitions and query patterns
Concurrency control
Workload duration
Optional seeding collections with data
Control over operation types and their distribution
- You can specify the percentage of each operation type, for example:
  - find_percent: 55
  - update_percent: 20
  - delete_percent: 10
  - insert_percent: 10
  - aggregate_percent: 5
More …..

Additional capabilities are available and you can find our full documentation in our git repo, Percona Load Generator For MongoDB Clusters (PLGM), with more features currently in development.

Step 3: Using PLGM

Once you have configured plgm to your requirements you can run it and observe the output.

Native Docker & Kubernetes Support

Modern infrastructure lives in containers, and so can plgm. We provide a Docker workflow and sample Kubernetes Job manifests, so instead of running a benchmark from your laptop, you can deploy plgm as a pod inside your Kubernetes cluster. This eliminates network bottlenecks and tests the database’s true throughput limits.

Head-to-Head Comparison

Feature	YCSB	Sysbench	POCDriver	mgodatagen	plgm
Primary Use Case	Hardware comparison	CPU/Disk Stress	Quick Load Gen	Smart Data Seeding	App Simulation
Data Realism	Low (Random strings)	Low	Medium	High (Relational)	High (Custom BSON)
Complex Queries	No (PK only)	Difficult (Lua)	Limited	No (Inserts only)	Native Support (Agg)
Configuration	Command Line	Lua Scripts	Command Line	JSON	JSON / YAML
Workload Logic	None	Scriptable	None	None	Custom Templates

Verdict: Which Tool Should You Choose?

If Your Goal Is…	Choose This Tool	Why
Compare vendors or hardware	YCSB	Standardized, widely recognized benchmark
Stress-test CPU or storage	Sysbench	Pushes infrastructure to its limits
Generate quick background load	POCDriver	Minimal setup and fast execution
Seed a realistic dataset	mgodatagen	Preserves relationships and schema integrity
Benchmark real application behavior	plgm	Mirrors production traffic, schema, and query patterns

If you care about how your application code truly interacts with the database and queries perform reliably under pressure—synthetic benchmarks are not enough. You need a workload simulator that reflects production reality.

Get started today with plgm and test your database the way your application actually uses it.

MySQL 5.7 Support

Compare Percona to Leading Database Solutions

Software Downloads

Valkey Contribution

Product Documentation

Resource Hub

Why Percona for MongoDB?

Why Percona for PostgreSQL?

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Introducing Percona Load Generator for MongoDB Clusters: The Benchmark Tool That Simulates Your Actual Application

The “Old Guard”: Synthetic Benchmarking Tools

The Challenger: plgm

The plgm Advantage

1. Real Data, Not Random Junk

2. Native Aggregation Support

3. “Configuration as Code” for Workloads

4. High-Performance Go Architecture

Zero-Dependency Installation & DevOps Ready

Step 1: Download

Step 2: Configure

Set your Connection

Define Your Reality (Optional)

Fine tune your workload (Optional)

Step 3: Using PLGM

Native Docker & Kubernetes Support

Head-to-Head Comparison

Verdict: Which Tool Should You Choose?

About the Author

Share This Post!

Stay up to date with the Percona Blog

Related Blog Articles

RECOMMENDED ARTICLES

A Failing Unit Test, a Mysterious TCMalloc Misconfiguration, and a 60% Performance Gain in Docker

Security Advisory: A Series of CVEs Affecting Valkey

Percona Operator for MongoDB 1.22.0: Automatic Storage Resizing, Vault Integration, Service Mesh Support, and More!

MOST POPULAR ARTICLES

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL Performance Tuning: Maximizing Database Efficiency and Speed

The Ultimate Guide to Open Source Databases

MySQL 5.7
Support

Software
Downloads