If you have ever tuned a MongoDB cluster that passed every synthetic benchmark with flying colors, only to choke the moment real user traffic hit, you are not alone.
For years, database administrators and developers have relied on a standard suite of tools to test MongoDB performance (YCSB, Sysbench, POCDriver and mgodatagen – just to name a few). While effective for measuring raw hardware throughput, these tools often fail to answer the most critical question: “How will this database handle my specific application load?”
In this post, we’ll compare the mentioned standard suites against a new challenger, Percona Load Generator For MongoDB Clusters (PLGM), to see which tool offers the most value for modern engineering teams.
The “Old Guard”: Synthetic Benchmarking Tools
These tools are excellent for comparing one server instance against another (e.g., “Is AWS m5.large faster than Azure D4s?”), but they often fall short on realism.
| Tool | Primary Purpose | Strengths | Limitations | Best Used When |
| YCSB | NoSQL benchmarking | Industry standard; widely adopted; ideal for vendor and hardware comparisons | Highly synthetic data; no realistic document structures or index selectivity; primary-key CRUD only | Comparing raw performance across vendors or hardware |
| Sysbench | System stress testing | Excellent at exposing CPU and disk I/O limits | Steep learning curve; Lua scripting required; limited use of MongoDB’s document model | Finding infrastructure bottlenecks |
| POCDriver | Basic workload generation | Simple CLI; quick to start generating load | Limited configurability; poor support for multi-stage application workflows | Generating background load or quick demos |
| mgodatagen | Data seeding | Maintains relational integrity; supports derived fields, sharding, and index creation | Static dataset only; no workload simulation | Creating realistic initial datasets before testing |
The Challenger: plgm
Enter plgm. Unlike the tools above, which focus on server performance or static data generation, plgm focuses on realism. It was built on the premise that a benchmark is useless if the data and the behavior don’t look like your application. Instead of blasting random keys at the database, plgm allows you to define custom schemas and query patterns that strictly mirror your actual application.
The plgm Advantage
1. Real Data, Not Random Junk
plgm integrates with gofakeit to generate realistic data as opposed to filling your database with random strings.
- Need a user profile with a nested array of 3 distinct addresses?
- Need valid email addresses, UUIDs, or realistic dates?
plgm handles this natively. This means your indexes and compression ratios will behave exactly as they do in production. You can literally provide the exact collection definitions and query patterns your application uses, and plgm will execute that precise workload.
2. Native Aggregation Support
Most benchmarks only test simple “Find by ID” queries. But real MongoDB apps run heavy aggregation pipelines among other queries. plgm allows you to define from the most simple query to the most complex pipelines (with $match, $group, $lookup, etc.) in a simple JSON format. You can finally stress-test that analytical dashboard query before it takes down your production cluster.
3. “Configuration as Code” for Workloads
Instead of learning Lua (Sysbench) or complex Java classes (YCSB), plgm uses simple JSON files to define your workload.
- Collections.json: Define your document structure.
- Queries.json: Define your mix of Finds, Updates, Deletes and Aggregates.
You can look at your application logs, copy the slow queries into queries.json, and instantly reproduce that exact load in your staging environment. Simply replace the specific values with type placeholders (<int> , <string>, etc …), and plgm will work its magic—automatically generating randomized, type-safe values for every execution.
4. High-Performance Go Architecture
Written purely in Go, plgm utilizes Goroutines to spawn thousands of concurrent workers with minimal memory usage. It automatically detects your CPU cores to maximize throughput, ensuring the bottleneck is the database, not the benchmark tool.
Zero-Dependency Installation & DevOps Ready
One of the biggest pain points with legacy benchmarking tools is the setup. YCSB requires a Java Runtime Environment (JRE) and complex Maven setups. Python-based tools require virtual environments and often struggle with driver version conflicts.
plgm is different.
Because it is written in Go, it compiles down to a single, static binary. There are no dependencies to install. You don’t need Python, Java, or Ruby on your machine.
Step 1: Download
You simply download the appropriate binary for your operating system and run it. Navigate to Releases section of our repository , select the version that best fits your use case, then extract, configure, and run the application.
| 1 2 | # 1. Extract the binary tar -xzvf plgm-linux-amd64.tar.gz |
Step 2: Configure
Instead of long command-line arguments, plgm uses a clean and very easy to configure config.yaml file (environment variables are also supported).
Set your Connection
Open config.yaml and set your MongoDB URI
| 1 | uri: "mongodb://localhost:27017" |
Define Your Reality (Optional)
If you want to simulate your specific application, simply edit the configuration and point to your own JSON definitions
| 1 2 | collections_path: "./my_app_schema.json" queries_path: "./my_app_queries.json" |
Fine tune your workload (Optional)
Additional optimization and configuration can be performed through config.yaml. The tool also supports environment variables, enabling quick configuration changes between workload runs. This allows you to version-control your benchmark configuration alongside your application code, ensuring your performance tests always match your current schema. Some of the available options include:
- Configuring default workloads
- Defining multiple workloads
- Providing your custom collection definitions and query patterns
- Concurrency control
- Workload duration
- Optional seeding collections with data
- Control over operation types and their distribution
- You can specify the percentage of each operation type, for example:
- find_percent: 55
- update_percent: 20
- delete_percent: 10
- insert_percent: 10
- aggregate_percent: 5
- You can specify the percentage of each operation type, for example:
- More …..
Additional capabilities are available and you can find our full documentation in our git repo, Percona Load Generator For MongoDB Clusters (PLGM), with more features currently in development.
Step 3: Using PLGM
Once you have configured plgm to your requirements you can run it and observe the output.

Native Docker & Kubernetes Support
Modern infrastructure lives in containers, and so can plgm. We provide a Docker workflow and sample Kubernetes Job manifests, so instead of running a benchmark from your laptop, you can deploy plgm as a pod inside your Kubernetes cluster. This eliminates network bottlenecks and tests the database’s true throughput limits.
Head-to-Head Comparison
| Feature | YCSB | Sysbench | POCDriver | mgodatagen | plgm |
| Primary Use Case | Hardware comparison | CPU/Disk Stress | Quick Load Gen | Smart Data Seeding | App Simulation |
| Data Realism | Low (Random strings) | Low | Medium | High (Relational) | High (Custom BSON) |
| Complex Queries | No (PK only) | Difficult (Lua) | Limited | No (Inserts only) | Native Support (Agg) |
| Configuration | Command Line | Lua Scripts | Command Line | JSON | JSON / YAML |
| Workload Logic | None | Scriptable | None | None | Custom Templates |
Verdict: Which Tool Should You Choose?
| If Your Goal Is… | Choose This Tool | Why |
| Compare vendors or hardware | YCSB | Standardized, widely recognized benchmark |
| Stress-test CPU or storage | Sysbench | Pushes infrastructure to its limits |
| Generate quick background load | POCDriver | Minimal setup and fast execution |
| Seed a realistic dataset | mgodatagen | Preserves relationships and schema integrity |
| Benchmark real application behavior | plgm | Mirrors production traffic, schema, and query patterns |
If you care about how your application code truly interacts with the database and queries perform reliably under pressure—synthetic benchmarks are not enough. You need a workload simulator that reflects production reality.
Get started today with plgm and test your database the way your application actually uses it.