If you have ever tuned a MongoDB cluster that passed every synthetic benchmark with flying colors, only to choke the moment real user traffic hit, you are not alone.

For years, database administrators and developers have relied on a standard suite of tools to test MongoDB performance (YCSB, Sysbench, POCDriver and mgodatagen –  just to name a few). While effective for measuring raw hardware throughput, these tools often fail to answer the most critical question: “How will this database handle my specific application load?”

In this post, we’ll compare the mentioned standard suites against a new challenger, Percona Load Generator For MongoDB Clusters (PLGM), to see which tool offers the most value for modern engineering teams.

The “Old Guard”: Synthetic Benchmarking Tools

These tools are excellent for comparing one server instance against another (e.g., “Is AWS m5.large faster than Azure D4s?”), but they often fall short on realism.

Tool Primary Purpose Strengths Limitations Best Used When
YCSB NoSQL benchmarking Industry standard; widely adopted; ideal for vendor and hardware comparisons Highly synthetic data; no realistic document structures or index selectivity; primary-key CRUD only Comparing raw performance across vendors or hardware
Sysbench System stress testing Excellent at exposing CPU and disk I/O limits Steep learning curve; Lua scripting required; limited use of MongoDB’s document model Finding infrastructure bottlenecks
POCDriver Basic workload generation Simple CLI; quick to start generating load Limited configurability; poor support for multi-stage application workflows Generating background load or quick demos
mgodatagen Data seeding Maintains relational integrity; supports derived fields, sharding, and index creation Static dataset only; no workload simulation Creating realistic initial datasets before testing

The Challenger: plgm

Enter plgm. Unlike the tools above, which focus on server performance or static data generation, plgm focuses on realism. It was built on the premise that a benchmark is useless if the data and the behavior don’t look like your application. Instead of blasting random keys at the database, plgm allows you to define custom schemas and query patterns that strictly mirror your actual application.

The plgm Advantage

1. Real Data, Not Random Junk

plgm integrates with gofakeit to generate realistic data as opposed to filling your database with random strings.

  • Need a user profile with a nested array of 3 distinct addresses?
  • Need valid email addresses, UUIDs, or realistic dates?

plgm handles this natively. This means your indexes and compression ratios will behave exactly as they do in production. You can literally provide the exact collection definitions and query patterns your application uses, and plgm will execute that precise workload.

2. Native Aggregation Support

Most benchmarks only test simple “Find by ID” queries. But real MongoDB apps run heavy aggregation pipelines among other queries. plgm allows you to define from the most simple query to the most complex pipelines (with $match, $group, $lookup, etc.) in a simple JSON format. You can finally stress-test that analytical dashboard query before it takes down your production cluster.

3. “Configuration as Code” for Workloads

Instead of learning Lua (Sysbench) or complex Java classes (YCSB), plgm uses simple JSON files to define your workload.

  • Collections.json: Define your document structure.
  • Queries.json: Define your mix of Finds, Updates, Deletes and Aggregates.

You can look at your application logs, copy the slow queries into queries.json, and instantly reproduce that exact load in your staging environment. Simply replace the specific values with type placeholders (<int> , <string>, etc …), and plgm will work its magic—automatically generating randomized, type-safe values for every execution.

4. High-Performance Go Architecture

Written purely in Go, plgm utilizes Goroutines to spawn thousands of concurrent workers with minimal memory usage. It automatically detects your CPU cores to maximize throughput, ensuring the bottleneck is the database, not the benchmark tool.

Zero-Dependency Installation & DevOps Ready

One of the biggest pain points with legacy benchmarking tools is the setup. YCSB requires a Java Runtime Environment (JRE) and complex Maven setups. Python-based tools require virtual environments and often struggle with driver version conflicts.

plgm is different.

Because it is written in Go, it compiles down to a single, static binary. There are no dependencies to install. You don’t need Python, Java, or Ruby on your machine.

Step 1: Download

You simply download the appropriate binary for your operating system and run it. Navigate to Releases section of our repository , select the version that best fits your use case, then extract, configure, and run the application.

Step 2: Configure

Instead of long command-line arguments, plgm uses a clean and very easy to configure config.yaml file (environment variables are also supported).

Set your Connection 

Open config.yaml and set your MongoDB URI

Define Your Reality (Optional) 

If you want to simulate your specific application, simply edit the configuration and point to your own JSON definitions

Fine tune your workload (Optional) 

Additional optimization and configuration can be performed through config.yaml. The tool also supports environment variables, enabling quick configuration changes between workload runs. This allows you to version-control your benchmark configuration alongside your application code, ensuring your performance tests always match your current schema. Some of the available options include:

  • Configuring default workloads
  • Defining multiple workloads
  • Providing your custom collection definitions and query patterns
  • Concurrency control
  • Workload duration 
  • Optional seeding collections with data
  • Control over operation types and their distribution
    • You can specify the percentage of each operation type, for example:
      • find_percent: 55
      • update_percent: 20
      • delete_percent: 10
      • insert_percent: 10
      • aggregate_percent: 5
  • More …..

Additional capabilities are available and you can find our full documentation in our git repo, Percona Load Generator For MongoDB Clusters (PLGM), with more features currently in development.

Step 3: Using PLGM

Once you have configured plgm to your requirements you can run it and observe the output. 

Native Docker & Kubernetes Support

Modern infrastructure lives in containers, and so can plgm. We provide a Docker workflow and sample Kubernetes Job manifests, so instead of running a benchmark from your laptop, you can deploy plgm as a pod inside your Kubernetes cluster. This eliminates network bottlenecks and tests the database’s true throughput limits.

Head-to-Head Comparison

Feature YCSB Sysbench POCDriver mgodatagen plgm
Primary Use Case Hardware comparison CPU/Disk Stress Quick Load Gen Smart Data Seeding App Simulation
Data Realism Low (Random strings) Low Medium High (Relational) High (Custom BSON)
Complex Queries No (PK only) Difficult (Lua) Limited No (Inserts only) Native Support (Agg)
Configuration Command Line Lua Scripts Command Line JSON JSON / YAML
Workload Logic None Scriptable None None Custom Templates

Verdict: Which Tool Should You Choose?

If Your Goal Is… Choose This Tool Why
Compare vendors or hardware YCSB Standardized, widely recognized benchmark
Stress-test CPU or storage Sysbench Pushes infrastructure to its limits
Generate quick background load POCDriver Minimal setup and fast execution
Seed a realistic dataset mgodatagen Preserves relationships and schema integrity
Benchmark real application behavior plgm Mirrors production traffic, schema, and query patterns

If you care about how your application code truly interacts with the database and queries perform reliably under pressure—synthetic benchmarks are not enough. You need a workload simulator that reflects production reality.

Get started today with plgm and test your database the way your application actually uses it.

Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments