Pstress: Database Concurrency and Crash Recovery Testing Tool

Pstress PerconaDatabases are complicated software made to handle the concurrent load while making specific guarantees about data consistency and availability. There are many scenarios which should be tested that can only happen under concurrent conditions.

Pstress is a probability-based open-source database testing tool designed to run in concurrency and to test if the database can recover when something goes wrong. It generates random transactions based on options provided by the user. With the right set of options, users can test features, regression, and crash recovery. It can create the concurrent load on a cluster or on a single server.

The tool is currently in beta, but it has already become very important within the testing pipeline for Percona. Pstress is widely used by Percona’s QA team during each phase of testing. It has identified some critical bugs in Oracle MySQL Community Edition, Percona Server for MySQL, Percona XtraDB Cluster, and Percona XtraBackup. Pstress is developed on top of the Percona Pquery framework which is a testing tool that picks random SQL statements from a file and executes them in multiple threads concurrently against the target environment. More information about Pquery can be found in the last section of this document.

Initially, the tool was named Pquery 3.0 but since its function and purpose diverged from that of Pquery we have renamed it to Pstress to more accurately convey its use case.

GitHub repository for Pstress

Key Features of Pstress

  • Generate different varieties of load based on the options provided by the user. For example, the tool can generate a single-threaded test having only inserts in one table or it can generate huge transactions involving JSON, GIS, partitions, foreign keys, generated column, et cetera in multiple threads.
  • Kill or shutdown the running server  or a node and let it recover and check 
  • In case any issue is identified it creates a bug directory having all relevant information. 
  • It extracts important information from the bug directory and tries to compare it with existing bugs.
  • Provides an easy mechanism to reproduce the problem

It can be used to test any database which supports a SQL interface. Currently, it is being developed to test MySQL and the plan is to extend the tool to test PostgreSQL as well.

How it Works

Pstress runs in multiple steps. In the first step, it creates metadata required to run the test based on the options provided by the user. Below you can find an example execution command line.

The above command creates 20 tables which have 10 columns and 20 indexes and other random attributes like compression, encryption, tablespaces based on seed 2020. If a seed is some other number, the table structures would differ. Options like “–no-blob –no-virtual” means it will not have blob and virtual columns and “–encryption=keyring” it will use keyring-based encryption.

It generates random SQLs in proportion to the probability of each option. It picks the command line option or the default probability for each option. Currently, there are around 100 options to generate random SQL. Once we have support for JSON, GIS, Partition tables there will be more options. 

The above command executes SQLs in the proportion of 10 add indexes, 10 drop indexes, 800 inserts, 500 updates, and the default value for other SQL in each session.  It helps to test the scenario where the add/drop indexes are happening concurrently with DMLs.

After a certain time, it stops the step, saves the data directory and restarts the server by varying some server variables and then continues the load. 

While the test is in progress the tool keeps checking the heartbeat of the server, error logs, and other checksums to identify any bug and in case Pstress finds any error it will create a bug directory. It saves stack traces, error logs, and other relevant information for you to analyze and reproduce the problem. 

At the end of all of the steps, it tags each bug’s directory with known and unknown issues and creates a single report to be analyzed by the user.

Why We Have Multiple Steps

  • Pstress can be used for a crash recovery test since we kill the server or node multiple times when the server is under load and let it recover.
  • These multiple steps act as a snapshot of the server, and if any problem is identified, Pstress can rerun from the last known good point to reproduce the problem.

Different Use Cases for Pstress

It can be used as a regression, feature, or crash recovery testing tool

Regression Testing

Each type of SQL statement has some default probability value which can be picked in a transaction. So when we combine all of these SQL statements we get a pretty good load covering the different features of the database in a concurrent mode. The tool has successfully found some issues that can only happen under concurrent conditions.

The plan is to have a configuration file containing combinations of different probabilities to test features like JSON, GIS, partitions, et cetera. Based on the seed value Pstress would pick these combinations to catch regression bugs.

Feature Testing

Pstress can be used to stress areas of a database that features can impact. Users have to pick the relevant SQL and set their probability high. 

In the next column, you can find a simple example to generate load to test the feature “Instant Add Column” introduced in the latest release series of MySQL, MySQL 8.0.  

Within this set of options –only-cl-ddl represents “only command line DDL”. 

The tool will execute concurrent add and drop columns in multiple sessions along with DML. Also if a column is added successfully in one session then another session will start generating DML using that column.

Crash Recovery Testing

At the end of each step, Pstress kills the server when the database is under load and lets it recover. It also changes some of the server variables before restarting the server and does a sanity check before moving ahead.  If Pstress identifies an issue, it saves all the contents of the data directory and other relevant information in the bug’s directory.

Modules in Pstress

Pstress consists of a driver script and a workload. 

Driver Script

The driver script is written in BASH as a shell script. Below are the main features of the driver script.

  1. start/shutdown nodes of the database server.
  2. Execute the workload.
  3. Checks the status of the server by verifying the heartbeat of the server, error logs generated by the server, etc.
  4.  If Pstress identifies an issue, it saves the data directory and other relevant information in the bug’s directory.
  5. It extracts stack traces and other information from the bug’s directory and tries to tag to known and unknown issues.
  6. Monitors the workload and kills the workload if it is stuck.
  7. At the end of the test, it generates report details about the test; such as the number of errors that occurred, known issues found, and the percentage of transactions that were successful.

Workload

The workload is a multi-threaded program that executes transactions. Each thread has its own loop where randomly some transaction is executed. This transaction can be DML/DDL against a table or setting some global variables. 

Below is the workflow for a workload

  1. Create metadata for tables based on the option provided by the user.
  2. Start workload in multiple threads.
  3. In the first step, it creates default tables and loads initial data into those tables.
  4. Picks some random table and issues transactions against it.
  5. Updates its own metadata after a DDL, so other threads can start using the new definition of the table. 
  6. At the end of the run, saves the metadata in a JSON file which can be used for the next step.
  7. If an error is found in any thread, that complete step is marked as a failure.

Types of Transactions

Pstress supports a huge variety of SQL statements and transactions which are randomly generated based on the seed value. 

  1. INSERT/REPLACE statement which can involve int, varchar, blob, GIS data, JSON, partitions, and foreign keys.
  2. SELECT statements with some columns, all columns, and a where clause, operators such as <, =, >, like, between, et cetera.
  3. UPDATE and DELETE statements with some columns, all columns, in the clause, where clause, and operators such as <,=>, like, between, et cetera.
  4. OPTIMIZE, ANALYZE, TRUNCATE or other table-level operations
  5. Add or drop columns, add or drop indexes, rename columns, rename indexes, and add JSON or GIS indexes.
  6. JOIN on multiple tables. 
  7. Executing special syntax SQL statements. The user provides a grammar file. Pstress will randomly fit some metadata and execute them as transactions. 
  8. Add/Drop/Create  redo/undo at runtime.

The target is to add the majority of transactions and statements supported by MySQL and PostgreSQL.

Design

The driver script is written in BASH shell script and workload is written in C++. Workload has nodes, tables, columns, and index objects. 

  • Nodes are instantiated when the database is running in a clustered mode.  
  • Tables have multiple derived objects such as temporary tables, partition tables, etc. 
  • Columns are also derived into GIS, JSON, generated, blob, varchar, int, etc. 
  • Indexes also have derived objects like virtual indexes, JSON indexes, GIS indexes, etc.
  • Tables can also have interrelationships as well for foreign keys.

There are also tablespace objects, undo objects, and redo objects.  Data structures are used to hold global variables and workload uses mutexes and locks to enable the modification of metadata in multiple threads.

At the end of each step, objects are written to a JSON file and then reconstructed from the file after restarting the step

Success Stories with MySQL

Within the development phase, Pstress has been successful in finding many bugs in Oracle MySQL Community Edition, Percona Server for MySQL, Percona XtraDB Cluster, and Percona XtraBackup.  As of the time of writing, there are more than 50 bugs filed against these products’ bug trackers. A few examples are linked below.

Bugs in Oracle MySQL Community Edition

https://bugs.mysql.com/bug.php?id=98564

https://bugs.mysql.com/bug.php?id=98530

Bugs in Percona Server for MySQL

https://jira.percona.com/browse/PS-6815

https://jira.percona.com/browse/PS-5924

Bugs in Percona XtraBackup

https://jira.percona.com/browse/PXB-1974

https://jira.percona.com/browse/PXB-1972

Bugs in Percona XtraDB Cluster

https://jira.percona.com/browse/PXC-2629

https://jira.percona.com/browse/PXC-2949

Comparison with Existing Tools

Some well-known existing tools in the MySQL community are RQG (Random Query Generator), SysBench, and Pquery.

RQG has the concept of a grammar file. It stresses the database by executing the grammar files. So, you must combine lots of grammar files to create a good load which is often not a straight-forward process.

Also when adding new database features, existing grammar files are not helpful because they do not account for the new feature and must be modified.  In essence, you must edit all grammar each time a new database feature is added to have a good test-case. This is not easy and can be a cumbersome process.

SysBench is more of a benchmark tool than a testing tool. It takes a lot of effort to combine tests to generate load which can be used to test database concurrency and test some specific database features.

Pstress has some cool features like crash recovery, bug tagging, and reports.  These features are part of the focus of testing database concurrency and its side-effects, which make it a better tool for this purpose.

Pquery and Pstress

Pstress is developed on the top of the Pquery testing tool. Pquery executes random SQL from a file. Pstress generates SQL based on the option provided by the user.

In Pstress, table metadata gets refreshed whenever a DDL happens which makes the probability of executing successful SQL statements much higher.  For example, if a new column is added in one session then other sessions start using that column to generate SQL.

Pquery picks random queries from a file and executes those queries against the server. Some SQL statements within a transaction must occur in a specific order to succeed, if they are executed out of order it will incur a failure. For example, attempting to execute an INSERT on a table before executing “CREATE TABLE”.  

Pquery by itself does not generate enough load to create stress on various modules within the database code to find stress-induced bugs in features.  Pstress is focused on concurrency and can be used to find and identify stress-induced bugs.

Limitations of Pstress

Pstress can’t be used to check the correctness of a database. Pstress doesn’t check that a query that was successful is returning correct data. There are a few checks, however, such as ensuring the parent-child relationship is not violated in foreign keys and that partition tables don’t have data from other partitions. These checks are very limited. Because of the concurrent nature of Pstress, there is no mechanism to check if it is returning correct query results because by the time we are able to perform a check other threads have updated something.  This presents an effective race condition if we were to attempt validating correctness. 

One possible way to solve this is to use multi-master clustering to check data correctness but it may not be a true validation because the additional master is another database instance.  We plan to try this method and see what the results are, as we already support working with clusters such as Percona XtraDB Cluster. 

Pstress primarily relies on database asserts, messages in error logs, checksum, and a few other things to find issues in the database.  This means that if bugs silently present themselves in the database, the tool will be unable to find and identify them. While the tool is good at finding interesting bugs, it’s still critically important that the database code includes proper logging and assertions.

Sometimes it can be difficult to reproduce the bugs found because of their highly concurrent nature. The concept of multiple steps helps because we save snapshots of the database and reproduce the problem in a small time-frame.  However, this is not always guaranteed. If a user has some hints about the source of the issue they can stress that part of the database and increase the probability of reproducing the bug.

Features in the Development Stage

JSON, GIS, Partition, and Fk are in the development stage. 

ACKNOWLEDGMENTS

Thanks to Percona for giving me enough time so I can work on this tool. Thanks to the Percona QA team for contributions to Pstress.

Share this post

Comment (1)

  • Morgan Tocker (@morgo) Reply

    > Pstress can’t be used to check the correctness of a database. Pstress doesn’t check that a query that was successful is returning correct data.

    This might be a bit of a tangent, but there has been some recent research on how to effectively do this using Pivoted Query Synthesis (PQS). Basically it involves picking a row in the result set that should be there, and then generating a bunch of random expressions which should yield true for that row. I think it’s really interesting work.

    I’d link directly, but my experience says will send me to blog spam 🙂

    April 15, 2020 at 10:49 am

Leave a Reply