Databases are complicated software made to handle the concurrent load while making specific guarantees about data consistency and availability. There are many scenarios which should be tested that can only happen under concurrent conditions.
Pstress is a probability-based open-source database testing tool designed to run in concurrency and to test if the database can recover when something goes wrong. It generates random transactions based on options provided by the user. With the right set of options, users can test features, regression, and crash recovery. It can create the concurrent load on a cluster or on a single server.
The tool is currently in beta, but it has already become very important within the testing pipeline for Percona. Pstress is widely used by Percona’s QA team during each phase of testing. It has identified some critical bugs in Oracle MySQL Community Edition, Percona Server for MySQL, Percona XtraDB Cluster, and Percona XtraBackup. Pstress is developed on top of the Percona Pquery framework which is a testing tool that picks random SQL statements from a file and executes them in multiple threads concurrently against the target environment. More information about Pquery can be found in the last section of this document.
Initially, the tool was named Pquery 3.0 but since its function and purpose diverged from that of Pquery we have renamed it to Pstress to more accurately convey its use case.
It can be used to test any database which supports a SQL interface. Currently, it is being developed to test MySQL and the plan is to extend the tool to test PostgreSQL as well.
Pstress runs in multiple steps. In the first step, it creates metadata required to run the test based on the options provided by the user. Below you can find an example execution command line.
|
1 |
<b>./pstress-ms --tables 20 --columns 10 --encryption=keyring --indexes 20 --no-blob --no-virtual –seed 2020</b> |
The above command creates 20 tables which have 10 columns and 20 indexes and other random attributes like compression, encryption, tablespaces based on seed 2020. If a seed is some other number, the table structures would differ. Options like “–no-blob –no-virtual” means it will not have blob and virtual columns and “–encryption=keyring” it will use keyring-based encryption.
It generates random SQLs in proportion to the probability of each option. It picks the command line option or the default probability for each option. Currently, there are around 100 options to generate random SQL. Once we have support for JSON, GIS, Partition tables there will be more options.
|
1 |
<b>--add-index 10 --drop-index 10 --rename-table 0 --insert 800 --update 500</b> |
The above command executes SQLs in the proportion of 10 add indexes, 10 drop indexes, 800 inserts, 500 updates, and the default value for other SQL in each session. It helps to test the scenario where the add/drop indexes are happening concurrently with DMLs.
After a certain time, it stops the step, saves the data directory and restarts the server by varying some server variables and then continues the load.
While the test is in progress the tool keeps checking the heartbeat of the server, error logs, and other checksums to identify any bug and in case Pstress finds any error it will create a bug directory. It saves stack traces, error logs, and other relevant information for you to analyze and reproduce the problem.
At the end of all of the steps, it tags each bug’s directory with known and unknown issues and creates a single report to be analyzed by the user.
It can be used as a regression, feature, or crash recovery testing tool
Each type of SQL statement has some default probability value which can be picked in a transaction. So when we combine all of these SQL statements we get a pretty good load covering the different features of the database in a concurrent mode. The tool has successfully found some issues that can only happen under concurrent conditions.
The plan is to have a configuration file containing combinations of different probabilities to test features like JSON, GIS, partitions, et cetera. Based on the seed value Pstress would pick these combinations to catch regression bugs.
Pstress can be used to stress areas of a database that features can impact. Users have to pick the relevant SQL and set their probability high.
In the next column, you can find a simple example to generate load to test the feature “Instant Add Column” introduced in the latest release series of MySQL, MySQL 8.0.
|
1 |
<b>--add-column 20 --drop-column --only-cl-ddl</b> |
Within this set of options –only-cl-ddl represents “only command line DDL”.
The tool will execute concurrent add and drop columns in multiple sessions along with DML. Also if a column is added successfully in one session then another session will start generating DML using that column.
At the end of each step, Pstress kills the server when the database is under load and lets it recover. It also changes some of the server variables before restarting the server and does a sanity check before moving ahead. If Pstress identifies an issue, it saves all the contents of the data directory and other relevant information in the bug’s directory.
Pstress consists of a driver script and a workload.
The driver script is written in BASH as a shell script. Below are the main features of the driver script.
The workload is a multi-threaded program that executes transactions. Each thread has its own loop where randomly some transaction is executed. This transaction can be DML/DDL against a table or setting some global variables.
Below is the workflow for a workload
Pstress supports a huge variety of SQL statements and transactions which are randomly generated based on the seed value.
The target is to add the majority of transactions and statements supported by MySQL and PostgreSQL.
The driver script is written in BASH shell script and workload is written in C++. Workload has nodes, tables, columns, and index objects.
There are also tablespace objects, undo objects, and redo objects. Data structures are used to hold global variables and workload uses mutexes and locks to enable the modification of metadata in multiple threads.
At the end of each step, objects are written to a JSON file and then reconstructed from the file after restarting the step.
Within the development phase, Pstress has been successful in finding many bugs in Oracle MySQL Community Edition, Percona Server for MySQL, Percona XtraDB Cluster, and Percona XtraBackup. As of the time of writing, there are more than 50 bugs filed against these products’ bug trackers. A few examples are linked below.
Bugs in Oracle MySQL Community Edition
https://bugs.mysql.com/bug.php?id=98564
https://bugs.mysql.com/bug.php?id=98530
Bugs in Percona Server for MySQL
https://jira.percona.com/browse/PS-6815
https://jira.percona.com/browse/PS-5924
Bugs in Percona XtraBackup
https://jira.percona.com/browse/PXB-1974
https://jira.percona.com/browse/PXB-1972
Bugs in Percona XtraDB Cluster
https://jira.percona.com/browse/PXC-2629
https://jira.percona.com/browse/PXC-2949
Some well-known existing tools in the MySQL community are RQG (Random Query Generator), SysBench, and Pquery.
RQG has the concept of a grammar file. It stresses the database by executing the grammar files. So, you must combine lots of grammar files to create a good load which is often not a straight-forward process.
Also when adding new database features, existing grammar files are not helpful because they do not account for the new feature and must be modified. In essence, you must edit all grammar each time a new database feature is added to have a good test-case. This is not easy and can be a cumbersome process.
SysBench is more of a benchmark tool than a testing tool. It takes a lot of effort to combine tests to generate load which can be used to test database concurrency and test some specific database features.
Pstress has some cool features like crash recovery, bug tagging, and reports. These features are part of the focus of testing database concurrency and its side-effects, which make it a better tool for this purpose.
Pstress is developed on the top of the Pquery testing tool. Pquery executes random SQL from a file. Pstress generates SQL based on the option provided by the user.
In Pstress, table metadata gets refreshed whenever a DDL happens which makes the probability of executing successful SQL statements much higher. For example, if a new column is added in one session then other sessions start using that column to generate SQL.
Pquery picks random queries from a file and executes those queries against the server. Some SQL statements within a transaction must occur in a specific order to succeed, if they are executed out of order it will incur a failure. For example, attempting to execute an INSERT on a table before executing “CREATE TABLE”.
Pquery by itself does not generate enough load to create stress on various modules within the database code to find stress-induced bugs in features. Pstress is focused on concurrency and can be used to find and identify stress-induced bugs.
Pstress can’t be used to check the correctness of a database. Pstress doesn’t check that a query that was successful is returning correct data. There are a few checks, however, such as ensuring the parent-child relationship is not violated in foreign keys and that partition tables don’t have data from other partitions. These checks are very limited. Because of the concurrent nature of Pstress, there is no mechanism to check if it is returning correct query results because by the time we are able to perform a check other threads have updated something. This presents an effective race condition if we were to attempt validating correctness.
One possible way to solve this is to use multi-master clustering to check data correctness but it may not be a true validation because the additional master is another database instance. We plan to try this method and see what the results are, as we already support working with clusters such as Percona XtraDB Cluster.
Pstress primarily relies on database asserts, messages in error logs, checksum, and a few other things to find issues in the database. This means that if bugs silently present themselves in the database, the tool will be unable to find and identify them. While the tool is good at finding interesting bugs, it’s still critically important that the database code includes proper logging and assertions.
Sometimes it can be difficult to reproduce the bugs found because of their highly concurrent nature. The concept of multiple steps helps because we save snapshots of the database and reproduce the problem in a small time-frame. However, this is not always guaranteed. If a user has some hints about the source of the issue they can stress that part of the database and increase the probability of reproducing the bug.
JSON, GIS, Partition, and Fk are in the development stage.
ACKNOWLEDGMENTS
Thanks to Percona for giving me enough time so I can work on this tool. Thanks to the Percona QA team for contributions to Pstress.
Resources
RELATED POSTS