The future of MySQL quality assurance: Introducing pquery

Being a QA Engineer, how would you feel if you had access to a framework which can generate 80+ crashes – a mix of hitting developer introduced assertions (situations that should not happen), and serious unforeseen binary crashes – for the world’s most popular open source database software – each and ever hour? What if you could do this running on a medium spec machine – even a laptop?

The seniors amongst you may object “But… generating a crash or assertion is one thing – creating a repeatable testcase for the same is quite another.”

Introducing pquery, mtr_to_sql, reducer.sh (the pquery-enabled version), and more:

80+ coredumps per hour. Fully automatic testcase creation. Near-100% testcase reproducibility. C++ core. 15 Seconds run time per trial. Up to 20-25k lines of SQL executed per trial. CLI testcases. Compatible with sporadic issues. High-end automation of many aspects.

It all started when we realized how slow RQG really is. The Random Query Generator by Philip Stoev is a fantastic tool, and it has been greatly expanded over the years, but though it is a Swiss army knife in what you can do with it, when it comes to speed it is not the fastest. The Perl backend – combined with much random-ness – has made the system slow. At least when compared with the ultra-fast execution of C++ code.

I discussed things with Alexey Bychko, and before long pquery was born. The C++ core code of pquery is Alexey’s creation. It easily executes 20k random lines of SQL in 15-20 seconds, with full logging (including errors) turned on. Though the tool looks fairly simple when reviewing the code, it is incredibly powerful.

Now, one thing which people being introduced to QA for MySQL (or any other large software with many features/functions/options etc.) have to grasp is “random spread testing”. If your random spread (and the amount of samples) is large enough (read: ‘sufficient’), it is relatively easy to get a good “overall quality” estimate of your software by doing a few runs (taking usually around 1-3 days – or longer if various options are being tested).

So,we now had speed (pquery) and near-perfect testcase simplification/reduction & reproducibility (the new pquery adaption of reducer.sh) – but we needed one thing more: SQL which would cover… well… every area of mysqld. A fast framework without matching grammars is not worth much…

Converting the grammars from RQG seemed like a mammoth task – and we would really just be re-writing RQG in C. And creating a new SQL generator was an almost hopeless venture (I gave it a short try) – given the huge variety and complexity when constructing SQL statements.

I took the cheeky road. And it paid off. mtr_to_sql.sh was born.

The MTR testcases included (and delivered) with the MySQL server download contain individual tests for nearly every possible SQL syntax possible, including ones that – ought to – error out (but not crash). Not only that, there are specific MTR testcases for each feature, not to mention the many MTR testcases that were added to cover bug regression testing. Where there is smoke…

107.5K of high-quality SQL. Covering every possible functionality and buggy area out there. Free.

Let the testing begin!

I was quite dumbfounded when (after further tuning and scripting) we started seeing 50+, then 80+ cores per hour. Sig11’s (crashes), Sig6’s (asserts), server hangs, character set issues, error log errors and so on. Many crashes and issues in optimized code. Fun.

Our best weeks yet?

All of the above can be done on commodity hardware, running a single server, running single-threaded SQL (single client) and with no special mysqld options activated.

Compare this to RQG. Even with combinations.pl running hundreds if not thousands of mysqld — option combinations, and with nearly-everything-is-random-sql, it still comes nowhere near even one tenth of that number/speed. And this is even when using high-end hardware, 8 simultaneous mysqld servers, up to 20-25 client threads and at times special grammar-mix tools like MaxiGen etc.

In preparation for the Twitter week mentioned above we started running 4-5 simultaneous pquery run’s (5x mysqld, still all single threaded; a single client per mysqld) in different shell screen sessions, controlled by cron jobs.

A whole set of automation scripts were quickly added to handle the huge influx in bugs (you can get all for free (GPLv2) at $bzr branch lp:percona-qa – see pquery*.sh files), and now you can quickly review a list of issues pquery discovered. For writing this article, I started a run and in it’s first hour it found exactly 85 crashes. Here is a report from around ~2h;

For these (standard by now) pquery runs, we use pquery-run.sh. It starts a server, executes and monitors the pquery binary, and then checks on the outcome: