EmergencyEMERGENCY? Get 24/7 Help Now!

The future of MySQL quality assurance: Introducing pquery

 | February 4, 2015 |  Posted In: Insight for DBAs, Insight for Developers, MySQL, Quality Assurance

PREVIOUS POST
NEXT POST

Being a QA Engineer, how would you feel if you had access to a framework which can generate 80+ crashes – a mix of hitting developer introduced assertions (situations that should not happen), and serious unforeseen binary crashes – for the world’s most popular open source database software – each and ever hour? What if you could do this running on a medium spec machine – even a laptop?

The seniors amongst you may object “But… generating a crash or assertion is one thing – creating a repeatable testcase for the same is quite another.”

Introducing pquery, mtr_to_sql, reducer.sh (the pquery-enabled version), and more:

80+ coredumps per hour. Fully automatic testcase creation. Near-100% testcase reproducibility. C++ core. 15 Seconds run time per trial. Up to 20-25k lines of SQL executed per trial. CLI testcases. Compatible with sporadic issues. High-end automation of many aspects.

It all started when we realized how slow RQG really is. The Random Query Generator by Philip Stoev is a fantastic tool, and it has been greatly expanded over the years, but though it is a Swiss army knife in what you can do with it, when it comes to speed it is not the fastest. The Perl backend – combined with much random-ness – has made the system slow. At least when compared with the ultra-fast execution of C++ code.

I discussed things with Alexey Bychko, and before long pquery was born. The C++ core code of pquery is Alexey’s creation. It easily executes 20k random lines of SQL in 15-20 seconds, with full logging (including errors) turned on. Though the tool looks fairly simple when reviewing the code, it is incredibly powerful.

Now, one thing which people being introduced to QA for MySQL (or any other large software with many features/functions/options etc.) have to grasp is “random spread testing”. If your random spread (and the amount of samples) is large enough (read: ‘sufficient’), it is relatively easy to get a good “overall quality” estimate of your software by doing a few runs (taking usually around 1-3 days – or longer if various options are being tested).

So,we now had speed (pquery) and near-perfect testcase simplification/reduction & reproducibility (the new pquery adaption of reducer.sh) – but we needed one thing more: SQL which would cover… well… every area of mysqld. A fast framework without matching grammars is not worth much…

Converting the grammars from RQG seemed like a mammoth task – and we would really just be re-writing RQG in C. And creating a new SQL generator was an almost hopeless venture (I gave it a short try) – given the huge variety and complexity when constructing SQL statements.

I took the cheeky road. And it paid off. mtr_to_sql.sh was born.

The MTR testcases included (and delivered) with the MySQL server download contain individual tests for nearly every possible SQL syntax possible, including ones that – ought to – error out (but not crash). Not only that, there are specific MTR testcases for each feature, not to mention the many MTR testcases that were added to cover bug regression testing. Where there is smoke…

107.5K of high-quality SQL. Covering every possible functionality and buggy area out there. Free.

Let the testing begin!

I was quite dumbfounded when (after further tuning and scripting) we started seeing 50+, then 80+ cores per hour. Sig11’s (crashes), Sig6’s (asserts), server hangs, character set issues, error log errors and so on. Many crashes and issues in optimized code. Fun.

Our best weeks yet?

All of the above can be done on commodity hardware, running a single server, running single-threaded SQL (single client) and with no special mysqld options activated.

Compare this to RQG. Even with combinations.pl running hundreds if not thousands of mysqld — option combinations, and with nearly-everything-is-random-sql, it still comes nowhere near even one tenth of that number/speed. And this is even when using high-end hardware, 8 simultaneous mysqld servers, up to 20-25 client threads and at times special grammar-mix tools like MaxiGen etc.

In preparation for the Twitter week mentioned above we started running 4-5 simultaneous pquery run’s (5x mysqld, still all single threaded; a single client per mysqld) in different shell screen sessions, controlled by cron jobs.

A whole set of automation scripts were quickly added to handle the huge influx in bugs (you can get all for free (GPLv2) at $bzr branch lp:percona-qa – see pquery*.sh files), and now you can quickly review a list of issues pquery discovered. For writing this article, I started a run and in it’s first hour it found exactly 85 crashes. Here is a report from around ~2h;

For these (standard by now) pquery runs, we use pquery-run.sh. It starts a server, executes and monitors the pquery binary, and then checks on the outcome:

(The MYSAFE settings are some generic settings specifically suited for QA testing – kindly provided by Shane Bester)

Within the scripts many QA-goodies are already built-in: automated gdb query extraction from the core and the error log (each added 3x to the end of the sql trace to ensure maximum reproducibility), unique issue classification using bug-relevant strings, /dev/shm execution for optimal speed, etc. – it is all based/build on many years of mysqld QA experience.

If you can’t wait to spin off some I-crashed-mysqld (it’s easy you know…) bugs of your own, download the tools at lp:percona-qa ($bzr branch lp:percona-qa) and checkout the various pquery* scripts. Some shown in use above.

And, when you’re ready for slightly more serious feature testing – add whatever statements your feature uses to the sql file (or you may even replace it), plus the relevant mysqld options modifications to pquery-run.sh (i.e. set the $MYEXTRA string). You can also use sql-interleave.sh/sql-interleave-switch.sh to interleave new sql into the main sql file(s). This script is available in the pquery/ directory of the percona-qa branch mentioned above. Soon you will see fireworks.

Enjoy the show!

PREVIOUS POST
NEXT POST
Roel Van de Paar

Roel leads Percona's QA team. Before coming to Percona, he contributed significantly to the QA infrastructure at Oracle. Roel has a varied background in IT, backed up by many industry leading certifications. He also enjoys time with God, his wife and 5 children, or heading into nature.

10 Comments

  • You know one fun MySQL crash I saw once? It was remarkably (if brain-damagedly) simple: A field definition declared NOT NULL DEFAULT NULL, in MySQL 5.6.

    …Yes, I know, the originator of this definition was seriously unclear on the concept of NOT NULL. But it shouldn’t have crashed MySQL.

  • I am getting following error:
    /usr/lib64/libperconaserverclient.so.18: version `libperconaserverclient_18′ not found (required by /home/ec2-user/percona-qa/pquery/pquery)

    I am trying to run:

    ${PQUERY_BIN} –infile=${RUNDIR}/${TRIAL}/${TRIAL}.sql –database=test –threads=${THREADS} –queries_per_thread=${QUERIES_PER_THREAD} –logdir=${RUNDIR}/${TRIAL} –log_all_queries –log_failed_queries –user=root –addr=some_ip –port=3306 >${RUNDIR}/${TRIAL}/pquery.log 2>&1 &

    (libperconaserverclient.so.18 and libmysqlclient.so are present in /usr/lib64)

  • @Mark Cool! Re: concurrency – no, not atm. Interesting idea/question. As an alternative, running pquery –threads=x (where x>1 and likely best >100) would likely exhaustively random-cover any concurrency-related triggers if the test runs long enough. Even an hour per trial in this case seems plenty (and you will have likely hit many assertions before the hour is over). Alternatively or additionally, you could further limit the SQL to just the “interesting parts” so that all areas that ought to be covered definitely are. Multi-threaded testcase reduction is possible, but not straightforward, –threads=1 cases (and there are plenty) are easier to reduce. See pquery-prep-red.sh, then pquery-clean-known.sh and finally pquery-results.sh, then execute reducer{trialnr}.sh, as generated by pquery-prep.sh (and assuming all the time that pquery-run.sh was used to do the original run with x trials). Also, past experience shows that many (maybe as high as 85-90%) of multi-threaded (i.e. –threads=x) testcases are (or in some limited cases “can be made to be”) single-thread (i.e. client side – mysqld always having it’s background threads) reproducible.

    @Shivam – easy fix; $ export LD_LIBRARY_PATH=${BASEDIR}/lib where ${BASEDIR} is your mysqld’s basedir (or your specific location). Depending on whether your are testing upstream mysqld or Percona Server, you would want to swap between using the pquery (Percona) or pquery-ms (MySQL) binary. Alternatively, see compile information in pquery subdirectory to self-compile pquery, potentially with libs included! As a side note, I may recommend using export LD_PRELOAD=/usr/lib64/libjemalloc.so.1 to use jemalloc (depending on what flavor of mysqld you are testing). Also, you may want to use pquery-run.sh instead – this wrapper script around pquery allows automated pquery runs with all bells and whisles included ($ cd ~; bzr branch lp:percona-qa; vi ~/percona-qa/pquery-run.sh). See reply to Mark above on further steps after pquery-run.sh. In pquery-run.sh you need to set some settings like basedir, pquery or pquery-ms binary etc. See in-script for details.

  • Thanks for the help. It worked.

    I am creating a wrapper script like pquery-run.sh. Can you elaborate on how to identify *different* bugs/segmentation faults by using pquery with same main.sql file?

    So far I have been able to hit the same segmentation fault again and again with following command:
    ./pquery-ms –infile=./main.sql –database=test –threads=2 –queries_per_thread=10000 –log_all_queries –log_failed_queries –user=root –port=3306 –addr=127.0.0.1

    I am unclear about your statement: “80+ coredumps per hour”. Aren’t they for the same issue?

  • @Shivam – Great you got it working! The 80 coredumps per hour? These are many different issues. Offcourse there are duplicates, and it looks like you are hitting a particularly nasty one, but in general you should see a spread. As a good example, see the ~/percona-qa/pquery-results.sh run above, taken after about 2 hours of running; 20 different issues with a bunch of duplicates thereof. The duplicates are handy in a way, as they produce[d] different testcases for the same assertion/crash.

    Back to the segfault you keep hitting again and again. Those are ones that we call “qa blockers” – and we tag them on our launchpad with “qablock”. There are several things that you can do with these; 1) ask developers to fix them (or hire Percona to do it for you), or 2) workaround the issue by disabling the particular sql needed to trigger this bug in the sql file you use. This is quite easy to do. Firstly, you need to find out what queries are triggering your issue. Two actions; a) check one of the produced error logs near the end for the “Query: …” statement and b) use gdb to extract all queries that where running at the time of the assertion/crash (we already have scripts for this… see extract_query.gdb and this (including last comment – the variable name changed). Finally, remove or remark these queries in the sql file (and log a bug for the issue seen so it will get fixed in time).

    But, for many of these sort of things “we’ve been there, and done that” and all this is already highly automated in percona-qa (except manually disabling qablock sql in the sql files, this is easy/quick to do). Here’s how I suggest going about things; 1) use pquery-run.sh (it uses /dev/shm for nice fast processing), 2) go to the workdir produced by it (as set by you in the $WORKDIR variable in pquery-run.sh) and run ~/percona-qa/pquery-prep-red.sh – this will produce nice testcase reducer scripts ready-to-run … it will auto-extract the query from the error log *and* from the coredump and add these to the end of the pquery sql trace (in the numbered subdirectories) – btw, you could get the to-remark queries from here if that makes sense. 3) Step 3 is optional depending on what you are running: If you are using Percona Server or upstream MySQL, simply run ~/percona-qa/pquery-clean-known.sh. 4) Run ~/percona-qa/pquery-results.sh – see the example in the main text above – based on unique bug text strings, it will ‘classify’ each issue and give you a nice output as to the newly found issues. Don’t be suprised to see only a handful if you are using Percona Server or upstream MySQL and have run the cleaner script pquery-clean-known.sh, as we have logged literally hundreds of bugs over the last months already. Yet, you’re running dual-threaded so you may see new fireworks :). 5) Run the individual reducer scripts. These will reduce your testcase from 30k+ lines to a few hundred, to <5-25, often within the space of an hour/a few hours. Reducer is a powerfull tool. Yet, as you are running two threads, a bit more work may be needed. See how you go (and let us know!). Also, you may like to checkout ~/percona-qa/reproducing_and_simplification.txt which has lots of goodies on how to get testcases to "work" in the first place. Our reproducibility is very high, ~100%, with the occasional "looser" here or there, but in those cases we still have the coredump/stacks from the original trial run and so it's sometimes easy for devs to do a code analysis and find/fix the bug that way.

    I hope this answers the questions you had. Looks like a blog post on pquery and it's framework's use would be a good idea too. In short; it is all there already, ready to use. If you have some cool updates, feel free to contribute too. Anyone is free to push to the tree, but please 1) test your changes, 2) be prepared to have changes revoked in case it breaks our integration etc. 🙂

    Enjoy!

  • All, please note we have moved percona-qa to GitHub:
    https://github.com/Percona-QA/percona-qa

    To clone it, use:
    $ sudo yum install git
    $ cd ~
    $ git clone https://github.com/Percona-QA/percona-qa.git

    reducer.sh was also put directly into this repository (and it is maintained there), so *no* need anymore to separately fetch lp:randgen.

Leave a Reply