October 20, 2014

How to find bugs in MySQL

How to find bugs in MySQLFinding bugs in MySQL is not only fun, it’s also something I have been doing the last four years of my life.

Whether you want to become the next Shane Bester (who is generally considered the most skilled MySQL bug hunter worldwide), or just want to prove you can outsmart some of the world’s best programmers, finding bugs in MySQL is a skill not reserved anymore to top QA engineers armed with a loads of scripts, expensive flash storage and top-range server hardware. Off course, for professionals that’s still the way to go, but now anyone with an average laptop and a standard HDD can have a lot of fun trying to find that elusive crash…

 If you follow this post carefully, you may well be able to find a nice crashing bug (or two) running RQG (an excellent database QA tool). Linux would be the preferred testing OS, but if you are using Windows as your main OS, I would recommend getting Virtual Box and running a Linux guest in a suitably sized (i.e. large) VM. In terms of the acronym “RQG”, this stands for “Random Query Generator,” also named “randgen.”

If you’re not just after finding any bug out there (“bug hunting”), you can tune the RQG grammars (files that define what sort of SQL RQG executes) to more or less match your “issue area.” For example, if you are always running into a situation where the server crashes on a DELETE query (as seen at the end of the mysqld error log for example), you would want an SQL grammar that definitely has a variety of DELETE queries in it. These queries should be closely matched with the actual crashing query – crashes usually happen due to exactly the same, or similar statements with the same clauses, conditions etc.

But, taking a step back, to get started with RQG, you can either use the setup_server.sh script in percona-qa (more on how to obtain percona-qa from Launchpad using bazaar below), or you can use yum to manually install a set of modules installed on your Linux machine:

$ sudo yum install kernel-devel wget patch make cmake automake autoconf libtool bzr gtest zlib-static
gcc gcc-c++ ncurses-devel libaio libaio-devel bison valgrind perl-DBD-mysql cpan zlib-devel
bzip2 valgrind-devel svn pam-devel openssl openssl-dev screen strace sysbench

(Note that I have included a few extra items needed for Percona Server & a few items that are handy like screen and sysbench for example. Note also that you can change yum to apt-get, though this may require a few package name changes – search the web for info if you want to use apt-get.)

After these modules are installed, you can:

1. Pull the tree from Launchpad:
$ bzr branch lp:randgen

2. Make sure to have the many (Perl modules etc.) dependencies installed. Follow:
https://github.com/RQG/RQG-Documentation/wiki/RandomQueryGeneratorQuickStart#wiki-Prerequisites

3. Do a quick test to see if RQG is working fine:
https://github.com/RQG/RQG-Documentation/wiki/RandomQueryGeneratorQuickStart#wiki-Running_your_first_test

If all that worked, then congratulations, you now have simple RQG runs working. Let’s now look into how RQG is structured.

The way you can think about RQG “execution” in a hierarchical tree format is like this: combinations.pl starts runall.pl which starts gentest.pl which may start gendata.pl. You don’t have to use combinations.pl, (or even runall.pl as you would have learned by following the ‘running your first test’ example above; i.e. mysqld can be started manually and gentest.pl can then be used for testing the already started server), but runall.pl and definitely combinations.pl surely add more power to our testing, as we will see soon.

In terms of the various Perl scripts (.pl) listed in the tree, it is:

- combinations.pl which generates the many different ways (read ‘trials’) in which RQG is started (using runall.pl) with various options to mysqld etc.
runall.pl which starts/stops mysqld, does various high-level checking etc. (in other words; ‘a basic RQG run’)
gentest.pl which is the executor component (iow ‘the actual basic RQG test’)
gendata.pl which setups the data (tables + data)

If you know the performance testing tool SysBench, you may compare gentest.pl with an actual SysBench run and gendata.pl with a “sysbench prepare.”

To get into real bug hunting territory, we will use combinations.pl to do an extensive random testing run against a server. You never know what you may find. Small warning; before you log your newly discovered bug, make sure that it is not logged on bugs.mysql.com (for MySQL Server bugs) or on bugs.launchpad.net/percona-server (for Percona Server bugs) already.

In this example, we will be testing Percona Server, as the combination.pl (.cc) grammar we use is optimized for Percona Server. If you would like to test the MySQL server, you can build your own grammars, or use one of the many grammars available in RQG (though they are not many combinations.pl grammars in RQG yet. There are plenty of (less-powerful) runall.pl grammars however). For a MySQL-compatible combinations.pl grammar which tests the optimizer, see randgen/conf/optimizer/starfish.cc – a grammar which I developed whilst working for oracle.

Another very extensive grammar set, usable with Percona Server 5.6 (we call this our ‘base grammar’ as it test many features developed for Percona Server), can be found in randgen/conf/percona_qa/5.6/* – edit and then use 5.6.sh – the startup script (in this set WORKDIR and RQG_DIR) and 5.6.cc – the combinations file (in this change path names for the optimized/debug and valgrind compiled server to match your system) to get started . More on this below.

An earlier and more limited version of this base grammar can be found in your randgen tree; go to randgen/conf/percona_qa/ and review the files there. This more limited base grammar can be used for testing any version of Percona Server, or you can follow along and use the 5.6 grammar mentioned above and test Percona Server 5.6 – the same basic steps (detailed below) apply.

In the percona_qa directory, the percona_qa.sh script is the start script (like 5.6.sh), percona_qa.yy file contains the SQL (like 5.6.yy etc.), the .zz file contains the data definitions, and finally the .cc file is a combinations.pl setup which “combines” various options from various blocks. Combinations.pl has great bug-hunting power.

Side note: you can read more about how the option blocks work at:
https://github.com/RQG/RQG-Documentation/wiki/RandomQueryGeneratorCombinations

All you need to do to get an exhaustive test run started, is edit some options (assuming you have Percona Server installed on your test machine already) and start the percona_qa.sh script:

1. Edit the “percona_qa.sh” script and set the WORKDIR and RQG_DIR variables (In regards RQG_DIR, the script will normally assume that randgen is stored under WORKDIR, but you can change RQG_DIR to point to your randgen download path instead, for example RQG_DIR=/randgen).

2. Edit the “percona_qa.cc” script and point it to the location of your server in the –basedir= setting (i.e. replace “/Percona-Server” with “/path_to_your_Percona_Server_installation”.

For the moment, you can just use a standard Percona Server installation, and remove the Valgrind line directly under the one we just edited (use “dd” in vim), but once you get a bit more professional, compiling from source (“building”) is the way to to go.

The reason for building yourself is that if you use a debug compiled server (i.e. execute ./build/build-binary.sh –debug in a Percona Server source download) or a Valgrind instrumented compiled server (i.e. execute ./build/build-binary.sh –debug –valgrind in a Percona Server source download) you will find more bugs (the debug server contains more developer debug asserts etc.).

Note you can use the “build_percona.sh” in the percona-qa Launchpad project (more on this below) to quickly build an optimized, debug and Valgrind server from source. build_mysql.sh does the same for MySQL server.

3. Now you’re ready to go; execute ./percona_qa.sh and watch the screen carefully. You’ll likely immediately see some STATUS_ENVIRONMENT_FAILURE runs. This is quite common and means you have made a little error somewhere a long the way. Stop the run (ctrl+z, then kill -9 all relevant pids, then execute “fg”). Now edit the files as needed (check all the logs, starting with the failed trials ‘trial<no>.log’, etc.). Then start the run again. If your machine is used for testing only (i.e. no production programs running), you can use the following quick command to kill all relevant running mysqld, perl and Valgrind processes:

4. Once you’re run is going, leave it going for a few hours, or a few days (we regularly test with runs that go for 2-5 days or more), and then start checking logs (trial<nr>.log is the one you want to study first. Use “:$” in vim to jump to the end of the file, or “:1″ to jump back to the first line).

5. Once you get a bit more professional, use the percona-qa scripts (bzr branch lp:percona-qa) to quickly handle trials of interest. You may want to initially checkout rqg_results.sh, analyze_crash.sh, startup.sh, build_percona.sh, delete_single_trial.sh and keep_single_trial.sh – simply execute them without parameters to get an initial idea on how to use them. These scripts greatly reduce the efforts required when analyzing multiple trials.

6. Finally, for those of you needing the reduce long SQL testcases (from any source) quickly, see reducer.sh in randgen/util/reducer/reducer.sh – it’s a multi-threaded high-performance simplification script I developed whilst working at oracle. They kindly open sourced it some ago. You may then also want to checkout parse_general_log.pl in the percona-qa scripts listed in point 5 above. This script parses a general log created by mysqld (–general_log option) into a ready-to-use SQL trace.

If you happen to find a bug, share the joy! If you happen to run into issues, post your questions below so others who run into the same can find answers quickly. Also feel free to share any tips you find while playing around with this.

Enjoy!

About Roel Van de Paar

Roel leads Percona's QA team. Before coming to Percona, he contributed significantly to the QA infrastructure at Oracle. Roel has a varied background in IT, backed up by many industry leading certifications. He also enjoys time with God, his wife and 4 children, or heading into nature.

Speak Your Mind

*