How To Test Your Upgrades – pt-upgradeJervin Real
Upgrades are usually one of the biggest part of any database infrastructure maintenance. Even with enough planning something else can go bad after sending your production application to the version you’ve upgraded to. Let’s look at how one Percona Toolkit tool, pt-upgrade can help you identify what to expect and test your upgrades better which is one important step when upgrading your servers.
This tool lets you test your SELECT queries against multiple MySQL servers and reports on how each type of query performs from every server you let it test on. It can report differences with results like row counts, checksums, column count, query execution time including execution errors and warnings.
So how do you exactly test your queries on servers of multiple versions.
- First, ideally, each server must have identical set of data. One approach and most recommended is by having a slave upgraded to your target version and let it replicate until such time you are ready to test. Another is populating spare servers from backup and testing with them. With the former, you can easily verify that data is consistent across the master and the slave with pt-table-checksum.
- Capture your production queries, we recommend either slow query log (use long_query_time = 0) or tcpdump (other types supported by pt-query-digest’s –type option is also possible, see below).
- Test away! Some example scenarios below.
Using slow query log:
# You can also run the slow log through pt-query-digest like the commands below
# to limit the number of queries per fingerprint to test with.
pt-upgrade --fingerprints --run-time=1h mysqld-slow.log h=127.0.0.1,P=5091 h=127.0.0.1,P=5517
Using sources supported by pt-query-digest –type:
# tcpdump - You can replace --type tcpdump with the specific type and
# path to the appropriate type of file you wish to input as source of queries
# --print simply returns the queries in slow log format and --sample
# limits this to 100 queries only
# --no-report simply skips the overhead pt-query-digest had to compute
# values for reports
pt-query-digest --print --no-report --sample 100 --type tcpdump /path/to/tcpdump.out \
| pt-upgrade --fingerprints --run-time=1h h=mysql50 h=mysql51 h=mysql55
Few essential notes when using pt-upgrade:
- Make sure to cover all your query types from your log (slow log or tcpdump). If you have 5GB of log collected within 2hrs, it does not make sense to test all 5GB if your runtime will not cover all fingerprints. Passing the queries to pt-query-digest and using a sample of 100 like above should be enough sample per fingerprint and reduces the runtime (–runt-time) of the tests.
- Using a separate database and table with –temp-database and –temp-table is recommended, this ensures separation of the test table from your production databases and tables.
- Use a separate account for testing – one that has read (i.e. SELECT) to all objects within the database server and has all privileges to the specified –temp-database from #1. pt-upgrade is a read-only tool, however I can only emphasize you should try to use an account with only the necessary privileges.
- It is also important to note to run pt-upgrade itself on a 3rd server if the servers you are testing against are on two different machines. This helps account for the network latency and provide more accurate results.
- Do not test against production servers, 1) because it can incur extra load and 2) the existing production load can affect your results. Testing on EC2 (+EBS) is no exception, although you can easily spin up instances for testing, the unpredictable nature of the cloud can also affect your results.
- Lastly, although if you are upgrading your hardware alongside your MySQL version, you should also consider running these tests on similar hardware and OS configuration. Sometimes, a slight difference in these factors can skew your results greatly.
Here is a simple report output from pt-upgrade. As you can see, there are no errors or warnings and only differences with query times.
# Query 7: ID 0x76323E2525BA457C at byte 0 _______________________________
# Found 441 differences in 510 samples:
# checksums 0
# column counts 0
# column types 0
# query times 441
# row counts 0
# warning counts 0
# warning levels 0
# warnings 0
# 127.0.0.1:50910 127.0.0.1:55170
# Errors 0 0
# Warnings 0 0
# sum 442ms 506ms
# min 391us 480us
# max 4ms 2ms
# avg 867us 992us
# pct_95 1ms 1ms
# stddev 234us 210us
# median 881us 1ms
# sum 510 510
# min 1 1
# max 1 1
# avg 1 1
# pct_95 1 1
# stddev 0 0
# median 1 1
# select * from categories where parent = ? and site = ?
Looking at the sample above, there are microsecond differences between the query times. The most important is the average (avg) which tells in general how each fingerprint performs on the test servers. About 10% variation is tolerable and can be accounted for other MySQL overhead like network latency.
pt-upgrade provides valuable insight on how your production queries will behave on your target version, but it is only a part of the process. There is still no alternative to doing your research for incompatibilities, changes and bug fixes or new bugs introduced that can potentially break your application.