PERFORMANCE_SCHEMA vs Slow Query LogPeter Zaitsev
A couple of weeks ago, shortly after Vadim wrote about Percona Cloud Tools and using Slow Query Log to capture the data, Mark Leith asked why don’t we just use Performance Schema instead? This is an interesting question and I think it deserves its own blog post to talk about.
First, I would say main reason for using Slow Query Log is compatibility. Basic Slow query log with microsecond query time precision is available starting in MySQL 5.1, while events_statements_summary_by_digest table was only added in MySQL 5.6 which was out for about a year now but which is still far from complete market domination. It is especially interesting if you look at the low-end market – users who just run some web applications using whatever MySQL Version their hosting provider installed for them. If you look at WordPress Users for example you will see MySQL 5.6 at just 1.3% as of today. As time passes and MySQL 5.6 takes a larger share of the market we surely should add support for Performance Schema based query sampling to Percona Cloud Tools.
The second reason is amount of data available. There is a fair amount of data which Performance Schema digest table providers including some which are not available in Percona Server logs:
mysql> select * from events_statements_summary_by_digest where digest_text like "%sbtest%" G
*************************** 1. row ***************************
DIGEST_TEXT: SELECT c FROM sbtest WHERE id = ?
FIRST_SEEN: 2014-02-10 17:39:39
LAST_SEEN: 2014-02-10 17:40:39
1 row in set (0.00 sec)
BTW – note Rows Sent here not being equal to rows examined while in reality they should be exactly the same for this benchmark. This is the approximate accounting of Performance Schema in action, though.
Now compare it to the sample for the same query in the slow query log in Percona Server
SELECT c from sbtest where id=387872;
# Time: 140210 17:33:34.837517
# User@Host: root[root] @ localhost  Id: 95
# Schema: sbtest Last_errno: 0 Killed: 0
# Query_time: 0.001000 Lock_time: 0.000054 Rows_sent: 1 Rows_examined: 1 Rows_affected: 0
# Bytes_sent: 74 Tmp_tables: 0 Tmp_disk_tables: 0 Tmp_table_sizes: 0
# InnoDB_trx_id: 90E34CF
# QC_Hit: No Full_scan: No Full_join: No Tmp_table: No Tmp_table_on_disk: No
# Filesort: No Filesort_on_disk: No Merge_passes: 0
# InnoDB_IO_r_ops: 0 InnoDB_IO_r_bytes: 0 InnoDB_IO_r_wait: 0.000000
# InnoDB_rec_lock_wait: 0.000000 InnoDB_queue_wait: 0.000000
# InnoDB_pages_distinct: 1
# Log_slow_rate_type: query Log_slow_rate_limit: 100
What I believe is the most valuable here is the information about Innodb IO which instantly allows us to isolate the query patterns which are disk IO bound as well as information about Bytes Sent which allows to see which queries are responsible for generating high volumes of network traffic.
I wish Performance Schema would be enhanced to return data in something like JSON where for each digest the top waits are accounted as in reality it can be different by the query. Some queries might be waiting on IO other on Locks, yet another could be bound by some specific mutexes. Having exact information about what limits performance of queries of the specific type would be a gem.
The third reason for using Slow Query Log is using placeholders. Note in the query above has “SELECT c FROM sbtest WHERE id = ?” which is not very convenient – I can’t even run EXPLAIN for such query to see what could be the reason for its slowness. Log contains exact queries and we are able to show exact queries in reports (pt-query-digest and Percona Cloud Tools) or you can opt for seeing only query digests if you do not want to see the values for privacy/security reasons. Picking the constant for a query with worse plan usually works very well to check out worse case scenario.
This might look like very simple problem – why you can’t just come up with ID and reconstruct the query but for more complicated queries with multiple conditions it is virtually impossible to reconstruct the realistic query.
Now in theory you can look up actual query from events_statements_history_long and join the data together, however it does not really work at the high query rates as it is very likely rare queries will not have a sample available in the history table.
The forth reason is support for prepared statements. Enable prepared statements and you will not see the actual query in the digest. This may or may not be an issue for your application but it further limits usability of this feature. I can’t count on simply looking at events_statements_summary_by_digest to always find which queries are responsible for majority of the load.
The fifth reason is performance or actually not much of a reason. I really think Performance Schema overhead is reasonable for most workloads. In simple queries benchmark which I’ve done:
sysbench --test=oltp --oltp-test-mode=simple --num-threads=16 --max-requests=0 --max-time=60 --mysql-user=root --oltp-table-size=1000000 run
On my old server I got some 20.2K QPS with Performance Schema Disabled and 19.4 QPS with Performance Schema enabled which is overhead of less than 5%.
For most workloads paying 5% to have insight about what is happening with the system is a very fair trade.
The slow query log overhead actually can be much larger. The moderate level of details “microtime” resulted in 15.1K queries and log_slow_verbosity=”full” takes this value down to 11.6K having over 40% overhead. Note I designed this test as worse case scenario and for more complicated queries the overhead is likely to be less (while overhead with Performance Schema can stay the same or even increase depending on what queries are doing).
Some people set long_query_time to some non zero value to reduce amount of queries logged. This is bad idea because the workload logged will be very different from your real one – chances are majority of your load comes from simple quick queries which will be very unrepresented with non zero long query time with only outliers logged.
A much better idea is to enable Sampling which is available in latest version of Percona Server – this way only one out of every so many queries will be logged:
mysql> set global log_slow_rate_limit=100;
Query OK, 0 rows affected (0.00 sec)
mysql> set global log_slow_rate_type="query";
Query OK, 0 rows affected (0.00 sec)
This will get one out ever 100 queries randomly logged which should give you good idea of your workload without such skew. It works well unless you have some rare and very complicated queries which impact your workload disproportionally and which you can’t ignore for performance analyses. To deal with this situation we added slow_query_log_always_write_time option to Percona Server, which allows you to always log such queries in the log even if they would not be selected because of sampling.
Enabling sampling 1/100 queries for this workload with full level of details I get 19.8K queries giving us overhead less than 2% which is even less than Performance Schema and selecting 1/1000 queries to be logged I can get overhead to about 1%. So with Slow Query Log I can make a choice between accuracy and overhead.
I wish Performance Schema would offer something similar – instead of figuring out what probes and statistic tables I need (not all of them are enabled by default) I could just chose to get the sampled data and play with accuracy vs overhead instead of missing the data all together.
Summary: There is a great amount of data in Performance Schema in MySQL 5.6 and Percona Server 5.6 though there are a number of reasons that you also might not want to discard the old and proven tools based on the slow query log just yet.
P.S., If you’re interested in seeing Query Performance Analyses in action please join Vadim’s webinar on Wednesday Feb 12. If you missed it check out recording at that same link.