I very much like the fact that MySQL allows you to embed comments into SQL statements. These comments are extremely convenient, because they are written into MySQL log files as part of the query. This includes the general log, the binary log and the slow query log. Maatkit includes tools which interact with these logs, including mk-query-digest. This tool, in particular, has a very nice option called –embedded-attributes which can process data embedded in query comments.
The support for embedded attributes makes some cool tricks possible. Peter and I co-presented a talk at this past MySQL Conference and Expo. In this talk I presented my Instrumentation-for-PHP class as a demonstration instrumentation application for the LAMP stack. Instrumentation-for-php collects information about the PHP process such as wall time, cpu time, mysql query times, etc, and automatically places this information into the Apache environment. It also includes support for “augmenting” SQL queries with additional information such as the PHP source line and user session information by embedding this information in a specially formatted SQL comment. This information can then be cross-referenced between MySQL logs and Apache logs.
-- File: myfile.php Line: 25 Function: myFunc SessionID: a1b2c3d4e5f6a7b8c9d0aef UserID: swany78
Notice that the query ends with a comment, and that this comment includes important information, such as the PHP session ID, and the application user name. This allows mk-query digest to discover these attributes and use them for aggregation. You could also filter on the embedded information to display only queries from a particular file, particular session or particular user for example.
The –embedded-attributes option takes two parts. The first part is a Perl regular expression which identifies which comment will be treated as the one with attributes. The second part is another expression used to split the comment into attribute key/value pairs.
--embedded-attributes <strong>'^-- [^\n]+'</strong>,<strong>'(\w+): ([^ ]+)'</strong>
The first part: ‘^– [^\n]+’ matches everything starting with — and ending with newline.
The second part includes two expressions, each enclosed by parenthesis so that they will be captured. The first matching expression:(\w+): matches one or more words followed by colon and a space. The second match expression:([^ ]+)captures everything up to the first space.
A problem with this approach is that MySQL treats identical queries with different comments as different queries. This essentially renders the query cache inoperative when this approach is taken. A cache that can’t return any results is worse than not having a cache at all. The cache won’t return any result if each query has some sort of unique identifier in it. Evicting queries will become very frequent when every query is unique and the query cache will use additional resources for no effective benefit.
Luckily the problem is mostly* fixed in Percona Server 11.
I wrote a simple test script to demonstrate the difference. The test loops twice over a set of query patterns. Each pattern is executed 1000 times in each iteration of the loop. The queries differ only by comments. In the best case scenario where comments are stripped we would expect 1000 misses and then all hits from this test. You can enable query cache stripping in Percona Server at runtime:
mysql> set global query_cache_strip_comments='ON';
Query OK, 0 rows affected (0.00 sec)
|random at start||2000||0||0||2000|
|random at middle||2000||0||0||2000|
|static at start||2000||0||1000||1000|
As you can see, the first loop of the queries (that is, those with no comment at all) get 1000 misses. Everything beyond that is a hit when stripping is enabled. Contrast this with the results where stripping is not enabled. There are many more misses due to the random values in the query comments. Also notice the query with static comments. There are 1000 hits and 1000 misses because MySQL treats identical queries with identical comments as the same. When comments are stripped every one of these queries is answered by the cache.
If you are benefiting from the query cache, but would like better instrumentation in your queries, consider using Percona Server and turning on the –strip-query-cache-comments feature.
* Percona Server 11.1 still has a problem with # and — comments at the start of a query(before SELECT), and in some cases, with varying white-space. These issues will be rectified in the next release. I’ve updated instrumentation-for-php to place the comment at the end of the query to avoid these problems.