Faster fingerprints and Go packages for MySQLDaniel Nichter
I’m happy to announce Go packages for MySQL. Particularly exciting is a new query fingerprint function which is very fast and efficient, but I’ll talk about that later. First, go-mysql is currently three simple Go packages for parsing and aggregating MySQL slow logs. If you’ve been following Percona development, you’ve no doubt heard of Percona Cloud Tools (PCT), a somewhat new performance management web service for MySQL.
One tool in PCT is “Query Analytics” which continuously analyzes query metrics from the slow log. The slow log provides the most metrics and therefore the most performance insight into MySQL. percona-agent, the open-source agent for PCT, uses go-mysql to parse and analyze the slow log, so the code has both heavy formal testing and heavy real-world testing. If you’re working with Go, MySQL, and MySQL slow logs, we invite you to try go-mysql.
Last October we implemented a completely new query fingerprint function. (See “Fingerprints” in the pt-query-digest doc for a background on query fingerprints.) Since mydumpslow, the very first slow log parser circa 2000, fingerprints have been accomplished with regular expressions. This approach is normally fine, but percona-agent needs to be faster and more efficient than normal to reduce the cost of observation. Regex patterns are like little state machines. One regex can be very fast, but several are required to produce a good fingerprint. Therefore, the regex approach requires processing the same query several times to produce a fingerprint. Even worse: a regex can backtrack which means a single logical pass through the query can result in several physical passes. In short: regular expressions are a quick and easy solution, but they are very inefficient.
Several years ago, a former colleague suggested a different approach: a single pass, purpose-built, character-level state machine. The resulting code is rather complicated, but the resulting performance is a tremendous improvement: 3-5x faster in informal benchmarks on my machine, and it handles more edge cases. In simplest terms: the new fingerprint function does more with less, which makes percona-agent and Query Analytics better.