Analyzing WordPress MySQL queries with Query Analytics

This blog, MySQLPerformanceBlog.com, is powered by WordPress, but we never really looked into what kind of queries to MySQL are used by WordPress. So for couple months we ran a Query Analytics (part of Percona Cloud Tools) agent there, and now it is interesting to take a look on queries. Query Analytics uses reports produced by pt-query-digest, but it is quite different as it allows to see trends and dynamics of particular query, in contrast to pt-query-digest, which is just one static report.

Why looking into queries important? I gave an intro in my previous post from this series.

So Query Analytics give the report on the top queries. How to detect which query is “bad”?
One of metrics I am typically looking into is ratio of “Rows examined” to “Rows sent”. In OLTP workload
I expect “Rows sent” to be close to “Rows examined”, because otherwise it means that a query handles a lot of rows (“examined”) which are not used in final result set (“sent”), and it means wasted CPU cycles and even unnecessary IOs if rows are not in memory.

Looking on WordPress queries it does not take long to find one:
query1

This one actually looks quite bad… It examines up to ~186000 rows to return 0 or in the best case 1 row.
The full query text is (and this is available in Query Analytics, you do not need to dig through logs to find it):

SELECT comment_ID FROM wp_comments WHERE comment_post_ID = '154' AND comment_parent = '0' AND comment_approved != 'trash' AND ( comment_author = 'poloralphlauren.redhillrecords' OR comment_author_email = 'spam@gmail.com' ) AND comment_content = 'Probabilities are in case you are like the ma spam jorityof people nowadays, you're f lululemonaddictoutletcanadaSale.angelasauceda ighting tooth and nail just looking to keep up together with your existence. Acquiring organized can help you win the fight. Appear to the ugg factors in just abo spam ut every of your spaces (desk, workplace, living room, bed' LIMIT 1;

We can see how execution time of this query changes overtime
query_1_time

and also how many rows it examines for the last month
query_max_rows

It is clearly an upward trend, and obviously the query does not scale well as there more and more data.
I find these trending graphs very useful and they are available in Query Analytics as we continuously digest and analyze queries. We can see that only for the last month amount of rows this query examines increased from ~130K to ~180K.

So, the obvious question is how to optimize this query?

We look into the explain plan

and SHOW CREATE TABLE

Obviously WordPress did not design this schema to handle 180000 comments to a single post.
There are several ways to fix it, I will take the easiest way and change the key
KEY comment_post_ID (comment_post_ID)
to
KEY comment_post_ID (comment_post_ID,comment_content(300))

and it changes execution plan to

From 186000 rows to 910 rows – that’s quite improvement!

How does it affect execution time? Let’s query run for a while and see again in our trending graph:

query_1_time_after

The drop from ~600ms to ~34ms

and for Rows examined:
query_1_after

The 2nd query is also not to hard to find, and it is again on wp_comments table
query_2

The query examines up to 16K rows, sending only 123 in the best case.

Query text is (this one is from different instance of WordPress, so the table structure is different)

SELECT comment_post_ID FROM wp_comments WHERE LCASE(comment_author_email) = 'spam@gmail.com' AND comment_subscribe='Y' AND comment_approved = '1' GROUP BY comment_post_ID

and EXPLAIN for this particular one