EmergencyEMERGENCY? Get 24/7 Help Now!

Using MySQL 5.7 Generated Columns to Increase Query Performance

 | January 29, 2018 |  Posted In: Insight for DBAs, MySQL

PREVIOUS POST
NEXT POST

MySQL Generated ColumnsIn this blog post, we’ll look at ways you can use MySQL 5.7 generated columns (or virtual columns) to improve query performance.

Introduction

About two years ago I published a blog post about Generated (Virtual) Columns in MySQL 5.7. Since then, it’s been one of my favorite features in the MySQL 5.7 release. The reason is simple: with the help of virtual columns, we can create fine-grained indexes that can significantly increase query performance. I’m going to show you some tricks that can potentially fix slow reporting queries with GROUP BY and ORDER BY.

The Problem

Recently I was working with a customer who was struggling with this query:

The query was running for more than an hour and used all space in the tmp directory (with sort files).

The table looked like this:

We found out the query was not using an index on the timestamp field (“ts”):

The reason for that is simple: the number of rows matching the filter condition was too large for an index scan to be efficient (or at least the optimizer thinks that):

Total number of rows: 21998514. The query needs to scan 36% of the total rows (7948800 / 21998514).

In this case, we have a number of approaches:

  1. Create a combined index on timestamp column + group by fields
  2. Create a covered index (including fields that are selected)
  3. Create an index on just GROUP BY fields
  4. Create an index for loose index scan

However, if we look closer at the “GROUP BY” part of the query, we quickly realize that none of those solutions will work. Here is our GROUP BY part:

There are two problems here:

  1. It is using a calculating field, so MySQL can’t just scan the index on verb + url. It needs to first concat two fields, and then group on the concatenated string. That means that the index won’t be used.
  2. The URL is declared as “varchar(3000) COLLATE utf8mb4_unicode_ci NOT NULL” and can’t be indexed in full (even with innodb_large_prefix=1  option, which is the default as we have utf8 enabled). We can only do a partial index, which won’t be helpful for GROUP BY optimization.

Here, I’m trying to add a full index on the URL with innodb_large_prefix=1:

Well, changing the “GROUP BY CONCAT(verb, ‘ – ‘, replace(url,’.xml’,”))” to  “GROUP BY verb, url” could help (assuming that we somehow trim the field definition from  varchar(3000) to something smaller, which may or may not be possible). However, it will change the results as it will not remove the .xml extension from the URL field.

The Solution

The good news is that in MySQL 5.7 we have virtual columns. So we can create a virtual column on top of “CONCAT(verb, ‘ – ‘, replace(url,’.xml’,”))”. The best part: we do not have to perform a GROUP BY with the full string (potentially > 3000 bytes). We can use an MD5 hash (or longer hashes, i.e., sha1/sha2) for the purposes of the GROUP BY.

Here is the solution:

So what we did here is:

  1. Declared the virtual column with type varbinary(16)
  2. Created a virtual column on CONCAT(verb, ‘ – ‘, replace(url,’.xml’,”), and used an MD5 hash on top plus an unhex to convert 32 hex bytes to 16 binary bytes
  3. Created and index on top of the virtual column

Now we can change the query and GROUP BY verb_url_hash column:

MySQL 5.7 has a strict mode enabled by default, which we can change for that query only.

Now the explain plan looks much better:

MySQL will avoid any sorting, which is much faster. It will still have to eventually scan all the table in the order of the index. The response time is significantly better: ~38 seconds as opposed to > an hour.

Covered Index

Now we can attempt to do a covered index, which will be quite large:

We had to add a “verb” and “url”, so beforehand I had to remove the COLLATE utf8mb4_unicode_ci from the table definition. Now explain shows that we’re using the index:

The response time dropped to ~12 seconds! However, the index size is significantly larger compared to just verb_url_hash (16 bytes per record).

Conclusion

MySQL 5.7 generated columns provide a valuable way to improve query performance. If you have an interesting case, please share in the comments.

PREVIOUS POST
NEXT POST
Alexander Rubin

Alexander joined Percona in 2013. Alexander worked with MySQL since 2000 as DBA and Application Developer. Before joining Percona he was doing MySQL consulting as a principal consultant for over 7 years (started with MySQL AB in 2006, then Sun Microsystems and then Oracle). He helped many customers design large, scalable and highly available MySQL systems and optimize MySQL performance. Alexander also helped customers design Big Data stores with Apache Hadoop and related technologies.

One Comment

  • The “GROUP BY verb, url” will not change the results, you will still SELECT CONCAT( verb, ‘-‘, URL), and will solve the wrong grouping done using the hash índex.
    At least, MySQL finally has some reasonables defaults, it’s a pitty that DBAs insist in going back to the wrong ones…

Leave a Reply