Buy Percona ServicesBuy Now!
Subscribe to Latest MySQL Performance Blog posts feed
Updated: 1 hour 46 min ago

MyRocks Engine: Things to Know Before You Start

January 31, 2018 - 11:48pm

Percona recently released Percona Server with MyRocks as GA. You can see how Facebook explains wins they see in production with MyRocks. Now if you use Percona repositories, you can simply install MyRocks plugin and enable it with ps-admin --enable-rocksdb.

There are some major and minor differences when comparing it to typical InnoDB deployments, and I want to highlight them here. The first important difference is that MyRocks (based on RocksDB) uses Log Structured Merge Tree data structure, not a B+ tree like InnoDB.

You learn more about the LSM engine in my article for DZone.The summary is that an LSM data structure is good for write-intensive workloads, with the expense that reads might slow down (both point reads and especially range reads) and full table scans might be too heavy for the engine. This is important to keep in mind when designing applications for MyRocks. MyRocks is not an enhanced InnoDB, nor a one-size-fits-all replacement for InnoDB. It has its own pros/cons just like InnoDB. You need to decide which engine to use based on your applications data access patterns.

What other differences should you be aware of?
  • Let’s look at the directory layout. Right now, all tables and all databases are stored in a hidden .rocksdb directory inside mysqldir. The name and location can be changed, but still all tables from all databases are stored in just a series of .sst files. There is no per-table / per-database separation.
  • By default in Percona Server for MySQL, MyRocks will use LZ4 compression for all tables. You can change compression settings by changing the rocksdb_default_cf_options server variable. By default it set to compression=kLZ4Compression;bottommost_compression=kLZ4Compression. We chose LZ4 compression as it provides acceptable compression level with very little CPU overhead. Other possible compression methods are Zlib and ZSTD, or no compression at all. You can learn more about compression ratio vs. speed in Peter’s and my post.To compare the data size of a MyRocks table loaded with traffic statistic data from my homebrew router, I’ve used the following table created for pmacct collector:
    CREATE TABLE `acct_v9` ( `tag` int(4) unsigned NOT NULL, `class_id` char(16) NOT NULL, `class` varchar(255) DEFAULT NULL, `mac_src` char(17) NOT NULL, `mac_dst` char(17) NOT NULL, `vlan` int(2) unsigned NOT NULL, `as_src` int(4) unsigned NOT NULL, `as_dst` int(4) unsigned NOT NULL, `ip_src` char(15) NOT NULL, `ip_dst` char(15) NOT NULL, `port_src` int(2) unsigned NOT NULL, `port_dst` int(2) unsigned NOT NULL, `tcp_flags` int(4) unsigned NOT NULL, `ip_proto` char(6) NOT NULL, `tos` int(4) unsigned NOT NULL, `packets` int(10) unsigned NOT NULL, `bytes` bigint(20) unsigned NOT NULL, `flows` int(10) unsigned NOT NULL, `stamp_inserted` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP, `id` int(11) NOT NULL AUTO_INCREMENT, PRIMARY KEY (`id`) ) ENGINE=ROCKSDB AUTO_INCREMENT=20127562
    As you can see, there are about 20mln records in this table. MyRocks (with default LZ4 compression) uses 828MB. InnoDB (uncompressed) uses 3760MB.
  • You can find very verbose information about your RocksDB instance in the LOG file located in .rocksdb directory. Check this file for more diagnostics. You can also try the SHOW ENGINE ROCKSDB STATUS command, but it is even more cryptic than SHOW ENGINE INNODB STATUS. It takes time to parse and to understand it.
  • Keep in mind that at this time MyRocks supports only READ-COMMITTED and SERIALIZABLE isolation levels. There is no REPEATABLE-READ isolation level and no gap locking like in InnoDB. In theory, RocksDB should support SNAPSHOT isolation level. However, there is no notion of SNAPSHOT isolation in MySQL so we have not implemented the special syntax to support this level. Please let us know if you would be interested in this.
  • For bulk loads, you may face problems trying to load large amounts of data into MyRocks (and unfortunately this might be the very first operation when you start playing with MyRocks as you try to LOAD DATA, INSERT INTO myrocks_table SELECT * FROM innodb_table or ALTER TABLE innodb_table ENGINE=ROCKSDB). If your table is big enough and you do not have enough memory, RocksDB crashes. As a workaround, you should set rocksdb_bulk_load=1 for the session where you load data.  See more on this page: https://github.com/facebook/mysql-5.6/wiki/data-loading.
  • Block cache in MyRocks is somewhat similar to innodb_buffer_pool_size, however for MyRocks it’s mainly beneficial for reads. You may want to tune the rocksdb_block_cache_size setting. Also keep in mind it uses buffered reads by default, and in this case the OS cache contains cached compressed data and RockDB block cache will contain uncompressed data. You may keep this setup to have two levels of cache, or you can disable buffering by forcing block cache to use direct reads with rocksdb_use_direct_reads=ON.
  • The nature of LSM trees requires that when a level becomes full, there is a merge process that pushes compacted data to the next level. This process can be quite intensive and affect user queries. It is possible to tune it to be less intensive.
  • Right now there is no hot backup software like Percona XtraBackup to perform a hot backup of MyRocks tables (we are looking into this). At this time you can use mysqldump for logical backups, or use filesystem-level snapshots like LVM or ZFS.

You can find more MyRocks specifics and limitations in our docs at https://www.percona.com/doc/percona-server/LATEST/myrocks/limitations.html.

We are looking for feedback on your MyRocks experience!

Percona Monitoring and Management 1.7.0 (PMM) Is Now Available

January 31, 2018 - 2:52pm

Percona announces the release of Percona Monitoring and Management 1.7.0. (PMM ) is a free and open-source platform for managing and monitoring MySQL and MongoDB performance. You can run PMM in your own environment for maximum security and reliability. It provides thorough time-based analysis for MySQL and MongoDB servers to ensure that your data works as efficiently as possible.

This release features improved support for external services, which enables a PMM Server to store and display metrics for any available Prometheus exporter. For example, you could deploy the postgres_exporter and use PMM’s external services feature to store PostgreSQL metrics in PMM. Immediately, you’ll see these new metrics in the Advanced Data Exploration dashboard. Then you could leverage many of the pre-developed PostgreSQL dashboards available on Grafana.com, and with a minimal amount of edits have a working PostgreSQL dashboard in PMM! Watch for an upcoming blog post to demonstrate a walk-through of this unlocked functionality.

New Percona Monitoring and Management 1.7.0 Features
  • PMM-1949: New dashboard: MySQL Amazon Aurora Metrics.

Improvements
  • PMM-1712: Improve external exporters to let you easily add data monitoring from an arbitrary Prometheus exporter you have running on your host.
  • PMM-1510: Rename swap in and swap out labels to be more specific and help clearly see the direction of data flow for Swap In and Swap Out. The new labels are Swap In (Reads) and Swap Out (Writes) accordingly.
  • PMM-1966: Remove Grafana from a list of exporters on the dashboard to eliminate confusion with existing Grafana in the list of exporters on the current version of the dashboard.
  • PMM-1974: Add the mongodb_up in the Exporter Status dashboard. The new graph is added to maintain consistency of information about exporters. This is done based on new metrics implemented in PMM-1586.
Bug fixes
  • PMM-1967: Inconsistent formulas in Prometheus dashboards.
  • PMM-1986: Signing out with HTTP auth enabled leaves the browser signed in.

Aurora Hash Join Optimization (with a Gentle Reminder on Lab Features)

January 31, 2018 - 1:49pm

The Aurora hash join feature for relational databases has been around for a while now. But unlike MySQL Block Nested Loop algorithm, an Aurora hash join only caters to a specific number of use cases. When implemented with the optimizer properly, they can provide great benefits with certain workloads. Below we’ll see a brief example of a quick win.

This new feature is available in Aurora lab mode version 1.16. Because this is a lab feature, it’s important to make sure to test your queries before upgrading, especially if you are looking to scale up to the new R4 instances before the Superbowl to avoid hitting the same problem I discuss below.

When lab mode is enabled and hash_join  is ON, you can verify the optimizer feature from the optimizer_switch variable:

mysql> SELECT @@aurora_version, @@aurora_lab_mode, @@optimizer_switch G *************************** 1. row *************************** @@aurora_version: 1.16 @@aurora_lab_mode: 1 @@optimizer_switch: index_merge=on,...,hash_join=on,hash_join_cost_based=on

Hash joins work well when joining large result sets because – unlike block nested loop in the same query – the optimizer scans the larger table and matches it against the hashed smaller table instead of the other way around. Consider the tables and query below:

+----------+----------+ | tbl | rows | +----------+----------+ | branches | 55143 | | users | 103949 | | history | 27168887 | +----------+----------+ EXPLAIN SELECT SQL_NO_CACHE COUNT(*) FROM branches b INNER JOIN users u ON (b.u_id = u.u_id) INNER JOIN history h ON (u.u_id = h.u_id);

With hash joins enabled, we can see from the Extra column in the EXPLAIN output how it builds the join conditions:

mysql> EXPLAIN -> SELECT SQL_NO_CACHE COUNT(*) -> FROM branches b -> INNER JOIN users u ON (b.u_id = u.u_id) -> INNER JOIN history h ON (u.u_id = h.u_id); +----+-------------+-------+-------+---------------+---------+---------+------+----------+----------------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+---------+---------+------+----------+----------------------------------------------------------+ | 1 | SIMPLE | u | index | PRIMARY | PRIMARY | 4 | NULL | 103342 | Using index | | 1 | SIMPLE | h | ALL | NULL | NULL | NULL | NULL | 24619023 | Using join buffer (Hash Join Outer table h) | | 1 | SIMPLE | b | index | user_id | user_id | 4 | NULL | 54129 | Using index; Using join buffer (Hash Join Inner table b) | +----+-------------+-------+-------+---------------+---------+---------+------+----------+----------------------------------------------------------+

Without hash joins, it’s a straightforward Cartesian (almost) product of all three tables:

mysql> SET optimizer_switch='hash_join=off'; Query OK, 0 rows affected (0.02 sec) mysql> EXPLAIN -> SELECT SQL_NO_CACHE COUNT(*) -> FROM branches b -> INNER JOIN users u ON (b.u_id = u.u_id) -> INNER JOIN history h ON (u.u_id = h.u_id); +----+-------------+-------+--------+---------------+---------+---------+----------------+----------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+--------+---------------+---------+---------+----------------+----------+-------------+ | 1 | SIMPLE | h | ALL | NULL | NULL | NULL | NULL | 24619023 | NULL | | 1 | SIMPLE | u | eq_ref | PRIMARY | PRIMARY | 4 | percona.h.u_id | 1 | Using index | | 1 | SIMPLE | b | ref | user_id | user_id | 4 | percona.h.u_id | 7 | Using index | +----+-------------+-------+--------+---------------+---------+---------+----------------+----------+-------------+

Now, the execution times without hash joins enabled:

mysql> SELECT SQL_NO_CACHE COUNT(*) -> FROM branches b -> INNER JOIN users u ON (b.u_id = u.u_id) -> INNER JOIN history h ON (u.u_id = h.u_id); +-----------+ | COUNT(*) | +-----------+ | 128815553 | +-----------+ 1 row in set (1 min 6.95 sec) mysql> SET optimizer_switch='hash_join=off'; Query OK, 0 rows affected (0.01 sec) mysql> SELECT SQL_NO_CACHE COUNT(*) -> FROM branches b -> INNER JOIN users u ON (b.u_id = u.u_id) -> INNER JOIN history h ON (u.u_id = h.u_id); +-----------+ | COUNT(*) | +-----------+ | 128815553 | +-----------+ 1 row in set (2 min 28.27 sec)

Clearly with this optimization enabled, we have more than a 50% gain from the example query.

Now while this type of query might be rare, most of us know we need to avoid really large JOINs as they are not scalable. But at some point, we find some that take advantage of the feature. Here is an excerpt from an actual production query I’ve recently worked on. It shows the good execution plan versus the one using hash joins.

This particular EXPLAIN output only differs in the row where without a hash join, it uses an index, and the query executes normally. With the hash join enabled, the optimizer thought it was better to use it instead:

... *************************** 3. row *************************** id: 1 select_type: SIMPLE table: t type: eq_ref possible_keys: PRIMARY,r_type_id_ix,r_id_r_type_id_dt_ix key: PRIMARY key_len: 4 ref: db.x.p_id rows: 1 Extra: Using where ... ... *************************** 3. row *************************** id: 1 select_type: SIMPLE table: t type: index possible_keys: PRIMARY,r_type_id_ix,r_id_r_type_id_dt_ix key: r_id_r_type_id_dt_ix key_len: 18 ref: NULL rows: 715568233 Extra: Using where; Using index; Using join buffer (Hash Join Inner table t) ...

Needless to say, it caused problems. Unfortunately, a bug on Aurora 1.16 exists where hash joins cannot be turned off selectively (it is enabled by default) from the parameter group. If you try this, you get an error “Error saving: Invalid parameter value: hash_join=off for: optimizer_switch”. The only way to disable the feature is to turn off lab_mode, which requires an instance restart. An alternative is to simply add SET optimizer_switch='hash_join=off'; from the application, especially if you rely on some of the other lab mode features in Aurora.

To summarize, the new hash join feature is a great addition. But as it’s a lab feature, be careful when upgrading!

Percona Server for MySQL 5.5.59-38.11 is Now Available

January 30, 2018 - 9:49am

Percona announces the release of Percona Server for MySQL 5.5.59-38.11 on January 30, 2018. Based on MySQL 5.5.59, including all the bug fixes in it, Percona Server for MySQL 5.5.59-38.11 is now the current stable release in the 5.5 series.

Percona Server for MySQL is open-source and free. Downloads are available here and from the Percona Software Repositories.

Bugs Fixed:
  • With innodb_large_prefix set to 1, Blackhole storage engine was incompatible with InnoDB table definitions, thus adding new indexes would cause replication errors on the slave. Fixed #1126 (upstream #53588).
  • A GCC 7 warning fix introduced introduced regression in Percona Server 5.5.58-38.10 that lead to a wrong SQL query built to access the remote server when Federated storage engine was used. Bug fixed #1134.
  • Percona Server 5.5 embedded server builds were broken. Bug fixed #2893.
  • Percona Server now uses TraviCI for additional tests. Bug fixed #3777.

Other bugs fixed: #257 and #2415.

This release contains fixes for the following CVE issues: CVE-2018-2562, CVE-2018-2622, CVE-2018-2640, CVE-2018-2665, CVE-2018-2668.

Find the release notes for Percona Server for MySQL 5.5.59-38.11 in our online documentation. Report bugs on the Jira bug tracker.

MySQL Query Performance: Not Just Indexes

January 30, 2018 - 7:06am

In this blog post, I’ll look at whether optimizing indexing is always the key to improving MySQL query performance (spoiler, it isn’t).

As we look at MySQL query performance, our first concern is often whether a query is using the right indexes to retrieve the data. This is based on the assumption that finding the data is the most expensive operation – and the one you should focus on for MySQL query optimization. However, this is not always the case.

Let’s look at this query for illustration:

mysql> show create table tbl G *************************** 1. row ***************************       Table: tbl Create Table: CREATE TABLE `tbl` (  `id` int(11) NOT NULL AUTO_INCREMENT,  `k` int(11) NOT NULL DEFAULT '0',  `g` int(10) unsigned NOT NULL,  PRIMARY KEY (`id`),  KEY `k_1` (`k`) ) ENGINE=InnoDB AUTO_INCREMENT=2340933 DEFAULT CHARSET=latin1 1 row in set (0.00 sec) mysql> explain select g,count(*) c from tbl where k<1000000 group by g having c>7 G *************************** 1. row ***************************           id: 1  select_type: SIMPLE        table: tbl   partitions: NULL         type: ALL possible_keys: k_1          key: NULL      key_len: NULL          ref: NULL         rows: 998490     filtered: 50.00        Extra: Using where; Using temporary; Using filesort 1 row in set, 1 warning (0.00 sec) mysql> select g,count(*) c from tbl where k<1000000 group by g having c>7; +--------+----+ | g      | c  | +--------+----+ |  28846 |  8 | | 139660 |  8 | | 153286 |  8 | ... | 934984 |  8 | +--------+----+ 22 rows in set (6.80 sec)

Looking at this query, many might assume the main problem is that this query is doing a full table scan. One could wonder then, “Why does the MySQL optimizer not use index (k)?”  (It is because the clause is not selective enough, by the way.) This thought might cause someone to force using the index, and get even worse performance:

mysql> select g,count(*) c from tbl force index(k) where k<1000000 group by g having c>7; +--------+----+ | g      | c  | +--------+----+ |  28846 |  8 | | 139660 |  8 | ... | 934984 |  8 | +--------+----+ 22 rows in set (9.37 sec)

Or someone might extend the index on (k) to (k,g) to be a covering index for this query. This won’t improve performance either:

mysql> alter table tbl drop key k_1, add key(k,g); Query OK, 0 rows affected (5.35 sec) Records: 0  Duplicates: 0  Warnings: 0 mysql> explain select g,count(*) c from tbl where k<1000000 group by g having c>7 G *************************** 1. row ***************************           id: 1  select_type: SIMPLE        table: tbl   partitions: NULL         type: range possible_keys: k          key: k      key_len: 4          ref: NULL         rows: 499245     filtered: 100.00        Extra: Using where; Using index; Using temporary; Using filesort 1 row in set, 1 warning (0.00 sec) mysql> select g,count(*) c from tbl where k<1000000 group by g having c>7; +--------+----+ | g      | c  | +--------+----+ |  28846 |  8 | | 139660 |  8 | ... | 915436 |  8 | | 934984 |  8 | +--------+----+ 22 rows in set (6.80 sec)

This wasted effort is all due to focusing on the wrong thing: figuring out how can we find all the rows that match k<1000000  as soon as possible. This is not the problem in this case. In fact, the query that touches all the same columns but doesn’t use GROUP BY runs 10 times as fast:

mysql> select sum(g) from tbl where k<1000000; +--------------+ | sum(g)       | +--------------+ | 500383719481 | +--------------+ 1 row in set (0.68 sec)

For this particular query, whether or not it is using the index for lookup should not be the main question. Instead, we should look at how to optimize GROUP BY – which is responsible for some 90% of the query response time.

In my next blog post, I will write about four ways to execute the MySQL GROUP BY operation to provide further help on optimizing these queries.

Using MySQL 5.7 Generated Columns to Increase Query Performance

January 29, 2018 - 6:25am

In this blog post, we’ll look at ways you can use MySQL 5.7 generated columns (or virtual columns) to improve query performance.

Introduction

About two years ago I published a blog post about Generated (Virtual) Columns in MySQL 5.7. Since then, it’s been one of my favorite features in the MySQL 5.7 release. The reason is simple: with the help of virtual columns, we can create fine-grained indexes that can significantly increase query performance. I’m going to show you some tricks that can potentially fix slow reporting queries with GROUP BY and ORDER BY.

The Problem

Recently I was working with a customer who was struggling with this query:

SELECT CONCAT(verb, ' - ', replace(url,'.xml','')) AS 'API Call', COUNT(*) as 'No. of API Calls', AVG(ExecutionTime) as 'Avg. Execution Time', COUNT(distinct AccountId) as 'No. Of Accounts', COUNT(distinct ParentAccountId) as 'No. Of Parents' FROM ApiLog WHERE ts between '2017-10-01 00:00:00' and '2017-12-31 23:59:59' GROUP BY CONCAT(verb, ' - ', replace(url,'.xml','')) HAVING COUNT(*) >= 1 ;

The query was running for more than an hour and used all space in the tmp directory (with sort files).

The table looked like this:

CREATE TABLE `ApiLog` ( `Id` int(11) NOT NULL AUTO_INCREMENT, `ts` timestamp DEFAULT CURRENT_TIMESTAMP, `ServerName` varchar(50) NOT NULL default '', `ServerIP` varchar(50) NOT NULL default '', `ClientIP` varchar(50) NOT NULL default '', `ExecutionTime` int(11) NOT NULL default 0, `URL` varchar(3000) NOT NULL COLLATE utf8mb4_unicode_ci NOT NULL, `Verb` varchar(16) NOT NULL, `AccountId` int(11) NOT NULL, `ParentAccountId` int(11) NOT NULL, `QueryString` varchar(3000) NOT NULL, `Request` text NOT NULL, `RequestHeaders` varchar(2000) NOT NULL, `Response` text NOT NULL, `ResponseHeaders` varchar(2000) NOT NULL, `ResponseCode` varchar(4000) NOT NULL, ... // other fields removed for simplicity PRIMARY KEY (`Id`), KEY `index_timestamp` (`ts`), ... // other indexes removed for simplicity ) ENGINE=InnoDB;

We found out the query was not using an index on the timestamp field (“ts”):

mysql> explain SELECT CONCAT(verb, ' - ', replace(url,'.xml','')) AS 'API Call', COUNT(*) as 'No. of API Calls', avg(ExecutionTime) as 'Avg. Execution Time', count(distinct AccountId) as 'No. Of Accounts', count(distinct ParentAccountId) as 'No. Of Parents' FROM ApiLog WHERE ts between '2017-10-01 00:00:00' and '2017-12-31 23:59:59' GROUP BY CONCAT(verb, ' - ', replace(url,'.xml','')) HAVING COUNT(*) >= 1G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: ApiLog partitions: NULL type: ALL possible_keys: ts key: NULL key_len: NULL ref: NULL rows: 22255292 filtered: 50.00 Extra: Using where; Using filesort 1 row in set, 1 warning (0.00 sec)

The reason for that is simple: the number of rows matching the filter condition was too large for an index scan to be efficient (or at least the optimizer thinks that):

mysql> select count(*) from ApiLog WHERE ts between '2017-10-01 00:00:00' and '2017-12-31 23:59:59' ; +----------+ | count(*) | +----------+ | 7948800 | +----------+ 1 row in set (2.68 sec)

Total number of rows: 21998514. The query needs to scan 36% of the total rows (7948800 / 21998514).

In this case, we have a number of approaches:

  1. Create a combined index on timestamp column + group by fields
  2. Create a covered index (including fields that are selected)
  3. Create an index on just GROUP BY fields
  4. Create an index for loose index scan

However, if we look closer at the “GROUP BY” part of the query, we quickly realize that none of those solutions will work. Here is our GROUP BY part:

GROUP BY CONCAT(verb, ' - ', replace(url,'.xml',''))

There are two problems here:

  1. It is using a calculating field, so MySQL can’t just scan the index on verb + url. It needs to first concat two fields, and then group on the concatenated string. That means that the index won’t be used.
  2. The URL is declared as “varchar(3000) COLLATE utf8mb4_unicode_ci NOT NULL” and can’t be indexed in full (even with innodb_large_prefix=1  option, which is the default as we have utf8 enabled). We can only do a partial index, which won’t be helpful for GROUP BY optimization.

Here, I’m trying to add a full index on the URL with innodb_large_prefix=1:

mysql> alter table ApiLog add key verb_url(verb, url); ERROR 1071 (42000): Specified key was too long; max key length is 3072 bytes

Well, changing the “GROUP BY CONCAT(verb, ‘ – ‘, replace(url,’.xml’,”))” to  “GROUP BY verb, url” could help (assuming that we somehow trim the field definition from  varchar(3000) to something smaller, which may or may not be possible). However, it will change the results as it will not remove the .xml extension from the URL field.

The Solution

The good news is that in MySQL 5.7 we have virtual columns. So we can create a virtual column on top of “CONCAT(verb, ‘ – ‘, replace(url,’.xml’,”))”. The best part: we do not have to perform a GROUP BY with the full string (potentially > 3000 bytes). We can use an MD5 hash (or longer hashes, i.e., sha1/sha2) for the purposes of the GROUP BY.

Here is the solution:

alter table ApiLog add verb_url_hash varbinary(16) GENERATED ALWAYS AS (unhex(md5(CONCAT(verb, ' - ', replace(url,'.xml',''))))) VIRTUAL; alter table ApiLog add key (verb_url_hash);

So what we did here is:

  1. Declared the virtual column with type varbinary(16)
  2. Created a virtual column on CONCAT(verb, ‘ – ‘, replace(url,’.xml’,”), and used an MD5 hash on top plus an unhex to convert 32 hex bytes to 16 binary bytes
  3. Created and index on top of the virtual column

Now we can change the query and GROUP BY verb_url_hash column:

mysql> explain SELECT CONCAT(verb, ' - ', replace(url,'.xml','')) AS 'API Call', COUNT(*) as 'No. of API Calls', avg(ExecutionTime) as 'Avg. Execution Time', count(distinct AccountId) as 'No. Of Accounts', count(distinct ParentAccountId) as 'No. Of Parents' FROM ApiLog WHERE ts between '2017-10-01 00:00:00' and '2017-12-31 23:59:59' GROUP BY verb_url_hash HAVING COUNT(*) >= 1; ERROR 1055 (42000): Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'ApiLog.ApiLog.Verb' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by

MySQL 5.7 has a strict mode enabled by default, which we can change for that query only.

Now the explain plan looks much better:

mysql> select @@sql_mode; +-------------------------------------------------------------------------------------------------------------------------------------------+ | @@sql_mode | +-------------------------------------------------------------------------------------------------------------------------------------------+ | ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION | +-------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) mysql> set sql_mode='STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION'; Query OK, 0 rows affected (0.00 sec) mysql> explain SELECT CONCAT(verb, ' - ', replace(url,'.xml','')) AS 'API Call', COUNT(*) as 'No. of API Calls', avg(ExecutionTime) as 'Avg. Execution Time', count(distinct AccountId) as 'No. Of Accounts', count(distinct ParentAccountId) as 'No. Of Parents' FROM ApiLog WHERE ts between '2017-10-01 00:00:00' and '2017-12-31 23:59:59' GROUP BY verb_url_hash HAVING COUNT(*) >= 1G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: ApiLog partitions: NULL type: index possible_keys: ts,verb_url_hash key: verb_url_hash key_len: 19 ref: NULL rows: 22008891 filtered: 50.00 Extra: Using where 1 row in set, 1 warning (0.00 sec)

MySQL will avoid any sorting, which is much faster. It will still have to eventually scan all the table in the order of the index. The response time is significantly better: ~38 seconds as opposed to > an hour.

Covered Index

Now we can attempt to do a covered index, which will be quite large:

mysql> alter table ApiLog add key covered_index (`verb_url_hash`,`ts`,`ExecutionTime`,`AccountId`,`ParentAccountId`, verb, url); Query OK, 0 rows affected (1 min 29.71 sec) Records: 0 Duplicates: 0 Warnings: 0

We had to add a “verb” and “url”, so beforehand I had to remove the COLLATE utf8mb4_unicode_ci from the table definition. Now explain shows that we’re using the index:

mysql> explain SELECT CONCAT(verb, ' - ', replace(url,'.xml','')) AS 'API Call', COUNT(*) as 'No. of API Calls', AVG(ExecutionTime) as 'Avg. Execution Time', COUNT(distinct AccountId) as 'No. Of Accounts', COUNT(distinct ParentAccountId) as 'No. Of Parents' FROM ApiLog WHERE ts between '2017-10-01 00:00:00' and '2017-12-31 23:59:59' GROUP BY verb_url_hash HAVING COUNT(*) >= 1G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: ApiLog partitions: NULL type: index possible_keys: ts,verb_url_hash,covered_index key: covered_index key_len: 3057 ref: NULL rows: 22382136 filtered: 50.00 Extra: Using where; Using index 1 row in set, 1 warning (0.00 sec)

The response time dropped to ~12 seconds! However, the index size is significantly larger compared to just verb_url_hash (16 bytes per record).

Conclusion

MySQL 5.7 generated columns provide a valuable way to improve query performance. If you have an interesting case, please share in the comments.

Visit Percona Store


General Inquiries

For general inquiries, please send us your question and someone will contact you.