Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

Correct Index Choices for Equality + LIKE Query Optimization

April 11, 2017

Author

marcos.albe

Insight for DBAs

MySQL

Share this Post:

As part of our support services, we do a lot of query optimization. This is where most performance gains come from. Here’s an example of the work we do.

Some days ago a customer arrived with the following table:

 CREATE TABLE `infamous_table` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `member_id` int(11) NOT NULL DEFAULT '0',
  `email` varchar(200) NOT NULL DEFAULT '',
  `msg_type` varchar(255) NOT NULL DEFAULT '',
  `t2send` int(11) NOT NULL DEFAULT '0',
  `flag` char(1) NOT NULL DEFAULT '',
  `sent` varchar(100) NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
  KEY `f` (`flag`),
  KEY `email` (`email`), 
  KEY `msg_type` (`msg_type`(5)),
  KEY `t_msg` (`t2send`,`msg_type`(5))
) ENGINE=InnoDB DEFAULT CHARSET=latin1

CREATE TABLE `infamous_table` (

`id` int(11) NOT NULL AUTO_INCREMENT,

`member_id` int(11) NOT NULL DEFAULT '0',

`email` varchar(200) NOT NULL DEFAULT '',

`msg_type` varchar(255) NOT NULL DEFAULT '',

`t2send` int(11) NOT NULL DEFAULT '0',

`flag` char(1) NOT NULL DEFAULT '',

`sent` varchar(100) NOT NULL DEFAULT '',

PRIMARY KEY (`id`),

KEY `f` (`flag`),

KEY `email` (`email`),

KEY `msg_type` (`msg_type`(5)),

KEY `t_msg` (`t2send`,`msg_type`(5))

) ENGINE=InnoDB DEFAULT CHARSET=latin1

And a query that looked like this:

SELECT COUNT(*) 
  FROM `infamous_table` 
 WHERE `t2send` > 1234 
   AND `msg_type` LIKE 'prefix%';

SELECT COUNT(*)

FROM `infamous_table`

WHERE `t2send` > 1234

AND `msg_type` LIKE 'prefix%';

The table had an index t_msg that wasn’t helping at all: the EXPLAIN for our 1000000 rows test table looked like this:

id: 1
select_type: SIMPLE
table: infamous_table
type: range
possible_keys: t_msg
key: t_msg
key_len: 4
ref: NULL
rows: 107478
Extra: Using where

id: 1

select_type: SIMPLE

table: infamous_table

type: range

possible_keys: t_msg

key: t_msg

key_len: 4

ref: NULL

rows: 107478

Extra: Using where

You can see the index is the on that was expected: “t_msg”. But the key_len is 4. This indicates that the INT part was used, but that the msg_type(5) part was ignored. This resulted examining 100k+ rows. If you have MySQL 5.6, you can see it more clearly with EXPLAIN FORMAT=JSON under used_key_parts:

EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "table": {
      "table_name": "infamous_table",
      "access_type": "range",
      "possible_keys": [
        "t_msg"
      ],
      "key": "t_msg",
      "used_key_parts": [
        "t2send"
      ],
      "key_length": "4",
      "rows": 107478,
      "filtered": 100,
      "index_condition": "(`test`.`infamous_table`.`t2send` > 1234)",
      "attached_condition": "(`test`.`infamous_table`.`msg_type` like 'prefix%')"
    }
  }
}

EXPLAIN: {

"query_block": {

"select_id": 1,

"table": {

"table_name": "infamous_table",

"access_type": "range",

"possible_keys": [

"t_msg"

"key": "t_msg",

"used_key_parts": [

"t2send"

"key_length": "4",

"rows": 107478,

"filtered": 100,

"index_condition": "(`test`.`infamous_table`.`t2send` > 1234)",

"attached_condition": "(`test`.`infamous_table`.`msg_type` like 'prefix%')"

}

The customer had multi-valued strings like “PREFIX:INT:OTHER-STRING” stored in the columnmsg_type, and that made it impossible to convert it to an enum or similar field type that allowed changing the LIKE for an equity.

So the solution was rather simple: just like for point and range queries over numeric values, you must define the index with the ranged field as the rightmost part. This means the correct index would have looked like msg_type(5),t2send. The EXPLAIN for the new index provided the customer with some happiness:

id: 1
select_type: SIMPLE
table: infamous_table
type: range
possible_keys: t_msg,better_multicolumn_index
key: better_multicolumn_index
key_len: 11
ref: NULL
rows: 4716
Extra: Using where

id: 1

select_type: SIMPLE

table: infamous_table

type: range

possible_keys: t_msg,better_multicolumn_index

key: better_multicolumn_index

key_len: 11

ref: NULL

rows: 4716

Extra: Using where

You can see the key_len is now what we would have expected: four bytes for the INT and another seven bytes for the VARCHAR (five for our chosen prefix + two for prefix length). More importantly, you can notice the rows count decreased by approximately 22 times.

We used pt-online-schema on the customer’s environment to apply ALTER to avoid downtime. This made it an easy and painless solution, and the query effectively executed in under 1/20 of the time! So, all fine and dandy? Well, almost. We did a further test, and the query looked like this:

SELECT COUNT(*) 
  FROM `infamous_table` 
 WHERE `t2send` > 1234 
   AND `msg_type` LIKE 'abc%';

SELECT COUNT(*)

FROM `infamous_table`

WHERE `t2send` > 1234

AND `msg_type` LIKE 'abc%';

So where’s the difference? The length of the string used for the LIKE condition is shorter than the prefix length we choose for the VARCHAR part of the index (the customer intended to look-up strings with only three chars, so we needed to check this). This query also scanned 100k rows, and EXPLAIN showed the key_len was 4, meaning the VARCHAR part was being ignored once again.

This means the index prefix needed to be shorter. We ALTERed the table and made the prefix four characters long, counting on the fact that the multi-valued strings were using “:” to separate the values, so we suggested the customer include the colon in the look-up string for the shortest strings. In this case, 'abc%' would be 'abc:%' (which is also four characters long).

As a final optimization, we suggested dropping old indexes that were covered by the new better_multicolumn_index, and that were most likely created by the customer while testing optimization.

Conclusion

Just like in point-and-range queries, the right order for multi-column indexes is putting the ranged part last. Equally important is that the length of the string prefix needs to match the length of the shortest string you intend to look-up. Just remember, you can’t make this prefix too short or you’ll lose specificity and the query will end up scanning rows unnecessarily.

0 0 votes

Article Rating

4 Comments

Oldest

Newest Most Voted

Fadi El-Eter (itoctopus)

9 years ago

Hi Marcos,

Thanks for clarifying the key indexing and matching process in MySQL.

Quick question: What will you do if someone searches for let’s say for “ab%” or “a%”? Typically, it is the responsibility of the application to limit the minimum search characters – and not the database layer. Do you agree?

Also, which tools do you use to catch these queries?

arun

7 years ago

Thanks for the share this update to all web users follow here http://windowstuts.net/connections and find the all windows 10 bluetooth audios device and wirelesses displays.

Alex Gambler

7 years ago

Great!

David Galli

7 years ago

To tie this in with what I said earlier, Brent is definitely active in PASS. In fact, he’s presenting at the next indypass.org meeting tomorrow night (my local chapter).

I’ve learned a lot from consultants in his group (both current and former), and all are active in PASS. You’ll find that the vast majority of the experts in the field are all tied closely to PASS. https://www.essayswritingservice.co.uk/