JOIN Performance & Charsets

We have written before about the importance of using numeric types as keys, but maybe you’ve inherited a schema that you can’t change or have chosen string types as keys for a specific reason. Either way, the character sets used on joined columns can have a significant impact on the performance of your queries.

Take the following example, using the InnoDB storage engine:

CREATE TABLE `t1` (
`char_id` char(6) NOT NULL,
`v` varchar(128) NOT NULL,
PRIMARY KEY (`char_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `t2` (
`id` int unsigned NOT NULL auto_increment,
`char_id` char(6) NOT NULL,
`v` varchar(128) NOT NULL,
PRIMARY KEY (`id`),
KEY (`char_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

CREATE TABLE `t1` (

`char_id` char(6) NOT NULL,

`v` varchar(128) NOT NULL,

PRIMARY KEY (`char_id`)

) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `t2` (

`id` int unsigned NOT NULL auto_increment,

`char_id` char(6) NOT NULL,

`v` varchar(128) NOT NULL,

PRIMARY KEY (`id`),

KEY (`char_id`)

) ENGINE=InnoDB DEFAULT CHARSET=latin1;

t1 has been populated with 100,000 records, while t2 has 400,000. The data set fit easily in memory.

mysql> EXPLAIN EXTENDED SELECT SQL_NO_CACHE COUNT(t1.char_id) 
      >  FROM t1 
      >  JOIN t2 USING (char_id)\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t2
         type: index
possible_keys: NULL
          key: char_id
      key_len: 6
          ref: NULL
         rows: 394424
        Extra: Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 18
          ref: func
         rows: 1
        Extra: Using where; Using index
2 rows in set, 1 warning (0.00 sec)

mysql> SHOW WARNINGS\G
*************************** 1. row ***************************
  Level: Note
   Code: 1003
Message: select sql_no_cache count(`test`.`t1`.`char_id`) AS `COUNT(t1.char_id)` from `test`.`t1` join `test`.`t2` where (`test`.`t1`.`char_id` = convert(`test`.`t2`.`char_id` using utf8))
1 row in set (0.00 sec)

mysql> EXPLAIN EXTENDED SELECT SQL_NO_CACHE COUNT(t1.char_id)

> FROM t1

> JOIN t2 USING (char_id)\G

*************************** 1. row ***************************

id: 1

select_type: SIMPLE

table: t2

type: index

possible_keys: NULL

key: char_id

key_len: 6

ref: NULL

rows: 394424

Extra: Using index

*************************** 2. row ***************************

id: 1

select_type: SIMPLE

table: t1

type: eq_ref

possible_keys: PRIMARY

key: PRIMARY

key_len: 18

ref: func

rows: 1

Extra: Using where; Using index

2 rows in set, 1 warning (0.00 sec)

mysql> SHOW WARNINGS\G

*************************** 1. row ***************************

Level: Note

Code: 1003

Message: select sql_no_cache count(`test`.`t1`.`char_id`) AS `COUNT(t1.char_id)` from `test`.`t1` join `test`.`t2` where (`test`.`t1`.`char_id` = convert(`test`.`t2`.`char_id` using utf8))

1 row in set (0.00 sec)

Notice the differences in key_len and the explicit call to CONVERT() in the WHERE clause. This is a result of the joined columns being of different character sets.

The above query took an average of 4.33 seconds to execute with t1 as utf8 and t2 as latin1. Converting both tables to utf8 resulted in an average execution time of 3.12 seconds and had the following EXPLAIN:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: index
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 18
          ref: NULL
         rows: 99414
        Extra: Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: t2
         type: ref
possible_keys: char_id
          key: char_id
      key_len: 18
          ref: test.t1.char_id
         rows: 1
        Extra: Using index
2 rows in set, 1 warning (0.00 sec)

mysql> SHOW WARNINGS\G
*************************** 1. row ***************************
  Level: Note
   Code: 1003
Message: select sql_no_cache count(`test`.`t1`.`char_id`) AS `COUNT(t1.char_id)` from `test`.`t1` join `test`.`t2` where (`test`.`t2`.`char_id` = `test`.`t1`.`char_id`)
1 row in set (0.00 sec)

*************************** 1. row ***************************

id: 1

select_type: SIMPLE

table: t1

type: index

possible_keys: PRIMARY

key: PRIMARY

key_len: 18

ref: NULL

rows: 99414

Extra: Using index

*************************** 2. row ***************************

id: 1

select_type: SIMPLE

table: t2

type: ref

possible_keys: char_id

key: char_id

key_len: 18

ref: test.t1.char_id

rows: 1

Extra: Using index

2 rows in set, 1 warning (0.00 sec)

mysql> SHOW WARNINGS\G

*************************** 1. row ***************************

Level: Note

Code: 1003

Message: select sql_no_cache count(`test`.`t1`.`char_id`) AS `COUNT(t1.char_id)` from `test`.`t1` join `test`.`t2` where (`test`.`t2`.`char_id` = `test`.`t1`.`char_id`)

1 row in set (0.00 sec)

Notice here how there is no CONVERT() required and the key_len on both tables match. Just this simple change resulted in more than a 25% improvement in average execution time, from 4.33 to 3.12 seconds.

This test was performed with MySQL 5.0.67, FreeBSD 7, on a box with 2GB RAM.

2 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Jatin Mehta

16 years ago

I have one problem with join.

I need to reduce query time to it optimized level. Details are as under

QUERY:
SELECT SQL_CALC_FOUND_ROWS p.*, FLOOR(p.prodratingtotal/p.prodnumratings) AS prodavgrating, 0 AS prodgroupdiscount, pi.* , (IF(p.prodname=’gold’, 10000, 0) + IF(p.prodcode=’gold’, 10000, 0) + ((MATCH (ps.prodname) AGAINST (‘gold’)) * 10) + MATCH (ps.prodname,ps.prodcode,ps.proddesc,ps.prodsearchkeywords) AGAINST (‘gold’)) AS score FROM products p LEFT JOIN product_images pi ON (p.productid = pi.imageprodid AND pi.imageisthumb = 1) INNER JOIN product_search ps ON p.productid = ps.productid WHERE p.prodvisible = 1 AND (ps.prodcode = ‘gold’ OR TRUE) AND (MATCH (ps.prodname,ps.prodcode,ps.proddesc,ps.prodsearchkeywords) AGAINST (‘gold’)) ORDER BY score DESC LIMIT 20
EXECUTION TIME: 2.5000+ seconds
TABLES DATA:
products: 31,000 records
product_images: 92,000 records
product_search: 57,000 records

EXPLAIN COMMAND WITH ABOVE QUERY:
1 SIMPLE ps fulltext prodname prodname 0 1 Using where; Using temporary; Using filesort
1 SIMPLE p eq_ref PRIMARY,i_products_rating_vis,i_products_added_vis,i_products_sortorder_vis PRIMARY 4 shoppingcart_5521.ps.productid 1 Using where
1 SIMPLE pi ref i_product_images_imageprodid i_product_images_imageprodid 5 shoppingcart_5521.p.productid,const 1

MLBR

15 years ago

Any insight into this CHARSET perfomance issue related to ENUMs?

http://bugs.mysql.com/bug.php?id=55606

MySQL 5.7
Support

Compare Percona to Leading Database Solutions

Software
Downloads

Valkey Contribution

Product Documentation

Resource Hub

Why Percona for MongoDB?

Why Percona for PostgreSQL?

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

JOIN Performance & Charsets

Related Blog Articles

RECOMMENDED ARTICLES

MOST POPULAR ARTICLES

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL Performance Tuning: Maximizing Database Efficiency and Speed

The Ultimate Guide to Open Source Databases

MySQL 5.7 Support

Compare Percona to Leading Database Solutions

Software Downloads

Valkey Contribution

Product Documentation

Resource Hub

Why Percona for MongoDB?

Why Percona for PostgreSQL?

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

JOIN Performance & Charsets

About the Author

Share This Post!

Stay up to date with the Percona Blog

Related Blog Articles

RECOMMENDED ARTICLES

MOST POPULAR ARTICLES

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL Performance Tuning: Maximizing Database Efficiency and Speed

The Ultimate Guide to Open Source Databases

MySQL 5.7
Support

Software
Downloads