In many Search/Browse applications you would see main (fact) table which contains search fields and dimension tables which contain more information about facts and which need to be joined to get query result.
If you’re executing count(*) queries for such result sets MySQL will perform the join even if you use LEFT JOIN so it is not needed which slows down things considerably. In similar way MySQL generates full rows while executing queries with limit before throwing them away which makes queries with high offset values very expensive.
To get better performance you can “Help” MySQL and remove JOIN for count(*) and do JOIN after limiting result set for retrieval queries.
Lets look at following simple example with one dimension table. In real life you will usually have several of these so performance improvements can be even higher.
|
1 |
<br><br>CREATE TABLE `fact` (<br> `i` int(10) unsigned NOT NULL,<br> `val` int(10) unsigned NOT NULL,<br> KEY `i` (`i`,`val`)<br>) <br><br>CREATE TABLE `dim` (<br> `id` int(10) unsigned NOT NULL auto_increment,<br> `pad` varchar(100) NOT NULL,<br> PRIMARY KEY (`id`)<br>)<br><br>mysql> select count(*) from dim;<br>+----------+<br>| count(*) |<br>+----------+<br>| 30720 |<br>+----------+<br>1 row in set (0.00 sec)<br><br>mysql> select count(*) from fact;<br>+----------+<br>| count(*) |<br>+----------+<br>| 7340032 |<br>+----------+<br>1 row in set (0.00 sec)<br><br>mysql> select count(*) from fact where i<10000;<br>+----------+<br>| count(*) |<br>+----------+<br>| 733444 |<br>+----------+<br>1 row in set (0.44 sec)<br><br><br>mysql> select count(*) from fact<br> left join dim on val=id<br> where i<10000;<br>+----------+<br>| count(*) |<br>+----------+<br>| 733444 |<br>+----------+<br>1 row in set (2.15 sec)<br><br><br>mysql> select i,pad from fact<br> left join dim on val=id<br> where i<10000 limit 500000,10;<br>+------+------------------------------------------+<br>| i | pad |<br>+------+------------------------------------------+<br>| 6811 | 06bfea523be29a6070488ee66e874dffa170de76 |<br>| 6811 | 3baf40c2d76998270f8954bedda386b5021e0624 |<br>| 6811 | 35ad5c3a9d0763acc305992327864bed1af34167 |<br>| 6811 | 81de98a3ef74ddc0fa4f7c95a27e3dbebca8df0d |<br>| 6811 | 11cde5d0bd8ffe1eda86b39d05a58c525e8fac8f |<br>| 6811 | 25c474b380388c23b1de730c4255612e1233e14e |<br>| 6811 | 1d32b5ba28a513097fc88f3efd91155b2697aeec |<br>| 6811 | bdc9a39cdfafda26fc2f48a48abd3bc5f051a4ea |<br>| 6811 | d2e6cb9ca5aa9dd2bc3d033de45579a76ccdafdf |<br>| 6811 | 0130c708083d77377255bd8f5e0daa15fbb24212 |<br>+------+------------------------------------------+<br>10 rows in set (3.88 sec)<br><br>mysql> select i,pad from (<br> select i,val from fact<br> where i<10000 limit 500000,10<br> ) res<br> left join dim on val=id;<br>+------+------------------------------------------+<br>| i | pad |<br>+------+------------------------------------------+<br>| 6811 | 06bfea523be29a6070488ee66e874dffa170de76 |<br>| 6811 | 3baf40c2d76998270f8954bedda386b5021e0624 |<br>| 6811 | 35ad5c3a9d0763acc305992327864bed1af34167 |<br>| 6811 | 81de98a3ef74ddc0fa4f7c95a27e3dbebca8df0d |<br>| 6811 | 11cde5d0bd8ffe1eda86b39d05a58c525e8fac8f |<br>| 6811 | 25c474b380388c23b1de730c4255612e1233e14e |<br>| 6811 | 1d32b5ba28a513097fc88f3efd91155b2697aeec |<br>| 6811 | bdc9a39cdfafda26fc2f48a48abd3bc5f051a4ea |<br>| 6811 | d2e6cb9ca5aa9dd2bc3d033de45579a76ccdafdf |<br>| 6811 | 0130c708083d77377255bd8f5e0daa15fbb24212 |<br>+------+------------------------------------------+<br>10 rows in set (0.30 sec)<br><br> |
So as you can see using this trick we get 5 times speed up for count(*) query and 12 times. This is of course for extremely high offset value but it is also for example with only one dimension which fully fits in memory. for IO bound workload performance difference can be much higher than that.
You may also notice one more trick I’m using here – fact table has covered index on which has val column in it this allow to get join query a bit more optimal.
So right now performance gain may be worth the trick, in the future I hope MySQL Optimizer will be improved so it does these transformations automatically.
Resources
RELATED POSTS