Charset and Collation Settings Impact on MySQL Performance

PREVIOUS POST
NEXT POST

Following my post MySQL 8 is not always faster than MySQL 5.7, this time I decided to test very simple read-only CPU intensive workloads, when all data fits memory. In this workload there is NO IO operations, only memory and CPU operations.

My Testing Setup

Environment specification

  • Release | Ubuntu 18.04 LTS (bionic)
  • Kernel | 4.15.0-20-generic
  • Processors | physical = 2, cores = 28, virtual = 56, hyperthreading = yes
  • Models | 56xIntel(R) Xeon(R) Gold 5120 CPU @ 2.20GHz<
  • Memory Total | 376.6G
  • Provider | packet.net x2.xlarge.x86 instance

I will test two workloads, sysbench oltp_read_only and oltp_point_select varying amount of threads

sysbench oltp_read_only --mysql-ssl=off --report-interval=1 --time=300 --threads=$i --tables=10 --table-size=10000000 --mysql-user=root run

sysbench oltp_point_select --mysql-ssl=off --report-interval=1 --time=300 --threads=$i --tables=10 --table-size=10000000 --mysql-user=root run

The results for OLTP read-only (latin1 character set):

MySQL 5.7.25 MySQL 8.0.15
threads throughput throughput throughput ratio
1 1241.18 1114.4 1.11
4 4578.18 4106.69 1.11
16 15763.64 14303.54 1.10
24 21384.57 19472.89 1.10
32 25081.17 22897.04 1.10
48 32363.27 29600.26 1.09
64 39629.09 35585.88 1.11
128 38448.23 34718.42 1.11
256 36306.44 32798.12 1.11

The results for point_select (latin1 character set):

point select MySQL 5.7.25 MySQL 8.0.15
threads throughput throughput throughput ratio
1 31672.52 28344.25 1.12
4 110650.7 98296.46 1.13
16 390165.41 347026.49 1.12
24 534454.55 474024.56 1.13
32 620402.74 554524.73 1.12
48 806367.3 718350.87 1.12
64 1120586.03 972366.59 1.15
128 1108638.47 960015.17 1.15
256 1038166.63 891470.11 1.16

We can see that in the OLTP read-only workload, MySQL 8.0.15 is slower by 10%, and for the point_select workload MySQL 8.0.15 is slower by 12-16%.

Although the difference is not necessarily significant, this is enough to reveal that MySQL 8.0.15 does not perform as well as MySQL 5.7.25 in the variety of workloads that I am testing.

However, it appears that the dynamic of the results will change if we use the utf8mb4 character set instead of latin1.

Let’s compare MySQL 5.7.25 latin1 vs utf8mb4, as utf8mb4 is now default CHARSET in MySQL 8.0

But before we do that let’s take look also at COLLATION.

MySQL 5.7.25 uses a default collation utf8mb4_general_ci, However, I read that to use proper sorting and comparison for Eastern European languages, you may want to use the utf8mb4_unicode_ci collation. For MySQL 8.0.5 the default collation is

So let’s compare each version latin1 vs utf8mb4 (with default collation). First 5.7:

Threads utf8mb4_general_ci latin1 latin1 ratio
4 2957.99 4578.18 1.55
24 13792.55 21384.57 1.55
64 24516.99 39629.09 1.62
128 23977.07 38448.23 1.60

So here we can see that utf8mb4 in MySQL 5.7 is really much slower than latin1 (by 55-60%)

And the same for MySQL 8.0.15

MySQL 8.0 defaultcollations

Threads utf8mb4_0900_ai_ci (default) latin1 latin1 ratio
4 3968.88 4106.69 1.03
24 18446.19 19472.89 1.06
64 32776.35 35585.88 1.09
128 31301.75 34718.42 1.11

For MySQL 8.0 the hit from utf8mb4 is much lower (up to 11%)

Now let’s compare all collations for utf8mb4

For MySQL 5.7

MySQL 5.7 utf8mb4

utf8mb4_general_ci (default) utf8mb4_bin utf8mb4_unicode_ci utf8mb4_unicode_520_ci
4 2957.99 3328.8 2157.61 1942.78
24 13792.55 15857.29 9989.96 9095.17
64 24516.99 28125.16 16207.26 14768.64
128 23977.07 27410.94 15970.6 14560.6

If you plan to use utf8mb4_unicode_ci, you will get an even further performance hit (comparing to utf8mb4_general_ci )

And for MySQL 8.0.15

MySQL 8.0 utf8mb4

utf8mb4_general_ci utf8mb4_bin utf8mb4_unicode_ci utf8mb4_0900_ai_ci (default)
4 3461.8 3628.01 3363.7 3968.88
24 16327.45 17136.16 15740.83 18446.19
64 28960.62 30390.29 27242.72 32776.35
128 27967.25 29256.89 26489.83 31301.75

So now let’s compare MySQL 8.0 vs MySQL 5.7 in utf8mb4 with default collations:

mysql 8 and 5.7 default collation

MySQL 8.0 utf8mb4_0900_ai_ci MySQL 5.7 utf8mb4_general_ci MySQL 8.0 ratio
4 3968.88 2957.99 1.34
24 18446.19 13792.55 1.34
64 32776.35 24516.99 1.34
128 31301.75 23977.07 1.31

So there we are. In this case, MySQL 8.0 is actually better than MySQL 5.7 by 34%

Conclusions

There are several observations to make:

  • MySQL 5.7 outperforms MySQL 8.0 in latin1 charset
  • MySQL 8.0 outperforms MySQL 5.7 by a wide margin if we use utf8mb4 charset
  • Be aware that utf8mb4  is now default MySQL 8.0, while MySQL 5.7 has latin1 by default
  • When running comparison between MySQL 8.0 vs MySQL 5.7 be aware what charset you are using, as it may affect the comparison a lot.
PREVIOUS POST
NEXT POST

Share this post

Comments (6)

  • Camille Huot (@CamilleHuot) Reply

    Interesting results. I guess that the performance drop comes from additional convertions, do you confirm?
    So what is the right setup if I need latin1 on mysql8?

    March 1, 2019 at 2:39 am
    • vadimtk Reply

      Camille,

      You can set latin1 in MySQL 8 in different ways.

      1. Global , in my.cnf
      character_set_server=latin1
      collation_server=latin1_swedish_ci

      2. Per table.
      CREATE TABLE t (c CHAR(20) CHARACTER SET latin1
      3. Per column
      col1 VARCHAR(5) CHARACTER SET latin1

      March 1, 2019 at 12:51 pm
  • Karl Reply

    This is something I’ve thought a lot about recently when designing tables.

    Since we can specify charset and collation per column, does it not make a lot of sense to be careful to set latin1 as the charset for varchar columns that are used either for joins or being indexed? This is of course only possible if we know that the column is ANSI only which it often is for many things such as guids and hashes.

    We have a system with a lot of indexes and joins being done on varchars. That’s just the nature of what’s being done and can’t really be helped by normalization, at least not with a performance benefit. It’s obvious that indexes are a lot larger with utf8mb4 meaning that (I assume) a lot less data can be kept in RAM. It’s interesting that MySQL chose this default, is it really that big a problem that people can’t store emoticons into their tables by default compared to performance in high usage scenarios?

    It would be very interesting to see benchmarks on the impact of index size, performance with data size larger than RAM and join performance on latin1 vs utf8mb4.

    March 2, 2019 at 12:54 pm
    • Vadim Tkachenko Reply

      Karl,

      Using a string column to join tables is rarely a good idea in general.
      What kind of schema you have in mind to join on characters columns?

      March 2, 2019 at 8:17 pm
      • Karl Reply

        Ok I take it back, I don’t actually join on varchars. However almost all our lookups are on secondary indexes on varchar(64) columns. That’s just the way it has to be because this data comes from external sources.

        But the lookup speed / data set larger than RAM issue with indexes on varchar columns with different charsets is still something that would be very interesting to hear more about!

        March 3, 2019 at 3:35 am
  • Gaetano Giunta Reply

    It would be interesting to know if there is any difference in performance, for MySQL 5.7, between the ‘historical’ 3 byte utf8_general_ci and the ‘modern’ utf8mb4.
    This could be f.e. a driving factor in deciding about upgrading exiting databases (which have no stringent need to support characters outside the bmp at the moment)

    April 2, 2019 at 2:31 pm

Leave a Reply