October 24, 2014

MySQL 5.6 vs MySQL 5.5 and the Star Schema Benchmark

MySQL 5.6 vs MySQL 5.5 and the Star Schema Benchmark

MySQL 5.6 vs MySQL 5.5 & the Star Schema Benchmark

So far most of the benchmarks posted about MySQL 5.6 use the sysbench OLTP workload.  I wanted to test a set of queries which, unlike sysbench, utilize joins.  I also wanted an easily reproducible set of data which is more rich than the simple sysbench table.  The Star Schema Benchmark (SSB) seems ideal for this.

I wasn’t going to focus on the performance of individual queries in this post, but instead intended to focus only on the overall response time for answering all of the queries in the benchmark. I got some strange results, however, which showed MySQL 5.6.10 to be much slower than MySQL 5.5.30 even with only a single connection. I felt these results warranted deeper investigation, so I did some research and detailed my findings here.

Just a few notes:
I tested two scenarios: a buffer pool much smaller than the data set (default size of 128MB, which is 1/8th of the data) and I also testing a 4G buffer pool, which is larger than the data. Very little tuning was done. The goal was to see how MySQL 5.6 performs out-of-the-box as compared to 5.5.30 with default settings. The non-default settings were tried to dig deeper into performance differences and are documented in the post.

This blog post is not a definitive conclusion about innodb_old_blocks_pct or innodb_old_blocks_time. It does highlight how a data set much larger than the buffer pool may perform worse with innodb_old_blocks_time=1000, but as I said this needs further investigation. One particular point of investigation which needs to be followed up on, including testing innodb_old_blocks_time=1000 on MySQL 5.5.30 and testing multiple buffer pools on MySQL 5.5.30. Finally, MySQL 5.6.10 has many additional tuning options which must be investigated (MRR, BKA, ICP, etc) before coming to further conclusions. These will be the topic of further blog posts.

Benchmark Details:

The SSB employs a data generator which produces data for a star schema.  Star schema are commonly used for analytics because it is extremely easy to construct queries against.  It is also very easy to define an OLAP cube over a star schema, so they are popular for use with tools like Mondrian and also for data mining.  I wrote an earlier blog post which describes the differences between major schema types.

  • I used the SSB data set at scale factor 1.  Scale factor 1 results in 587MB of raw data, mostly in one  table (lineorder).
  • Each of the 13 queries were executed serially in a single connection
  • I modified the queries to use ANSI JOIN syntax.  No other changes to the queries were made.

Test Environment

  • The MySQL versions used at the time of this post are 5.5.30 and 5.6.10, each of which are GA when this was written.
    • I compiled both servers from source (cmake -gui .; make; make install)
    • Only changes from defaults was that both servers are compiled without the PERFORMANCE_SCHEMA, and paths are unique for basedir and datadir
  • I tested three configurations:
    • Config 1: Default config for MySQL 5.5 and MySQL 5.6, no tuning at all
    • Config 2: MySQL 5.6 with all default settings except innodb_old_blocks_time=0
    • Config 3: MySQL 5.5 and 5.6 with a 4G buffer pool instead of the default 128M

Rationale:

  • Since O_DIRECT is not used by default, the file system cache will give better read performance after first run (but not as good as warm buffer pool)
  • Thus, the results marked COLD are the results after the server reboot, when the FS cache is cold
  • The remaining results are runs without a server restart.  For the default size BP, this means the FS cache is warm.  For the 4G BP, the BP is completely warm.
    • The idea here is to test the situation when the buffer pool is smaller than data and the IO is slow (when the  FS cache is cold, IO to slow IO subsystem happens)
    • Repeated runs test a buffer pool which is smaller than the data but underlying IO is fast (a warm FS cache reduces IO cost significantly)
    • And finally, testing with a 4G buffer pool shows how the system performs when the data fits completely into the buffer pool (no IO on repeat runs)

Test Server:

    • Intel core i970-3.20GHz.  12 logical cores (six physical cores).
    • 12GB memory
    • 4 disk 7200RPM RAID 10 array with 512MB write-back cache

 Star Schema Benchmark – Scale Factor 1 – Mysql 5.5 vs 5.6
response times are in seconds (lower is better)

VersionBufferColdRun1Run2Run3
5.5.30128M361.49189.29189.34189.40
5.6.10128M362.31324.25320.74318.84
5.6.10 (innodb_old_blocks_time=0)128M349.24178.80178.55179.07
5.5.304G200.8720.5320.3620.35
5.6.104G195.3314.4114.4514.61

I started by running the benchmark against MySQL 5.5.30.  It took 361.49 seconds to complete all 13 queries.  I then repeated the run three more times.  The speed is very consistent, just a few tenths of a second off per run.  I then rebooted the machine and fired up 5.6.10.   I ran the test, and to my surprise MySQL 5.6.10 did not get much faster during the repeat runs, compared to the initial cold run.  I stopped the MySQL 5.6 server, rebooted and verified again.  Same issue.  This was very different from MySQL 5.5.30, which performs significantly better on the repeat warm runs.

Just to be sure it wasn’t a disk problem, I pointed the MySQL 5.6.10 at the MySQL 5.5.30 data directory.  Tthe speed was essentially the same.   I did some further investigation and I determined that there was a lower buffer pool hit ratio during the MySQL 5.6 runs and MySQL 5.6.10 was doing more IO as a consequence.  To confirm that this was indeed the problem I decided to compare performance with a buffer pool much larger than the data size, so I configured the server with a 4GB buffer pool.  I tested both versions, and as you can see above, MySQL 5.6 outperformed MySQL 5.5.30 with the big buffer pool.

Why is the MySQL 5.6.10 with default settings test significantly slower than MySQL 5.5.30 in repeat runs?

I thought about the differences in the defaults between MySQL 5.5 and MySQL 5.6 and innodb_old_blocks_time immediately came to mind.  The InnoDB plugin introduced innodb_old_blocks_time to help control the behavior of the new split LRU mechanism which was implemented in the plugin.  In the original InnoDB, the LRU was implemented as a classic LRU which is subject to “pollution” by full table scans.  In the classic LRU, a full table scan pushes out important hot pages from the buffer pool often for an infrequent scan, like a backup or report.  In an OLTP system this can have very negative performance consequences.

The plugin attempts to fix this problem by splitting the LRU into hot and cold sections.  When a page is first read into the buffer pool it is first placed onto the head of the cold section of the LRU, where it begins to age of naturally.  If the page is touched again while on the cold portion, it is moved to the head of the hot portion.

This sounds good in theory, but in practice it is problematic.  What usually happens is that the full table scans access the table by primary key.  This forces the storage engine to touch the same page numerous times in rapid succession.  This invariably moves the page onto the hot area, defeating the split.  In order to prevent this from happening, another variable innodb_old_blocks_time was introduced.

Innodb_old_blocks_time controls how long a page must be on the cold portion of the LRU before it is eligible to be moved to the hot portion.  In MySQL 5.5 and earlier, innodb_old_blocks_time defaults to a value of 0(zero), which means that pages move rapidly from the cold portion to the hot portion because they must stay on the cold LRU for zero milliseconds before being able to move to the hot list.  In MySQL 5.6 the default value of innodb_old_blocks_time is changed to 1000.   The location at which a page is initially placed into the LRU is defined by innodb_old_blocks_pct.  The default value on both versions is 38, which happens to be 3/8 of the buffer pool.

For this workload with a small buffer pool (the buffer pool is smaller than the working set) having innodb_old_blocks_time=1000 appears to cause a major performance regression.  The new setting  changes which pages end up staying in the buffer pool, and which are aged out.

Digging into why innodb_old_blocks_time change the performance?

Each “flight” of queries represents a set of drill-down queries to find an anomaly.  I am going to focus on the first query, which uses only one join. Since it is practical for a query with only one join, I’ve tested performance of the query with the join in both directions.
Explain for query Q1.1:

After running the query, see how many pages were read from disk versus how many page requests their were: 

Now compare the difference with innodb_old_blocks_time=0;

Here is the difference between innodb_buffer_pool_stats side by side:

As promised, here are the results from joining the tables in the other direction

And with innodb_old_blocks_time=0:

Finally, I collected SHOW PROFILES information for the faster join direction (fact -> dimension)

 Here are my modified versions of the queries (just to use ANSI JOIN syntax):

And the schema:

About Justin Swanhart

Justin is a Principal Support Engineer on the support team. In the past, he was a trainer at Percona and a consultant. Justin also created and maintains Shard-Query, a middleware tool for sharding and parallel query execution and Flexviews, a tool for materialized views for MySQL. Prior to working at Percona Justin consulted for Proven Scaling, was a backend engineer at Yahoo! and a database administrator at Smule and Gazillion games.

Comments

  1. khan says:

    Thanks Justin, I do not see conclusion section. So you suggest innodb_old_blocks_time default value should be zero?

  2. “This blog post is not a definitive conclusion about innodb_old_blocks_pct or innodb_old_blocks_time. It does highlight how a data set much larger than the buffer pool may perform worse with innodb_old_blocks_time=1000, but as I said this needs further investigation.”

  3. Khan,

    This setting (as many others) is workload dependent. I think the default of 1000 is better choice than 0 for many, yet I think you better to test it out to see what works best for you.

    Note this variable is something what you can change online without restarting the server which makes it easier to play with in production

  4. James Day says:

    Khan, almost all settings have defaults that work reasonably but can be tuned for specific workloads.

    Justin is using a star schema sort of workload and that’s different from OLTP. In the OLTP case we expect occasional queries that do table scans but generally it’s best to prevent them from flushing the pages used by most queries. Analytical processing workloads for which the star schema tends to be used don’t necessarily have that property and may benefit from smaller or bigger, depending on workload, innodb_old_blocks_pct and innodb_old_blocks_time = 0.

    If Justin wanted to he could show settings that work better for this job and worse for pure OLTP on the same hardware. It’s routine tuning for what’s happening with a specific server. Not good or bad settings, just different.

    With the previous default it was useful to show benchmarks illustrating the value of setting innodb_old_blocks_pct to a non-zero value. Now we’ve changed the default it’s useful to show the ones where 0 can be better. Just to help people to know that defaults can be tuned and when it might be a good idea.

    Defaults and general recommendations are good but can’t ever replace testing on specific combinations to find out what works best.

    Views are my own, for an official Oracle view consult a PR person.

    James Day, MySQL Senior Principal Support Engineer, Oracle

  5. Justin Swanhart says:

    The queries aren’t applicable to only star schema. It is likely that a reporting workload on 5.6.10 on an any type of schema could be slower than 5.5.30 if the working set is larger than the buffer pool, even with very fast io (the FS cache is the fastest IO possible). It seems to me (just my humble opinion) that MySQL should be usable for more than sysbench! So people have to know that this setting could have negative impact and test it on their workload.

    You can’t just suggest that everybody upgrade to 5.6.10 and it will be all roses and faster and better! Some people from Oracle have been suggesting just that.

    >If Justin wanted to he could show settings that work better for this job and worse for pure OLTP on the same hardware.
    >It’s routine tuning for what’s happening with a specific server. Not good or bad settings, just different.

    Actually it isn’t clear if I can, because I haven’t tested the other settings. I tested with basic out of the box settings and saw an immediate very large performance difference on the SSB. It might be that enabling new optimizer features will help, or it may not. I don’t have the data to make such predictions. So as I said, I’ll be doing more testing.

    If it isn’t obvious, I had to spend a LOT of time this week figuring out why 5.6.10 was so much slower out of the box, then documenting it and taking the time to post about it. I want to test other settings and see how they work, but I can’t put that all in a single post. Thus, this post is mainly about the huge out-of-the-box difference I saw with innodb_old_blocks_time.

    I did not expect my performance to drop so significantly when testing 5.6.10 and people testing it on their workloads may not be aware of all the defaults changes and which ones might result in significant (and possibly negative) performance differences. Most of the defaults changes are good for all (or nearly all) workloads, but this one seems good only for OLTP and even then, maybe not in all cases (once again, needs more testing). The only testing I’ve seen on innodb_old_blocks_time is “sysbench + mysqldump” and even then, the tests were only done when the data fits in the buffer pool. There are many workloads that are significantly different from the synthetic sysbench test and many databases have working sets larger than the buffer pool.

    Note:
    It would be very nice if InnoDB had multiple buffer pools and supported placing tables into named BPs, or if it supported pinning tables into the BP. But it doesn’t do those things. In either case, the dimension tables could be placed (or pinned) into a specific buffer pool and it would eliminate the problem.

  6. Justin Swanhart says:

    In this particular case, it might be beneficial to set up an event which does an FTS on the dimension tables frequently. They are small and this will keep them in the cache.

  7. James Day says:

    I agree that it’s good for people to know that this setting can have a negative effect for some workloads. I think it’s a good thing that you’re writing about one of those cases and I appreciate you doing it.

    James Day

Speak Your Mind

*