September 17, 2014

A closer look at Percona Server 5.6

Yesterday we announced the GA release of Percona Server 5.6, the latest release of our enhanced, drop-in replacement for MySQL. Percona Server 5.6 is the best free MySQL alternative for demanding applications. Our third major release, Percona Server 5.6 offers all the improvements found in MySQL 5.6 Community Edition plus scalability, availability, backup, and security features some of which are found only in MySQL 5.6 Enterprise Edition.

Percona Server 5.6 comes with:

General performance improvements
MySQL 5.6 by itself comes with a great list of performance fixes, however what we discovered that their focus was on small datasets that fit into memory. In other words, mostly on CPU-bound workloads.

In our research we found that in IO-bound cases there is still room for improvement so we took action:

For performance improvements and testing I would like to give credit to Percona engineers Laurynas Biveinis and Alexey Stroganov.

Diagnostics via TABLE/INDEX/USER STATISTIC and slow query log
Even with MySQL 5.6′s rich PERFORMANCE_SCHEMA information, we decided to keep our diagnostics. Why? Because it is very easy to use. Check Domas’ post on this topic.

Integration with Percona XtraBackup: “Real” incremental backups and Archive logs backups
Percona Server comes with following backup features:

  • Changed page tracking AKA “Real” incremental backups. Using this feature will avoid full table scans for incremental backups, and information on changed pages is now available in bitmap files.
  • Archive Logs. An alternative way to perform incremental backups by copying InnoDB transactional logs and applying them to backup.

These features are unique to Percona Server, and, in combination with Percona XtraBackup, they allow users to achieve greater flexibility in backup schemas.

Statement timeouts
This feature was ported from Twitter’s fork of MySQL and allows users to control execution time of statements.

You are welcome to review what has changed in Percona Server 5.6 in our summary, compare with previous Percona Server releases, or compare with MySQL 5.6.

How do we do QA of Percona Server
For QA testing I want to give credit to Roel Van De Paar and his Random Query Generator extensions specific for new Percona Server features. By using RQG together with a new option combinatorics approach (expect a blog post on this soon), we are confident in the quality of Percona Server.
Also, we found, reported and fixed quite a large number of bugs for upstream MySQL.

Performance results
And of course I want to share benchmark results. What kind of performance gain can we expect with all these performance improvements I explained above?

For tests I took sysbench OLTP read-write workload with pareto distribution. The dataset is 32 tables, 10mln rows each, which totals about ~77GB of data. Our interest is intensive IO-bound workload, so buffer_pool size is 25GB and we ran the load in 250 user threads.

For hardware I used a Cisco UCS 250 server with two Intel(R) Xeon(R) CPU X5670, Ubuntu 12.04.3 LTS as the OS and very high-end PCIe SSD storage (capable of 100,000 IOPS in random 16KB writes).

So let’s compare Percona Server 5.6 and MySQL 5.6 in this workload. The graph shows timeline for 30 mins run with 1 sec resolution.
Throughput (more is better):
PS-thr
95% Response time (less is better):
PS-rt

We find that Percona Server 5.6 provides 2x better performance (in both throughput and response time) with much less variance.

This is possible to achieve by decreasing internal contention in InnoDB and prioritizing page cleaner thread and free list refill.

Now let me share configuration files for this run:

And some comments on it:

1. Please note that we are using fully durable settings:
innodb_flush_log_at_trx_commit = 1
innodb_doublewrite=1
innodb_checksum_algorithm = crc32

This is different from the results provided by MySQL, where, to get better numbers, they disable data protection.

2. innodb_checksum_algorithm = crc32. New hardware crc32 checksums actually provide much better performance, and we recommend using it whenever possible (Please note this will have an effect only on new created databases, and not for databases created in previous versions of the server)

3. Percona Server only specific settings:
innodb_sched_priority_cleaner=39 – to give highest priority to page cleaner thread
innodb_log_block_size=4096 – to use 4096 block size for InnoDB logs
innodb_adaptive_hash_index_partitions=65 – to enable partitioning of adaptive hash index, otherwise quite often this is a contention point.

And some variables which are used by default in Percona Server.

innodb_foreground_preflush=exponential_backoff
innodb_empty_free_list_algorithm=backoff
innodb_cleaner_lsn_age_factor=high_checkpoint

As a disclaimer I should mention that I expect the difference between Percona Server 5.6 and MySQL 5.6 performance will grow even wider with a larger dataset, more memory and faster storage.

For a small dataset which fits into memory on low-end servers, Percona Server 5.6 performance will be identical to MySQL 5.6 performance. Credit where credit is due: MySQL did a great job optimizing InnoDB for small datasets in memory.

This characteristics of Percona Server 5.6 are actually very important. With our server you are able to scale your workload by a simple hardware upgrade. By increasing CPU speed, increasing memory or upgrading your storage, you get better performance with Percona Server.

You are welcome to try Percona Server 5.6 yourself and give us your feedback!

What is in the future?
We are not stopping here. Expect new improvements and more features.

  • More InnoDB performance improvements. What we have done so far is only the tip of the iceberg and has opened the door for further research
  • TokuDB support. We will ship TokuDB in one of the next Percona Server 5.6 releases
  • Per query variables. We will add a new scope (in additional to global and session) for MySQL variables: per query. You will be able to change some variable only for one specific query
  • Percona XtraDB Cluster 5.6, based on Percona Server 5.6 with all of its improvements, will be available in few months

Should you need any assistance in planning your Percona Server 5.6 upgrade or migration for your company, we’re here to help. Percona has seasoned support and consulting professionals available around the world that are ready for your call. Contact us today.

About Vadim Tkachenko

Vadim leads Percona's development group, which produces Percona Clould Tools, the Percona Server, Percona XraDB Cluster and Percona XtraBackup. He is an expert in solid-state storage, and has helped many hardware and software providers succeed in the MySQL market.

Comments

  1. Dash Bitla says:

    Great improvements iverall thank you.
    Since cluster is not yet released, With this release 5.6 what version of xtradb cluster can be used and will it work without any issues ? Or to use the cluster, do we need to wait for cluster 5.6 release?

    Is the standard replication works?

  2. Sharath win says:

    Great improvements iverall thank you.
    Since cluster is not yet released, With this release 5.6 what version of xtradb cluster can be used and will it work without any issues ? Or to use the cluster, do we need to wait for cluster 5.6 release?

    Is the standard replication works?

  3. Great Results Vadim!

    I see there are some micro stalls in Percona Server 5.6 still with small number of samples showing 0 transactions per second. If this is not the data capture glitch there is some work to be done still :)

  4. Why was this needed given that the buffer pool is sharded in 5.6 (innodb_buffer_pool_instances)?
    >>Ported good old buffer mutex split from Percona Server 5.5 to Percona Server 5.6. This helps to decrease a contention on buffer pool even further.

  5. Justin Swanhart says:

    By increasing the overall mutexes per buffer pool, contention in any instance is reduced an parallelism is increased. For example, if you have an insert heavy workload, the adaptive hash index mutex may be very hot in every instance. By separating this mutex from other mutexes, reads may see increased performance. This is particularly true in PS because the adaptive hash index can be partitioned and each INDEX_ID goes to a partition. So mutex contention is reduced that much further.

  6. Justin Swanhart says:

    Actually, maybe the AHI was a bad example. The biggest advantage is that the flush list and free list and LRU can all be manipulated independently in many cases in each BP instance. This likely has implications for purge and page cleaner threads in particular.

    BUT – I haven’t profiled PS 5.6 so this is speculation at some level.

  7. Patryk Pomykalski says:

    Per query variables – cool stuff coming :)

  8. Really impressive work and very welcome news.

  9. Bravo, the stability of the Percona Server 5.6 numbers is very impressive.

  10. Andy says:

    This looks great.

    Can you also benchmark this vs MariaDB (and possibly MySQL 5.7)? It’d be interesting to see how they stack up.

  11. Laurynas Biveinis says:

    Peter -

    You are right re. further work remaining :)

  12. Laurynas Biveinis says:

    Mark, Justin -

    On buffer pool mutex split vs instances. I believe the split comes
    before the instances for I/O-bound cases (and instances are perfectly
    fine for CPU-bound cases, not much need for split there). One day I
    hope to blog about it with some numbers, but here’s a shorter
    hand-wavy version:

    1) Instances are not used uniformly, split LRU lists mean there is no
    global LRU list anymore. Which should mean (here’s the part where
    I don’t have numbers), that hotter instances are evicting or
    flushing too hot pages, in the end causing more read I/Os.

    2) Instances are not used uniformly, doesn’t matter if we have 15 out
    of 16 instances with full free lists, if the last instance is
    empty and we need a page from it, too bad, flushing/evicting
    time. We see this all the time.

    3) If the above is correct, then the fewer the instances in I/O-bound
    workloads the better. Split mutex should allow to reduce their
    number.

    4) Splitting the mutex clarifies what’s going on and is a prerequisite
    for further work. Right now we don’t have a hot buffer pool mutex
    anymore, we have hot LRU and free list mutexes. They are rarely
    held together. All the free list mutex contention is in
    buf_LRU_get_free_only() taking its pages and in
    buf_LRU_block_free_non_file_page() providing its pages, directly
    leading to our priority free list refill work. With the unsplit
    mutex it’d have been harder to discover this. We could have
    implemented a similar fix with the unsplit mutex too, but I
    suspect that it would be much less effective, because instead of a
    very short free list mutex lock in
    buf_LRU_block_free_non_file_page(), we’d have a much longer buffer
    pool mutex lock in its caller.

    5) Now we also know what’s going on with the LRU list mutex. About
    half of its contention is in buf_page_make_young() called from
    buf_page_make_young_if_needed() in I/O-bound workloads. (probably
    more in CPU-bound workloads, but there my guess it should be
    clearly visible with the unsplit mutex too and easily fixable with
    just raising the number of instances). This gives some direction
    for our future work, and also for specific workloads might suggest
    innodb_old_blocks_pct/innodb_old_blocks_time tuning.

    6) Mikael Ronstrom’s post from 2010
    http://mikaelronstrom.blogspot.com/2010/09/multiple-buffer-pools-in-mysql-55.html
    suggests that instances are better than the split because the
    number of places where multiple mutexes must be held is high.
    There are such places, yes, but I cannot think of any on the hot
    code paths. As discussed above, LRU and free list mutexes are
    contended separately. Of course the picture is different if your
    workload is doing compressed page flush list relocations all the
    time. Compressed pages also remind me that we have addressed all
    the quality concerns we had with our mutex split patch, thanks to
    Roel.

  13. Peter,

    Yes, we still have drops to zero in Percona Server, the reason is the bug in upstream, the bug is that “server is idle” detection is broken, and in some random time InnoDB decides that the server is idle, than trying to flush all dirty page, and in this way effectively locks all activities.

    Also I would like to point that we are able to detect these stalls because now we use 1 sec resolution in the benchmark report. Before when we used 10 sec resolutions these stall were just smoothed in average calculations. I think 1 sec should become a standard for benchmarks.

  14. Jan says:

    Hi.

    The troughput graph looks really nice. What tool did you use to generate it?

  15. Ulf Wendel says:

    Looking forward to per query settings. Something, I’d love to see in a different context.

  16. Laurynas,

    I wonder on uniformity. If we have buffer pool of say 16GB it will be 1M pages, split to 8 buffer pools we’re looking at 125K pages per buffer pool or so. Unless there is poor hash function in use you should have pretty uniform split of accessed pages to the buffer pools. Are you saying you do not see it ?

    Now there are few things which can happen outside of the general hashing issue. For example you can have some page which is a hotspot (ie some hot row in counter table), the other thing the processing of these buffer may be not uniform by threads which are responsible for flushing cleaning etc.

    My point is to ensure there is not some mapping bug there if we see more difference between buffer pools than Math would tell us to expect.

  17. Peter,

    On uniformity,
    As I understand the issue is not in the page distribution – it actually work fine.
    The issue is in how pages are flushed.

    Say you have 500 dirty pages in each 15 buffer pool instances.
    InnoDB asks to flush 750 pages.
    So in this case: From buffer pool instance 1 – 500 pages flushed.
    From buffer pool instance 2 – 250 pages flushed
    the rest buffer pool instances – 0 pages flushed.

    As you understand, after you some period of work, the distribution of dirty pages in this case becomes very non-uniform.

  18. Laurynas Biveinis says:

    Peter -

    That’s correct: we do not see uniformity, and again according to Mikael’s original post http://mikaelronstrom.blogspot.com/2010/09/multiple-buffer-pools-in-mysql-55.html, we were not supposed to see it: “First the accesses to the buffer pools is in no way evenly spread out. This is not surprising given that e.g. the root page of an index is a very frequently accessed page.”

    But Alexey Stroganov and I spent some time looking into it as well. The current hash function is as simple as it can get. We tried couple of different hash functions, together with some other tweaks such as separate undo tablespaces to spread out space id some more, but we never reached uniformity.

    Now that I think about it, innodb-file-per-table=0 perhaps could make things more uniform :)

  19. Laurynas Biveinis says:

    Vadim -

    Flushing is spread across the instances uniformly both in Oracle and in our page cleaner. In your example it would ask each instance to flush 500/15 pages. Also, when we added page cleaner timeouts, we added it so the current pass over all instances still completes.

  20. AFAIK the page cleaner thread tries to do this once per second from each buffer pool instance
    * flush io_capacity / buffer_pool_instances dirty pages
    * flush lru_scan_depth dirty pages from the tail of the LRU

    Note that one rate is per instance, the other is not. That has confused me in the past.

    A simple change for Percona is to let the page cleaner thread try to run more than once per second, and then configure it to do less work per iteration.

  21. Laurynas Biveinis says:

    Mark -

    Done.

  22. Laurynas Biveinis says:

    I should be more specific. Our cleaner thread does not attempt to sleep the remaining time until 1 second but rather adapts the sleep time according to total free list length and checkpoint age.

  23. Mark says:

    Nice article. I’m also curious about the performance difference between Percona 5.5 and 5.6.

  24. Vadim,

    Yes I expect something like what you describe can be happening which is essentially the “bug” which prevents uniform handling of multiple buffer pools

  25. PaulC says:

    Curious, the documentation states that innodb_adaptive_hash_index_partitions has value ranging from 1-32 for 32-bit and 1-64 on 64-bit systems. What does 65 do?

  26. PaulC,

    It is a typo, it should be 64, thanks.

  27. PaulC says:

    Looks like the engine (5.5.33-rel31.1) does the right thing anyway:

    [Warning] option ‘innodb-adaptive-hash-index-partitions’: unsigned value 65 adjusted to 64.

    Given the value range here and guessing at what you’re doing in the code is it safe to assume that it’s best to pick powers of 2 as opposed to prime numbers or does it not really matter?

  28. Laurynas Biveinis says:

    PaulC -

    Does not really matter. The value range is historical, there is no inherent max value limitation anymore. The AHI partition assignment is index id mod number of AHI partitions.

  29. On a related note, Stewart Smith will be hosting a webinar on Oct. 16 highlighting the differences between Percona Server 5.6, Percona Server 5.5, and MySQL 5.6. More info and registration here:

    http://www.percona.com/resources/mysql-webinars/percona-server-56-enterprise-grade-mysql-outstanding-performance.

    It will be recorded and available at that same url afterward.

  30. chad ambrosius says:

    Jan-

    I think Vadim used R (http://cran.r-project.org/index.html) and ggplot2 (http://docs.ggplot2.org/) to make the plots. I highly recommend both! He has also posted about using R and ggplot2 for making his plots on this site before (http://www.mysqlperformanceblog.com/2012/02/23/some-fun-with-r-visualization/).

  31. Thomas Lekai says:

    Just curious as to the types of disks you are using that can withstand the innodb_io_capacity setting of 25K? Great stuff otherwise . . .

  32. Hrvoje Matijakovic says:

    @Daniël

    Link has been fixed, thanks for reporting this.

  33. sanjay singh says:

    i am Execute a same query Local system only take ka less time 10000000 1.35 sec and same query put a remot database than take a time more 5.35 mines 35 second .. so query take more i/o cycle .. this query use sub query and join ..
    any one let solve this problem ……

Speak Your Mind

*