Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

MySQL 5.5.8 and Percona Server: being adaptive

December 20, 2010

Author

Vadim Tkachenko

Benchmarks

MySQL

Share this Post:

As we can see, MySQL 5.5.8 comes with great improvements and scalability fixes. Adding up allÂ the new features, you haveÂ a great release. However, there is one area I want to touch on in this post. At Percona, weÂ consider itÂ important not only to have the best peak performance, butÂ also stable and predictable performance. I refer you to Peter’s post, Performance Optimization and Six Sigma.

In Percona Server (and actually even before that, in percona-patches builds for 5.0), we added adaptive checkpoint algorithms, and later the InnoDB-plugin included an implementation ofÂ Â “adaptive flushing”. This post shows theÂ differences between them and MySQL.

The postÂ alsoÂ answersÂ the question of whether we are going to have releases of Percona Server/XtraDB based on the MySQL 5.5 code line. The answer: Yes, we are. My benchmarks here are based on Percona Server 5.5.7. (You can get the source code from lp:~percona-dev/percona-server/5.5.7 , but it is very beta quality at the moment.)

For this post, I made tpcc-runs on our Dell PowerEdge R900 box, using RAID10 over 8 disks and a FusionIO 320GB MLC card.

First,Â the results for tpcc-mysql, 500w (around 50GB of data) on RAID10. I used innodb_buffer_pool_size=24G, innodb_log_file_size=2000M (innodb_log_files_in_group=2), and innodb_flush_log_at_trx_commit=2. Also, innodb_adaptive_flushing (ON) / innodb_adaptive_checkpoint (estimate) were the default values.

The raw results, full config files, and scripts areÂ in ourÂ Benchmarks Wiki.

The graphical result belowÂ shows the throughput on the server over 8 hours. (Yes, 8 hours, to show MySQL performanceÂ over aÂ long time period. It is not a short, 5-minute exercise.)

Although it takes a decent time for the Percona Server results to stabilize, for MySQL 5.5.8 we have regular dipsÂ (3 times per hour) from 24900 NOTPM to 17700 NOTPM (dips of around 30%).

Next, the second run on the FusionIO card. There I should say that we were not able to get stable results with the existing adaptive_checkpoint or adaptive_flushing algorithms. So, Yasufumi invested a lot of research timeÂ and came up with the new innodb_adaptive_checkpoint=”keep_average” method. This method requires settingÂ innodb_flush_neighbor_pages=0 , to disable flushing of neighborhood pages (not available in MySQL 5.5.8). The problem with flushing neighborhood pages is that it makes an exact calculation of how many pages were handled impossible. The flushing neighborhoods feature wasÂ created as an optimization for hard drives, since InnoDB tries to combine writing as many pagesÂ as possible into a single sequential write, which means that a single I/O may have a size of 32K, 64K, 96K, …, etc. And again, that makes a prediction of how many I/O operationsÂ thereÂ are impossible. Furthermore, this optimization is not needed for flash devices, like FusionIO or Virident cards.

An additional optimization we have for SSDs is big log files. For this run, I used innodb_log_file_size=4G (innodb_log_files_in_group=2) for Percona Server. That gave 8GB in total sizeÂ for log files (MySQL 5.5.8 has aÂ 4GB limit). In additional to increasing log_size we added option innodb_log_block_size which allows to change IO block size for logs files. Default is 512 bytes, in test with FusionIO I use 4096 bytes, to align IO with internal FusionIO size.

You can see that MySQL 5.5.8 has periodic drops here, too. TheÂ margin between Percona Server and MySQL is about 2500-2800 NOTPM (~15% difference).

MySQL 5.5.8 now has featuresÂ related to havingÂ several buffer pool instancesÂ that are supposed to fix the buffer poolÂ scalability issue. Let’s see how MySQL performance changes for the last workload if we set innodb_buffer_pool_instances=8 or 16.

As you see, having several buffer pools makes the dips deeper and longer. It seems that for Percona Server the best choice is innodb_buffer_pool_instances=1, as we implemented buffer pool scalability in a different way.

UPDATE
By request from commenter I put also results with different innodb_io_capacity for MySQL 5.5.8. It is 500 ( which I used in benchmarks above), 4000 and 20000.

As you see there is no improvements from bigger innodb_io_capacity, and it also concurs with my previous experience, that with bigger io_capacity you rather getting worse results.

For reference, here is the config file used for benchmarks on FusionIO:

[client]
socket=/var/lib/mysql/mysql.sock
[mysqld]
core
basedir=/usr/local/mysql
user=root
socket=/var/lib/mysql/mysql.sock
skip-grant-tables
server_id=1
local_infile=1
datadir=/mnt/fio320
innodb_buffer_pool_size=24G
innodb_data_file_path=ibdata1:10M:autoextend
innodb_file_per_table=1
innodb_flush_log_at_trx_commit=2
innodb_log_buffer_size=8M
innodb_log_files_in_group=2
innodb_log_file_size=4G
innodb_log_block_size=4096
innodb_thread_concurrency=0
innodb_flush_method = O_DIRECT
innodb_read_ahead = none
innodb_flush_neighbor_pages = 0
innodb_write_io_threads=8
innodb_read_io_threads=8
innodb_io_capacity=500
max_connections=3000
query_cache_size=0
skip-name-resolve
table_cache=10000
[mysql]
socket=/tmp/mysql.sock

[client]

socket=/var/lib/mysql/mysql.sock

[mysqld]

core

basedir=/usr/local/mysql

user=root

socket=/var/lib/mysql/mysql.sock

skip-grant-tables

server_id=1

local_infile=1

datadir=/mnt/fio320

innodb_buffer_pool_size=24G

innodb_data_file_path=ibdata1:10M:autoextend

innodb_file_per_table=1

innodb_flush_log_at_trx_commit=2

innodb_log_buffer_size=8M

innodb_log_files_in_group=2

innodb_log_file_size=4G

innodb_log_block_size=4096

innodb_thread_concurrency=0

innodb_flush_method = O_DIRECT

innodb_read_ahead = none

innodb_flush_neighbor_pages = 0

innodb_write_io_threads=8

innodb_read_io_threads=8

innodb_io_capacity=500

max_connections=3000

query_cache_size=0

skip-name-resolve

table_cache=10000

[mysql]

socket=/tmp/mysql.sock

(post edited by Fred Linhoss)

0 0 votes

Article Rating

25 Comments

Oldest

Newest Most Voted

Andy

15 years ago

A few questions:

1) Compare the 1st graph with the 2nd graph, you got around 25000 NOTPM on hard drives but less than 20000 NOTPM on FusionIO. I’d expect the results for FusionIO to be much higher than that of HDD, but that’s not the case here. Why is that?

2) the settings you used for FusionIO such as innodb_flush_neighbor_pages=0, would that also work for SATA SSD such as Intel X25?

3) You used innodb_io_capacity=500 for FusionIO. That implies an IO capacity of 500 IOPS, right? But FusionIO can do much higher than that. Why didn’t you use a higher value of innodb_io_capacity?

Dimitri

15 years ago

Hi Vadim,

few comments so far:

* any reason why innodb_io_capacity=500 ?.. – if I remember well, within XtraDB you’re not limiting flushing by I/O capacity, while 5.5.8 will respect I/O capacity limit and not flush more pages/sec than this limit.. (we discussed about during your previous tests on 5.5.4)

So, may you replay your tests with, say, innodb_io_capacity=20000 ? (I have no idea how intensive your workload on flushing ;-))

* Regarding tests with Buffer Pool instances: did you monitor on which mutexes a contention become more hot??.. – I’m very curious because on my tests I’ve obtained a completely opposite results.. – 16 instances are giving more than 50% better performance..

* Then with a REDO log block size it’s a good point – 4K is the most optimal I/O size for SSD! – however, I have a doubt it makes a huge difference while innodb_flush_log_at_trx_commit=2 is used.. So, to prove it you should present XtraDB results comparing innodb_log_block_size=4096 and 512, may you present them here?..

Very interesting results anyway – as they are involving many questions of all kind 😉

Rgds,
-Dimitri

Author

Vadim Tkachenko

15 years ago

Andy,

1) for FusionIO I use 1000W, and for hard drive 500W, that is FusionIO case has 2x more data.
2) innodb_flush_neighbor_pages=0 should have benefit for Intel SSD also
3) innodb_io_capacity is tricky parameter, it is not directly corresponds to IOS, in my benchmarks it did not show much benefit to go over 500, but as drawback you can easily saturate your IO system. But Dimitri has good point, I may try big value for MySQL 5.5.8.

Author

Vadim Tkachenko

15 years ago

Dimitri,

Ok, I should try MySQL 5.5 with big innodb_io_capacity. I just remember it did not do any good for Percona Server, but may be different.

I do not have stats for mutexes, sorry.

Regarding innodb_log_block_size and innodb_flush_log_at_trx_commit=2, I think log_sys->mutex, it still in hot list, and innodb_log_block_size=4096 allows to decrease time spent under mutex.

inaam

15 years ago

Vadim,

Thanks for the detailed post. Couple of comments/observations:
* I agree that adaptive_flushing needs more work. One of the things that we did in 5.5 is to make it more accurate. As you know master thread does the adaptive_flushing check every second in a loop. In InnoDB plugin for 5.1 if master thread ends up doing some flushing work in one iteration it won’t sleep and would do the next iteration right away. That means it can do multiple batches in one second. In 5.5 we tidied it up a little and forced master thread to sleep for ‘1 – time spent in flusing’ seconds. What that means is that adaptive_flushing will do a maximum of one innodb_io_capacity batch in one second. Seeing the pattern above and the capabilities of the system you are benchmarking on I’d suggest setting innodb_io_capacity to something more realistic than 500. I’d say something like 4000 won’t be a bad idea. adaptive_flushing is heavily dependent on innodb_io_capacity i.e.: the batch size is capped by this value.
* Assuming your workload to be much like dbt2, I don’t think purge should be a concern. But to keep master thread free of getting stuck in purge maybe it is a good idea to use innodb_purge_threads = 1.

Author

Vadim Tkachenko

15 years ago

Inaam,

Thanks, I accept your suggestions.

I did try innodb_purge_threads = 1, but I did not see much effect from it under this workload.
I am re-running tests with innodb_io_capacity=4000 and 20000 for FusionIO case and I will update post when I have it.

Andy

15 years ago

You set the log block size to 4K to align with FusionIO size. What about non-log IO such as IO for table and index data? Is it necessary to set block size for them too?

Author

Vadim Tkachenko

15 years ago

Andy,

Table and index data IO is 16K and it is not problem, as it is just 4 internal IO for FusionIO.
What is however is more problematic is system tablespace and doublewrite area, which can be
sequential writes ( I observed single IO like 200-250KB). It may be good idea to put ibdata1 ( without
tables and indexes) into different location, i.e. RAID10. And that’s why we have feature in Percona Server to set innodb_doublewrite_file= , to set doublewrite in different from ibdata1 location.

Andy

15 years ago

Vadim,

I thought all the doublewrite data is stored in ibdata1. Are you saying you can set it to be stored somewhere else? What is the name of the doublewrite file?

So tables and indexes should be stored in SSD, ibdata1 in RAID 10, and doublewrite file somewhere else other than ibdata1’s RAID 10? Can you clarify?

How should I use innodb_doublewrite_file=? Can you give an example?

Thanks.

Author

Vadim Tkachenko

15 years ago

Andy,

doublewrite is stored in ibdata1. so if you decided to put ibdata1 on RAID10 you do not need additional actions.

innodb_doublewrite_file= is available only in recent Percona Server releases.
you should use it like innodb_doublewrite_file=/data/raid10/ib_dbwl.ibd ,
in this case double write area will be located outside of ibdata1.

Mark Callaghan

15 years ago

Vadim — is this a problem in Percona? That is, is it possible for threads handling user connections to request page writes when the async checkpoint limit has been reached? And if it does submit write requests, does it block for them to finish? That is the case for official InnoDB which means that foreground threads block on IO to be done by background threads and can add lots of pauses.

http://bugs.mysql.com/bug.php?id=55004

Author

Vadim Tkachenko

15 years ago

Dimitri, Inaam,

I updated post with result with different innodb_io_capacity.
I do not see much improvements with 4000, and it getting rather bad with 20000.

Dimitri

15 years ago

Vadim,

Thanks for update!

TPS drops are looking smaller with innodb_io_capacity=20000, but peak performance is lower due higher writes I suppose.. – and what will be a picture within 8 hours? (I was mainly curious on your 500w test, where during first 4 hours you have TPS drops on XtraDB too..)

BTW, how many concurrent user sessions were used during your test?
and how many CPU/cores do you have on your server?

Where may I get the percona-5.5.7 “tarball” to test it on my server?.. (I saw you’ve mentioned a LP trunk and the code is “very beta”, so I supposing it’s changing most of the time and if you may provide a source or binary tarball used within your tests it’ll simplify many things)

Thank you!

Rgds,
-Dimitri

Hrvoje

15 years ago

I would be very interested to see it compared with 5.1 version of Percona

Vadim

15 years ago

Dimitri,

it is 16 cores system and I used 16 users.
I may run 16 hours tests, but it is all time consuming 🙂

I propose you to wait our patches available for 5.5.8, we will provide .tar.gz for that version.

Admin

Peter Zaitsev

15 years ago

Vadim,

Thanks for publishing results. Do you think it is possible to do something so Percona Server stabilizes quickly ? Does our Buffer Pool preload helps here ? I mean if I would use this option would I end up at almost stabilized state straight away or it is different self balancing stuff which is going on

Re innodb_io_capacity I think this option is “evil” to begin with. It would be much better to have database to balance to use capacity of storage it has available (probably self tuning based on IO performance it is getting) as there are many ways IO storage can vary a lot during run time. Think shared SAN storage, LVM snapshots or cloud for example.

Liang Xie

15 years ago

Hi Vadim,

Thanks for sharing this amazing testing.Just one minor question: where can i get the above “tcpp-mysql” tool ?

Best Regards,
-Liang

Author

Vadim Tkachenko

15 years ago

Liang,

tpcc-mysql is available from launchpad
https://launchpad.net/perconatools

Author

Vadim Tkachenko

15 years ago

Mark,

It should not be problem in Percona, we should have it resolved in our adaptive_checkpoint methods.

Seun Osewa

15 years ago

Any particular reason you used different data for the HDD and FusionIO tests? It’s frustrating because you robbed us of an opportunity to compare the performance of the two types of storage.

Dimitri

15 years ago

Vadim,

it’s pity you’re presenting results for the code I’m not able to test :-))
I’ve get a look on LP – there are only patches, without any info what to do and how to apply them..
so, yes, I may only wait for your 5.5.8 tar.gz..

Regarding the tpcc-mysql: there is no README either how to run it :-))
I may suppose that I have to compile tpcc_load and tpcc_start, then load the data, and then run a test…
May you please, just give me 3 lines:
– how I create db schema
– how do I load the data (exactly params)
– how do I run the test (exactly params)

(when I think that you told me that dbSTRESS is hard to setup… ;-))

And the last: may you finally present the result comparing 512 and 4K bytes in redo log block size?.. – you just told it improves performance on SSD but without giving any numbers..

Thank you!

Rgds,
-Dimitri

Author

Vadim Tkachenko

15 years ago

Seun,

I’ve done about 100 different combinations, I basically can’t publish them all. The reader will be lost and it also requires a lot of time to prepare.

I published what I think is most interesting ones.
If you want to see direct comparison FusionIO vs HDD, you can see some data there
http://www.mysqlperformanceblog.com/2010/04/08/fast-ssd-or-more-memory/
and there
http://www.mysqlperformanceblog.com/2009/05/01/raid-vs-ssd-vs-fusionio/

Author

Vadim Tkachenko

15 years ago

Dimitri,

We have Makefile in our directory with patches, and make does all dirty job for applying patches.

tpcc-mysql is just intuitively easy to use 😉

but point is taken, I added README, please check.

I may publish innodb_log_block_size later.

Dhiraj

15 years ago

Hi Vandim,
I have two questions regarding MySql 5.5.8:
1.) Previous versions of MySql had a size restriction of 4GB per table. So does MySql 5.5.8 has any such restriction?

2.) How many Schema’s can we prepare in MySql 5.5.8?

3.) Is there any limit for number of Columns a table can contain in MySql 5.5.8 ?

Regards,
Dhiraj