November 28, 2014

Which adaptive should we use?

As you may know, InnoDB has 2 limits for unflushed modified blocks in the buffer pool. The one is from physical size of the buffer pool. And the another one is oldness of the block which is from the capacity of transaction log files.

In the case of heavy updating workload, the modified ages of the many blocks are clustered. And to reduce the maximum of the modified ages InnoDB needs to flush many of the blocks in a short time, if these are not flushed at all. Then the flushing storm affect the performance seriously.

We suggested the “adaptive_checkpoint” option of constant flushing to avoid such a flushing storm. And finally, the newest InnoDB Plugin 1.0.4 has the new similar option “adaptive_flushing” as native.

Let’s check the adaptive flushing options at this post.

HOW THEY WORK

< adaptive_checkpoint=reflex (older method)>

oldest modified age %/maxbehavior
0 ~ 50%nothing
50% ~ 75%constant flushing (weak)
75% ~ 87.5%constant flushing (strong)
87.5% ~(flushing storm)

< adaptive_checkpoint=estimate (newer method)>

oldest modified age %/maxbehavior
0 ~ 50%nothing
50% ~ 87.5%estimate flushing as bellow *
87.5% ~(flushing storm)

* estimate blocks to flush based on…

  • how many modified and unflushed blocks
  • progress speed of the transaction log
  • the modified age average of all blocks to flush

< adaptive_flushing (default “true” at 1.0.4)>
Its behavior is not based on the oldest modified age %.

* the how many blocks to flush based on…

  • how many modified and unflushed blocks
  • progress speed of the transaction log
  • transaction log capacity

(and adjust along with another flushing)

And it doesn’t exceed the equivalent to “constant flushing (strong)”
of the “adaptive_checkpoint=reflex”

RESULTS

TPC-C like workload (100WH: 16 session full)

innodb_buffer_pool_size = 16G
innodb_max_dirty_pages_pct = 90

innodb_log_file_size = 512M
innodb_log_files_in_group = 2

innodb_io_capacity = 4000
innodb_read_io_threads = 8
innodb_write_io_threads = 8
innodb_flush_method = O_DIRECT

innodb_thread_concurrency = 0
innodb_ibuf_active_contract = 1

<none>
innodb_adaptive_flushing = false
innodb_adaptive_checkpoint = none

<flushing>
innodb_adaptive_flushing = true

<reflex>
innodb_adaptive_flushing = false
innodb_adaptive_checkpoint = reflex

<estimate>
innodb_adaptive_flushing = false
innodb_adaptive_checkpoint = estimate

case1: “innodb_doublewrite = false”

[0~1800 sec.]

no_doublewrite-0-1800

[1200~1800 sec.]

no_doublewrite-1200-1800

averages (1200~1800 sec.)
none:     6868.92
flushing: 6655.92
reflex:     6481
estimate: 6575.88

case2: “innodb_doublewrite = true”

[0~1800 sec.]

doublewrite-0-1800

[1200~1800 sec.]

doublewrite-1200-1800

averages (1200~1800 sec.)
none:     6569.48
flushing: 5090.12
reflex:     6871.9
estimate: 6609.9

CONSIDERINGS

The new “adaptive_flushing” seems to be tuned for “innodb_doublewrite = false” only.
(cause too much flushing for “innodb_doublewrite = true”)

“innodb_adaptive_checkpoint = reflex” and “adaptive_flushing” need tuning innodb_io_capacity properly.
(The result is based on proper value “innodb_io_capacity = 4000″)

“innodb_adaptive_checkpoint = estimate” is not depend on innodb_io_capacity, and it seems more “soft” result than the other methods.

So,

The “adaptive_flushing” seems good when we use “innodb_doublewrite = false”.

“innodb_adaptive_checkpoint = estimate” may be safe for “innodb_doublewrite = true” for now.

I should adjust “adaptive_flushing” for “innodb_doublewrite = true”. :-)


IMPROVEMENTS (16 Sep.)

I have tried to improve the performance. But I couldn’t adjust performance of adaptive_flushing with doublewrite… So, instead of that, estimate is adjusted more.

case1: “innodb_doublewrite = false”

[0~1800 sec.]

no_doublewrite-0-1800_2

case2: “innodb_doublewrite = true”

[0~1800 sec.]

doublewrite-0-1800_2

The new “innodb_adaptive_checkpoint = estimate” seems to be more stable than before. The next question is, “Why is estimate + doublewrite better than the other no doublewrite settings?”:-)


ADDITIONAL TEST IN ANOTHER BALANCE

Also tested in another configuration to check the behaviors in another balance of IO bound.

<changes>
innodb_buffer_pool_size = 2G   (1/8)
innodb_log_file_size = 128M    (1/4)

case1: “innodb_doublewrite = false”

[0~1200 sec.]

no_doublewrite_2-0-1200

case2: “innodb_doublewrite = true”

[0~1200 sec.]

doublewrite_2-0-1200

“innodb_adaptive_checkpoint = estimate” seems stable but seems to flush much in its initial phase. The adaptive_flushing seems to be more soft in this cases.

Hmmm….

We should choose suitable adaptive_xxx for each workload for now:-)

About Yasufumi Kinoshita

Yasufumi is a former Percona employee.
Yasufumi is one of the foremost InnoDB experts in the world, and has been researching and improving InnoDB internals for years. He is the original author of the first InnoDB scalability enhancements and has developed many improvements to InnoDB's internal algorithms such as flushing, locking, and recovery.

Comments

  1. Dimitri says:

    Hi Yasufumi,

    Thanks to share it! Very interesting analyze of the problem!
    May you share more details about:
    – was your workload data set kept in memory or not?
    – write queries are only UPDATEs or also INSERTs and DELETEs ?

    Also, did you try to run more concurrent users? What will change with 32 sessions?..

    Thank you!

    Rgds,
    -Dimitri

  2. Yasufumi says:

    Hi Dimitri,

    “TPC-C like” means almost same SQL set from DBT-2 (maybe). And contains also INSERTs and DELETEs.

    And WH100 database has around 10GB in files, so all of data will be in buffer pool in the end.
    But the test is from just after startup InnoDB. So each pages are must be read once from disk actually.
    (because using O_DIRECT)

    Then, about more sessions, I will try later.

    Thank you,
    Yasufumi

  3. Dimitri says:

    Hi Yasufumi,

    So, if your workload is also using INSERT & DELETE I’m curious if you’ve monitored your history list length and how high it’s growing or staying stable..

    On the same time as your part with “adaptive” code is still reached within Master thread I may suppose that the load is not yet reached a critical level (that’s why I’m asking about test with more sessions)..

    Also, if I’m not wrong – O_DIRECT option is only applied to the redo files, no?.. All others should be still buffered.. – or probably you mean a filesystem mount options?..

    Thank you!

    Rgds,
    -Dimitri

  4. Yasufumi says:

    Dimitri,

    I know 16GB buffer pool has enough margin to test 10WH TPC-C at least 30 min in this cases…
    “critical”? You means also at smaller buffer pool like 5GB or 1GB?
    I also will do that.

    And you are wrong about O_DIRECT of InnoDB….

    http://dev.mysql.com/doc/refman/5.0/en/innodb-parameters.html#sysvar_innodb_flush_method

    What you say is feature for PostgreSQL only….
    Do you prefer PostgreSQL to MySQL??? :-)

    (PostgreSQL is one of the RDBMS which has the most strange characteristic of performance for me. I think you should not tune MySQL/InnoDB based on the PostgreSQL tuning knowledge…)

    Thank you

  5. Dimitri says:

    Hi Yasufumi,

    by “critical” I mean a higher load :-) it’s rather more user sessions or higher transaction rates – to see at which moment you’ll start to loop in purge loop :-) (all issues I’ve observed before with dbSTRESS recently were also reproduced with DBT-2, so I’m curious about your case ;-))..

    And what about History Length?..

    O_DIRECT – I’ve missed this point :-) (as I kept in mind MySQL is still calling fsync() – and you don’t need if when you’re in true Direct I/O)..

    BTW, it may be not bad to split it in 2 options: one for redo and another for data files – the I/O nature is different on them, so it make sense..

    (And I’m not here to discuss about PG :-)) but I respect it, it’s just different.. And for a long time before out-performed MySQL, so if we have some things to learn – we should not miss it :-))

    Thank you!

    Rgds,
    -Dimitri

  6. Yasufumi says:

    Hi Dimitri,

    OK. I understand. :-)
    But this topics is about the flushing. I think the purging is next problem (though both of them are done by master thread currently…). I will test it soon (also your purge_thread effect). Please wait to discuss about it. :-)

    Regards,
    Yasufumi

  7. Dimitri says:

    Hi Yasufumi,

    on my point of view both topics are coming together :-) but well..

    another interesting point on flushing: I’ve observed when I have a more wide data set randomizing (and data are reading all the time from disks (mainly random read I/O-bound)) – the more optimal flushing strategy in such situation become to flush by setting a dirty pages limit rather “adaptive” (specially when doublewrite buffer is used (http://dimitrik.free.fr/blog/archives/09-01-2009_09-30-2009.html#93))
    Did you observe the same thing?..

    Rgds,
    -Dimitri

  8. Todd says:

    In all of your tests, is it a good assumption that you are not using innodb_file_per_table and that you are using xfs for the file system? Is it a multi-spindle local RAID or a SAN? I’d like to compare your hardware used for testing to hardware we have available so we can estimate what kind of performance we should see.

  9. Yasufumi says:

    Dimitri,

    Sorry, what is the point of the post?
    The effect of the storage’s write buffer? (twice IO Write/s: Writes to double write buffer are buffered)

    Todd,

    I uses innodb_file_per_table=true and ext3.
    And I think it is local RAID, but I don’t know exact about the server.
    Because I can login from remote only…

    Thank you.

  10. Yasufumi says:

    Dimitri,

    I have checked the history list length and your purge_thread.
    The history list length never exceed 20000 during any TPC-C test at the server.
    And purge_thread affect 3~4% improve of throughput and it seems to purge aggressively (not exceed 1000).

    And about 32 threads, the server has 16 cores and the each thread of the benchmark is full-powered.
    So, almost no difference with 16 threads of the equilibrium throughput.

    Thank you.

  11. Dimitri says:

    Hi Yasufumi,

    20000 in history length is really small.. Is your test kit available somewhere I can download and test it?..

    purge_thread is not for improving performance but rather for performance stability (but it’s nice to see the improvement too)

    Any chances you may try dbSTRESS workload too? – I’ll be very curious about your observations ;-)

    Thank you!

    Rgds,
    -Dimitri

  12. Yasufumi says:

    Hi Dimitri,

    Our test kit is
    https://code.launchpad.net/~percona-dev/perconatools/tpcc-mysql

    OK, I will try dbSTRESS if I will have chance.

    Thank you!

  13. Ratheesh says:

    Yasufumi,

    I thought innodb_flush_method=O_DIRECT will override doublewrite? (I am confused with doublewrite and double buffering)

    If that is true what is the implication of the test case with adaptive_flushing with innodb_doublewrite=true?


    Ratheesh

  14. Alfie John says:

    I’m trying to find more info on innodb_ibuf_active_contract, because it’s not in the current 5.6 RC. I’ve found the 5.1 documentation about innodb_ibuf_active_contract, but then from 5.5.8’s changlog it was renamed to innodb_ibuf_active_merge which also doesn’t have any documentation.

    Is there anywhere you could point to me for a full list of options for each server version? Something like MySQL’s Server System Variables page would be nice because it outlines all variables for the current version, which version they were introduced if they were new, and in which version they were deprecated if they’ve been obsoleted.

  15. Hrvoje Matijakovic says:

    Hi Alfie,

    List of variables introduced in Percona Server 5.5 and 5.6 can be seen here:
    http://www.percona.com/doc/percona-server/5.5/ps-variables.html
    http://www.percona.com/doc/percona-server/5.6/ps-variables.html

Speak Your Mind

*