September 20, 2014

Adaptive checkpointing

Do you know that there are two limits about dirty (modified but not flushed to disk) blocks of InnoDB buffer pool? One is the limit of “amount”. The other is the limit of “age”.

– limit of “amount” –

As you know, buffer pool of InnoDB works as write-back cache of its datafiles. If the buffer pool is filled by dirty blocks, InnoDB cannot allocate new blocks without flushing the dirty blocks and the performance would get worse. This is the limit of dirty block “amount”. We can avoid this limit by setting ‘innodb_max_dirty_pages_pct’ smaller or setting the larger buffer pool size. We might be never at a loss about the limit.

The another limit we should understand is limit of dirty block “age”.

– limit of “age” –

As you know again, because InnoDB write the modifies of datafile to transaction log file synchronously, InnoDB is allowed to treat its buffer pool as write-back cache. The transaction log from the last checkpoint assures the latest-committed consistent data. So, the transaction log must contain the all transaction log from the last checkpoint.

Basically, InnoDB makes the checkpoint passively (it is called “fuzzy checkpoint”). InnoDB makes the point of time when the oldest dirty block occurred as the new checkpoint.

In short,
the dirty block older than the oldest transaction log is never allowed to exist.

This is the limit of dirty block “age”.

The max age is determined by the total size of the transaction log files, because InnoDB uses the transaction log circulately. But, the larger transaction log file may not solve the problem of dirty block “age”…

The main essence of the “age” limit problem might be clustered distribution of “age”. If there are huge number of similar aged dirty blocks and their age nears the max age, InnoDB flushes the dirty blocks with its best, but the oldest “age” of dirty blocks might not change and might reach the critical limit. Then InnoDB will pause until all of the old dirty blocks flushed…

I will show you the examples.
The workload is TPC-C like and the graphs show the change of throughput, write IO and checkpoint age with time.

<Normal>

The first is the case of normal (not patched) InnoDB.

Normal case

The intense write IO occur and InnoDB stalls when the checkpoint age reaches the red broken line. After that, InnoDB stalls periodically.

The steady flushing may be short, and there may be clustered distributed “age”. The clustered “age” may causes the clustered flushing and the next clustered “age”…

<innodb_max_dirty_pages_pct(native parameter) & innodb_io_capacity(patched parameter)>

The next graph is the case when I tried to restrain the checkpoint age growing by controlling steady flushing with only innodb_max_dirty_pages_pct and innodb_io_capacity.

Tuned by dirty_pages_pct

There are no sudden stalls, but many write IO and regression of the average throughput. The optimum setting may be very difficult by this method, because innodb_max_dirty_pages_pct and innodb_io_capacity are independent to the checkpoint age.

<innodb_adaptive_checkpoint(new patched parameter) & innodb_io_capacity>

The last is the case of using innodb_adaptive_checkpoint instead of innodb_max_dirty_pages_pct.

Tuned by adaptive_checkpoint

During the checkpoint age excesses the light-blue broken line, the steady flushing occurs. If the checkpoint age reaches the next orange line, the flushing becomes more strong. This is the effect of innodb_adaptive_checkpoint. And its strength can be controlled by innodb_io_capacity. (The optimum setting may not be difficult as the 2nd case.)

At first, the checkpoint age touch the red line once and the regression occurs. However after that, the periodical regression waves seem to be attenuated quickly. The clustered “age” may be flatted. There are no the regression of the average throughput.

innodb_adaptive_checkpoint will solve or soften such problems of dirty block “age” limit
with easier setting.

In addition, innodb_io_patches.patch also adds
max checkpoint age“, “modified age” and “checkpoint age
to SHOW INNODB STATUS and we can check them easily.

About Yasufumi Kinoshita

Yasufumi is a former Percona employee.
Yasufumi is one of the foremost InnoDB experts in the world, and has been researching and improving InnoDB internals for years. He is the original author of the first InnoDB scalability enhancements and has developed many improvements to InnoDB's internal algorithms such as flushing, locking, and recovery.

Comments

  1. peter says:

    Great result Yasufumi !

    I guess even if average TPM is not that much different, affect 95 percentile response time is significant… and in practice the periodic checkpointing stalls are surely not acceptable for real applications.

    There is still probably a bit potential for improvement for system to self adapt and try to even out writes so we do not get into 20% TPM dips/write spikes.

  2. John Masters says:

    How do you collect the information for the graphs above from mysql? Also, is there a tool to plot the data to visualize the performance bottlenecks like you show above? Appreciate your help in this regards as we are running into strange issues with Mysql where in the same query with (sql_no_cache) gets the results in 50seconds one time and it takes more than 600secs the other. I see no other processes running using “show processlist”. So I am assuming something to do with mysql system level bottlenecks and would like some tools to help diagnose

    Thanks

  3. Yasufumi says:

    peter,

    Yes, I know we never get such a regular results from real applications. But cautious people might do their benchmarks before running their system in practice. At that time, they may face such a strange behavior of MySQL/InnoDB which they cannot explain, and may abandon to use MySQL from the worry about the unexplained. (It is from my experience, I have met such situations before and some people also have consulted to me about this behavior in their benchmarks.)

    I think this problem is difficult to explain to everyone clearly, because of its complicated reason peculiar to RDBMS (and my ability of description…). But I also think it is very important to proof “The problem can be explained!” for MySQL guys in the world (, even if they cannot understand clearly sorry for my poor description…).

  4. Yasufumi says:

    John,

    We can check the checkpoint age even if we use normal MySQL.
    From the output of SHOW INNODB STATUS,
    ===========================

    LOG

    Log sequence number 0 1059494372 <—
    Log flushed up to 0 1059494372
    Last checkpoint at 0 1055251010 <—
    ===========================
    we can calculate the checkpoint age

    (0 – 0) * 2^32 + (1059494372 – 1055251010)
    = 4243362

    And max chekcpoint age may be 70% or 80% of

    innodb_log_file_size * innodb_log_files_in_group

    Thank you.

  5. This is great. Can you provide more details on what ‘adaptive checkpointing’ does or reference a description elsewhere?

  6. Yasufumi,

    Is innodb_adaptive_checkpoint a setting that can be changed online, or does it require restart? Thanks

    Davy

  7. Yasufumi says:

    Davy,
    innodb_adaptive_checkpoint is online parameter.

    Mark,
    Sorry, I don’t know what I should reference, though it may exist elsewhere…
    Could you tell me what do you want to know?

  8. http://mysqlha.blogspot.com/ says:

    Yes, I would like to know what makes it adaptive? The original code flushes at most a fixed number of blocks per time interval as requested by a background IO thread. Have you added something to make the number of blocks to flush dynamic?

  9. Mike says:

    What are the units of the values reported for “max checkpoint age”, “modified age” and “checkpoint age” ?

Speak Your Mind

*