Group commit and real fsync

During the recent months I’ve seen few cases of customers upgrading to MySQL 5.0 and having serious performance slow downs, up to 10 times in certain cases. What was the most surprising for them is the problem was hardware and even OS specific – it could show up with one OS version but not in the other. Even more interesting performance may be dramatically affected by –log-bin settings, which usually has just couple of percent overhead. So what is going on?

Actually we’re looking at two issues here which interleave such funny way

  • Group commit is broken in MySQL 5.0 if binary loging is enabled (as it enables XA)
  • Certain OS/Hardware configurations still fake fsync delivering great performance at the cost of being non ACID

First one can be tracked by this bug. In the nutshell the problem is – new feature – XA was implemented in MySQL 5.0 which did not work with former group commit code. The new code for group commit however was never implemented. XA allows to keep different transactonal storage engines in sync, together with binary log. XA is enabled if binary log is enabled this is why this issue is trigered by enabled binary log. if binary log is disabled, so is XA and old group commit code works just fine.

Second one is interesting. Actually we would hear much more people screaming about this problem if OS would be honest with us. Happily for us many OS/Hardware pairs are still lying about fsync(). fsync() call suppose to place data on the disk securely, which unless you have battery backed up cache would give you only 80-200 sequential fsync() calls per second depending on your hard drive speed. With fake fsync() call the data is only written to the drives memory and so can be lost if power goes down. However it gives great performance improvement and you might see 1000+ of fsync() calls per second. So if your OS is not giving you real fsync you might not notice this bug. The performance degradation will still happen but it will be much smaller, especially with large transactions.

So how you can solve the problem ?

  • Disable binary log. This could be option for slaves for example which do not need point in time recovery etc.
  • Check if you OS is doing real fsync. You should to know anyway if you care about your data safety. This can be done for example by using SysBench: sysbench –test=fileio –file-fsync-freq=1 –file-num=1 –file-total-size=16384 –file-test-mode=rndwr. This will write and fsync the same page and you should see how many requests/sec it is doing. You also might want to check diskTest from this page which does some extra tests for fsync() correctness.
  • Install RAID with battery backed up cache. This gives about the same effect as fake fsync() but you can make it secure (However make sure your drives are not caching data by themselves). The good thing RAID with battery backed up cache are becoming really inexpensive.

You also probably want to know if this bug is going to be fixed ? I’m not authority in this question but as Heikki describes it as fundamental task I’m not sure it will be done in 5.0 Good if it is done in 5.1.


Share this post

Comments (9)

  • Jay Pipes

    Hi Peter!

    Wonderful blog entries, as usual! Hope your move is going OK. Do you have any insider updates on the group commit bug? I checked the worklog and found very little going on. Do you know if Innobase has found the root cause of the issue?

    Cheers, and talk with you soon!


    May 23, 2006 at 8:48 am
  • peter


    There is no mystery in this problem. It is reported as bug as you see and Heikki needs to fix it. However the problem is it is not tiny little bug it is feature which was not implemented in 5.0. Simply the new code had to be written to make that work with XA but it never was done. So now it is kind of dangerous to fix it in 5.0

    May 23, 2006 at 10:11 am
  • Nilnandan Joshi


    I want to optimise the ORDER BY in MySQL how can i???? pls help me.
    I run 1 query with GROUP BY without ORDER BY then it takes 0.0417 seconds but when i add ORDER BY then it takes 12, can i use any alternative of ORDER BY..????


    Nilnandan Joshi

    December 20, 2006 at 11:57 pm
  • Okulov Vitaliy

    My benchmark with sysbench:

    sysbench v0.4.8: multi-threaded system evaluation benchmark

    Running the test with following options:
    Number of threads: 1

    Extra file open flags: 0
    1 files, 16Kb each
    16Kb total file size
    Block size 16Kb
    Number of random requests for random IO: 10000
    Read/Write ratio for combined random IO test: 1.50
    Periodic FSYNC enabled, calling fsync() each 1 requests.
    Calling fsync() at the end of test, Enabled.
    Using synchronous I/O mode
    Doing random write test
    Threads started!

    Operations performed: 0 Read, 10000 Write, 10000 Other = 20000 Total
    Read 0b Written 156.25Mb Total transferred 156.25Mb (83.458Mb/sec)
    5341.28 Requests/sec executed

    Test execution summary:
    total time: 1.8722s
    total number of events: 10000
    total time taken by event execution: 0.1558
    per-request statistics:
    min: 0.0000s
    avg: 0.0000s
    max: 0.0000s
    approx. 95 percentile: 0.0000s

    Threads fairness:
    events (avg/stddev): 10000.0000/0.00
    execution time (avg/stddev): 0.1558/0.00

    Hardware: 2×5130 2.3 Ghz, 6 Gb, 237 RAID 10 Adaptec 2130SLP BBU.

    July 3, 2008 at 1:02 am

Comments are closed.

Use Percona's Technical Forum to ask any follow-up questions on this blog topic.