Beware: ext3 and sync-binlog do not play well togetherPeter Zaitsev
One of our customers reported strange problem with MySQL having extremely poor performance when sync-binlog=1 is enabled, even though the system with RAID and BBU were expected to have much better performance.
The problem could be repeated with SysBench as follows:
./sysbench --num-threads=2 --test=oltp --oltp-test-mode=complex --oltp-table-size=100000 --oltp-distinct-ranges=0 --oltp-order-ranges=0 --oltp-sum-ranges=0 --oltp-simple-ranges=0 --oltp-point-selects=0 --oltp-range-size=0 --mysql-table-engine=innodb --mysql-user=root --max-requests=0 --max-time=60 --mysql-db=test run
On Dell R900 with CentOS 5.2 and ext3 filesystem we get 1060 transactions/sec with single thread and sync_binlog=1 while with 2 threads it drops to 610 transactions/sec Because of synchronous serialized writes to binlog we’re not expecting significant improvements in transaction rate (as the load is binlog fsync bound) but we surely do not expect it to drop ether.
To ensure this is not RAID controller related we repeated the run on the in memory block device and got very similar results – 2350 transactions/sec with 2 threads with sync-binlog=0 and just 750 transactions/sec with sync-binlog=1
We tried running data=writeback and data=journal just to make sure but results were basically the same.
Using XFS instead of EXT3 gives expected results – we get 2350 transactions/sec with sync_binlog=1 and 2550 transactions/sec with sync-binlog=0 which is about 10% overhead.
EXT2 filesystem also behaves similar to XFS so it seems to be EXT3 specific performance issue. We do not know if it is CentOS/RHEL specific and if it is fixed in the recent kernels. If somebody can run the same test with different kernels it would be interesting to know results.
You also may wonder how could this problem be unnoticed for a while ? First it only applies to the binary log. Innodb transactional log does not have the same problem on EXT3. I expect this is because Innodb log is pre-allocated so there is not need to modify meta data with each write while binary log is written as growing file. Another possible reason is small sizes of writes which are being fsync’ed with binary log. Though these are just my speculations – we did not investigate it in details.
Another reason is – it only happens with high transaction commit rate. If you’re just running couple of hundreds of durable commits a second you may not notice this problem.
Third – sync-binlog=1 is an option great for safety but it is surely not the most commonly used one. The web applications which often have highest transaction rates typically do not have so strong durability requirements to run this option.
As the call for action – I would surely like someone to see if EXT3 can be fixed in this regard as it is by far the most common file system for Linux. It is also worth to investigate if something can be done on MySQL side – may be opening binlog with O_DSYNC flag if sync_binlog=1 instead of using fsync will help ? Or may be binlog pre-allocation would be good solution.