Buy Percona ServicesBuy Now!

Pretending to fix broken group commit

 | February 2, 2009 |  Posted In: Percona Software


The problem with broken group commit was discusses many times, bug report was reported 3.5 years ago and still not fixed in MySQL 5.0/5.1 (and most likely will not be in MySQL 5.1). Although the rough truth is this bug is very hard (if possible) to fix properly. In short words if you enable replication (log-bin) on server without BBU (battery backup unit) your InnoDB write performance in concurrent load drops down significantly.
We wrote also about it before, see “Group commit and real fsync” and “Group commit and XA“.

The problem is the InnoDB tries to keep the same order of transactions in binary logs and in transaction logs and acquires mutex to serialize writes to both logs. We basically propose to break this serialization – in XtraDB release3 (will be announced soon, you can take current version for testing from Launchpad) we introduce –innodb-unsafe-group-commit mode. There are results with this options vs without (results are in transactions per second, more is better, this is sysbench OLTP load).

I tested it on Dell PowerEdge R900 with RAID 10 in WriteThrough mode to emulate absence of BBU. With BBU you will not see this problem (all results will scale well) as internal RAID cache will accumulate changes and return fsync() call immediately without real syncing data in disk.

So what can be wrong if you run –innodb-unsafe-group-commit — as I said there is possibility that transactions in binary-logs will be in different order than in InnoDB transactional log. Why this is bad? For example if box crashes and InnoDB does recovery: transactions on slaves may be executed in different order — that is you MAY get slaves unsynchronized with master. Is performance benefit worth it? It’s up to you, but I think better to have this choice then do not have.

I do not urge to use –innodb-unsafe-group-commit, I propose to have BBU on your RAID. But if it appears you don’t have it, and write load on server is significant — it may worth to try –innodb-unsafe-group-commit.

Vadim Tkachenko

Vadim Tkachenko co-founded Percona in 2006 and serves as its Chief Technology Officer. Vadim leads Percona Labs, which focuses on technology research and performance evaluations of Percona’s and third-party products. Percona Labs designs no-gimmick tests of hardware, filesystems, storage engines, and databases that surpass the standard performance and functionality scenario benchmarks. Vadim’s expertise in LAMP performance and multi-threaded programming help optimize MySQL and InnoDB internals to take full advantage of modern hardware. Oracle Corporation and its predecessors have incorporated Vadim’s source code patches into the mainstream MySQL and InnoDB products. He also co-authored the book High Performance MySQL: Optimization, Backups, and Replication 3rd Edition.


  • Really, why have two logs? This whole problem seems like an argument for replicating directly from the InnoDB journal, though I imagine it would be a substantial amount of work to add keys & column metadata. Requiring BBU or any other hardware for that matter is a non-starter in virtual environments like Amazon where you don’t control the actual machines.

  • Ah, okay. If you also break XA – then it’s possible.
    But without XA you can get desynchronized slaves even if you maintain strict commit order 🙂

  • InnoDB stays in sync with the binlog by getting a list of committed transactions from the binlog during crash recovery and doing a commit or rollback for in-doubt transactions (in-doubt == transaction was prepared but not committed). How does this option affect that?

  • Sergei,

    The possibility here is theoretical. We had done some stress test running many transactions which should break replication if they are replied in the wrong order… and things just work fine.

    It is just we’re not quite sure it will work in 100% cases.

  • If you’re not sure, you may suspect that a specific sequence of actions will make slaves desynchronized. Such as “first transaction does this. second does that. first starts committing. syncs, and this very moment the second does this-and-that. We pull the plug…” and so on. I’m asking you to show this sequence.

    Because I don’t see how this could ever cause slave desynchronization, even theoretically. As far as I understand it *only* affect innodb hot backup.

  • Hello,

    thanks for the nice article. I have question for you, thats not related to this post. I am making an album script that I want it to have a feature where users can set different order of their photos and subalbums per album. Should I do it with two tables:
    album(id,name,path,col2,col3,col4,col5) and albumorder(id,photoorder,albumorder) or one table album(id,name,photoorder,albumorder,col2,col3,col4,col5)?

  • Hi Vadim,

    Why not fix concurrent commit for real? I submitted a patch with a real fix for this back in August 2008. You can see it at and read about it at

  • Dadiv,

    Yes, I saw your patch. We reviewed it but we actually can’t say if it is better or worse than our solution. As I understand your patch has the same danger for innobackup as our.

  • Hi Vadim,

    I don’t believe that my patch does have a problem with innobackup. It preserves the order of binlog and innodb commits, although it doesn’t preserve the order of innodb prepare and binlog. This isn’t a problem for crash recovery, since mysql scans the binlog and commits any pending prepares in the same order as commits in the binlog file. Any prepare that doesn’t have a corresponding commit in the binlog file is rolled back.

    Innobase has been annoyingly silent on whether innobackup cares about prepare/binlog ordering, but it seems unlikely. It would seem like a bug to me if innobackup treated prepares differently than mysql does itself during crash recovery.

  • Hi David,

    That’s actually problem – I can’t be sure in both patches. Both requires intensive production testing. That’s why we made our own – we know how it supposed to work (more or less), and actually is shorter 🙂 But I believe we will consider it again and again.

Comments are closed