Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

Pretending to fix broken group commit

February 3, 2009

Author

Vadim Tkachenko

Percona Software

Share this Post:

The problem with broken group commit was discusses many times, bug report was reported 3.5Â years ago and still not fixed in MySQLÂ 5.0/5.1 (and most likely will not be in MySQLÂ 5.1). Although the rough truth is this bug is very hard (if possible) to fix properly. In short words if you enable replication (log-bin) on server without BBU (battery backup unit) your InnoDB write performance in concurrent load drops down significantly.
We wrote also about it before, see “Group commit and real fsync” and “Group commit and XA“.

The problem is the InnoDB tries to keep the same order of transactions in binary logs and in transaction logs and acquires mutex to serialize writes to both logs. We basically propose to break this serialization – in XtraDB release3 (will be announced soon, you can take current version for testing from Launchpad) we introduce –innodb-unsafe-group-commit mode. There are results with this options vs without (results are in transactions per second, more is better, this is sysbench OLTP load).

I tested it on Dell PowerEdge R900 with RAID 10 in WriteThrough mode to emulate absence of BBU. With BBU you will not see this problem (all results will scale well) as internal RAID cache will accumulate changes and return fsync() call immediately without real syncing data in disk.

So what can be wrong if you run –innodb-unsafe-group-commit â€” as I said there is possibility that transactions in binary-logs will be in different order than in InnoDB transactional log. Why this is bad? For example if box crashes and InnoDB does recovery: transactions on slaves may be executed in different order â€” that is you MAY get slaves unsynchronized with master. Is performance benefit worth it? It’s up to you, but I think better to have this choice then do not have.

I do not urge to use –innodb-unsafe-group-commit, I propose to have BBU on your RAID. But if it appears you don’t have it, and write load on server is significant â€” it may worth to try –innodb-unsafe-group-commit.

0 0 votes

Article Rating

11 Comments

Oldest

Newest Most Voted

http://scale-out-blog.blogspot.com/

17 years ago

Really, why have two logs? This whole problem seems like an argument for replicating directly from the InnoDB journal, though I imagine it would be a substantial amount of work to add keys & column metadata. Requiring BBU or any other hardware for that matter is a non-starter in virtual environments like Amazon where you don’t control the actual machines.

Sergei Golubchik

17 years ago

I don’t see how this could cause desynchronized slaves.
Could you show it step-by-step ?

Sergei Golubchik

17 years ago

Ah, okay. If you also break XA – then it’s possible.
But without XA you can get desynchronized slaves even if you maintain strict commit order 🙂

Mark Callaghan

17 years ago

InnoDB stays in sync with the binlog by getting a list of committed transactions from the binlog during crash recovery and doing a commit or rollback for in-doubt transactions (in-doubt == transaction was prepared but not committed). How does this option affect that?

Admin

Peter Zaitsev

17 years ago

Sergei,

The possibility here is theoretical. We had done some stress test running many transactions which should break replication if they are replied in the wrong order… and things just work fine.

It is just we’re not quite sure it will work in 100% cases.

Sergei Golubchik

17 years ago

If you’re not sure, you may suspect that a specific sequence of actions will make slaves desynchronized. Such as “first transaction does this. second does that. first starts committing. syncs, and this very moment the second does this-and-that. We pull the plug…” and so on. I’m asking you to show this sequence.

Because I don’t see how this could ever cause slave desynchronization, even theoretically. As far as I understand it *only* affect innodb hot backup.

Nikola

17 years ago

Hello,

thanks for the nice article. I have question for you, thats not related to this post. I am making an album script that I want it to have a feature where users can set different order of their photos and subalbums per album. Should I do it with two tables:
album(id,name,path,col2,col3,col4,col5) and albumorder(id,photoorder,albumorder) or one table album(id,name,photoorder,albumorder,col2,col3,col4,col5)?

David Lutz

17 years ago

Hi Vadim,

Why not fix concurrent commit for real? I submitted a patch with a real fix for this back in August 2008. You can see it at http://bugs.mysql.com/file.php?id=10008 and read about it at http://blogs.sun.com/dlutz/entry/toward_a_more_scalable_mysql

Vadim

17 years ago

Dadiv,

Yes, I saw your patch. We reviewed it but we actually can’t say if it is better or worse than our solution. As I understand your patch has the same danger for innobackup as our.

David Lutz

17 years ago

Hi Vadim,

I don’t believe that my patch does have a problem with innobackup. It preserves the order of binlog and innodb commits, although it doesn’t preserve the order of innodb prepare and binlog. This isn’t a problem for crash recovery, since mysql scans the binlog and commits any pending prepares in the same order as commits in the binlog file. Any prepare that doesn’t have a corresponding commit in the binlog file is rolled back.

Innobase has been annoyingly silent on whether innobackup cares about prepare/binlog ordering, but it seems unlikely. It would seem like a bug to me if innobackup treated prepares differently than mysql does itself during crash recovery.

Vadim

17 years ago

Hi David,

That’s actually problem – I can’t be sure in both patches. Both requires intensive production testing. That’s why we made our own – we know how it supposed to work (more or less), and actually is shorter 🙂 But I believe we will consider it again and again.