Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

Emulating global transaction ID with pt-heartbeat

November 4, 2011

Author

Baron Schwartz

Insight for DBAs

MySQL

Share this Post:

Global transaction IDs are being considered for a future version of MySQL. A global transaction ID lets you determine a server’s replication position reliably, among other benefits. This is great when you need to switch a replica to another master, or any number of other needs.

Sometimes you can’t wait for the real thing, but an imitation is good enough for your needs. That’s why we built a pseudo global transaction ID into pt-heartbeat, the replication heartbeat tool included with Percona Toolkit. It has some of the benefits of a global transaction ID, but it works on any version of MySQL.

Now why would we want this, and who cares anyway? Good question. Let me tell you a tale of two datacenters. Once upon a time, one of our customers had a very “tall” replication topology, with servers replicating back and forth across datacenters so that if something went really wrong with one of them, there would be a backup on the other. The worst-case scenario happened, and a datacenter was offline for a very long time.

As a result, we had a large number of replicas that we needed to switch to new masters. We had, for example, A-B-C replication, and we needed to start at C and work up the chain to figure out its replication position relative to A. But B was gone, so we couldn’t simply do the usual process of matching up binary log filenames and byte offsets. We had to look in A’s binary logs, find some distinctive pattern of updates, and look in C’s binary logs to find the same thing — and match the two up to figure out C’s position relative to A.

It turned out to be tedious, but not impossible, because pt-heartbeat updates in server A were the most recognizable thing to look for in C’s binary log. That was still very painful, though. We had to find the latest heartbeat in C, then go looking for where it originated in A, in many many binary logs. This was a lot of work, even when automated. Wouldn’t it be nice if the replication heartbeat record on C actually said approximately which binary log and byte offset to look in? That would have saved hours and hours of downtime grepping through binary logs.

That’s what we decided to build into pt-heartbeat. But not only that, we decided that this tool should be usable on any server in the replication topology, so that if the situation happened again, no matter how crazy the topology or how many servers were missing, we’d have some guidepost for where everything is relative to everything else upstream from it.

So how does the new pt-heartbeat work? In a nutshell, this feature enables you to have multiple heartbeat rows in your heartbeat table, and multiple instances of pt-heartbeat, one per server in your replication topology. The heartbeat table is also extended to allow extra information to be maintained. This includes the server’s current binary log position, and its position in replication relative to its master.

In contrast, the old version of this tool supported only a single row in the heartbeat table, and it was just a primary key and a timestamp.

Here’s a sample of what you might see in a three-server replication chain. I’m using sandbox servers on my laptop to illustrate:

 mysql> select * from heartbeatG
 *************************** 1. row ***************************
                    ts: 2011-08-03T18:12:18.002500
             server_id: 5150
                  file: ginger-bin.000033
              position: 7936
 relay_master_log_file: NULL
   exec_master_log_pos: NULL
 *************************** 2. row ***************************
                    ts: 2011-08-03T18:12:18.003000
             server_id: 5508
                  file: ginger-bin.000020
              position: 21198
 relay_master_log_file: ginger-bin.000033
   exec_master_log_pos: 7936
 *************************** 3. row ***************************
                    ts: 2011-08-03T18:12:18.001960
             server_id: 15511
                  file: ginger-bin.000002
              position: 26478
 relay_master_log_file: ginger-bin.000020
   exec_master_log_pos: 21198

mysql> select * from heartbeatG

*************************** 1. row ***************************

ts: 2011-08-03T18:12:18.002500

server_id: 5150

file: ginger-bin.000033

position: 7936

relay_master_log_file: NULL

exec_master_log_pos: NULL

*************************** 2. row ***************************

ts: 2011-08-03T18:12:18.003000

server_id: 5508

file: ginger-bin.000020

position: 21198

relay_master_log_file: ginger-bin.000033

exec_master_log_pos: 7936

*************************** 3. row ***************************

ts: 2011-08-03T18:12:18.001960

server_id: 15511

file: ginger-bin.000002

position: 26478

relay_master_log_file: ginger-bin.000020

exec_master_log_pos: 21198

The servers represent our A-B-C scenario. On the second server we can see that its position relative to its master was ginger-bin.000033:7936, and we can see that the third instance is replicating from ginger-bin.000020:21198 on the second instance.

Now if the second server disappears, and I want to start the third one replicating from the first one, I can deduce that I should look in the first server’s binlogs somewhere near ginger-bin.000033:7936 to find the right location. That saves me a huge amount of time in emergency situations.

To enable this functionality, just drop your heartbeat table (if you already have one), let the tool create a new one for you if you want (or look in the documentation to see how to do it yourself), and start an instance of pt-heartbeat per server. It just works.

Everyone who runs replication should be running pt-heartbeat already to get a reliable indicator of actual replication delay. This avoids the many problems with Seconds_behind_master, which pop up continually. But now it’s time to update that advice: everyone should not only be running pt-heartbeat, but they should be running it with the new version, and making it update every server in their replication topology so that it’s easier to figure out replication positions when things go wrong.

It’s not a real global transaction ID, but it is good enough sometimes!

0 0 votes

Article Rating

6 Comments

Oldest

Newest Most Voted

Mat Keep

14 years ago

Hi Baron
more than just “being considered for a future version of MySQL”.

A preview version has been released and is discussed here:
http://d2-systems.blogspot.com/2011/10/global-transaction-identifiers-feature.html

The preview cab be downloaded today from labs.mysql.com in the mysql-5.6-labs-global-trans-ids build

Dave Juntgen

14 years ago

Baron – can not achieve the same with MHA?

Author

Baron Schwartz

14 years ago

Not if servers are missing. There is no reliable way to correlate positions between servers when there is a server in between that is simply missing. We need TRUE global transaction IDs to do this programmatically and reliably.

Author

Baron Schwartz

14 years ago

Mat, my understanding is that features that are in lab previews aren’t guaranteed to go into the next release, but features that are in development milestone releases are definitely targeted for the next release. Is this incorrect?

Robert Hodges

14 years ago

Why wouldn’t you just recommend Tungsten Replicator in this case? It has global IDs and covers this case as well as normal slave-to-master promotion very well. Even when you know roughly where to look, matching logs is very tedious and could result in prolonged downtime for busy systems.

vishal saxena

14 years ago

Whenever i hear the talk about failover from intermediate slaves to master and knowing that mysql does not have a way match the transactions as ‘global transactions id’.I agree that matching logs is very tedious but how about ‘row based replication’ ? i think it can help.Or may be its a dumb idea but still throwing.

if you have A——>B——->C and B crashed and captured information is telling that B ( relay_master_log_file: test_file , exec_master_log_pos: 2399 ) was replicating from these positions before crashing then we can tell C to start replicating from A with “relay_master_log_file: test_file , exec_master_log_pos: 2399” and ignore the duplicate key errors for some time untill replication catches up.This should work fine in ‘row based replication’ correct ? because even pointing to previous position and by ignoring duplicate key errors we are bringing the rows up to the same level.

that way we do not have to match the logs.