Buy Percona ServicesBuy Now!

Planes, Trains, and Automobiles: MySQL XA Transactions

 | May 16, 2018 |  Posted In: Insight for DBAs, Insight for Developers, MySQL, Percona Server for MySQL

PREVIOUS POST
NEXT POST

MySQL, coupled with the InnoDB engine, provides full ACID compliance. With the addition of “eXtended Architecture” (XA) technology, they can all act in a distributed atomic environment.

In distributed systems, several different technologies, platforms and devices may need to act on separate sets of data atomically. These backend datasets can exist side by side in a datacenter, or across oceans. There can be multiple datasets where atomicity is a necessity. Foreign engines could completely control these datasets, with no concept of cross communication. The XA standard was specified by the Open Group to describe distributed transaction processing. This “eXtended Architecture” describes the relationship between a transaction manager and a transaction resource. You can find the specification HERE.

Transactions are:

  • ATOMIC, committed as a whole
  • CONSISTENT, the state of all data is either valid, or rolled back to last valid state prior to a failed transaction
  • ISOLATED, uncommitted transactions are isolated from other transactions*
  • DURABLE, once committed, transactions stay committed even in the event of a system failure

* transaction isolation level is configurable, allowing for reads of various states (dirty, phantom) of un-committed data, if necessary.

Working with database objects, transactions are committed as a whole (ATOMIC), or not committed at all. This is achieved with the concept of the two-phase commit (2PC) protocol. 2PC describes the method whereby all processes involved in a transaction are first polled for a consistent state, ready to commit (phase 1) and the commit phase where all process commit the changes. If the vote during the polling phase is not unanimous, the entire transaction is rolled back. Sometimes though, the native transaction atomicity in the InnoDB engine is not enough.

The most common example of a distributed transaction is a banking transaction. Let us imagine you go to the local grocery store and pick up a few items. At the register, you pay with a debit card and request some money back. There are several transactions going on here at the same time. From the moment the request for payment has started, despite all the different pieces involved, the entire transaction must be completed as a whole. To maintain a consistent state, the transaction manager assigns an XID that is distributed along with each individual request. The XID is a unique identifier that is persistent for each meta transaction.

Let’s say, for example, the bill for your groceries was $288.50, and you want $50.00 back. There are several separate transactions going on here, but to simplify, there are three main transactions going on:

  1. The withdrawal of $338.50 from your bank account
  2. The withdrawal of $50.00 from the cash register account
  3. The deposit of $338.50 to the master store account
XA transactions
Paper, plastic or XA transaction?

 

These transactions must be done atomically. The transaction manager handles the requests and provides an XID; each resource manager starts a transaction with that XID. The resource managers, in turn, perform the necessary DML. Each resource manager ends the transaction and prepares it for commit, and at this point, those changes are not finalized. But the data is locked, solely writable by this transaction. During this time the transaction manager is polling the resource managers for the XID in prepared state. When the transaction manager receives confirmation from all resource managers that they are prepared, it sends the instruction to commit.

This is a very simplified description of the transaction chain, but it is enough to illustrate the individual nature of the transactions. We have three separate”banks” here: your bank, the physical cash register and the store bank. A separate transaction is prepared on each of the disparate accounts: decrementing $338.50 from your bank (if available of course), decrementing $50 from the cash register local balance, and incrementing the store bank account $338.50. The transaction manager then polls the transaction resources for status and if they are all in agreement that they can complete (PREPARED), they execute as a whole, creating an atomic transaction across the many resources.

Each of these resources may be running a different backend resource. For example, we can imagine your bank is running MySQL community, the cash register is backed by PostgreSQL and the store bank is running Percona Server for MySQL. These completely separate technologies act in concert as directed by the XA transaction manager.

  1. The transaction manager follows a distinct path with each resource:
  2. Transaction Manager (TM) instructs the Resource Manager (RM) to XA START my_XA_transaction
  3. RM executes DML1; DML2; … DMLN; XA END my_XA_transaction; XA PREPARE my_XA_transaction*
  4. TM polls RM for a positive response on PREPARE
  5. When all RMs are in prepared state, the TM instructs the RMs to XA COMMIT/ROLLBACK

*transactions can be “ONE PHASE” and will COMMIT and PREPARE in one step if only a single Transaction Manager is involved

Another example of the XA protocol we might be familiar with is a travel agency. A travel agent may be reserving travel accommodations, lodging, and local transportation as a single transaction in the view of the consumer. Behind the scenes, several disconnected transactions are being executed across several different client resources.

XA Transactions
XA transactions book planes, trains and automobiles at the same time.

Working Directly with XA Transactions

MySQL enables XA transactions by default. However, they are only supported by the InnoDB engine. XA transactions are by design, optimistic: the manager assumes that if a transaction can prepare completely, it can commit completely. This is not always the case.

In MySQL, a transaction may report back to the manager that it is prepared (that is, the transaction can be completed logically and within data integrity rules as enforced by the resource), but may not be able to actually complete that operation due to block corruption, power loss, or any host of reasons a server may halt. The rules of XA transactions do not require a resource to report back success on commit. This can manifest as a database locking incident if a resource manager outside of the MySQL daemon does not successfully prepare its own transaction. The thread will wait, with potentially many rows locked for the resource manager to initiate the execute statement. While rare, I have seen instances where this occurs and XA transactions are left in the “recover” state.

Recover is not the aptest term for the state of the transaction. Recovery implies that something was broken and has now been fixed. That may be the case, a resource manager may have failed and is now back online and a transaction needs to be recovered. But every XA transaction that has been prepared, but not yet executed will show up when XA RECOVER is executed. A more descriptive term might be “uncommitted” and the reason for the lack of commit may be a recovered transaction. In any case, transactions that do not clear have definite locking implications and can cause total application failure if proper error handling at the transaction management layer is not in place.

Take the following example. First, we will start an XA transaction:

And perform locking DML:

With our local work “done”, we end and prepare the transaction:

At this point, in theory, the transaction manager would poll all transaction resources and when they all respond that they are prepared, the order commit is given. However, if one of the resource managers for any reason does not report back the transaction on the MySQL server must wait. This, of course, blocks other transactions from acquiring row locks. For example, another thread tries to update the same row and times out waiting:

And we can see the transaction in the InnoDB status holding the lock:

In order to see what XA transactions have been prepared but not committed, we can execute the recover command:

This output is not useful as the “data” column is in potentially unprintable characters and cannot be used on the transaction. Luckily we can convert the XID to a hexstring:

That is better, but still not totally usable. We still need to parse the value of the data column for the actual gtrid and bqual values of the XID. The gtrid_length and bqual_length fields tell us the position to start and stop. First, we need to strip the characters that denote a HEX value, the preceding 0x. Then we need to take the grtid_length in bytes, and bqual_lengh in bytes and separate out those values.

I started this transaction denoting the XID in hex, so the values were converted (i.e., 1A is a single byte). Armed with this knowledge, we know that the gtrid is “1A2B3C4D5E6F”, and the bqual is “F6E5D4C3B2A1”. Now we can take appropriate action on the stranded transaction. When confronted with this, it is necessary to investigate all transaction managers in order to determine if the transaction should be committed, or rolled back. The transactions are logged in the binary log with the XID in hexstring format*. We can use mysqlbinlog to find a specific XA START command and inspect the transactions:

*hexstrings are case insensitive, when parsing a binary log for the hex string, make sure to ignore case in string searches

When we have made the determination we can either commit the transaction or roll it back*:

*the alternative is XA ROLLBACK XID

Which frees up the object for other transactions:

Alternatively, I could have started the transaction denoting the XID in quoted string form:

Replication

Prior to MySQL minor version 5.7.7, replication was not entirely safe. A crash could result in a transaction that can be recovered by issuing the appropriate XA commit, but the corresponding binary log entry would not exist. We have worked with a client doing distributed financial transactions utilizing XA and traditional replication on older versions. Occasionally transactions were lost on slaves and pt-table-checksum and pt-table-sync were necessary to ensure consistency. This was not best practice, but this topology supported >15000/TPS for years with just a handful of failures! More info on the replication issue prior to 5.7.7 was detailed by Yves.

Note that Percona XtraDB Cluster does not support XA.

Conclusion

When you have need to tie disparate data sources together for distributed atomic transactions, the support is built into MySQL and InnoDB engine by default. Make sure the engineers responsible for the database are aware of the existence of the XA transactions and know how to debug them. And always be certain to keep MySQL up to date.

Find the full documentation in the reference manual.

PREVIOUS POST
NEXT POST
Dov Endress

Dov joined Percona in the fall of 2015 as a senior support engineer. He learned BASIC in elementary school on an Apple IIE, and his first computer was a Commodore 64. Dov started working in the LAMP stack in 1999 and has been doing so ever since. He lives in northern Nevada with his wife, step-daughter, grandson, a clowder of cats and Macy - the best dog a person could meet. In his free time he can be found somewhere outdoors or making things in the garage.

Leave a Reply