Percona XtraDB Cluster replicates actions executed on one node to all other nodes in the cluster and make it fast enough to appear as it if is synchronous (aka virtually synchronous).
There are two main types of actions: DDL and DML. DDL actions are executed using Total Order Isolation (let’s ignore Rolling Schema Upgrade for now) and DML using normal Galera replication protocol.
This manual page assumes the reader is aware of Total Order Isolation and MySQL replication protocol.
DELETE) operations effectively change the state
of the database, and all such operations are recorded in XtraDB by
registering a unique object identifier (aka key) for each change (an update
or a new addition).
append_keyoperation registers the key of the data object that has undergone a change by the transaction. The key for rows can be represented in three parts as
pkis absent, a hash of the complete row is calculated). In short there is quick and short meta information that this transaction has touched/modified following rows. This information is passed on as part of the write-set for certification to all the nodes of a cluster while the transaction is in the commit phase.
cert_index_ng. If the said key is already part of the vector, then conflict resolution checks are triggered.
Changes made to DB objects are bin-logged. This is the same as how MySQL does it for replication with its Master-Slave ecosystem, except that a packet of changes from a given transaction is created and named as a write-set.
Once the client/user issues a
COMMIT, Percona XtraDB Cluster will run a
commit hook. Commit hooks ensure following:
gcs_recv_threadis first to receive the packet, which is then processed through different action handlers.
id, which is a locally maintained counter by each node in sync with the group. When any new node joins the group/cluster, a seed-id for it is initialized to the current active id from group/cluster. (There is an inherent assumption/protocol enforcement that all nodes read the packet from a channel in same order, and that way even though each packet doesn’t carry
idinformation it is inherently established using the local maintained
/* Common situation - * increment and assign act_id only for totally ordered actions * and only in PRIM (skip messages while in state exchange) */ rcvd->id = ++group->act_id_; [This is an amazing way to solve the problem of the id co-ordination in multiple master system, otherwise a node will have to first get an id from central system or through a separate agreed protocol and then use it for the packet there-by doubling the round-trip time].
What happens if two nodes get ready with their packet at same time?
Both nodes will be allowed to put the packet on the channel. That means the channel will see packets from different nodes queued one-behind-another.
It is interesting to understand what happens if two nodes modify same set of rows. For example:
create -> insert (1,2,3,4)....nodes are in sync till this point. node-1: update i = i + 10; node-2: update i = i + 100; Let's associate transaction-id (trx-id) for an update transaction that is executed on node-1 and node-2 in parallel (The real algorithm is bit more involved (with uuid + seqno) but conceptually the same so for ease we're using trx_id here) node-1: update action: trx-id=n1x node-2: update action: trx-id=n2x
Both node packets are added to the channel but the transactions are conflicting. Let’s see which one succeeds. The protocol says: FIRST WRITE WINS. So in this case, whoever is first to write to the channel will get certified. Let’s say node-2 is first to write the packet and then node-1 makes immediately after it.
each node subscribes to all packages including its own package. See below for details.
The certification protocol will be described using the example from above. As discussed above, the central certification vector (CCV) is updated to reflect reference transaction.
n2x, whereas write-set proposes setting it to
n1x. This causes a conflict, which in turn causes the node-1 originated transaction to fail the certification test.
This helps point out a certification failure and the node-1 packet is rejected.
This suggests that the node doesn’t need to wait for certification to complete, but just needs to ensure that the packet is written to the channel. The applier transaction will always win and the local conflicting transaction will be rolled back.
What happens if one of the nodes has local changes that are not synced with group?
create (id primary key) -> insert (1), (2), (3), (4); node-1: wsrep_on=0; insert (5); wsrep_on=1 node-2: insert(5). insert(5) will generate a write-set that will then be replicated to node-1. node-1 will try to apply it but will fail with duplicate-key-error, as 5 already exist. XtraDB will flag this as an error, which would eventually cause node-1 to shutdown.
With all that in place, how is GTID incremented if all the packets are
processed by all nodes (including ones that are rejected due to certification)?
GTID is incremented only when the transaction passes certification and is ready
for commit. That way errant-packets don’t cause GTID to increment. Also, they
don’t confuse the group packet
id quoted above with GTID. Without
errant-packets, you may end up seeing these two counters going hand-in-hand,
but they are no way related.
For general inquiries about our open source software and database management tools, please send us your question and someone will contact you.