September 30, 2014

New wsrep_provider_options in Galera 3.x and Percona XtraDB Cluster 5.6

Now that Percona XtraDB Cluster 5.6 is out in beta, I wanted to start a series talking about new features in Galera 3 and PXC 5.6.  On the surface, Galera 3 doesn’t reveal a lot of new features yet, but there has been a lot of refactoring of the system in preparation for great new features in the future.

Galera vs MySQL options

wsrep_provider_options is a semi-colon separated list of key => value configurations that set low-level Galera library configuration.  These tweak the actual cluster communication and replication in the group communication system.  By contrast, other PXC global variables (like ‘wsrep%’) are set like other mysqld options and generally have more to do with MySQL/Galera integration.   This post will cover the Galera options and mysql-level changes will have to wait for another post.

Here are the differences in the wsrep_provider_options between 5.5 and 5.6:

gmcast.segment=0

This is a new setting in 3.x and allows us to distinguish between nodes in different WAN segments.  For example, all nodes in a single datacenter would be configured with the same segment number, but each datacenter would have its own segment.

Segments are currently used in two main ways:

  1. Replication traffic between segments is minimized.  Writesets originating in one segment should be relayed through only one node in every other segment.  From those local relays replication is propagated to the rest of the nodes in each segment respectively.
  2. Segments are used in Donor-selection.  Yes, donors in the same segment are preferred, but not required.

replicator -> repl

The older ‘replicator’ tag is now renamed to ‘repl’ and the causal_read_timeout and commit_order settings have moved there.  No news here really.

repl.key_format = FLAT8

Every writeset in Galera has associated keys.  These keys are effectively a list of primary, unique, and foreign keys associated with all rows modified in the writeset.  In Galera 2 these keys were replicated as literal values, but in Galera 3 they are hashed in either 8 or 16 byte values (FLAT8 vs FLAT16).  This should generally make the key sizes smaller, especially with large CHAR keys.

Because the keys are now hashed, there can be collisions where two distinct literal key values result in the same 8-byte hashed value.  This means practically that the places in Galera that rely on keys may falsely believe that there is a match between two writesets when there really is not.  This should be quite rare.  This false positive could affect:

  • Local certification failures (Deadlocks on commit) that are unnecessary.
  • Parallel apply – things could be done in a stricter order (i.e., less parallelization) than necessary

Neither case affects data consistency.  The tradeoff is more efficiency in keys and key operations generally making writesets smaller and certification faster.

repl.proto_max

Limits the Galera protocol version that can be used in the cluster.  Codership’s documentation states it is for debugging only.

socket.checksum = 2

This modifies the previous network packet checksum algorithm (CRC32) to support CRC32-C which is hardware accelerated on supported gear.  Packet checksums also can now be completely disabled (=0).

In the near future I’ll write some posts about WAN segments in more detail and about the other global and status variables introduced in PXC 5.6.

About Jay Janssen

Jay joined Percona in 2011 after 7 years at Yahoo working in a variety of fields including High Availability architectures, MySQL training, tool building, global server load balancing, multi-datacenter environments, operationalization, and monitoring. He holds a B.S. of Computer Science from Rochester Institute of Technology.

Comments

  1. Liviu says:

    Should I understand that if I have 3 nodes in one data center and 3 nodes in anotehr data center, having a total of 6 nodes cluster I should set gmcast.segment=0 on all the nodes in one data center and gmcast.segment=1 on all the nodes in the other data center?
    Thank you,
    Liviu

  2. Liviu: gmcast.segment would indeed be different in each datacenter.

    However, consider what would happen to your cluster if the WAN link between the two DCs went down. Most two colo architectures like this either need an arbitrator in a 3rd location or need to favor one colo over the other in terms of automatic failover.

Speak Your Mind

*