New wsrep_provider_options in Galera 3.x and Percona XtraDB Cluster 5.6Jay Janssen
Now that Percona XtraDB Cluster 5.6 is out in beta, I wanted to start a series talking about new features in Galera 3 and PXC 5.6. On the surface, Galera 3 doesn’t reveal a lot of new features yet, but there has been a lot of refactoring of the system in preparation for great new features in the future.
Galera vs MySQL options
wsrep_provider_options is a semi-colon separated list of key => value configurations that set low-level Galera library configuration. These tweak the actual cluster communication and replication in the group communication system. By contrast, other PXC global variables (like ‘wsrep%’) are set like other mysqld options and generally have more to do with MySQL/Galera integration. This post will cover the Galera options and mysql-level changes will have to wait for another post.
Here are the differences in the wsrep_provider_options between 5.5 and 5.6:
$ diff 5.5 5.6
> gmcast.segment = 0;
< replicator.causal_read_timeout = PT30S;
< replicator.commit_order = 3 |
> repl.causal_read_timeout = PT30S;
> repl.commit_order = 3;
> repl.key_format = FLAT8;
> repl.proto_max = 5;
> socket.checksum = 2
This is a new setting in 3.x and allows us to distinguish between nodes in different WAN segments. For example, all nodes in a single datacenter would be configured with the same segment number, but each datacenter would have its own segment.
Segments are currently used in two main ways:
- Replication traffic between segments is minimized. Writesets originating in one segment should be relayed through only one node in every other segment. From those local relays replication is propagated to the rest of the nodes in each segment respectively.
- Segments are used in Donor-selection. Yes, donors in the same segment are preferred, but not required.
2013-11-21 17:26:59 3853 [Warning] WSREP: There are no nodes in the same segment that will ever be able to become donors, yet there is a suitable donor outside. Will use that one.
replicator -> repl
The older ‘replicator’ tag is now renamed to ‘repl’ and the causal_read_timeout and commit_order settings have moved there. No news here really.
repl.key_format = FLAT8
Every writeset in Galera has associated keys. These keys are effectively a list of primary, unique, and foreign keys associated with all rows modified in the writeset. In Galera 2 these keys were replicated as literal values, but in Galera 3 they are hashed in either 8 or 16 byte values (FLAT8 vs FLAT16). This should generally make the key sizes smaller, especially with large CHAR keys.
Because the keys are now hashed, there can be collisions where two distinct literal key values result in the same 8-byte hashed value. This means practically that the places in Galera that rely on keys may falsely believe that there is a match between two writesets when there really is not. This should be quite rare. This false positive could affect:
- Local certification failures (Deadlocks on commit) that are unnecessary.
- Parallel apply – things could be done in a stricter order (i.e., less parallelization) than necessary
Neither case affects data consistency. The tradeoff is more efficiency in keys and key operations generally making writesets smaller and certification faster.
Limits the Galera protocol version that can be used in the cluster. Codership’s documentation states it is for debugging only.
socket.checksum = 2
This modifies the previous network packet checksum algorithm (CRC32) to support CRC32-C which is hardware accelerated on supported gear. Packet checksums also can now be completely disabled (=0).
In the near future I’ll write some posts about WAN segments in more detail and about the other global and status variables introduced in PXC 5.6.