Thanks to all who attended my webinar yesterday. The slides and recording are available on the webinar’s page. I was a bit overwhelmed with the amount of questions that came in and I’ll try to answer them the best I can here.
Yes, it does. However, locks are not held on all nodes while a transaction is in progress, so there is a possibility for conflicts between simultaneous writes on different nodes. Certification is the process that Galera uses to protect your data integrity, and the problem is pushed back to the application to handle in the form of deadlock errors. As long as you understand this and the conditions where you may see more deadlocks (and your application can properly handle them), multi-node writing is fine. See more detail on this in a blog post I did a while ago.
Yes, there is no sharding in Percona XtraDB Cluster, all data goes to all nodes. You are limited by replication throughput, or more specifically, transaction apply throughput, just like with standard MySQL replication. However, PXC offers true parallel apply, so many transactions can be applied in parallel, which should generally give better throughput than conventional master/slave.
If you listened closely to something Alexey said during the webinar, it’s actually possible that WAN segments will improve commit times because the master node won’t have spend extra time sending the transaction to each remote node. I would expect overall write performance to vary somewhat compared to not using segments, but there are probably factors that may influence it either way including number of remote nodes, bandwidth, latency, etc.
With or without segments, all nodes must acknowledge the transaction, this is a vital part of Galera’s implementation and cannot be relaxed.
Good question, I don’t know. This would be fun to test and would make a good blog post. I’ll put it on my list. I’d bet that segments may increase such a limit, but it probably depends on a lot of factors.
Practically I haven’t seen a cluster with more than a half-dozen nodes, but that doesn’t mean that’s the limit. I’d expect a big cluster would be around 10-12 nodes, but in reality that’s just a gut feeling more than any hard evidence.
Not unless you regularly add AUTO_INCREMENT columns to existing tables using ALTER TABLE. Note that the bug was also fixed in 5.5.34.
Nope. Currently the Codership-mysql patches are for Innodb only.
Any other transactional storage engine can theoretically be supported provided you can implement prioritized transactions in it. However SphinxSE is not transactional, so its support would be similar to MyISAM at best (which is very rudimentary and not likely to change). It would be easy to add such support and it’s possible that the Maria Galera Cluster guys are already considering it since Maria ships with that storage engine.
UPDATE: It was pointed out to me that SphinxSE doesn’t store any actual data, but instead just acts as a client to a distinct Sphinx index (which I just wasn’t thinking of). In that sense, it doesn’t need Galera support: each node can just have a SphinxSE table, to which any DML will update the Sphinx index directly, no replication required by the cluster. SphinxSE should work fine with PXC.
It depends, but I’d consider migrating to PXC to be at least equivalent to the risk of upgrading a major MySQL version and it should be tested thoroughly. To limit your risk, breaking the upgrade into smaller pieces may be prudent, but it is not strictly necessary. The further away you are from 5.6 (like say you are on 5.1 or 5.0), the more likely I’d recommend separate steps.
No, group communication continues as before, it’s just that the routing of the actual writeset content is different.
On a cluster in steady state, it should be easy to do rolling changes. That is, take one node out of the cluster, add some extra storage, and put it back in, repeat. Ideally such rolling changes are done quickly enough and are non-destructive to the datadir so you can IST on restart.
wsrep_retry_autocommit has not changed to my knowledge. The reasoning behind not applying this methodology to explicit transactions was that it was generally assumed that explicit transactions may have application logic being applied between the statements and could we assume it was safe to simply retry the same transaction if the data changed underneath? For example:
SELECT * FROM USERS WHERE ID=100 FOR UPDATE;
# Application applies some business logic based on the row contents here???
UPDATE USERS SET AUTHORIZED=1 WHERE ID=100;
If the UPDATE USERS was an autocommit, then wsrep_retry_autocommit would simply re-broadcast the same writeset (unmodified) with the new RBR row if there was a conflict. It does not re-run the statement. Would it be safe to do this if the explicit transaction got a cluster deadlock? We don’t know. If the user record was modified (which is what a cluster deadlock would indicate), should they still be authorized?
This was actually released in 5.5. The big changes are that the new SST method allows for encrypted SST transfers, compression, rate limiting, progress metering, and other goodness like that. It is not backwards compatible with the old xtrabackup method, and both the donor and joiner must have the new method available (i.e., be running a PXC version that has both) for it to work.
Yes, no problem. The SST method is dictated by the JOINER, meaning whatever the joiner node’s wsrep_sst_method is, that’s what is used (the donor obeys this even if its wsrep_sst_method is different). You can just go change the wsrep_sst_method in all the config files and it will be used next time an SST happens (since SST would only happen on a restart anyway). Just be careful that you test xtrabackup first, since it requires proper mysql credentials to be set in wsrep_sst_auth.
Now that 5.6 is GA, we’ve just passed a milestone where I’d start to recommend 5.6 as a starter for new Percona XtraDB Cluster deployments going forward. Certainly 5.5 is more battle-tested and understood and there still may be good reasons to use it, but from now forward, I’m expecting that the need for 5.5 will only diminish, not increase.
For existing PXC 5.5 deployments, I’d probably recommend waiting a bit unless there is some great need for the new release, but I don’t have any specific bugs or issues I’m thinking of, just the general newness of 5.6.
Nothing specific, these should work fine with Percona XtraDB Cluster. In RBR these only run on the master node (the node taking the write) and the changes done by them are incorporated into the transactions writeset.
Actually, certification results are not shared on the cluster. Certification is deterministic (or should be) and all nodes are expected to either pass or fail a transaction without any mixed results, hence there is no communication about pass/failure and only the master node (the node that originated the transaction) actually does anything about certification failure (i.e., increments a lcf failure counter and deadlock error for the client). Past bugs in PXC have resulted in non-deterministic certification in some edge-cases, and this can then obviously lead to node inconsistencies.
What is sent back is an acknowledgement of receipt of the transaction (which is much smaller) at the replication stage (pre-certification) and my understanding is that all nodes will reply back to the originating node more or less directly. I tend to think of this communication as “out-and-back”, but in reality it’s more nuanced than this; for example an acknowledgement may be piggy-backed with a new transaction from that node.
The act of replication delivers the transaction payload to all nodes, and all nodes acknowledge the transaction to each other, AND within this process a consistent GTID for the transaction is established efficiently. HOW precisely this happens is, as Alexey would say, is an implementation detail. Segments simply modify the method of transaction delivery, but I believe most of the other details are more or less the same.
Lock wait timeouts in the cluster will be happening to transactions on a single node while it waits for other transactions on that same node to commit. Any way that you can reduce the commit time should, in theory, reduce the lock_waits.
Since part of that commit wait is the synchronous replication, it stands to reason that this is perhaps the bulk of the wait time. I haven’t measured actual commit rates comparing full vs minimal row images, so I can’t tell you if this would decrease replication time or not. I can imagine a scenario where a very large transaction would benefit from minimal row images (i.e., by being much smaller and thereby taking less time to transmit over the network), but I’d expect that when the transactions are already small and single-row to begin with, it would make less of an impact.
Are you guys using innodb_flush_log_at_trx_commit=1? Relaxing that (and relying on the cluster replication for durability) may improve your commit times a lot.
I’m not aware of anything that is necessarily Galera 3 specific that is tunable, though there are things baked into Galera 3 that may help, such as the certification improvements. As I mentioned above, minimal row images may help by simply reducing the volume of data that needs transmitting.