Percona XtraDB Cluster: Failure Scenarios and their Recovery
Percona XtraDB Cluster (a.k.a PXC) is an open source, multi-master, high availability MySQL clustering solution. PXC works with your MySQL / Percona Server-created database. Given the multi-master aspect, there are multi-guards to protect a cluster from entering an inconsistent state. Most of these guards are configurable based on their user environment. However, if they are not configured properly they could cause the cluster to stall, fail or error-out.
In this session, we’ll discuss failure scenarios, including a MySQL cluster entering a non-primary state due to network partitioning. We’ll also discuss a cluster stall due to flow control, data inconsistency causing the shutdown of a node and common problems during the initial catch up – a.k.a State Snapshot Transfer (SST). Other issues include delays in the purging of a transaction, a blocking DDL causing the entire cluster to stall and a misconfigured cluster.
We will also go over how to solve some of these problems and how to safely recover from these failures.
About the Authors
Alkin has extensive experience in enterprise relational databases working in various sectors for large corporations. With more then 20 years of industry experience he has acquired skills for managing large projects from ground up to production. For the past six years he's been focusing on e-commerce, SaaS and MySQL technologies. He managed and architected database topologies for high volume site at eBay Intl. He has several years of experience on 24X7 support and operational tasks as well as improving database systems for major companies. He has led MySQL global operations team on Tier 1/2 support for MySQL customers. Recently joined Percona's expert technical management team.