September 2, 2014

High-availability options for MySQL, October 2013 update

The technologies allowing to build highly-available (HA) MySQL solutions are in constant evolution and they cover very different needs and use cases. In order to help people choose the best HA solution for their needs, we decided, Jay Janssen and I, to publish, on a regular basis (hopefully, this is the first), an update on the most common technologies and their state, with a focus on what type of workloads suite them best. We restricted ourselves to the open source solutions that provide automatic failover. Of course, don’t simply look at the number of Positives/Negatives items, they don’t have the same values. Should you pick any of these technologies, heavy testing is mandatory, HA is never beyond scenario that have been tested.

Percona XtraDB Cluster (PXC)

Percona XtraDB Cluster (PXC) is a version of Percona Server implementing the Galera replication protocol from Codeship.

Positive pointsNegative points
  • Almost synchronous replication, very small lag if any
  • Automatic failover
  • At best with small transactions
  • All nodes are writable
  • Very small read after write lag, usually no need to care about
  • Scale reads very well and to some extent, writes
  • New nodes are provisioned automatically through State Snapshot Transfer (SST)
  • Multi-threaded apply, greater write capacity than regular replication
  • Can do geographical disaster recovery (Geo DR)
  • More resilient to unresponsive nodes (swapping)
  • Can resolve split-brain situations by itself
  • Still under development, some rough edges
  • Large transactions like multi-statement transactions or large write operations cause issues and are usually not a good fit
  • For quorum reasons, 3 nodes are needed but one can be a lightweight arbitrator
  • SST can be heavy over a Wan
  • Commit are affected by the network latency, this impacts especially Geo DR
  • To achieve HA, a load balancer, like haproxy, is needed
  • Failover time is determined by the load balancer check frequency
  • Performance is affected by the weakest/busiest node
  • Foreign Keys are potential issues
  • MyISAM should be avoided
  • Can be mixed with regular async replication as master or slave but, slaves are not easy to reconfigure after a SST on their master
  • Require careful setup of the host, swapping can lead to node expulsion from the cluster
  • No manual failover mode
  • Debugging some Galera protocol issues isn’t trivial

 

Percona replication manager (PRM)

Percona replication manager (PRM) uses the Linux HA Pacemaker resource manager to manage MySQL and replication and provide high-availability. Information about PRM can be found here, the official page on the Percona web site is in the making.

Positive pointsNegative points
  • Nothing specific regarding the workload
  • Unlimited number of slaves
  • Slaves can have different roles
  • Typically VIP based access, typically 1 writer VIP and many reader VIPs
  • Also works without VIP (see the fake_mysql_novip agent)
  • Detects if slave lags too much and remove reader VIPs
  • All nodes are monitored
  • The best slaves is picked for master after failover
  • Geographical Disaster recovery possilbe with the lightweight booth protocol
  • Can be operated in manual failover mode
  • Graceful failover is quick, under 2s in normal conditions
  • Ungraceful failover under 30s
  • Distributed operation with Pacemaker, no single point of failure
  • Builtin pacemaker logic, stonith, etc. Very rich and flexible.
  • Still under development, some rough edges
  • Transaction maybe lost is master crashes (async replication)
  • For quorum reasons, 3 nodes are needed but one can be a lightweight arbitrator
  • Only one node is writable
  • Read after write may not be consistent (replication lag)
  • Only scales reads
  • Careful setup for the host, swapping can lead to node expulsion from the cluster
  • Data inconsistency can happen if the master crashes (fix coming)
  • Pacemaker is complex, logs are difficult to read and understand

 

MySQL master HA (MHA)

Like with PRM above, MySQL master HA (MHA), provides high-availability through replication. The approach is different, instead of relying on an HA framework like Pacemaker, it uses Perl scripts. Information about MHA can be found here.

Positive pointsNegative points
  • Mature
  • Nothing specific regarding the workload
  • No latency effects on writes
  • Can have many slaves and slaves can have different roles
  • Very good binlog/relaylog handling
  • Work pretty hard to minimise data loss
  • Can be operated in manual failover mode
  • Graceful failover is quick, under 5s in normal conditions
  • If the master crashes, slaves will be consistent
  • The logic is fairly easy to understand
  • Transaction maybe lost is master crashes (async replication)
  • Only one node is writable
  • Read after write may not be consistent (replication lag)
  • Only scales reads
  • Monitoring and logic are centralized, single-point of failure, a network partition can cause a split-brain
  • Custom fencing devices, custom VIP scripts, no reuse of other projects tools
  • Most of the deployments are using manual failover (at least at Percona)
  • Requires priviledged ssh access to read relay-logs, can be a security concern
  • No monitoring of the slave to invalidate it if it lags too much or if replication is broken, need to be done by external tool like HAProxy
  • Careful setup for the host, swapping can lead to node expulsion from the cluster

 

NDB Cluster

NDB cluster is the most high-end form of high-availability configuration for MySQL. It is a complete shared nothing architecture where the storage engine is distributed over multiple servers (data nodes). Probably the best starting point with NDB is the official document, here.

Positive pointsNegative points
  • Mature
  • Synchronous replication
  • Very good at small transactions
  • Very good at high concurrency (many client threads)
  • Huge transaction capacity, more than 1M trx/s are not uncommon
  • Failover can be ~1s
  • No single point of failure
  • Geographical disaster recovery capacity built-in
  • Strong at async replication, applying by batches gives multithreaded apply at the data node level
  • Can scale reads and writes, the framework implements sharding by hashes
  • Not a drop-in replacement for Innodb, you need to tune the schema and the queries
  • Not a general purpose database, some loads like reporting are just bad
  • Only the Read-commited isolation level is available
  • Hardware heavy, need 4 servers mininum for full HA
  • Memory (RAM) hungry, even with disk-based tables
  • Complex to operate, lots of parameters to adjust
  • Need a load balancer for failover
  • Very new foreign key support, field reports scarce on it

 

Shared storage/DRBD

Achieving high-availability use a shared storage medium is an old and well known method. It is used by nearly all the major databases. The share storage can be a DAS connected to two servers, a LUN on SAN accessible from 2 servers or a DRBD partition replicated synchronously over the network. DRBD is by bar the most common shared storage device used in the MySQL world.

Positive pointsNegative points
  • Mature
  • Synchronous replication (DRBD)
  • Automatic failover is easy to implement
  • VIP based access
  • Write capacity is impacted by network latency for DRBD
  • SANs are expensive
  • Only for InnoDB
  • Standby node, a big server doing nothing
  • Need a warmup period after failover to be fully operational
  • Disk corruption can spread

 

About Yves Trudeau

Yves is a Principal Consultant at Percona, specializing in technologies such as MySQL Cluster, Pacemaker and DRBD. He was previously a senior consultant for MySQL and Sun Microsystems. He holds a Ph.D. in Experimental Physics.

Comments

  1. Hi Yves,

    I’m curious what you think about using MHA w/semi-sync, or mysqlfailover (from the MySQL Utilities) which is designed to work with Replication+GTIDs?

  2. Roman Vynar says:

    >DRBD: Only for InnoDB.
    Guess it’s a mistake?

  3. LordFPL says:

    Good summary… but i’m disapointed as Roman “DRBD: Only for InnoDB.”… gné ?
    For information, i made a HA solution with a mix of drbd/pacemaker/slave.

    3 nodes with 3 type of environnement (prod/dev/valid).

    Example of nominal situation for prod
    - node 1 = drbd master + mysql master
    - node 2 = drbd slave
    - node 3 = mysql slave

    I have a sync and async replication… drbd is managed with pacemaker = 15sec to swith to another server.
    In case of problem, i have the async replication with mysql slave
    (and of course, some dump each night ;)).

    Cost of this = downgrade of write performance due to drbd -> from 3000ops with 64 writers, i fall to 1500… but it’s sufficient for our users (have an average of 500ops).

    I don’t know if i made a big mistake with this… but it’s in production since 1 year without any problem :)

  4. Hi Roman,
    The problems with Myisam are:  no durability and no guaranteed recovery.   If the primary drbd node crashes with non flushed writes to a large myisam table,  it may take hours to recover and data is very likely to be lost.  Of course, a read-only myisam table has no issue with drbd but the general use case is no myisam tables.  Remember, this is an ha setup, if it can’t recover in a predictible way, what’s the benefit.

    Regards,

    Yves

  5. Mha and Prm with semi-sync is interesting although it lowers the write bandwidth a bit. I haven’t played much with the new mysql tool for failover in 5.6, it can reconfigure replication but an ha solution is more than that.

    –Yves

  6. See my reply to Roman.

    –Yves

  7. Marc Langevin says:

    Hi Yves,

    I have experimented with MySQL 5.6 admin replication scripts and they work ok as long as you know what you are doing and you are disciplined. But you are rigth, this is not HA and I see replication a good solution for managed failover , not automatic. For HA, we have experimented with VmWare HA option and found it could give very good results. Of course, this make sense if there is already a Vm infrastructure in place. Also, using database server on Vm can still suffer some overhead depending on the load put on the database.

    Regards,

    - Marc

  8. Yves and Jay, very nice summary on MySQL HA solutions.

    In the future editions I would appreciate you not to restrict yourself to open source only solutions. I would like to see Continuent Tungsten (http://www.continuent.com/solutions/clustering) included.

    I agree that there is some merit on open source only approach. After all, Continuent also values open source and contributes to MySQL community our own open source replication solution, Tungsten Replicator, for various advanced multi-master, multi-site and heterogenous (for example MySQL to Oracle, MySQL to Hadoop, or Oracle to MySQL, etc.) replication needs.

    But there is also case to be made to offer your readers more balanced view of the world. Especially taking account that commercial Continuent Tungsten solution may not end up costing possible users any more than open source only solutions when taken account the work related to deploy and maintain these solutions.

    Additionally, you may not want to limit yourself in HA only. You could also take account DR, which creates whole new set of considerations and requirements. By adding DR in the consideration would give even better guidance which solution actually meets the real world needs.

    Last but not least, significant element that impacts down time is scheduled maintenance operations. Maintenance is the single most significant element impacting application availability. Good question to ask is if the chosen solution allows ‘zero downtime maintenance’ for all needed maintenance activities (schema change, database upgrade, hardware upgrade, etc.).

    Keep up with the good work!

  9. Good job – thanks for the great work @ Percona.

    You could add in the positives/negatives whether it really works in a WAN setup with 100ms latency between data centers.

  10. Sheyda Amini says:

    Does MHA works with MyIsam Storage Engine?

  11. MHA, like PRM, is only using MySQL replication so it works with MyISAM.

Speak Your Mind

*