November 25, 2014

Is DRBD the right choice for me?

It seems pretty common to find customers install DRBD for the wrong reasons. There are many pros/cons to compare DRBD to replication, but I’ve managed to cut down my spiel I give to customers to these two points:

  • DRBD’s aim (assuming replication mode C) is to provide 100% consistency, and then as much uptime as possible.
  • MySQL Replication (with a manager such as MMM) aims to have 100% availability, at the potential loss of some data surrounding a failure.

So if you are installing DRBD with the aim of purely “availability”, and are not worried about losing that last write on the crash to your master database that (hopefully) happens only once every few years, you may be using the wrong technology.

While the prized “1 minute failover” is possible in DRBD, it doesn’t really explain the full picture. The typical crash recovery process in DRBD is a lot longer:

  • After resource transfer, a filesystem check runs (0-5 seconds).
  • mysqld is started (1-5 seconds)
  • InnoDB runs through crash recovery (1 minute – several hours). Peter wrote about this here.
  • The server is then ready to accept connections.

Now, having said that: If you have an application that requires 100% consistency, then DRBD is one of your best choices on the mysql-market today.

About Morgan Tocker

Morgan is a former Percona employee.
He was the Director of Training at Percona. He was formerly a Technical Instructor for MySQL and Sun Microsystems. He has also previously worked in the MySQL Support Team, and provided DRBD support.

Comments

  1. Hi Morgan,

    Yep, precisely what I’ve been saying for a while!

    It is also worth noting that when you reach the last step “The server is then ready to accept connections.”, it’s running with essentially totally cold caches. In a replicated model, the failover pair very likely both have very similar, hot, caches.

    Regards,

    Jeremy

  2. Jeffrey Gilbert says:

    For pure availability without the requirement of consistency, i’ve been reading up on these non SQL based options:

    http://www.mongodb.org/
    http://books.couchdb.org/relax/why-couchdb

    Through my research, I’ve learned that I’m far more concerned with consistency than availability which I can usually add through the use of replication servers and caching mechanisms (memcached) and it doesn’t mean you can’t have both going with mysql. What I also learned was there are some other ideas afloat about high availability where data sanity is not your paramount concern (like with search results from google. those are cached anyway). For HA options, there are a lot of good ones these days!

  3. Morgan Tocker says:

    Hi Jeremy,

    Very true! Although it’s not without issues in MySQL Replication (the SELECT queries may need to be mirrored on the slave to warm the caches).

    I used to give a talk comparing DRBD to Replication where I could think of about 20 things to compare. There’s a lot of pros/cons for each:
    http://conf.phpquebec.com/slides/2009/A_tour_of_MySQL_High_Availability-phpquebec_2009.pdf

    .. but as I said in my post, it normally just comes down to these two points.

    – Morgan

  4. Morgan Tocker says:

    Hi Jeffrey,

    It’s very application specific which is more important:

    * If I were Youtube/Facebook/, I would be far more concerned with how much downtime is publicly visible than anything else. Consistency is “nice to have”, but if you fleece out the requirements enough, a lot of people fit in the bucket of uptime triumphs all.

    * But if I was responsible for a banking/financial application I’d be much more concerned with making sure that I have all of the data in more than one place. You can usually tell if you fit in this category if “Application down, but all data is safe” is a higher priority than “Secondary server is down, living on the edge of my seat if primary now has a failure!”.

    DRBD is elegant because it retrofits so well to introduce HA to MySQL applications. I’m roughly familiar with MongoDB – the 10gen guys know what they are doing, and it looks promising ;)

  5. peter says:

    I think with mk-table-checksum things got a whole deal better – now it is much easier and less intrusive to recheck if there are any inconsistences between master and slave following failover.

    I also would note one trick – if you use sync-binlog on the master you will have all statements in the log when it crashes. If you records slave position in master logs when you do the switch you can examine the statements which were “lost” because of fail over and act upon them.

    There is also a concept (though I’m unaware of any tool implementing it) to use MySQL replication with DRBD used for replication of binary log. In this case in case master crashes you should be able to catch up the events from the binary log replicated by DRBD before promoting the slave to the master.

  6. Anders Karlsson says:

    I think there are more things to consider here. Like interaction with some HA framework (Linux-HA springs to mind), I think DRBD is in much better shape out of the box. Also, failover with MySQL replication is far from instantaneous, you still need to wait on the slave fo the Relay binlog to be empty, unless you are not concerned with your data being properly serialized. And even if you are not, there is one issue with using MySQL Replication that can make things difficult, which is fail-back:

    Let’s say you have a crach on the Master and fail over to teh slave. One issue is that you may have “lost” data on the Master. But the next issue comes, even if your master database is actually intact and can be restarted, when you fail back to the master. You now have two servers, not necessarily with the slave being ahead of the master, which can be fixed, but with the master having on set of data not on the slave, and the other way around.

    One main issue is the “dual lag” you get in the two asynchronous layers here, i.e. from the master to the Slave realy log, and from the Relay log to the Slave database. When failing over and failing back, both of these needs to be handled, and this is where the difficulty comes in.

    This is not to say that there is sometning wrong with MySQL Replication, or that it cannot be used for HA, but that MySQL Replication has many other uses (scale-out for example) but that as an HA solution, although is works, is rather difficult to manage at times, and will often require manual intervention to work successfully. The latter is mostly due to the Asynchronous nature of MySQL Replication, which has it’s bad sides, as noted above, but also it’äs good points or course.

  7. Florian Haas says:

    @Morgan: agree (with the original post, but even more specifically on comment 4).

    @Jeremy: I understand people commonly “preheat” caches on MySQL startup using a clever SELECT script with –init-file, so as far as I can see that point is moot — am I mistaken?

    @Anders: I’d love to see someone write a Master/Slave capable, MySQL replication aware version of the Pacemaker mysql OCF resource agent so people could get the best of both worlds while using the same cluster framework all the time. I’ve tried myself; but found myself lacking the insight into MySQL replication to implement this properly. Any takers?

  8. Morgan Tocker says:

    @Florian: The problem with the cache heating scripts is they can take a long time (best measured in hours) on some of the larger servers customers are using these days – 16G/32G/64G++ of RAM seems to be very common.

    I think this is an issue that needs to be addressed by the MySQL Server team, since not only does it make failover with DRBD more difficult, but it makes server restarts to turn on features like ‘log-bin’ very painful.

    Jeremy provided a proof of concept patch to one possible solution:
    http://provenscaling.com/blog/2008/10/06/making-mysql-more-usable-innodb-saverestore-buffer-pool-patch/

    If the idea he mentioned “Add runtime commands to save/restore the buffer pool state from a running server, without shutting down” was implemented, this could be used for DRBD to periodically checkpoint.

  9. Morgan Tocker says:

    @Anders: Agreed that DRBD is better out of the box, but with third party tools these problems can still be handled:

    * MMM acts as the HA framework. On a failover, it will wait for the relay log to be empty on the slave.

    * For failback, it’s true the master can’t be trusted straight away, but as Peter mentioned mk-table-checksum is your friend ;)

    * The ‘dual lag problem’ is also handled by MMM – it will monitor that the slave is not greater than a certain number of seconds behind the master, and alert you if it is (this problem relates to bullet point #1).

    – Morgan

  10. Hi Morgan,
    Of course, a DRBD/Heartbeat setup is not for every use case. Like you mentioned, the main part of the failover time is the InnoDB recovery. The one minute failover time _implies_ small innodb log files, usually 2 files of around 100 MB. With DRBD/Heartbeat, once it is setup, you know what the recovery time will be.

    With Master-Master, here are my 2 big cons:
    – How do you failover if the surviving master is lagging 2 hours behind?
    – Replication often breaks, while broken, HA is very limited.

    To be honest, I must also add a con on the DRBD side, write performance is slower. Depending on the hardware, it can be significant, a 30% drop is not uncommon. Most of the time, it is due to the networking connection between the two nodes, even if bonding is used to boost the bandwidth.

    Since Master-Master rely on replication and replication as no built-in self-healing, it is my opinion that a Heartbeat/DRBD HA solution is easier to manage, especially in shops where there are no full-time DBA.

  11. Morgan Tocker says:

    @Yves:

    “How do you failover if the surviving master is lagging 2 hours behind”

    You don’t let it lag 2 hour behind. I know this is easier said than done (with replication being single threaded), but this would be one of the requirements for replication based availability.

    “Replication often breaks, while broken, HA is very limited.”

    That’s true. But this comes down to the same argument in my post – “MySQL Replication (with a manager such as MMM) aims to have 100% availability”. In this context availability means purely “Uptime” not redundancy.

    “Since Master-Master rely on replication and replication as no built-in self-healing, it is my opinion that a Heartbeat/DRBD HA solution is easier to manage, especially in shops where there are no full-time DBA.”

    Ease of use is always a difficult/subjective component to measure – especially when it can get in the way of choosing the best (performance-wise) solution. I don’t dispute how helpful the built in functions of DRBD are (online device verification, replication event checksums, fast resync after a node has been offline), and I wish that MySQL Replication had the same features. But it’s fairly easy to emulate most of what you need with tools like maatkit and mmm.

  12. peter says:

    What a fun discussion as usually on this topic :)

    I would mention few things

    1) When you want application to be up often does not only applies to failures but also minimizing scheduling downtime needed. MMM allows much more here – ALTER TABLEs, MySQL Upgrades, Storage Engine changes etc.

    2) Anders: The switch to the slave is not instant – it indeed needs to catch up but in well configured systems (in particular designed to minimize switch time) the lag should be fractions of the seconds.

    3) Anders: Indeed it is possible for master to remain in tact after failure. It would be reasonable to run mk-table-checksum before putting traffic back on it or simply reclone it from the slave using MMM. You also may resolve the changes at the same time.

    4) Yves: The trick is NOT to let your slave to get lagged behind. This indeed requires more discipline – if you use DRBD all overhead is on the Master so it simply can’t handle the load which means it gets instant attention while replication lag may be allowed to appear.

    5) Yves: Yes… Replication is a risk here. The monitoring for replication of course must be in place as well as regular checks with mk-table-checksum. In general for most applications replication can run rather stable.

  13. @Yves

    Please have a look at Tungsten Replicator. We have solutions for the consistency problems that tend to break MySQL replication (your point #2). These include global transaction IDs for safe slave promotion, crash-safe slaves, and built-in consistency checks similar to mk-table-checksum. Also, we will be introducing parallel replication during the next quarter, which is one answer–and a good one–to your point #1 about slave lag.

    Cheers, Robert

    p.s., Not to disagree at all with your conclusions about DRBD. For the right cases it really can be the cat’s miaow.

  14. Apachez says:

    What about those who use a SAN as backend to store the mysqldata?

    Could it be that mysql enterprise edition has something for these compared to the community version?

  15. Morgan Tocker says:

    @Apachez – The only difference I can think of, is that the SAN probably has some very large caches which (if still primed with the correct data) should make the crash recovery and initial warming period faster.

    Using MySQL Enterprise edition is not going to change anything.

  16. Apachez, Enterprise is no different from community. In fact, they are discontinuing this split. See http://blogs.sun.com/datacharmer/entry/the_pursuit_of_openness

  17. Jason Daly says:

    The issue we’ve experienced with MySQL replication is resyncing a master and slave on a database with high write activity. Locking the master and dumping the master database via mysqldump can take 20+ minutes on a database of reasonable size — thus limiting or halting all write activity on the master. This essentially places the master ‘out of service’ which is not ideal for an HA environment.

  18. M says:

    Jason,

    This is a pretty old post but I thought I would mention that mysqldump is rarely a great backup solution beyond a couple tens of GB of data, due to lengthy logical export and even longer restore times. Even with mysqldump, locking is not required if you use InnoDB exclusively and pass the –single-transaction flag.

    If you do have (large) MyISAM tables you can’t convert to InnoDB due to using FULLTEXT/RTREE index, you might use LVM snapshots rather than mysqldump –single-transaction or xtrabackup. (though xtrabackup can copy your MyISAM tables, it holds a lock while copying the MYD/MYI files, after having first copied all InnoDB data lockless)

Speak Your Mind

*