I was working with the customer today investigating MySQL over DRBD performance issues. His basic question was why there is so much overhead with DRBD in my case, while it is said there should be no more than 30% overhead when DRBD is used.
The truth is – because how DRBD works it does not adds static overhead which could be told as 10% or 80% and you really need to understand how DRBD works as well as how IO system is utilized to understand how much overhead you should expect.
First lets talk what kind of IO you performance you care about while running MySQL over DRBD. Your reads are going to be serviced from local hard drive and it is only writes which suffer overhead of DRBD.
If you’re using MySQL with Innodb (and running MyISAM with DRBD makes little sense anyway) you will have to care about background random IO coming from buffer flush activity – which is typically not latency critical and rarely the problem and log writes, which is critical and latency sensitive.
What many people do not realize is MySQL has to deal with multiple logs and perform multiple flushes if operating in maximum safety configuration, which is what DRBD users often prefer as if you can avoid transaction loss why bother with DRBD at all and not use MySQL replication instead.
So assuming you have innodb_flush_log_at_trx_commit=1 and sync_binlog=1 you will have 4 “sync” operations in MySQL 5.0 – there is event written to the binary log to “prepare” XA transaction, when there is record in Innodb log to commit it and finally there is record in Binary log to commit transaction followed by commit transaction in Innodb storage engine.
Moreover these operation could cause (depending on file system and configuration) more synchronous IO operations – MySQL binary logs are not preallocated as Innodb logs are which means both Data AND metadata have critical changes on each binary log write and both have to be synced to avoid data loss. Though in case of “data journaling” this can be done with single log write with actual modifications performed parallel.
Anyway the point is there is a lot of synchronous writes which are fully serialized because group commit in Innodb was broken in 5.0 and it is still not fixed in 5.1 to date. So this is the access pattern which is often going to define your MySQL on DRBD Performance.
Lets now see how DRBD works so we can analyze how much overhead we should expect. In the case of single outstanding synchronous request it is pretty easy.
When request goes to DRBD device in additionally to be performed locally it has to be performed on remote device, which means sending the block over network, executing it on the remote node and responding with ACK, assuming the DRBD is configured with maximum durability settings.
So overhead can vary a lot depending on the speed of the disk subsystem and network.
If you do not have BBU on disk when you will be able to do up to 200 serialized synchronous IO operations per second, meaning each operation will take about 5000 microseconds. At the same time using gigabit network will give you round trip size to send 4096 bytes of data (typical block size for filesystem) and return ACK packet will be 200 microseconds.
With such case even considering extra overhead besides network IO we’re speaking about 300 microseconds vs 5000 microseconds and DRBD overhead can be well below 10%
The problem is however such configuration will likely have extremely poor performance because of amount of synchronous operations required – which we counted to be 4 per transaction commit, or could be more than 6 depending on how Filesystem does it job. Rates of 40-50 transactions per second are not encouraging for many applications.
My typical advice – if you want to have things highly durable and performing well at the same time you must have BBU (Battery backed up unit) on your hardware RAID card or something which has same effect, especially as it become pretty cheap these days.
With good RAID as I benchmarked you can be getting over 10000 req/sec in-cache write speed (and this is what a lot of transaction and binary logs are) – in this case you have about 100 microsecond for request execution while DRBD overhead remains at 200-300 microseconds for requests. This means DRBD slow down things 3 times+ writes.
There is nothing wrong with DRBD, it is great piece of software and it is not what it is running slow but what the relative performance between system components is what is causing such overhead.
If you’re looking for less overhead for DRBD with fast storage (ie BBU) you should be looking at low latency network communications.
It would be actually rather interesting to see DRBD to have direct support for Infiniband or Dolphin interconnect sockets which are getting low cost these days and which could offer significant performance improvements for DRBD. Though you should be already be able to use these using standard TCP/IP communication, which already makes things a lot faster than 1Gb Ethernet.
Though this is only a theory – I have not had a chance to play with DRBD on this kind of networks yet.
As a summary in this case we investigated the system and the “surprising” overhead of DRBD perfectly matched system components performance capacity.
As a lesson – do not take the overhead as a number but learn where this overhead comes from so you can find how much would it be for your system.