October 25, 2014

Looking for RAID Controller without Battery Learning problems ?

A lot have been written about Battery Learning Cycle problems and its impact to MySQL Performance. Here are couple of links (1,2). It is good to see though there are some controllers coming out which solve this problem, namely Adaptec 5Z series controllers (Z stands for Zero Maintenance). This is not quite new technology they have appeared on market about 2 years ago but it is just now we can state they have been working well for number of customers.

As Explained in this PDF ZMCP (Zero-Maintenance Cache Protection) does not use battery but instead Capacitor plus flash. Capacitor provides enough energy for contents of DRAM to be flushed to supplied Flash module. This solution helps not only with battery discharge/learn cycle many Battery Backup Unit (BBU) based controllers help but also gives you a lot longer time to recover the data as it does not depend on battery any more.

The models which are known to work are Adaptec 5405Z, Adaptec 5445Z and Adaptec 5805Z which have their main difference in amount of internal and external hard drive connections they provide.

Some Newer Adaptec Controllers also offer using SSD as a cache with technology called MaxIQ. Namely Adaptec 5805ZQ might be of interest as it combines both ZMCP and MaxIQ technologies. I have not seen much use of this kind of cache with MySQL in practice though. If you’re using it please share your experiences. if you’re looking for more information this PDF might be good place to start.

For the reference here is how configuration output information looks for such controllers:

As you can see there is no information about BBU here instead it is replaced with status of ZMM (Zero Maintenance Module). It also reports information about MaxIQ which is confusing as MaxIQ is not supported by this controller per technical specs.

If you would like to check the cache status it should be in the logical drive information:

This tells us there is one RAID10 module which is currently operating in Write-Back cache mode now (Write-Cache mode line) and it is set into “Enabled (write-back) when protected by battery/ZMM”, which is setting you want to have as you want to ensure it goes to write through mode if ZMM fails.

If you have any experience with these controllers or other controllers using similar technology please feel free to share.

About Peter Zaitsev

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.

Comments

  1. Nils says:

    Looks like the controllers are also a bit better to manage on the command line than back in the dark ages of SCSI RAID. I mostly focused on Areca controllers because I don’t want to deal with the inferior command line tools most companies provide (just think about the LSI tools, they are horrible!).

    Will you do a benchmark/evaluation soon?

  2. Andrew Sitnikov says:

    Dell PERC with NV Cache ?

  3. Andrew,

    Which model are you referring to exactly ? When I looked at Dell controller lineup as here it looks like H700/H800 both come with BBU unit not capacitor http://www.dell.com/content/topics/topic.aspx/global/products/pvaul/topics/en/us/raid_controller?c=us&l=en&cs=555

  4. Joe says:

    We have a lot of boxes running the 5405Z and I’ve been very happy with them. Performance is good and they don’t seem to cause any trouble even when full of SSDs. And I do like not having to worry about the battery learn cycles our Dell/Lsi controllers are excellent at wrecking us with.

    Have not had any issues and I prefer adaptec’s arcconf tool over MegaCli/omreport for the Dell controllers as you can get all the info you need (battery/ZMM, individual disks, overal health, cache mode, rebuild progress, etc) with one command.

    We’ve specifically got a lot of Percona 5.1 and 5.5 installs running on these.

  5. George says:

    LSI has followed suited with their Cache Vault series http://www.lsi.com/channel/marketing/Pages/LSI-MegaRAID-CacheVault-Technology.aspx as well

    “CacheVault technology transfers the contents of the DRAM cache to NAND flash using power from the supercap module in the event of a power or server failure. With a traditional battery backup unit, after 72 hours without restored power, the cached data is lost. However, CacheVault technology safely stores the contents of DRAM on NAND flash for up to three years.”

  6. Joe,

    Great to hear it works well for you. Thanks for considering Percona Server !

    George – It is pretty annoying though (though classical marketing) to see essentially same technology getting many marketing names confusing all us. Previously it was easy. If controller supports RAID5 you know exactly what it means, Write Back cache with BBU also was called exactly that. I hope in couple of years when most of vendors will have it they will come up with some generic name which will be vendor independent.

  7. Ragu Bhat says:

    Pete

    Tnx for the ZMCP pdf link! As you have noted, the data write-back should never be battery dependent. Issues that crop up due to bad data write-backs are very hard to zero in and fix. I had this 3ware BBU card which was a real PITA, took sometime to figure out that the battery discharge-charge learn cycle was causing a big mess-up.

    Give us more of such write-ups focussing on cards/hardware too, real help just in time.


    Help stop Net censorship in India
    Censorship of Google, Yahoo, Facebook and Twitter has to be stomped down!

  8. Yes, we’ve been hit by this even on H700, though I think there is an NV option on the H700. We can use API calls to schedule and monitor the learn cycle (despite Dell saying you can’t), but still a pain.

    But the H710 used in Rx20 (R420, 620, 720, etc.) seems to be NV Flash and no battery, so we’ll see how well that works as we are just receiving them now. Hopefully this problem is thus gone forever.

  9. Andy Agarwal says:

    From what i’ve learned the H710 controller still uses a battery backed cache and uses NV Flash to save the contents of the cache upon power failure. The only advantage that NV provides is upto 10 years of cache storage (vs 72 hours with just battery).

    Doesn’t seem like Dell provides a product like the Adaptec 5Z series. Here is the info i received from Dell – http://en.community.dell.com/support-forums/servers/f/906/p/19495387/20316449.aspx

  10. Joe says:

    Comment drew my attention to this post. One thing I’ll mention is that those Adaptec 5Z controllers have ok performance and the no-battery part is nice, but we’ve had a lot of issues with them since my last comment way back in January 2012. Specifically, hot swap is extremely unreliable failing perhaps 1/2 the time. This is based only on my observations but for what it’s worth, that’s across hundreds of servers and dozens of hot swap attempts. I’ve never seen anything like it before.

    We’re about to roll out a bunch of boxes on HP’s p420i with 2GB cache and capacitor. Performance is excellent and hot swaps have worked reliably across dozens of tests. We’ll see how they hold up once a bunch of them are out in production.

  11. M says:

    http://lists.us.dell.com/pipermail/linux-poweredge/2012-June/046470.html
    transparent learn cycle

    NOTE: Virtual disks stays in Write Back mode, if enabled, during transparent learn cycle. When the TLC completes, the controller sets the next TLC to +90 days.
    TLC Time Frame
    The time frame for completion of a learn cycle is a function of the battery charge capacity and the discharge and charge currents used. For PERC H710 or H810 cards, the expected time frame for completion of a learn cycle is approximately seven hours.

Speak Your Mind

*