September 2, 2014

FlashCache: tpcc workload with FusionIO card as cache

This run is very similar what I had on Intel SSD X25-M card, but now I use FusionIO 80GB SLC card. I chose this card as smallest available card (and therefore cheapest. On Dell.com you can see it for about $3K). There is also FusionIO IO-Xtreme 80GB card, which is however MLC based and it could be not best choice for FlashCache usage ( as there high write rate on FlashCache for both reading and writing to/from disks, so lifetime could be short).

Also Facebook team released WriteThrough module for FlashCache, which could be good trade-off if you want extra warranty for data consistency and your load is mostly read-bound, so I tested this mode also.

All setup is similar to previous post, so let me just post the results with FlashCache on FusionIO in 20% dirty page, 80% dirty pages and write-through modes. I used full 80GB for caching ( total size of data is about 100GB).

Conclusions from the graph:

  • with 80% dirty page we have about 4x better throughput ( comparing to RAID).
  • Write-through mode is about 2x gain, but remember that load is very write intensive and all benefits in write-through mode come only from cached reads, so it is pretty good for this scenario

On this post I finish my runs on FlashCache for now and I think it may be considered for real usage, at least you may evaluate how it works on your workloads.

About Vadim Tkachenko

Vadim leads Percona's development group, which produces Percona Clould Tools, the Percona Server, Percona XraDB Cluster and Percona XtraBackup. He is an expert in solid-state storage, and has helped many hardware and software providers succeed in the MySQL market.

Comments

  1. peter says:

    Vadim,

    What happens at about 600 to 1200 seconds ? It looks like things start relatively stable when there is some strange period with performance results all over the place much higher and much lower the average line.

    Regarding benchmarks I think the test is not really close to the sweetspot in the real life. If you have 100GB database you probably would not get 80GB flash as a cache – you would likely get enough to fit in.

    I would be interested to see 500GB with 80GB cache (or to scale it down 100GB data, 20GB cache and 4GB buffer pool) to see if it makes any difference.

    It is also very interesting to get some read mostly workloads as a lot of web applications to show how well write through mode operates.

  2. Patrick Casey says:

    Why not just buy a whole mess of physical memory? For the price of an 80G Fusion IO card, I can easily buy 10 8G sticks which are going to perform better for random access that the Fusion IO card does. Sure, its not persistent over a restart in that case, but neither is an overlfow cache like this anyway.

  3. Vadim says:

    Patrick,

    More memory is way to go.
    However many boxes have only 6-12 memory slots, which even with ( very expensive) 8GB lines gives you 48-96GB of memory,
    and that may be not enough to cover your working dataset.

  4. Patrick Casey says:

    I hear you vadim, and you’re right in that some servers are clearly not going to take 80G of ram, but I think if you’re at a price point where you’re looking at dragging in a fusion IO card, you can seriously start talking about upgrading most rack mount servers for that much money.

    You can get a mid-range dell rack mount like an r710 with 64G of RAM for under $5k retail (sure it’ll be more like $6500 when you drop a second CPU and some decent drives in it), but that’s list. Any decent bulk sales agreement and you get that same rack mount, fully tricked out, for less than $3500, and that’s more or less the price of an 80G fusion IO card.

    Point I guess is that using an expensive Fusion IO card like this strikes me as a weird way to spend my hardware budget. I’d just replace my current server with a bigger rack-mount with more memory on it.

  5. Patrick Casey says:

    Edit: Think I’m beeing a wee bit optimistic on my rack mount pricing, so Vadim’s a bit more right than I gave him credit for :)

    The dell I outlined above, with a good raid controller, 6 15k drives, and a DRAC card is going to run you between 4500 and 5500 depending on your dell discount level and volume. 3500 – 4200 and you can it with one disk and no DRAC card, but that’s not a reasonable config.

  6. Vadim says:

    Patrick,

    So I mean it should be balanced decision on budget/performance, there is no clear winner.
    And I agree that PCI-E cards are overpriced as for me, so I am looking for more players and price drops
    in coming years.

  7. Andy says:

    Is FusionIO the only supplier of PCI-e SSD? If so, what’s topping other suppliers from entering this market?

    Is there any legit reason why a PCI-e SSD should be so much more expensive than a regular SATA SSD like X25-E? Or is this just a case of “they can charge whatever they want” because of no competition?

  8. Andy,

    I’m pretty sure OCZ offers PCI-e SSDs with their Z-Drive series as well. Last I’d checked, all their drives cost upwards of $900.

  9. peter says:

    Patrick,

    I would see it as concept test not exact numbers. You may have the box with 64GB of memory which will be still small portion of your total database size and getting say 320GB FusionIO card for cache may be quite a gain. Also depending on your workload you my be able to get cheap gains using SSD disks rather than cards.

    The other benefit of Flash is it is persistent unlike DRAM the more memory you have the longer warmup will take. Also no matter how much RAM you have you still have to do writes which can be expensive.

    I think 15K hard drives you’re mentioned are getting squeezed in the middle – if you need high performance SSDs is way to go. If you need more space at lower cost larger slower hard drives will do it for you.

  10. Vadim says:

    Andy,

    LSI/Seagate plans to release PCI-E card later this year
    http://storage.networksasia.net/content/lsi-reveals-6-gbps-nand-flash-pcie-card

    I think the current price for cards is calculated by next factors
    - no competitors
    - consolidation factor. with single card you may replace 2-5 servers, and the single card is cheaper in this case.

  11. Ryan White says:

    Speaking as someone who has spent a great deal of money on FusionIO hardware in the last 2 years (we started with them in the early alpha days when the cards/drivers were far from stable), and has done endless amounts of testing of various products and constant keeping up with new products on the market to make sure there’s not something better, I can say that FusionIO blows the pants off of anything else on the market from an ROI perspective.
    The solution is extremely fast, extremely reliable, and extremely tunable. The reason for the cost of the cards is that half of their cost (if not more) is the development of the software to run the cards (the driver, the groomer, etc). There’s a lot more to designing a PCI-E NAND flash card that will run stable in a 24×7 multi-thousand-transactions-per-second environment than throwing a bunch of flash and a controller on a PCI-E card. Flash is very complicated to run stable at high write cycles 24×7 and not have it slow down or crap out on you.

    Andy – You asked why nobody else is in the market. The answer is because it’s not easy to build this thing, and for anyone entering the market, FusionIO is at least 1.5 years ahead on development (and that doesn’t just mean performance). We’ve been able to make incredible progress on shrinking our MySQL environment, getting rid of piles of SAN arrays we spent $50k-$500k on just for IOPS performance, with just 2x160GB cards per system (and we only do 2 so that we can stripe them to fit our entire dataset on them, otherwise 1 would be plenty.)

  12. Andy says:

    Ryan,

    Thanks for the info.

    Did you test X25-E/M in production? Just wondered how they would hold up after some aging. From benchmark (like this: http://www.mysqlperformanceblog.com/2009/05/01/raid-vs-ssd-vs-fusionio/) they seem to represent a even better ROI then FusionIO – 50-60% of FusionIO’s performance at 10-20% of its cost. Don’t know if they could sustain that performance after they’re no longer new.

    Also isn’t striping your data over 2 cards a bit dangerous? Has that caused any data loss?

  13. Ryan White says:

    I forgot to mention on the topic that I’m *extremely* excited to test FlashCache soon. We have several DB’s that are too large to fit on an affordable amount of FusionIO storage, but that probably have *working* sets that would fit in the FusionIO card in a cached manner.

    *drools*

  14. Andy says:

    Ryan,

    Did you test X25-E/M in production? Percona’s benchmarks showed them to have even better ROI than FusionIO – 50-60% of FusionIO’s performance at 10-20% of its cost. But I wondered if they could maintain their performance level when they’re no longer new.

    Isn’t it dangerous to stripe data on 2 FusionIO cards? Has that caused any data loss?

  15. Ryan White says:

    Andy – I actually do consulting on the side in addition to my day job, and worked with a couple small shops that thought they could do better than FusionIO with Intel X25-E’s on their MySQL systems. They tried several configurations (usually 4 drives, they tried hardware RAID, software RAID, RAID 1+0 and RAID 0), but in all cases the RAID controller or software couldn’t predict when the drives would run out of free LEB’s (the free NAND cells you need to write new data), and a drive would drop out of the array due to timeouts when it goes into emergency grooming mode and crash the array. Most times it turned into serious corruption or total data loss. This is where the Intel and other drives are a complete black box. You have no idea what they’re doing internally, so you can’t predict or monitor anything.

    With the FusionIO cards + drivers, we do constant monitoring of estimated remaining life (based on number of writes to each cell), free LEB’s, and other important data so we can predict any bad behavior or data loss. You can also down-format the usable NAND flash on the FusionIO cards to improve write performance during emergency groomer maintenance when the card is out of free LEB’s, which makes the card perform better during these situations (all flash devices slow down by orders of magnitude during LEB exhaustion unless you build in a ton of intelligence and a very large LEB pool to move around in). In fact, we have a few systems that run so hot at writes 24×7 that after a week of uptime, we always out pace the groomer, and we’re always out of free LEB’s, and we don’t notice any performance degradation. In those systems we down format the cards by 25-35% to get that performance during LEB exhaustion.

    As for redundancy and stripping FIO cards, outside of a few hardware failures on the cards in early manufacturing, and the driver instabilities that have been worked out that we went through 2 years ago, our systems run at 2-10k IOPS/sec 24×7 100% stable. We use all HP Proliant servers with our FusionIO cards. Every once in a great while we’ll see a server crash, usually unrelated to FusionIO, like a kernel panic or something, and the FusionIO cards will do their unsafe shutdown scans (7 minutes on first boot per 160GB of FusionIO), and then we’re able to run fsck just fine and bring the file system and MySQL back online.

    Because we stripe the data across two cards with no redundancy in the system, we used a percentage of the money we saved to just buy more systems with more FusionIO cards and add more MySQL replication slaves =) So, IE, instead of 2 servers each with $200,000 storage arrays, we have 6 servers each with 2x160GB FusionIO cards, and we still saved money in the end, not to mention the power, rack space, cooling, etc.

    Are there other competitors trying to get into the market? Yes. Are there applications where cheaper SSD solutions (IE, the mustang) will probably make better monetary sense than FusionIO (IE, the ferrari)? Yes. Competition is good. And it all depends on your application. In our case, we run very high transaction rate databases that would knock just about every other flash solution on the market on it’s rear end.

  16. Ryan White says:

    Oh, I forgot to mention something VERY important to an Enterprise customer like us: Support. I’ll quickly discuss two points:

    1) FusionIO has been impeccable at support. If a cards fails (as I mentioned there were some early manufacturing defects), we page them in the middle of the night and they find a way to get us a replacement card (we’ve had sales reps drive to meet us somewhere with a replacement card).

    2) After being in the industry a long time, I demand single vendor support. I no longer am willing to buy whitebox hardware. We’ve had too many issues that require too much time and resources to diagnose parts made by different folks (RAID card, drive, backplane, etc) that isn’t integrated by a single vendor into a single hardware monitoring/diagnostics package. So, we buy HP so we can sleep at night. And that means the Intel and other SSD’s are out because HP isn’t going to support us taking their hot-swap drive sleds and screwing in SSD’s and expecting the HP RAID controller to properly support it.

  17. Eric Stone says:

    Guys,

    Update from the field, we have gone through extensive testing 08-09 with FusionIO and even Violin Memory in terms of SSD / RAM based systems for our very large InnoDB-based database (dataset about 100 GB).

    We have tested extensively with these products, and eventually settled on OCZ Z-Drive R2′s which we put into production last week.

    Our testing revealed not much difference between the three vendors, what we really discovered was that InnoDB’s built in deficiencies / problems at scale are the slowest common dominator in the entire equation.

    So there’s no reason to spend the big bucks on the Fusion IO — the OCZ R2 is 1/3 of the price and well worth it.

  18. Vadim says:

    Eric,

    With build-in InnoDB you really would not see difference between SSD cards.
    To see improvement you should try InnoDB-plugin, or XtraDB for even better performance.

  19. Ryan White says:

    Eric, how high-rate is your DB? Our dataset is about 200GB, runs at 10,000+ transacations/second, and we found the exact opposite results. We always outpace the groomer on any SSD/Flash product, so the performance with no free LEB’s is what matters. Did you test long-term performance? How does the OCZ product perform under LEB exhaustion?

  20. Andy says:

    Ryan,

    what InnoDB config you use to get sustained 10K+ transactions/sec? Are you using XtraDB?

    Do you put transaction logs on a separate HDD or do you put it on FusionIO as well?

    Do you use binlog? Group commit is reportedly to be still broken when binlog is enabled (http://kristiannielsen.livejournal.com/12254.html) so don’t know if it’s possible to get sustained 10K+ trans/sec if binlog is enabled.

Speak Your Mind

*