Virident tachIOn: New player on Flash PCI-E cards market

PREVIOUS POST
NEXT POST

(Note: The review was done as part of our consulting practice, but is totally independent and fully reflects our opinion)

In my talk on MySQL Conference and Expo 2010 “An Overview of Flash Storage for Databases” I mentioned that most likely there are other players coming soon. I actually was not aware about any real names at that time, it was just a guess, as PCI-E market is really attractive so FusionIO can’t stay alone for long time. So I am not surprised to see new card provided by Virident and I was lucky enough to test a pre-production sample Virident tachIOn 400GB SLC card.

I think it will be fair to say that Virident targets where right now FusionIO has a monopoly, and it will finally bring some competition to the market, which I believe is good for the end users. I am looking forward to price competition ( not having real numbers I can guess that vendors still put high margin in the price) as well as high performance in general and stable performance under high load in particular, and also competition in capacity and data reliability areas.

Priceline for Virident tachIOn cards already shows the price competition: oriented price for tachIOn 400GB is 13,600$ (that is 34$/GB) , and entry-base card is 200GB with price 6,800$ (there also is 300GB card in product line). Price for FusionIO 160GB SLC ( from dell.com, price on 14-Jun-2010 ) is 6,308.99$ ( that is 39.5$/GB)

Couple words about product, I know that Virident engineering team was concentrating on getting stable write performance in long running
write activities and in cases when space utilization is close to 100%. As you may know (check my presentation) SSD design requires background
“garbage collector” activity, which requires space to operate and Virident card already has enough space reservation to get stable write performance even when the disk is almost full.

As for reliability, I think, the design of the card is quite neat. The card by itself contains bunch of replaceable flash modules, and each individual module can be changed in case of failure. Also internally modules are joined in RAID (it is fully transparent for end user).

All this guarantees good level of confidence in data reliability: if a single module fails, the internal RAID will allow to continue operations, and after the replacement of module – it will be rebuilt. It still leaves the controller on card as single point of failure, but in this case all flash modules can be safely relocated to the new card with working controller. (Note: It was not tested by Percona engineers, but taken from vendor’s specification)

As for power failures – flash modules also come with capacitors which guarantees data delivery to final media even if power is lost on the main host. (Note: It was not tested by Percona engineers, but taken from vendor’s specification)

Now to most interesting part – performance numbers. I took sysbench fileio benchmark with 16KB blocksize to see what maximal performance we can expect.

Server specification is:

  • Supermicro X8DTH series motherboard
  • 2 x Xeon E5520 (2.27GHz) processors w/HT enabled (16 cores)
  • 64GB of ECC/Registered DDR3 DRAM
  • Centos 5.3 2-6.18.164 Kernel
  • Filesystem is XFS formatted withoption ( size=4096, sector size, is very important to have aligned IO requests) and mounted withoption
  • Benchmark: sysbench fileio on 100GB file, 16KB blocksize

The raw results are available on Wiki

And the graphs for random read, writes and sequential writes:

I think very interesting to see distribution of 95% response time results ( 0 time is obviously the problem in sysbench, which has no enough time resolution for such very fast operations)

As you can see we can get about 400MB/sec random write bandwidth with 8-16 threads and
with response time below 3.1ms (for 8 threads) and 3.8ms (16 threads) in 95% of cases.

As some issue here, I should mention, that despite the good response time results,
the maximal response time in some cases can jump to 300 ms per request, and I was told
it corresponds to garbage collector activity and will be fixed in the production release of driver.

I think it would be fair to get comparison with FusionIO card, especially for write pressure case
As you may know FusionIO recommends to have space reservation to get sustainable write performance
(Tuning Techniques for Writes).

I took FusionIO ioDrive 160GB SLC card, and tested fully formatted card (filesize 145GB), card formatted with 25% space reservation (file size 110GB), and Virident card 390GB filesize. It also allows us to see if Virident tachIOn card can sustain write in fully utilized card.

As disclaimer I want to mention that Virident tachIOn card was fine tuned by Virident engineers, while FusionIO card was tuned only by me and I may not have all knowledge needed for FusionIO tuning.

First graph is random reads, so see compare read performance

As you see in 1 and 4 threads FusionIO is better, while with more threads Virident card scales better

And now random writes:

You can see that FusionIO definitely needs space reservation to provide high write bandwidth, and it comes with
cost hit ( 25% space reservation -> 25% increase $/GB).

In conclusion I can highlight:

  • I am impressed with architecture design with replaceable individual flash modules, I think it establishes new high-end standard for flash devices
  • With single card you can get over 1GB/sec bandwidth in random reads (16-64 working threads), and it is the maximal results what I’ve seen so far ( again for single card)
  • Random write bandwidth exceeds 400MB/sec (8-16 working threads)
  • Random read/write mix results are also impressive, and it can be quite important in workloads like FlashCache, where card have both concurrent read and write pressure
  • Quite stable sequential writes performance (important in question for log related activity in MySQL)

I am looking forward to present results in sysbench oltp, tpcc workload, and also in FlashCahce mode.

PREVIOUS POST
NEXT POST

Comments

  1. Mitch Crane says

    Vadim, I see that you compared only one Fusion-io module to several modules of Virident.  Did you consider using an IO-duo from Fusion-io in the test?  Those would more closely match the capacity (320GB) and price point ($11,670), and yields about 1.5 GB/s of bandwidth vs Virident’s 1.2GB/s …  but the real value is in the MLC products.  There, a 640GB duo gives roughly the same performance advantage over Virident, but is 33% cheaper on a $/GB basis ($23/GB).  Let me know if you’d like assistance tuning as well, I can probably help.
     
    Regards,
     
    Mitch

  2. says

    Awesome work, thanks for comparing the two products. I’ve been thinking about demo’ing the Virident cards but hadn’t had the time to request one yet. Competition is great!

  3. Andy says

    Vadim

    What value of innodb_io_capacity did you use?

    Do you think 5.5’s async IO would help performance in this case?

  4. says

    Andy,

    It is sysbench fileio benchmarks, not MySQL oltp,
    so there is no innodb_io_capacity.

    This benchmarks suppose to see maximal throughput and get response time working directly with file.

  5. Mitch Crane says

    Vadim, I see that you compared only one Fusion-io module to several modules of Virident. Did you consider using an IO-duo from Fusion-io in the test? Those would more closely match the capacity (320GB) and price point ($11,670), and yields about 1.5 GB/s of bandwidth vs Virident’s 1.2GB/s . . . but the real value is in the MLC products. There, a 640GB duo gives roughly the same performance advantage over Virident, but is 33% cheaper on a $/GB basis ($23/GB). Let me know if you’d like assistance tuning as well, I can probably help.

    Regards,

    Mitch

  6. says

    Mitch,

    That would be interesting to compare, but I do not have access to FusionIO duo cards right now,
    so I can’t do that experiment.
    Help with tuning would be appreciated.

    How would you propose to use Duo cards ? RAID0?

  7. Andy says

    Vadim,

    Can you talk more about aligned IO request?

    Does “size=4096″ always lead to aligned IO or does the actual value of “size” depend on the actual SSD used?

  8. says

    Andy,

    It was the recommendation from Virident engineers.
    size=4096 means to use 4096 block size for XFS filesystem ( instead of default 512), and, as I understand,
    this will lead that all IO for Virident card will be aligned, but I can’t say about other vendors.

  9. Mitch Crane says

    Hi Vadim,
    Yes, I would absolutely recommend RAID0 on ioDrives (including ioDuo, which will show up as two block devices in Linux). Due to the nature of Flashback protection, our drives effectively (and transparently) maintain a sort of parallelized RAID5 across the NAND devices on a single ioDimm with a dedicated NAND for parity. We can tolerate not only losing (and quarantining) bad blocks, but also losing an entire NAND chip from each bank, with no impact on performance. That’s a very simplified description, but you get the idea. One can be confident striping a duo (or any pair of ioDrives, for that matter) using RAID0.. additionally the stripe will provide approximately linear scaling of bandwidth and IOPS.

    In terms of the comparison, it appeared to me from first look at that device that there are multiple, removable NAND modules populated on one PCIe carrier. The question in my mind is how does the device map the PCIe lanes to the controller(s), and on to the chips to provide the stated bandwidth. It is not clear to me whether they will provide multiple bridged connections back across the bus from multiple controllers in a single card, or if it’s just a single controller fronting all of the NAND. In any case, we’re all very excited to see some healthy competition!

    Regards,
    Mitch (from Fusion-io)

  10. says

    Mitch,

    Honestly, putting RAID0 is somewhat scary for me. Can’t you have controller failures ?

    Also we had FusionIO 320 Duo card failure for one of customers, and card is to be replaced.
    Data still can be read, but with RAID0 that means we need to copy whole content to safe location.

  11. Mitch Crane says

    We’ve typically found controller failures to be about as common as, say, DRAM failures.. that is, after burn-in the probability of such a failure mode is extremely slim. Our manufacturing folks are certainly aware of the need for appropriate burn-in and it’s quite rare that these cases of “infant mortality” go uncaught.

    What we find is that people generally implement business continuity (high availability) by using multiple servers and replication for MySQL. There is 1 master and multiple slaves. If one of the slaves goes down, the other slaves keep supporting the business. If the master goes down, then one of the slaves is promoted to be the master. So keeping that in mind, people like to implement RAID 0 because it gives them better performance per server and higher usable capacity. The various technologies we have designed to architect our ioDrive ensure that the card level failure is at minimum.

    Regards,
    Mitch

  12. says

    Mitch,

    As DRAM failures are rare, they still happens. And in this case if you recommend to use RAID0 I think
    that would be more correct if you add that RAID0 is recommended to use with proper master-slave configuration.

    However there is another catch, as FusionIO allows to handle very high write load, it is quite possible
    that slave will not be able to catch-up with master. In my observation as slave is single threaded, slave
    is probably able to handle 3x-5x less load than master. And in this case there is good question if
    it is worth to setup RAID0 configuration on master if slave not able to catch-up with that performance anyway. It is very workload depended though and topic for additional research.

  13. Tom C says

    Unfortunately with both Fusion-io and Virident you still have a single point of failure. If a PCI card fails, even with two PCI-e cards as RAID 1, you will end up with kernel panic or blue screen! Bottom line, sacrificing availability and reliability for simple performance is not an Enterprise solution. Of course you can have multiple servers with multiple cards, but you just doubled your cost! Great for my home PC if it was bootable and was cheaper than my car but not ideal to risk my datacenter! If it is not hot-swappable is not for Enterprise!

    Oh yeah, try running these cards for a week under any real write benchmark! Spikes in writes will make you think twice!

  14. Vadim says

    Tom C,

    It is all comes to price for performance question.
    If one does not have budget to allow redundancy on servers level then they should go with alternative cheaper solution.

    I am not sure what you mean about spikes in writes during week under high load, if you have real data, I’d like to see it.

  15. Joshua says

    The sequential write performance of Fusion-IO is less than 200MB/s? Why so low? The 16KB block size is for both sequential and random tests? And the threads of a sequential test write sequentially to different files in an interleaving manner? It seems to me basically the same as random tests. I guess that is why all the corresponding sequential and random performance are similar to each other. Sometimes the random one is even better than the sequential one.

  16. Mitch says

    Joshua: in fact sysbench does not interleave its sequential fileio test, so the test is more of a file locking test than an I/o test, when multiple threads are used. I can point you to the code, if you like.. With a little hacking it can be made to interleave.
    Mitch. (from fusion)

Leave a Reply

Your email address will not be published. Required fields are marked *