September 17, 2014

Clustrix benchmarks under tpcc-mysql workload

I’ve been working with Clustrix team for long time on the evaluation of Clustrix product, and this is the report on performance characteristics of Clustrix under tpcc-mysql workload.

I tested tpcc 5000W (~500GB of data in InnoDB) on Clustrix systems with 3, 6, 9-nodes and also, to have base for comparison, ran the same workload on HP ProLiant DL380 G6 powered by Fusion-io card, and on SuperMicro server powered by 7 Intel SSD 320 cards (this server is equal to hardware that Clustrix uses for its nodes).

The full report is available on our page with whitepapers, and in this post I would like to highlight the most interesting points.

The chart with comparison of all systems ( results in throughput per 10 sec, more is better)

So my conclusions from this benchmark:

  • Clustrix shows very good scalability in the high concurrent workload by adding additional nodes.
    In fact the throughput improves more than by 2 times (3 times) by doubling (tripling) amount of nodes. This is possible Clustrix automatically distributes data around new nodes, and data/memory ratio decreases, which allows to achieve better throughput per node.
  • Clustrix is able to handle such complex workload as tpcc, and automatically distributes load between nodes despite multi-statements transactions and foreign key relations.
  • For a workload with a small number of threads, Clustrix does not perform as well as the system with Fusion-io cards.
  • We also should take into account that Clustrix automatically provides high availability, maintaining redundant information on each node. Other systems in comparison are not fault- or crash-tolerant.

So looking on the results, Clustrix might be not your first choice for single-thread or low concurrency workloads from the performance point of view, but consider other factors such as high availability and transparent auto-rebalancing out-of-the-box. For high concurrent workloads, Clustrix provides great performance, and if you need better throughput, just add more nodes.

The other factor which would be interesting to compare, but I did not do that in this research, is the total cost of system. I need to ask Clustrix how cost of 3,6,9 nodes system is compared to other systems in comparison.

Standard dislaimer: this post is part of paid evaluation we perform for Clustrix, but is totally independent and fully reflects our opinion.


About Vadim Tkachenko

Vadim leads Percona's development group, which produces Percona Clould Tools, the Percona Server, Percona XraDB Cluster and Percona XtraBackup. He is an expert in solid-state storage, and has helped many hardware and software providers succeed in the MySQL market.

Comments

  1. Andy says:

    Interesting that at high concurrency 7 Intel 320 perform pretty much the same as FusionIO.

    7 320 cost about $2K. How much does that FuisonIO cost – around $10K?

    When you RAID-0 7 SSD, did SATA become the bottleneck?

  2. Is there a HW specification available also of the Clustrix systems as well?

  3. The Clustrix node is mostly off the shelf hardware with a couple extra components in it. It has 7 Intel 320 drives, 48GB RAM, dual 4 core Westmere processors. The extra components are Infiniband for inter-node communication and an NVRAM for very low latency writes with guaranteed durability.

    The hardware is important, of course, but the real magic is in the Clustrix software. The software is what allows it to seamlessly scale a single database to span multiple nodes.

  4. Andy,

    on high concurrency in MySQL what really comes into play is internal locking, and under-laying hardware has less effect.
    I did not look what is bottleneck with 7 Intel 320 SSD drives, as main focus there was on Clustrix systems. However it is worth to mention that it was software RAID0, not hardware RAID.

  5. marrtins says:

    How this compares to Galara replication?

  6. Vadim says:

    marrtins,

    I am going to write detailed blog post about Custrix, in short it is very different from Galera replication.

    Galera is based on MySQL/InnoDB, OpenSource, where each node contains full copy of data.

    Clustrix is proprietary software which uses only MySQL protocol and internally it is absolutely different solution.
    Clustrix does not keep full copy of data on each node, only partial and it automatically re-balances load and data distribution.

  7. marrtins says:

    Oh, thanks for insight! On quickview they looked very similar to me. Looking forward for detailed article.

  8. Very interesting performance numbers. Can anyone comment on the stability of Clusterix overall today? I am basing this on second-hand knowledge from colleagues who tested it, but my understanding was it still has many bugs to iron out. In the end we chose not to use it due to bugs. This was done about a year ago, however.

    Cheers,

    Tim

  9. Dan Pollack says:

    We have had Clustrix in production for quite some time. It lives at the core of an internal production storage service. It performs well, is scalable, and is stable. Check out the white paper here – http://www.clustrix.com/uploads/documents/Clustrix_Use_Case_AOL.pdf
    If you have questions and I’d be happy to talk about our experience with Clustrix.

  10. Pawel Sidoryk says:

    Hello,
    I came across this post because I would like to learn more about Clustrix. The comparison of Clustrix to MySQL is very interesting but I have doubts regarding one detail of your testing methodology. You used 4 clients to test Clustrix and you used only 1 client to test MySQL. Why ? This is very strange since I think that actually the client got saturated in the MySQL case, not the MySQL server. Or maybe I am wrong and there was a reason to use 4 clients to test Clustrix and only 1 client to test MySQL ?
    Could you please give an evidence that it was really the server that was saturated in the MySQL test and that the client was NOT saturated ?

Speak Your Mind

*