October 21, 2014

Percona Server scalability on multi-cores server

We now have hardware in our test lab that represents the next generation of commodity servers for databases. It’s a Cisco UCS C250 server, powered by two Intel Westmere CPUs (X5670 @ 2.93GHz). Each CPU has 6 cores and 12 threads. The most amazing part is the amount of memory. It has 384GB of RAM, which is actually more space than the disks contain.  The disks are 270GB in total, with the underlying configuration RAID10 over eight 2.5″ 15K RPM disks. To make the system even more powerful, I put a FusionIO 320GB SLC card in the PCI-E slot. Here is a link to the box specs.

The server was generously provided by Cisco Systems, Inc.

So, obviously I’m anxious to see how Percona Server with XtraDB scales on this hardware, and you can expect a series of benchmarks. An especially interesting topic is what we can get from “threads”, as there are only 12 “real” cores, with each core having two “threads”.

So, I took Percona Server 5.1.47-11.2  and ran the sysbench oltp read-only and read-write benchmarks using from 1 to 32 threads. The database size was 100 million rows ( about 23GB of data). (Starting with Percona Server 5.1.49-12.0, we are going to provide regular builds dedicated to the Cisco UCS platform)

The full results are available on Wiki, and the graphical representation follows:

You can see from the graph that it scales pretty well even up to 24 threads, despite the fact that half of them are not real CPU cores. For up to 10 threads, the scale factor is quite impressive: it is 8.2 for read-only and 9.2 for read-write.  (I calculate the scale factor as the result with 10 threads divided by the result with 1 thread.) Above 10 threads, the rate of increase is not as large, and for 24 threads we have a scale factor of 12.6 for read-only and 13.3 for read-write.

Of course, it will be interesting to compare these results with the latest MySQL 5.5 releases, and especially in cases with the number of threads is > 100. I’m going to do these comparisons in my next round of testing.

(Edited by Fred Linhoss)

About Vadim Tkachenko

Vadim leads Percona's development group, which produces Percona Clould Tools, the Percona Server, Percona XraDB Cluster and Percona XtraBackup. He is an expert in solid-state storage, and has helped many hardware and software providers succeed in the MySQL market.

Comments

  1. Shane says:

    I’ve been setting up a server at the moment with a pair of the same x5670 processors, 24Gb RAM, 22x10krpm raptors on adaptec raid. Seeing good readwrite sysbench results (3360 r/w tps @ 24 threads). Interestingly though my readonly sysbench results are only a fraction of my readwrite (650 tps @ 24 threads). Weird huh.

  2. Andy says:

    384GB of RAM in a 2U commodity server is pretty awe-inspiring. Didn’t know Cisco makes servers too. How much does a server like that cost? What can Cisco do that Dell or Supermicro can’t?

    By the way why make a build dedicated to Cisco servers? Aren’t they just regular Linux boxes running CentOS?

  3. Wow, that is big hardware. Now you need to run tests with a serveral hundred GB InnoDB buffer pool. The option you added to preserve the buffer pool across restarts should be great for that case.

  4. Nils says:

    Andy, here is a basic description of the technology used to increase the amount of RAM: http://www.cisco.com/en/US/prod/collateral/ps10265/ps10280/ps10300/white_paper_c11-525300_ps10279_Products_White_Paper.html

    It’s hard to get some decent information though, most of it is marketing.

  5. Jeffrey Gilbert says:

    Who does better with threading? Intel with it’s fast multicore chips and fake cores via HyperThreading or AMD with it’s MagnyCours chips with a ton of actual cpu cores runnings slightly lower speeds? I’d be interested in knowing the benchmarks there with percona builds as would many system architects out there I imagine.

  6. Great propose, i think this evolution is very necessary to Mysql development

  7. Back in the day when I worked with Sun customers who liked using Sparc and Solaris, I both heard and saw with my own eyes that an additional HW thread adds about 20% performance compared to a core with just one thread. This seems to be more or less the ratio for these Intel CPU’s too.

    Doing benchmarks becomes trickier, or rather reading the graphs, since at some stage in the curve you hit the point where the OS tells you there are more CPUs to utilize, but in reality it’s not a full core, just more threads. So the curve doesn’t look as good from 12-24 threads/CPUs, but actually the performance is increasing pretty much as you would expect it to. To expect a 1-to-1 performance increase beyond the 12 first cores would be physically impossible, as there aren’t 24 independent cores.

  8. Vadim says:

    @Andy,

    They cost is high. I do not have final numbers on hands, but only memory 48 dimms , 8GB each (~300$/dimm) will give you 15000$.
    So price should be in X0.000 range where 2 <= X <= 9 .

    I do not see problem why Dell or SuperMicro can't propose the similar configuration, may be high memory boxes are not interesting yet for them, or it requires some good engineering efforts to put 48 dimm slots.

    About specific optimization: it is much easier to have optimized version if we have defined restriction – in this case
    we can do optimization for Intel CPU, i.e. SSE4.2 instruction. Also we can get benefits for InnoDB if we know that we operate on 128+ GB of memory.

  9. Seen on Twitter: “384G ought to be enough for anybody.”

  10. nate says:

    Vadim – Other’s don’t offer it because the 48 dimm slots is achieved via a custom ASIC Cisco has as a result of a company they bought a few years ago. From what I understand the premium associated with that is pretty high but I haven’t priced them out.

    With the launch of the Xeon EX series(8 cores), the latest Cisco equipment does not use the memory extender technology that they have on the 4/6-core systems, I assume they didn’t port the tech to the new Intel stuff. EX does support higher memory numbers than the older Intel gear though.

    Myself I’m a fan of the HP BL685c G7 blade 48 cores, 32 DIMMs, and of course HP has Fusion IO expansion boards for their blades. Fully loaded chassis with 384 cores and 4TB of memory comes in at about 6,700W in 10U.

  11. Maybe you can get one of those HPs and let Vadim run some benchmarks on it ;-)

  12. nate says:

    hoping to get some soon! Though won’t have fusion IO on them, looking to get a small 64x15k RPM drive 3PAR F400 to drive my I/O initially. We have about 45 mysql databases, though very low load at the moment, lots of caching in the application layer.

  13. Gricey says:

    Dell do boxes with up to 1TB ram – check out the R910 (it’s 4U)
    http://www.dell.com/us/business/p/poweredge-r910/pd

    They also have a number of 2U boxes that take up to 512GB, e.g. the R810 / R815.

    Gricey.

  14. István Tóth says:

    To the 1st poster:

    I had the same problem, you MUST disable query cache on multicore systems, to get reasonable sysbench (and in my case, real workload) performance.

Speak Your Mind

*