Buy Percona ServicesBuy Now!

Percona Server scalability on multi-cores server

 | September 29, 2010 |  Posted In: Benchmarks, MySQL


We now have hardware in our test lab that represents the next generation of commodity servers for databases. It’s a Cisco UCS C250 server, powered by two Intel Westmere CPUs (X5670 @ 2.93GHz). Each CPU has 6 cores and 12 threads. The most amazing part is the amount of memory. It has 384GB of RAM, which is actually more space than the disks contain.  The disks are 270GB in total, with the underlying configuration RAID10 over eight 2.5″ 15K RPM disks. To make the system even more powerful, I put a FusionIO 320GB SLC card in the PCI-E slot. Here is a link to the box specs.

The server was generously provided by Cisco Systems, Inc.

So, obviously I’m anxious to see how Percona Server with XtraDB scales on this hardware, and you can expect a series of benchmarks. An especially interesting topic is what we can get from “threads”, as there are only 12 “real” cores, with each core having two “threads”.

So, I took Percona Server 5.1.47-11.2  and ran the sysbench oltp read-only and read-write benchmarks using from 1 to 32 threads. The database size was 100 million rows ( about 23GB of data). (Starting with Percona Server 5.1.49-12.0, we are going to provide regular builds dedicated to the Cisco UCS platform)

The full results are available on Wiki, and the graphical representation follows:

You can see from the graph that it scales pretty well even up to 24 threads, despite the fact that half of them are not real CPU cores. For up to 10 threads, the scale factor is quite impressive: it is 8.2 for read-only and 9.2 for read-write.  (I calculate the scale factor as the result with 10 threads divided by the result with 1 thread.) Above 10 threads, the rate of increase is not as large, and for 24 threads we have a scale factor of 12.6 for read-only and 13.3 for read-write.

Of course, it will be interesting to compare these results with the latest MySQL 5.5 releases, and especially in cases with the number of threads is > 100. I’m going to do these comparisons in my next round of testing.

(Edited by Fred Linhoss)

Vadim Tkachenko

Vadim Tkachenko co-founded Percona in 2006 and serves as its Chief Technology Officer. Vadim leads Percona Labs, which focuses on technology research and performance evaluations of Percona’s and third-party products. Percona Labs designs no-gimmick tests of hardware, filesystems, storage engines, and databases that surpass the standard performance and functionality scenario benchmarks. Vadim’s expertise in LAMP performance and multi-threaded programming help optimize MySQL and InnoDB internals to take full advantage of modern hardware. Oracle Corporation and its predecessors have incorporated Vadim’s source code patches into the mainstream MySQL and InnoDB products. He also co-authored the book High Performance MySQL: Optimization, Backups, and Replication 3rd Edition.


  • I’ve been setting up a server at the moment with a pair of the same x5670 processors, 24Gb RAM, 22x10krpm raptors on adaptec raid. Seeing good readwrite sysbench results (3360 r/w tps @ 24 threads). Interestingly though my readonly sysbench results are only a fraction of my readwrite (650 tps @ 24 threads). Weird huh.

  • 384GB of RAM in a 2U commodity server is pretty awe-inspiring. Didn’t know Cisco makes servers too. How much does a server like that cost? What can Cisco do that Dell or Supermicro can’t?

    By the way why make a build dedicated to Cisco servers? Aren’t they just regular Linux boxes running CentOS?

  • Wow, that is big hardware. Now you need to run tests with a serveral hundred GB InnoDB buffer pool. The option you added to preserve the buffer pool across restarts should be great for that case.

  • Andy, here is a basic description of the technology used to increase the amount of RAM:

    It’s hard to get some decent information though, most of it is marketing.

  • Who does better with threading? Intel with it’s fast multicore chips and fake cores via HyperThreading or AMD with it’s MagnyCours chips with a ton of actual cpu cores runnings slightly lower speeds? I’d be interested in knowing the benchmarks there with percona builds as would many system architects out there I imagine.

  • Back in the day when I worked with Sun customers who liked using Sparc and Solaris, I both heard and saw with my own eyes that an additional HW thread adds about 20% performance compared to a core with just one thread. This seems to be more or less the ratio for these Intel CPU’s too.

    Doing benchmarks becomes trickier, or rather reading the graphs, since at some stage in the curve you hit the point where the OS tells you there are more CPUs to utilize, but in reality it’s not a full core, just more threads. So the curve doesn’t look as good from 12-24 threads/CPUs, but actually the performance is increasing pretty much as you would expect it to. To expect a 1-to-1 performance increase beyond the 12 first cores would be physically impossible, as there aren’t 24 independent cores.

  • @Andy,

    They cost is high. I do not have final numbers on hands, but only memory 48 dimms , 8GB each (~300$/dimm) will give you 15000$.
    So price should be in X0.000 range where 2 <= X <= 9 .

    I do not see problem why Dell or SuperMicro can't propose the similar configuration, may be high memory boxes are not interesting yet for them, or it requires some good engineering efforts to put 48 dimm slots.

    About specific optimization: it is much easier to have optimized version if we have defined restriction – in this case
    we can do optimization for Intel CPU, i.e. SSE4.2 instruction. Also we can get benefits for InnoDB if we know that we operate on 128+ GB of memory.

  • Vadim – Other’s don’t offer it because the 48 dimm slots is achieved via a custom ASIC Cisco has as a result of a company they bought a few years ago. From what I understand the premium associated with that is pretty high but I haven’t priced them out.

    With the launch of the Xeon EX series(8 cores), the latest Cisco equipment does not use the memory extender technology that they have on the 4/6-core systems, I assume they didn’t port the tech to the new Intel stuff. EX does support higher memory numbers than the older Intel gear though.

    Myself I’m a fan of the HP BL685c G7 blade 48 cores, 32 DIMMs, and of course HP has Fusion IO expansion boards for their blades. Fully loaded chassis with 384 cores and 4TB of memory comes in at about 6,700W in 10U.

  • hoping to get some soon! Though won’t have fusion IO on them, looking to get a small 64x15k RPM drive 3PAR F400 to drive my I/O initially. We have about 45 mysql databases, though very low load at the moment, lots of caching in the application layer.

  • Dell do boxes with up to 1TB ram – check out the R910 (it’s 4U)

    They also have a number of 2U boxes that take up to 512GB, e.g. the R810 / R815.


  • To the 1st poster:

    I had the same problem, you MUST disable query cache on multicore systems, to get reasonable sysbench (and in my case, real workload) performance.

Comments are closed