Percona Server scalability on multi-cores server

Percona Server scalability on multi-cores server


We now have hardware in our test lab that represents the next generation of commodity servers for databases. It’s a Cisco UCS C250 server, powered by two Intel Westmere CPUs (X5670 @ 2.93GHz). Each CPU has 6 cores and 12 threads. The most amazing part is the amount of memory. It has 384GB of RAM, which is actually more space than the disks contain.  The disks are 270GB in total, with the underlying configuration RAID10 over eight 2.5″ 15K RPM disks. To make the system even more powerful, I put a FusionIO 320GB SLC card in the PCI-E slot. Here is a link to the box specs.

The server was generously provided by Cisco Systems, Inc.

So, obviously I’m anxious to see how Percona Server with XtraDB scales on this hardware, and you can expect a series of benchmarks. An especially interesting topic is what we can get from “threads”, as there are only 12 “real” cores, with each core having two “threads”.

So, I took Percona Server 5.1.47-11.2  and ran the sysbench oltp read-only and read-write benchmarks using from 1 to 32 threads. The database size was 100 million rows ( about 23GB of data). (Starting with Percona Server 5.1.49-12.0, we are going to provide regular builds dedicated to the Cisco UCS platform)

The full results are available on Wiki, and the graphical representation follows:

You can see from the graph that it scales pretty well even up to 24 threads, despite the fact that half of them are not real CPU cores. For up to 10 threads, the scale factor is quite impressive: it is 8.2 for read-only and 9.2 for read-write.  (I calculate the scale factor as the result with 10 threads divided by the result with 1 thread.) Above 10 threads, the rate of increase is not as large, and for 24 threads we have a scale factor of 12.6 for read-only and 13.3 for read-write.

Of course, it will be interesting to compare these results with the latest MySQL 5.5 releases, and especially in cases with the number of threads is > 100. I’m going to do these comparisons in my next round of testing.

(Edited by Fred Linhoss)


Share this post

Comments (14)

  • Shane Reply

    I’ve been setting up a server at the moment with a pair of the same x5670 processors, 24Gb RAM, 22x10krpm raptors on adaptec raid. Seeing good readwrite sysbench results (3360 r/w tps @ 24 threads). Interestingly though my readonly sysbench results are only a fraction of my readwrite (650 tps @ 24 threads). Weird huh.

    September 29, 2010 at 10:33 pm
  • Andy Reply

    384GB of RAM in a 2U commodity server is pretty awe-inspiring. Didn’t know Cisco makes servers too. How much does a server like that cost? What can Cisco do that Dell or Supermicro can’t?

    By the way why make a build dedicated to Cisco servers? Aren’t they just regular Linux boxes running CentOS?

    September 29, 2010 at 10:50 pm
  • Mark Callaghan Reply

    Wow, that is big hardware. Now you need to run tests with a serveral hundred GB InnoDB buffer pool. The option you added to preserve the buffer pool across restarts should be great for that case.

    September 29, 2010 at 10:58 pm
  • Nils Reply

    Andy, here is a basic description of the technology used to increase the amount of RAM:

    It’s hard to get some decent information though, most of it is marketing.

    September 30, 2010 at 7:20 am
  • Jeffrey Gilbert Reply

    Who does better with threading? Intel with it’s fast multicore chips and fake cores via HyperThreading or AMD with it’s MagnyCours chips with a ton of actual cpu cores runnings slightly lower speeds? I’d be interested in knowing the benchmarks there with percona builds as would many system architects out there I imagine.

    September 30, 2010 at 8:07 am
  • Lucas Schirm Reply

    Great propose, i think this evolution is very necessary to Mysql development

    September 30, 2010 at 9:59 am
  • Henrik Ingo Reply

    Back in the day when I worked with Sun customers who liked using Sparc and Solaris, I both heard and saw with my own eyes that an additional HW thread adds about 20% performance compared to a core with just one thread. This seems to be more or less the ratio for these Intel CPU’s too.

    Doing benchmarks becomes trickier, or rather reading the graphs, since at some stage in the curve you hit the point where the OS tells you there are more CPUs to utilize, but in reality it’s not a full core, just more threads. So the curve doesn’t look as good from 12-24 threads/CPUs, but actually the performance is increasing pretty much as you would expect it to. To expect a 1-to-1 performance increase beyond the 12 first cores would be physically impossible, as there aren’t 24 independent cores.

    September 30, 2010 at 10:36 am
  • Vadim Reply


    They cost is high. I do not have final numbers on hands, but only memory 48 dimms , 8GB each (~300$/dimm) will give you 15000$.
    So price should be in X0.000 range where 2 <= X <= 9 .

    I do not see problem why Dell or SuperMicro can't propose the similar configuration, may be high memory boxes are not interesting yet for them, or it requires some good engineering efforts to put 48 dimm slots.

    About specific optimization: it is much easier to have optimized version if we have defined restriction – in this case
    we can do optimization for Intel CPU, i.e. SSE4.2 instruction. Also we can get benefits for InnoDB if we know that we operate on 128+ GB of memory.

    September 30, 2010 at 12:18 pm
  • Baron Schwartz Reply

    Seen on Twitter: “384G ought to be enough for anybody.”

    October 1, 2010 at 5:03 am
  • nate Reply

    Vadim – Other’s don’t offer it because the 48 dimm slots is achieved via a custom ASIC Cisco has as a result of a company they bought a few years ago. From what I understand the premium associated with that is pretty high but I haven’t priced them out.

    With the launch of the Xeon EX series(8 cores), the latest Cisco equipment does not use the memory extender technology that they have on the 4/6-core systems, I assume they didn’t port the tech to the new Intel stuff. EX does support higher memory numbers than the older Intel gear though.

    Myself I’m a fan of the HP BL685c G7 blade 48 cores, 32 DIMMs, and of course HP has Fusion IO expansion boards for their blades. Fully loaded chassis with 384 cores and 4TB of memory comes in at about 6,700W in 10U.

    October 1, 2010 at 10:31 am
  • Baron Schwartz Reply

    Maybe you can get one of those HPs and let Vadim run some benchmarks on it 😉

    October 1, 2010 at 2:41 pm
  • nate Reply

    hoping to get some soon! Though won’t have fusion IO on them, looking to get a small 64x15k RPM drive 3PAR F400 to drive my I/O initially. We have about 45 mysql databases, though very low load at the moment, lots of caching in the application layer.

    October 1, 2010 at 2:46 pm
  • Gricey Reply

    Dell do boxes with up to 1TB ram – check out the R910 (it’s 4U)

    They also have a number of 2U boxes that take up to 512GB, e.g. the R810 / R815.


    October 5, 2010 at 4:54 am
  • István Tóth Reply

    To the 1st poster:

    I had the same problem, you MUST disable query cache on multicore systems, to get reasonable sysbench (and in my case, real workload) performance.

    October 12, 2010 at 11:25 pm

Leave a Reply