Where the open source database community meets: Use code PERCONA75 and secure your spot for Percona Live.  Register

Intel Nehalem vs AMD Opteron shootout in sysbench workload

April 25, 2011
Author
Vadim Tkachenko
Share this Post:

Having two big boxes in our lab, one based Intel Nehalem (Cisco UCS C250) and second on AMD Opteron (Dell PowerEdge R815), I decided to run some simple sysbench benchmark to compare how both CPUs perform and what kind of scalability we can expect.

It is hard to make apples to apples comparison, but I think it is still interesting.
Cisco UCS C250 has total 12 cores / 24 threads of Intel Nehalem X5670, and Dell PowerEdge R815 has 48 cores of AMD Opteron 6168.
One of biggest difference is that Cisco is running CentOS 5.5 and Dell R815 is based on RedHat EL 6. I will probably will need to rerun benchmark after upgrade Cisco to CentOS 6 ( will be it even released or it is dead ? ).
For benchmark I took sysbench oltp ( both read-only and read-write) and MySQL 5.6.2 as it seems most scalable system at the moment. All data fits into memory, so it is full CPU bound benchmark.

The full numbers, script and config are on our Benchmark Wiki, there are some graphs.

Ok, Despite claims I heard that AMD Opteron is much slower then Intel Nehalem, I do not see it in results.
Both systems scales pretty decent up to 32 threads, with AMD a little bit slower.

After 32 threads, system based on Intel handles 48-256 threads pretty decent, but on R815 / AMD something
gets wrong way. We do not see good improvement from 32 to 48 threads ( but I did expect that, as we have 48 cores),
and after 48 threads throughput drops significantly. I actually suspect this is rather MySQL problem, and 32 cores may be limit of scaling for MySQL 5.6.

Anyway, we have PERFORMANCE_SCHEMA and in next run I will try to get more information what is most used “wait” event that does not allow MySQL to scale.

0 0 votes
Article Rating
Subscribe
Notify of
guest

41 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Baron Schwartz
15 years ago

You could bind mysqld to 32 cores on the AMD box too, and see if it scales better after that.

tobi
tobi
15 years ago

“I actually suspect this is rather MySQL problem, and 32 cores may be limit of scaling for MySQL 5.6.” But mysql ist a constant factor to both benchmarks. The CPUs and OS are the only difference(?), Maybe the opteron CPUs have lower bandwidth for coherency traffic? Investigating this could reveal powerful insights.

Raine
Raine
15 years ago

Nahalem? Did you mean Nehalem?

🙂

George
George
15 years ago

Surprising results would of though a Nehalem based processor running at much higher clock speed would have a more substantial lead.

How would innodb_buffer_pool_instances come into play would it make a difference ? Noticed Intel and AMD servers have different vm.dirty_ratio of 40% and 20%

Patrick Casey
Patrick Casey
15 years ago

Could it be choking on memory access? With that many cores thrashing the registers at once, you’d expect the instruction caches on the chips to be pretty saturated.

I’ve yet to run a benchmark myself where memory access time mattered all that much, but you might have found the use case :).

Jeffrey Gilbert
Jeffrey Gilbert
15 years ago

You sure that’s the right chip model? That’s a six core chip. 32 / 6 = 5.3333

George
George
15 years ago

Yeah Xeon X5670 is dual hexa-core = 12 cores + 12 virtual = 24 cores.

Peter Zaitsev
Admin
15 years ago

Vadim,

So are you saying the performance is basically the same for single core or is it how graph make it look ?

If this is the case it is interesting to see Opteron cores are not slower. Overall however we get about same performance from 2 socket Intel as 4 socket Opteron which should be more expensive and take more power… which means 2 socket Intel is a way to go these days.

Jeffrey Gilbert
Jeffrey Gilbert
15 years ago

From wiki:

http://en.wikipedia.org/wiki/Nehalem_(microarchitecture)

6 cores (12 threads) @ 2.93 GHz
6 x 4 = 24 (so 4 sockets)

And the Opterons of course have slower clocked cores but more per socket, so

12 cores (12 threads) @ 1.9GHz
4 x 12 = 48 (still 4 sockets)

That’s my understanding of how this benchmark is being run and why it’s so even between the two. It is interesting to see that the chips with more physical cores cannot end up scaling past the chips with the faster clock, but for energy consumption during average use, it’s probably closer to AMD than Intel.

Peter Zaitsev
Admin
15 years ago

Jeffrey,

Vadim has corrected it is 12 cores 24 threads here. It is 2 sockets Intel vs 4 socket AMD

EBob
EBob
15 years ago

You are all confusing the issue with the cores:

Intel: 12 cores + hyperthreading.
AMD: 48 cores.

Forget about “threads” and “virtual cores”, its not a number you should consider, just think 5-10% faster than without hyperthreading

This seems to prove difinitively that fewer faster cores are better that more slower cores for mysql. I’m shocked that a 48 core system did not obliterate the 12 core. Simply shocked.

Jeffrey Gilbert
Jeffrey Gilbert
15 years ago

Now that I think about it a little while, it’s not THAT surprising that Intel’s faster chips have a leg up on the scalability side under MySQL workloads. The AMD chips are far better suited for true multi-threaded applications (e.g. apache), where there are no lock conditions. What’s happening, from my understanding, is that once contention for transactions becomes a factor of waiting for a process to finish executing, intel will win because it can execute that process faster and release the lock. On AMD, the individual cores can only execute instructions so fast, so their speed bottlenecks will show up in higher tps loads.

Under apache or something where resource locking isn’t an issue, i think the AMD chips would have a better leg up because they could spawn a process, throw it to a core, and forget about it. Just a guess based on how I understand both softwares to work.

Patrick Casey
Patrick Casey
15 years ago

I dunno, my rule of thumb is that I’d almost always prefer to have my MIPS on a small number of cores rather than on big ones since that’ll work well under various loads. I’m sure there are other workloads where having a huge core count beats having fast cores, but most of the ones I run into tend to be the opposite.

If you imagine 1 big CPU that does 16 “units of work/time” vs 16 little ones that do 1 “units/time” then they both have the same theoretical throughput, but the single big CPU is much more likely to achieve that in practice. Only case having a family of little CPUs would help out would be if you have a near perfectly parallelized problem.

I’ve been burned pretty hard with a couple of clients, for example, who bought Sun Niagara chips because the sun salesman convinced them it was the cheapest way to get lots of clock cycles, which was true for a perfectly parallelized compute problem, but grossly untrue for the kind of database workloads the customer had (they were much better off on dual core intels).

Denis M
Denis M
15 years ago

Vadim,
I think it’s interesting to measure what portion of performance hyper-threading contributes.

Laurynas
15 years ago

Vadim –

Was the mysqld binary built optimized for each CPU?

Laurynas
15 years ago

Vadim –

I do not know how exactly the generic binary was built but in the best case it could have used -mtune=generic option (one size fits all, or boths CPUs slightly penalized – perhaps by differing amount). Since this is CPU/memory bound benchmark, it’d be interesting to see if building for each CPU with -march=native –param l1-cache-size=x –param l1-cache-line-size=y –param l2-cache-size=z changes the results. Although I’d expect any differences to come up with smaller numbers of threads only.

Peter Zaitsev
Admin
15 years ago

Laurynas,

It might be an interesting test indeed. Though I remember some 6 years ago we played a lot with different compiler settings in attempt to get gains. We could get couple of percent tops at that time which was not worth it. May be architectures have diverged now and compilers improved so numbers are different.

whatever
whatever
15 years ago

Hi, Vadim Tkachenko :

According to the hardware specs, you are comparing the Cisco C250 with FusionIO SSD + 384GB ram against R815 with 6x 7200rpm western digital RAID10 setup+ 160GB ram? Then this test is an IO test, not CPU comparison?

Ketil
Ketil
15 years ago

I can’t help but noticing that the Intel system has a lot more memory than the AMD system, and I’d be inclined to suspect this is the reason for performance dropoff for AMD.

Anyway, the interesting question is how much performance do I need, and how much do I have to pay for it. I just checked Dell, and a 32 core R910 Intel system (2.26GHz) is almost twice the price of an R815 with 48 2.3GHz cores – both with 256GB memory. So although the Intel system might still be faster, I think the AMD system is a much better deal.

Peter Zaitsev
Admin
15 years ago

Laurynas,

It might be an interesting test indeed. Though I remember some 6 years ago we played a lot with different compiler settings in attempt to get gains. We could get couple of percent tops at that time which was not worth it. May be architectures have diverged now and compilers improved so numbers are different.

Laurynas Biveinis
15 years ago

Vadim –

I do not know how exactly the generic binary was built but in the best case it could have used -mtune=generic option (one size fits all, or boths CPUs slightly penalized – perhaps by differing amount). Since this is CPU/memory bound benchmark, it’d be interesting to see if building for each CPU with -march=native –param l1-cache-size=x –param l1-cache-line-size=y –param l2-cache-size=z changes the results. Although I’d expect any differences to come up with smaller numbers of threads only.

Laurynas Biveinis
15 years ago

Vadim –

Was the mysqld binary built optimized for each CPU?

Denis M
Denis M
15 years ago

Vadim,
I think it’s interesting to measure what portion of performance hyper-threading contributes.

Patrick Casey
Patrick Casey
15 years ago

I dunno, my rule of thumb is that I’d almost always prefer to have my MIPS on a small number of cores rather than on big ones since that’ll work well under various loads. I’m sure there are other workloads where having a huge core count beats having fast cores, but most of the ones I run into tend to be the opposite.

If you imagine 1 big CPU that does 16 “units of work/time” vs 16 little ones that do 1 “units/time” then they both have the same theoretical throughput, but the single big CPU is much more likely to achieve that in practice. Only case having a family of little CPUs would help out would be if you have a near perfectly parallelized problem.

I’ve been burned pretty hard with a couple of clients, for example, who bought Sun Niagara chips because the sun salesman convinced them it was the cheapest way to get lots of clock cycles, which was true for a perfectly parallelized compute problem, but grossly untrue for the kind of database workloads the customer had (they were much better off on dual core intels).

Jeffrey Gilbert
Jeffrey Gilbert
15 years ago

Now that I think about it a little while, it’s not THAT surprising that Intel’s faster chips have a leg up on the scalability side under MySQL workloads. The AMD chips are far better suited for true multi-threaded applications (e.g. apache), where there are no lock conditions. What’s happening, from my understanding, is that once contention for transactions becomes a factor of waiting for a process to finish executing, intel will win because it can execute that process faster and release the lock. On AMD, the individual cores can only execute instructions so fast, so their speed bottlenecks will show up in higher tps loads.

Under apache or something where resource locking isn’t an issue, i think the AMD chips would have a better leg up because they could spawn a process, throw it to a core, and forget about it. Just a guess based on how I understand both softwares to work.

EBob
EBob
15 years ago

You are all confusing the issue with the cores:

Intel: 12 cores + hyperthreading.
AMD: 48 cores.

Forget about “threads” and “virtual cores”, its not a number you should consider, just think 5-10% faster than without hyperthreading

This seems to prove difinitively that fewer faster cores are better that more slower cores for mysql. I’m shocked that a 48 core system did not obliterate the 12 core. Simply shocked.

Peter Zaitsev
Admin
15 years ago

Jeffrey,

Vadim has corrected it is 12 cores 24 threads here. It is 2 sockets Intel vs 4 socket AMD

Jeffrey Gilbert
Jeffrey Gilbert
15 years ago

From wiki:

http://en.wikipedia.org/wiki/Nehalem_(microarchitecture)

6 cores (12 threads) @ 2.93 GHz
6 x 4 = 24 (so 4 sockets)

And the Opterons of course have slower clocked cores but more per socket, so

12 cores (12 threads) @ 1.9GHz
4 x 12 = 48 (still 4 sockets)

That’s my understanding of how this benchmark is being run and why it’s so even between the two. It is interesting to see that the chips with more physical cores cannot end up scaling past the chips with the faster clock, but for energy consumption during average use, it’s probably closer to AMD than Intel.

Peter Zaitsev
Admin
15 years ago

Vadim,

So are you saying the performance is basically the same for single core or is it how graph make it look ?

If this is the case it is interesting to see Opteron cores are not slower. Overall however we get about same performance from 2 socket Intel as 4 socket Opteron which should be more expensive and take more power… which means 2 socket Intel is a way to go these days.

FERNANDEZGEORGE
FERNANDEZGEORGE
15 years ago

Yeah Xeon X5670 is dual hexa-core = 12 cores + 12 virtual = 24 cores.

Jeffrey Gilbert
Jeffrey Gilbert
15 years ago

You sure that’s the right chip model? That’s a six core chip. 32 / 6 = 5.3333

Patrick Casey
Patrick Casey
15 years ago

Could it be choking on memory access? With that many cores thrashing the registers at once, you’d expect the instruction caches on the chips to be pretty saturated.

I’ve yet to run a benchmark myself where memory access time mattered all that much, but you might have found the use case :) .

FERNANDEZGEORGE
FERNANDEZGEORGE
15 years ago

Surprising results would of though a Nehalem based processor running at much higher clock speed would have a more substantial lead.

How would innodb_buffer_pool_instances come into play would it make a difference ? Noticed Intel and AMD servers have different vm.dirty_ratio of 40% and 20%

Draineeeffine
Draineeeffine
15 years ago

Nahalem? Did you mean Nehalem?

:)

tobi
tobi
15 years ago

“I actually suspect this is rather MySQL problem, and 32 cores may be limit of scaling for MySQL 5.6.” But mysql ist a constant factor to both benchmarks. The CPUs and OS are the only difference(?), Maybe the opteron CPUs have lower bandwidth for coherency traffic? Investigating this could reveal powerful insights.

Baron Schwartz
15 years ago

You could bind mysqld to 32 cores on the AMD box too, and see if it scales better after that.

Far
Enough.

Said no pioneer ever.
MySQL, PostgreSQL, InnoDB, MariaDB, MongoDB and Kubernetes are trademarks for their respective owners.
© 2026 Percona All Rights Reserved