Intel Nehalem vs AMD Opteron shootout in sysbench workload

Baron Schwartz

15 years ago

You could bind mysqld to 32 cores on the AMD box too, and see if it scales better after that.

0

Reply

tobi

15 years ago

“I actually suspect this is rather MySQL problem, and 32 cores may be limit of scaling for MySQL 5.6.” But mysql ist a constant factor to both benchmarks. The CPUs and OS are the only difference(?), Maybe the opteron CPUs have lower bandwidth for coherency traffic? Investigating this could reveal powerful insights.

0

Reply

Raine

15 years ago

Nahalem? Did you mean Nehalem?

🙂

0

Reply

George

15 years ago

Surprising results would of though a Nehalem based processor running at much higher clock speed would have a more substantial lead.

How would innodb_buffer_pool_instances come into play would it make a difference ? Noticed Intel and AMD servers have different vm.dirty_ratio of 40% and 20%

0

Reply

Patrick Casey

15 years ago

Could it be choking on memory access? With that many cores thrashing the registers at once, you’d expect the instruction caches on the chips to be pretty saturated.

I’ve yet to run a benchmark myself where memory access time mattered all that much, but you might have found the use case :).

0

Reply

Jeffrey Gilbert

15 years ago

You sure that’s the right chip model? That’s a six core chip. 32 / 6 = 5.3333

0

Reply

George

15 years ago

Yeah Xeon X5670 is dual hexa-core = 12 cores + 12 virtual = 24 cores.

0

Reply

Author

Vadim Tkachenko

15 years ago

Yes, my bad, it is 24 cores on Cisco / Intel Nehalem

0

Reply

Admin

Peter Zaitsev

15 years ago

Vadim,

So are you saying the performance is basically the same for single core or is it how graph make it look ?

If this is the case it is interesting to see Opteron cores are not slower. Overall however we get about same performance from 2 socket Intel as 4 socket Opteron which should be more expensive and take more power… which means 2 socket Intel is a way to go these days.

0

Reply

Jeffrey Gilbert

15 years ago

From wiki:

http://en.wikipedia.org/wiki/Nehalem_(microarchitecture)

6 cores (12 threads) @ 2.93 GHz
6 x 4 = 24 (so 4 sockets)

And the Opterons of course have slower clocked cores but more per socket, so

12 cores (12 threads) @ 1.9GHz
4 x 12 = 48 (still 4 sockets)

That’s my understanding of how this benchmark is being run and why it’s so even between the two. It is interesting to see that the chips with more physical cores cannot end up scaling past the chips with the faster clock, but for energy consumption during average use, it’s probably closer to AMD than Intel.

0

Reply

Admin

Peter Zaitsev

15 years ago

Jeffrey,

Vadim has corrected it is 12 cores 24 threads here. It is 2 sockets Intel vs 4 socket AMD

0

Reply

EBob

15 years ago

You are all confusing the issue with the cores:

Intel: 12 cores + hyperthreading.
AMD: 48 cores.

Forget about “threads” and “virtual cores”, its not a number you should consider, just think 5-10% faster than without hyperthreading

This seems to prove difinitively that fewer faster cores are better that more slower cores for mysql. I’m shocked that a 48 core system did not obliterate the 12 core. Simply shocked.

0

Reply

Jeffrey Gilbert

15 years ago

Now that I think about it a little while, it’s not THAT surprising that Intel’s faster chips have a leg up on the scalability side under MySQL workloads. The AMD chips are far better suited for true multi-threaded applications (e.g. apache), where there are no lock conditions. What’s happening, from my understanding, is that once contention for transactions becomes a factor of waiting for a process to finish executing, intel will win because it can execute that process faster and release the lock. On AMD, the individual cores can only execute instructions so fast, so their speed bottlenecks will show up in higher tps loads.

Under apache or something where resource locking isn’t an issue, i think the AMD chips would have a better leg up because they could spawn a process, throw it to a core, and forget about it. Just a guess based on how I understand both softwares to work.

0

Reply

Patrick Casey

15 years ago

I dunno, my rule of thumb is that I’d almost always prefer to have my MIPS on a small number of cores rather than on big ones since that’ll work well under various loads. I’m sure there are other workloads where having a huge core count beats having fast cores, but most of the ones I run into tend to be the opposite.

If you imagine 1 big CPU that does 16 “units of work/time” vs 16 little ones that do 1 “units/time” then they both have the same theoretical throughput, but the single big CPU is much more likely to achieve that in practice. Only case having a family of little CPUs would help out would be if you have a near perfectly parallelized problem.

I’ve been burned pretty hard with a couple of clients, for example, who bought Sun Niagara chips because the sun salesman convinced them it was the cheapest way to get lots of clock cycles, which was true for a perfectly parallelized compute problem, but grossly untrue for the kind of database workloads the customer had (they were much better off on dual core intels).

0

Reply

Denis M

15 years ago

Vadim,
I think it’s interesting to measure what portion of performance hyper-threading contributes.

0

Reply

Laurynas

15 years ago

Vadim –

Was the mysqld binary built optimized for each CPU?

0

Reply

Author

Vadim Tkachenko

15 years ago

Laurynas,

Nope, it was generic Linux binary tar.gz from dev.mysql.com

0

Reply

Laurynas

15 years ago

Vadim –

I do not know how exactly the generic binary was built but in the best case it could have used -mtune=generic option (one size fits all, or boths CPUs slightly penalized – perhaps by differing amount). Since this is CPU/memory bound benchmark, it’d be interesting to see if building for each CPU with -march=native –param l1-cache-size=x –param l1-cache-line-size=y –param l2-cache-size=z changes the results. Although I’d expect any differences to come up with smaller numbers of threads only.

0

Reply

Admin

Peter Zaitsev

15 years ago

Laurynas,

It might be an interesting test indeed. Though I remember some 6 years ago we played a lot with different compiler settings in attempt to get gains. We could get couple of percent tops at that time which was not worth it. May be architectures have diverged now and compilers improved so numbers are different.

0

Reply

Author

Vadim Tkachenko

15 years ago

@whatever,

The disk system does not matter as all data is stored in memory, this test was 100% CPU burning.
But I expected that objection. On R815 the main storage was Virident tachIOn card. But again, it does not matter at all in this test.

0

Reply

whatever

15 years ago

Hi, Vadim Tkachenko :

According to the hardware specs, you are comparing the Cisco C250 with FusionIO SSD + 384GB ram against R815 with 6x 7200rpm western digital RAID10 setup+ 160GB ram? Then this test is an IO test, not CPU comparison?

0

Reply

Ketil

15 years ago

I can’t help but noticing that the Intel system has a lot more memory than the AMD system, and I’d be inclined to suspect this is the reason for performance dropoff for AMD.

Anyway, the interesting question is how much performance do I need, and how much do I have to pay for it. I just checked Dell, and a 32 core R910 Intel system (2.26GHz) is almost twice the price of an R815 with 48 2.3GHz cores – both with 256GB memory. So although the Intel system might still be faster, I think the AMD system is a much better deal.

0

Reply

Admin

Peter Zaitsev

15 years ago

Laurynas,

It might be an interesting test indeed. Though I remember some 6 years ago we played a lot with different compiler settings in attempt to get gains. We could get couple of percent tops at that time which was not worth it. May be architectures have diverged now and compilers improved so numbers are different.

0

Reply

Laurynas Biveinis

15 years ago

Vadim –

I do not know how exactly the generic binary was built but in the best case it could have used -mtune=generic option (one size fits all, or boths CPUs slightly penalized – perhaps by differing amount). Since this is CPU/memory bound benchmark, it’d be interesting to see if building for each CPU with -march=native –param l1-cache-size=x –param l1-cache-line-size=y –param l2-cache-size=z changes the results. Although I’d expect any differences to come up with smaller numbers of threads only.

0

Reply

Author

Vadim Tkachenko

15 years ago

Laurynas,

Nope, it was generic Linux binary tar.gz from dev.mysql.com

0

Reply

Laurynas Biveinis

15 years ago

Vadim –

Was the mysqld binary built optimized for each CPU?

0

Reply

Denis M

15 years ago

Vadim,
I think it’s interesting to measure what portion of performance hyper-threading contributes.

0

Reply

Patrick Casey

15 years ago

I dunno, my rule of thumb is that I’d almost always prefer to have my MIPS on a small number of cores rather than on big ones since that’ll work well under various loads. I’m sure there are other workloads where having a huge core count beats having fast cores, but most of the ones I run into tend to be the opposite.

If you imagine 1 big CPU that does 16 “units of work/time” vs 16 little ones that do 1 “units/time” then they both have the same theoretical throughput, but the single big CPU is much more likely to achieve that in practice. Only case having a family of little CPUs would help out would be if you have a near perfectly parallelized problem.

I’ve been burned pretty hard with a couple of clients, for example, who bought Sun Niagara chips because the sun salesman convinced them it was the cheapest way to get lots of clock cycles, which was true for a perfectly parallelized compute problem, but grossly untrue for the kind of database workloads the customer had (they were much better off on dual core intels).

0

Reply

Jeffrey Gilbert

15 years ago

Now that I think about it a little while, it’s not THAT surprising that Intel’s faster chips have a leg up on the scalability side under MySQL workloads. The AMD chips are far better suited for true multi-threaded applications (e.g. apache), where there are no lock conditions. What’s happening, from my understanding, is that once contention for transactions becomes a factor of waiting for a process to finish executing, intel will win because it can execute that process faster and release the lock. On AMD, the individual cores can only execute instructions so fast, so their speed bottlenecks will show up in higher tps loads.

Under apache or something where resource locking isn’t an issue, i think the AMD chips would have a better leg up because they could spawn a process, throw it to a core, and forget about it. Just a guess based on how I understand both softwares to work.

0

Reply

EBob

15 years ago

You are all confusing the issue with the cores:

Intel: 12 cores + hyperthreading.
AMD: 48 cores.

Forget about “threads” and “virtual cores”, its not a number you should consider, just think 5-10% faster than without hyperthreading

This seems to prove difinitively that fewer faster cores are better that more slower cores for mysql. I’m shocked that a 48 core system did not obliterate the 12 core. Simply shocked.

0

Reply

Admin

Peter Zaitsev

15 years ago

Jeffrey,

Vadim has corrected it is 12 cores 24 threads here. It is 2 sockets Intel vs 4 socket AMD

0

Reply

Jeffrey Gilbert

15 years ago

From wiki:

http://en.wikipedia.org/wiki/Nehalem_(microarchitecture)

6 cores (12 threads) @ 2.93 GHz
6 x 4 = 24 (so 4 sockets)

And the Opterons of course have slower clocked cores but more per socket, so

12 cores (12 threads) @ 1.9GHz
4 x 12 = 48 (still 4 sockets)

That’s my understanding of how this benchmark is being run and why it’s so even between the two. It is interesting to see that the chips with more physical cores cannot end up scaling past the chips with the faster clock, but for energy consumption during average use, it’s probably closer to AMD than Intel.

0

Reply

Admin

Peter Zaitsev

15 years ago

Vadim,

So are you saying the performance is basically the same for single core or is it how graph make it look ?

If this is the case it is interesting to see Opteron cores are not slower. Overall however we get about same performance from 2 socket Intel as 4 socket Opteron which should be more expensive and take more power… which means 2 socket Intel is a way to go these days.

0

Reply

Author

Vadim Tkachenko

15 years ago

Yes, my bad, it is 24 cores on Cisco / Intel Nehalem

0

Reply

FERNANDEZGEORGE

15 years ago

Yeah Xeon X5670 is dual hexa-core = 12 cores + 12 virtual = 24 cores.

0

Reply

Jeffrey Gilbert

15 years ago

You sure that’s the right chip model? That’s a six core chip. 32 / 6 = 5.3333

0

Reply

Patrick Casey

15 years ago

Could it be choking on memory access? With that many cores thrashing the registers at once, you’d expect the instruction caches on the chips to be pretty saturated.

I’ve yet to run a benchmark myself where memory access time mattered all that much, but you might have found the use case .

0

Reply

FERNANDEZGEORGE

15 years ago

Surprising results would of though a Nehalem based processor running at much higher clock speed would have a more substantial lead.

How would innodb_buffer_pool_instances come into play would it make a difference ? Noticed Intel and AMD servers have different vm.dirty_ratio of 40% and 20%

0

Reply

Draineeeffine

15 years ago

Nahalem? Did you mean Nehalem?

0

Reply

tobi

15 years ago

“I actually suspect this is rather MySQL problem, and 32 cores may be limit of scaling for MySQL 5.6.” But mysql ist a constant factor to both benchmarks. The CPUs and OS are the only difference(?), Maybe the opteron CPUs have lower bandwidth for coherency traffic? Investigating this could reveal powerful insights.

0

Reply

Baron Schwartz

15 years ago

You could bind mysqld to 32 cores on the AMD box too, and see if it scales better after that.

0

Reply

Intel Nehalem vs AMD Opteron shootout in sysbench workload

Still on MySQL 5.7 or 8.0? Those high-severity CVE fixes are covered

Skipping Percona Server for MySQL 8.4.9 and 9.7.0

Extending pt-archiver with a Partition-Aware Plug-in for Fast Retention Policy Enforcement

Far
Enough.

Intel Nehalem vs AMD Opteron shootout in sysbench workload

Still on MySQL 5.7 or 8.0? Those high-severity CVE fixes are covered

Skipping Percona Server for MySQL 8.4.9 and 9.7.0

Extending pt-archiver with a Partition-Aware Plug-in for Fast Retention Policy Enforcement

Far Enough.

Far
Enough.