November 27, 2014

Scaling to 256-way the Sun way

As you may have recently seen there are some articles about scaling MySQL one 256-way system.

I though wow did they really make it work, considering how many bottlenecks remain in MySQL.

What article really tells us ?

First the number 256 – this is not number of Cores… this is number of hardware threads which is not exactly the same thing. Each T2 Plus CPU has 8 cores, which with 8 threads each giving 64 threads per chip or 256 threads all together.

Now what is about MySQL scaling to use these 32 cores with 256 threads ? Especially with the goal of “Do it with minimal tuning i.e as close as possible as out-of-the-box” ? Do we simply start MySQL server and change couple of defaults to make it work ?

No! To get really good performance we have to setup 32! MySQL instances on this box (approximately one per core).

Comparing single instance performance (4350 statements/sec) to the total performance archived with 32 instances (79334 statements/sec) we can see the single instance gets 1/18 performance archiveable on the system which is of course extremely poor number.

The scale out and ability to scale by using as many MySQL instances as you like is a very good application architecture but unless you have everything absolutely automated in the Google way I do not think you would enjoy running 32 MySQL instances on the box for the single application just to get decent speed.

Now I should point back to my old post about T2000 Perfrormance with MySQL The Matt Ingenthron commented “Sun” is not, to my knowledge, “aggressively pushing T2000 as Scalable MySQL Platforms”. – Well what this one would be ? Of course it is not T2000 any more and the CPU performance got a bit better but as I understand single thread performance is still many time slower than one of recent Xeons which makes it really tough call for MySQL which can’t perform many operations in parallel.

Now you might got a feeling 79000 queries is a lot ? It is hard to tell without knowing what the queries are but if you just want to look at the queries – check this out – these are a year and a half old benchmarks on the single MySQL instance showing over 50.000 queries on 8 cores (and close to 40.000 queries/sec on 4 slow AMD cores).

Do not get me wrong. The multi-threading architecture Sun has is great for many applications… however it is NOT great for MySQL unless you really really do not care about single thread performance (REPLICATION, ALTER TABLE etc) and willing to run insane number of MySQL instances per server.

About Peter Zaitsev

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.

Comments

  1. Karsten Silz says:

    Hi!

    So does that mean that on a multi-core box, you’re forever stuck with one MySQL server instance never using more than one core?

    Karsten

  2. Thanks for the pointer to the former benchmark. However, the numbers mentioned are not comparable. Running a single trivial queries zillion of times is very different than running business transactions with multi-table join queries.CMT platform are the perfect fit for MySQL multi-instance scalability. I dare you to show me linear scalability to 24 instances of MySQL on any Linux server.

  3. Thanks for your interesting comments. Unfortunately you are mixing Peak Throughput and Response times in your notes. The sentence “To get really good performance you need 32 instances” does not make any sense to me. We are getting great performance from 1 to 28 instances and execellent scalability.
    In addition you can not compare the referred benchmark results to my results. The referred workoad is sending a trivial select zillions of times vs a benchmark performing a business transaction composed of 4 select doing up to 6 table joins, 1 insert and 1 delete.

    CMT servers are a perfect fit for multi-instance MySQL environment. And they will also give you exceptional TCO. I dare you to show me 24 instances of MySQL producing linear scalability on any Linux machine. Deal or no Deal ?

  4. peter says:

    MrBenchmark,

    Sure numbers are not comparable. My point is simply showing the numbers without having any baseline to compare is not very helpful.

    The MultiCore scalability of MySQL in my view is scalability of the single instance… though of course can define it differently.

    I can’t tell you about scaling to 24 instances on Linux server because there is really little need for that. Single instance perfectly works with several cores and there are no x86 systems with same amount of cores as there are threads in Sun systems.

    My main point is – the numbers shown in this case are barely relevant because running so many MySQL instances is highly artificial.

  5. peter says:

    Karsten,

    No. Depending on workload MySQL can work efficiently with significantly more than one core though it is very load specific. I would say 4 cores is what works reasonably well, some workloads scan scale to 8+ cores (considering x86 Intel/AMD cores).

  6. We are working on several key customers deployment in real environments that will show you otherwise. Stay tuned…

  7. Tom says:

    Peter,

    You say …

    “I can’t tell you about scaling to 24 instances on Linux server because there is really little need for that. Single instance perfectly works with several cores and there are no x86 systems with same amount of cores as there are threads in Sun systems.”

    Are you saying that scale-out to multiple instances is not important ? I thought this was the preferred architecture for MySQL database scaling ?
    Or are you saying you rarely see need to scale-out as far as 24 instances ?

    The reason for asking is that the benefit I see from this performance demonstration/workload from Sun is the ability to run in a scale-out fashion but with the benefits of consolidation which may (and likely do include) space,power and cost savings.

    thanks
    Tom

  8. Mark Callaghan says:

    I think we all will be using big SMP servers as a cluster of smaller SMP servers in the near future. But I don’t think that the tested system was balanced. It needs a lot more RAM than 64GB and and more disks than 10 15k (~2000 IOPs) to host 28 instances. A common high-performance commodity x86 server for MySQL uses 16GB or 32GB of RAM and 10 or more disks. By those standards, the Sun box would need 500GB to 1TB of RAM and 56,000 IOPs. As this storage won’t be direct attached, it is likely to be expensive. That much memory in a box will also be very expensive.

    SSD can make the box more interesting. It should be possible to get 56,000 IOPs from direct attached SSD and by the 5 minute rule (http://www.acmqueue.com/modules.php?name=Content&pa=showpage&pid=549&page=1) the box won’t need as much RAM when SSD is used.

  9. peter says:

    MrBenchmark,

    You can enjoy picking on the language. English is not my native and I do not expect to use it perfectly. Speaking about Performance and Scalability in the casual life you use them more broadly :) You can be speaking for example about the peak performance attained at certain configuration which could mean peak throughput.

    The scalability is basically a comparison – you can look at scalability in terms of various hardware configuration, number of disks, database size, number of connections.

    Speaking about 24 instances on any Linux machine as I mentioned I do not know – I have not seen such configurations so I can’t comment how it works.

  10. Mark;

    I can see by your comments that you are used to large scale MySQL deployments.
    You are right on your sizing note and my initial intent was to have 128GB of RAM on the T5440 but I could not make it happen. My article [http://blogs.sun.com/MrBenchmark] is a first attempt on the T5440 and I am convinced that many other experts will publish on this topic in the near future.

  11. peter says:

    Tom,

    Let me explain.

    First – Sharding which is the most common architecture for MySQL is a pain. You shard when you have to and if you would be able to stay away from it (while having performance/cost etc where it should be you would not do so). In fact because of problem with MySQL single instance scalability and operational problems (backups, alter table etc) people have to shard sooner than they would have and if MySQL would be more scalable.

    Second – when sharding number of boxes (and really instances) matter because of operational complexity etc. Performance optimization and management is more complicated with multiple instances on one server than running on the different servers because of shared hardware. People sharding on middle end will much rather have less MySQL instances. For example 5 shards is much easier to deal with than 50. On the higher end – dealing with 100 shards or with 1000 does not really matter because you have to have it automated.

    In the end scaling Single instance applies to all segments while being able to get scalability with multiple instances only to some of them.

    But what really is a show stopper for me on Sun T1/T2 architectures is the “single thread performance” – if I would see that even 50% of what you get from recent Opterons/Xeons I would praise it for many apps.

  12. peter says:

    Mark,

    I think you’re right about using large SMP as multiple boxes, however I think for medium-large scale applications this happens because we have to because of limited single instance scalability and operational issues. Even in this case we would like to see some relatively small number of instances running. For example running 2 instances on 16 core box, not 16.

    Now about performance/memory/disks – the balance here is highly application specific. Some people are able to saturate multiple cores with 2GB worth of data some would have much more data per shard to saturate CPU. When memory is the great factor – some people are running databases which pretty much fit in memory and have very little IO, others have workloads which require much more IO capacity. In many cases there is some optimal ratio between the IO system capacity and memory though you can’t always freely trade things here.

    And of course we should not forget about operational pains :)

  13. peter says:

    MrBenchmark:
    “We are working on several key customers deployment in real environments that will show you otherwise. Stay tuned…”

    I’m not sure what point are you trying to make ? Many customers will buy what they are sold and there are multiple ways to make things work. The point is whenever it is the optimal choice and if customer has implemented does not say anything about how optimal this solution was.

Speak Your Mind

*