October 31, 2014

How network can impact MySQL Operations ?

This week I’ve worked with the customer doing certain work during maintenance window which involved a lot of data copying around between MySQL boxes. We had prepared well and had measured how fast we could copy the data between servers of these kind connected to the same network, and we did the same thing before. Using simple tar+netcat based copy we can get 80-90MB/sec on 1GigE assuming RAID is powerful enough. This applies to large Innodb tables with not overly fragmented tablespace, or it is easy to become IO bound rather than network bound.

As I mentioned you can get even better with using fast parallel compression like LZO or QuickLZ but there was no need in this case.

So estimates were great but once we had started the real copy process we saw the copy speed about 20MB/sec instead of projected 80MB/sec. The IO and CPU usage both on source and target servers was low so it must have been network. Though there was no other traffic between the servers.

The mystery could be easily resolved while looking at network topology. Some database servers were connected to Switch A others to Switch B which had only 1Gbit connection in between.

During the maintenance window multiple tasks concerning different servers made this connection to be the bottleneck.

What does this tell us ? Even if you’re the DBA you better to understand network topology to understand what kind of performance availability and failure scenarios you should expect. If your network is too complicated it is at least worth to know the numbers.

It does not only apply to network but to any resource indeed. For example what if you have the catastrophic event and now would like to restore all 50 servers from backup… in parallel. Will you backup system will be able to restore these in parallel efficiently ? Will there be enough network bandwidth to pipe them through. These and similar questions are what you should be asking yourself.

About Peter Zaitsev

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.

Comments

  1. Another network-level performance concern is faulty cables. When I worked in tech support for InterBase, we helped customers whose performance problems were solved when they replaced the CAT5 cables between their app server and database server.

    Cables contain tiny copper filaments, and these get metal fatigue if they are bent too frequently. If too many filaments are broken this way, the cable can become faulty. TCP is designed to re-try packets that aren’t transmitted reliably, but that cuts down on throughput.

    We helped a customer who had up to 90% packet loss (which means each packet needs to be re-sent up to 10 times) because of his old, damaged cables.

  2. Just to correct myself: those anecdotes apply to stranded coaxial network cables, not CAT5.

  3. peter says:

    Bill,

    Sure you want network to be working normally. I would watch error rate on switches as well as on local nodes and on local network traffic you should expect it to be close to zero. If it is not something may be faulty or miss-configured.

  4. Pat says:

    Another thing worth pointing out is the check the latency of your network connections as well as the throughput. A high throughput high latency link can often give you worse performance (despite the high throughput) than a narrow pipe without latency problems.

    Especially if you app makes a lot of very fast queries, you may find network round trip takes longer than actual query execution.

  5. peter says:

    Pat,

    Good notice though in this case I mainly was speaking about switched LAN which normally is quite good with latencies. Though bunch of routers between servers can affect both latency and throughput dramatically but distance is your worse enemy in this case.

    I remember the customer which had performance issues and we asked him how fast is connection between 2 nodes and he told it is 1Gbit direct connection…. though he did not mention the servers actually 100 Miles from each other :)

Speak Your Mind

*