I generally thought about MySQL replication as being quite low overhead on Master, depending on number of Slaves. What kind of load extra Slave causes ? Well it just gets a copy of binary log streamed to it. All slaves typically get few last events in binary log so it is in cash. In most cases having some 10 slaves are relatively low overhead not impacting master performance a lot. I however ran into the case when performance was significantly impacted by going from 2 to 4 slave on the box. Why ?
The reason was Master was having a lot of very small updates going on – over 15.000 transactions a second. Each time event is logged to the binary log all 4 threads feeding binary logs to the slave were to be woken up to send the update notification. With 4 slaves connected this makes 60.000 of threads being woken up sending some 60.000 packets (there may be some buffering taking place on TCP side merging multiple sends but still)
I guess this scenario is just not really caught any developer attention yet as it should be rather easy to optimize. Same as network cards are designed to throttle numbers of interrupts they get and process several packets at the time we could make replication threads to be woken up in the batches. For example we could tune the system to wake up the thread feeding slave no more often than 1000 times a second and each wake up even would send multiple events to the slave. It should be possible to make this number tunable as more rare wakeups are less overhead but they are also can impact replication latency a bit. It is also possible to get some auto detection in place timing how long it really takes all threads to send their data to the slave. If you have large amount of slaves the delay from the event executed on the master to last thread sends packet to the slave can be significant.
What does this case teach me in general ? To always look for data rather and question your assumptions. If something is unlikely to be the bottleneck it does not mean it is not