MySQL Replication is asynchronous which causes problems if you would like to use MySQL Slave as it can contain stale data. It is true delay is often insignificant but in times of heavy load or in case you was running some heavy queries on the master which not take time to replicate to the slave replication lag can be significant. Also even very small lag can cause the problems – for example you’ve posted comment on the blog and on next page reload you do not see it as it was read from the slave millisecond later…. this is something you would not like to happen.
I’ll list some techniques here which I found to be helpful for offloading load to the slave without causing application to be have crazy. The same approach can be used in Master-Master replication in Active-Passive mode, just think about passive node as a slave.
Query type based distribution Elect some queries as “time critical” and other queries as “non time critical” in terms of requirements regarding reading stale data. run time critical queries on the master and non time critical on the slave, or balance them if you have underloaded master. For example search queries or reporting queries are very good candidates. The simplified version of this approach is using Slave for reporting while Master for OLTP queries. This solution works well if there are enough queries which can be non real time.
Session based approach You mainly care about real-time behavior for users which participate on the site, like they’ve added comment and it does not show up. If other visitors see a posts few seconds after it was created by an author they would not even notice, so you can use session based persistence and map users which are passive to the slave while if user had written something you can make him to read from the master for certain period of time. It is best to use this technique with assigning visitor to the dedicated slave for the duration of session which is both good from performance standpoint as he is likely to touch the same data and because all slaves may have different replication lags and you do not want visitors to see objects disappearing on page reloads. This approach is cool but can be a bit complicated and it also does not work well if you’re doing writes for passive visitors which when directly visible to them.
Change time based approach If your operations are specific to particular “objects” – blogs, movies, profiles, threads in forums etc you can track when object was modified last time. For example when new post was added to the blog, or edited. Now for object which were not changed for significant amount of time you can do reads from the slave and for objects which just were updated you can read them from the master. “Last updated” time is always to be queried from the master and better cached in something like memcache. This technique especially works great with caching things in memcache as it means actively updated and read stuff will be read from memcache and old stuff can be read from the slaves, reducing master load and improving scalability by use of simple replication. If using this approach you should chose granularity properly or use different ones – for example you may operate on blog level or on post level – checking when post was last modified/commented and using slaves to work with inactive posts.
Obviously there are other methods you can use for loading the slave, I just feel these free are relatively simple to use and usable for wide variety of applications. Whichever method you’re using you should not forget to monitor if replication is running properly and if replication lag is within the limit. It may be good idea to exclude slaves from the reads if lag is high or adjust behavior to make sure you only read slaves if you can afford reading data which is that stale.
Resources
RELATED POSTS