Filtered MySQL ReplicationPeter Zaitsev
To get this straight – I’m not a big fan of filtered or partial MySQL Replication (as of version MySQL 5.0) – there is enough gotchas with replication itself and getting things right with filtering can get quite bumpy road. In some applications however it is very helpful so lets see what one should do or should not do to make it work.
Experience shared here mainly applies to logical level replication. MySQL 5.1 can use row level replication which alleviates some of these issues but I guess it will still take some time before it becomes massively used.
First thing you need to know about partial replication – you can do filtering on the Master or on the Slave. Filtering on the Slave is normally preferred as in this case you get full binary log on the Master which is not only helpful for replication needs but also if you ever need to do roll forward recovery while recovering from backup. There are however some cases when you really do not need events in binary log – even for backup recovery, for example dealing with temporary tables (unless they are used to populate normal tables using SQL).
On the Master you can do filtering using binlog-do-db and binlog-ignore-db commands. This is there a lot of people stop reading documentation, specify databases they want to replicate (or ignore) and hoping it would work.
In reality however these filters are very fragile – you need to always have default database specified and you only need to update this default database for filtering to work normally. If you’re using this rules you have to be both sure your application(s) only write to default database and everyone who has access to the database always keeps this in mind. In quite a lot of cases I’ve seen such filtering being broken by developer forgetting this restriction and running one time update query without selecting proper default database.
Using this option takes a lot of discipline and so do not work well for a lot of installations, especially as MySQL Server do not offer any enforcements – ie there is no way to forbid updating not default database so you can ensure your application and staff members really follow these rules.
Another option to filter writes on the Master is of course manual filtering – using SET SQL_LOG_BIN=0 if you need to skip some statements from binary logging. This command requires SUPER Privilege and extreme care in use but it can be very helpful in advanced cases allowing replication to control what will be logged to binary log and so replicated.
On the Slave you have more options to filter the data. There is replicate-do-db and replicate-ignore-db options which closely mirror behavior of their binlog-do-* counterparts and which are as easy to break. So I generally would not recommend using these.
There are also replicate-do-table, replicate-ignore-table, replicate-wild-do-table and replicate-wild-ignore-table options which act differently. Instead of looking at the query and analyzing which tables this query is using.
This works in much larger amount of cases, but not all of them. For example stored procedures are handled (because binary logging is done on statements as executed inside stored procedures), however Stored Functions are not – so if you have updates done by stored functions the tables are not taken into account.
The traditional pre MySQL 5.0 way to break such replication is also using multi-table update or delete statements, if you happen to replicate one table or another this would not work. Though this is rather easy to avoid, but it is still remains a way how one can break filtered replication by executing some statements on the master.
Even though replicate-*-table options also have some issues they are much safer than database based counterparts.
In some cases however you can’t really do filtering on the slave or it gets really inefficient – what if you’re replicating over long distance and would like to save on traffic or if write load is high and it would be waste to pull binary logs to large variety of slaves?
This is the case when BLACKHOLE storage engine can come into the play. In the nutshell you can create intermediately slave which has BLACKHOLE as default-storage engine which would fetch all binary logs from the master, filter out only what you need replicated and pass it to the group of slaves. If you have several group of slaves which need different data sets replicated to them – you can use number of such filtering servers.
The good thing about BLACKHOLE filtered server is – it is generally rather lightweight as it only needs to fetch logs filter and write filtered logs – the queries are formally executed but it is rather fast as there is no any real data to modify.
Daniel Schneller wrote pretty good tutorial on creation for setting up BLACKHOLE Based Replication Filtering so I will not go into much of details but just note couple of issues you need to watch out for.
First having BLACKHOLE Slave in the middle you get into classical issues of chain replication – the latency will increase a bit (though not much because query execution on such slave is very fast) and what is more important you will get into more complicated math of dealing with binary log positions. For example cloning Master (and dropping not replicated tables) to set up replication becomes more tricky because you need to perform mapping between master positions which you get with backup to distribution slave position. You can also clone other slaves which are getting the same filtered stream of data though it does not help if you would like to add more tables to be replicated to the slaves.
Another issue is of course creating and altering tables. If you have Innodb tables you’re normally OK because you can use –skip-innodb on the filtering slave and default-storage-engine=BLACKHOLE to ensure all Innodb tables are created as BLACKHOLE. However you can’t do the same with MyISAM tables, because MyISAM can’t be disabled and so if you create tables as MYISAM or ALTER them to MyISAM you will get them as MyISAM on BLACKHOLE server as well. So you need to watch out for this one as well.
As you can see no matter which road you take with partial MySQL Replication there are things to be careful with, so if you’re considering to implement it make sure there are skills and discipline in your team to make sure you do not shot yourself in the foot.
If you’re not doing things which do not work Filtered Replication can work pretty well for you.
P.S It is in my todo to see how well MySQL 5.1 row based replication works and performs and I will be testing filtered scenarios as well. Hopefully everything is taken care of in this case.