September 1, 2014

Why MySQL’s binlog-do-db option is dangerous

I see a lot of people filtering replication with binlog-do-db, binlog-ignore-db, replicate-do-db, and replicate-ignore-db. Although there are uses for these, they are dangerous and in my opinion, they are overused. For many cases, there’s a safer alternative.

The danger is simple: they don’t work the way you think they do. Consider the following scenario: you set binlog-ignore-db to “garbage” so data in the garbage database (which doesn’t exist on the slave) isn’t replicated. (I’ll come back to this in a second, so if you already see the problem, don’t rush to the comment form.)

Now you do the following:

You just broke replication, twice. Once, because your slave is going to execute the first query and there’s no such table “garbage.junk” on the slave. The second time, silently, because the update to production.users isn’t replicated, so now the root user isn’t disabled on the slave.

Why? Because binlog-ignore-db doesn’t do what you think. The phrase I used earlier, “data in the garbage database isn’t replicated,” is a fallacy. That’s not what it does. In fact, it filters out binary logging for statements issued from connections whose default database is “garbage.” In other words, filtering is not based on the contents of the query — it is based on what database you USE.

The other configuration options I mentioned work similarly. The binlog-do-db and binlog-ignore-db statements are particularly dangerous because they keep statements from ever being written to the binary log, which means you can’t use the binary log for point-in-time recovery of your data from a backup.

In a carefully controlled environment, these options can have benefits, but I won’t talk about that here. (We covered that in our book.)

The safer alternative is to configure filters on the slave, with options that actually operate on the tables mentioned in the query itself. These are replicate-wild-* options. For example, the safer way to avoid replicating data in the garbage database is to configure replicate-wild-ignore-table=garbage.%. There are still edge cases where that won’t work, but it works in more cases and has fewer gotchas.

If you are confused, you should read the replication rules section of the manual until you know it by heart :-)

About Baron Schwartz

Baron is the lead author of High Performance MySQL.
He is a former Percona employee.

Comments

  1. Baron Schwartz says:

    Not really, the filtering behavior is still not what users expect. If you use row-based logging, replication (probably) won’t break as much (there are still cases where it can!), but you’ll still have the problem of some changes potentially not being logged, thus breaking your point-in-time-recovery ability.

  2. J Bruni says:

    Hi. I always set up binary logs as it “saved my life” once, allowing me to perform a “point in time recovery”.

    I have just read both the article and the current MySQL manual…

    Please correct me if I’m wrong… Maybe I missed something, but it seems to me that most of the dangerous scenario described in this article can be avoided by changing binlog_format option from STATEMENT (the default) to ROW… isn’t it?

    Thank you very much.

  3. John Swindells says:

    This is good to know. Is it true, therefore, that binlog-do-db behaves itself if you have always selected your database beforehand? When you say ‘default database’, does that include a database selected by USE DATABASE?

  4. The other important issue to consider is that binary logs are not only used for replication. They are incremental backups, and if you chose to use the “do” statements, you are effectively erasing history, and should you have a disaster and need the incremental backups, there is no way to retrieve the ignored information.

  5. John, yes that’s true.

  6. peter says:

    Sheeri,

    Indeed – so any binlog filtering is evil if you care about your data.
    With row level replication it is probably safe to skip tables which you do not care about such as temporary tables from the logging but this is about it.

  7. Master side filtering has a host of pitfalls, but even so it is sometimes necessary. Applications can turn off the binlog for selected statements using SET SQL_LOG_BIN=0. That’s better because at least you presumably know exactly what you are doing at the application level. We use this feature for Tungsten Replicator catalogs–replicating them would break our application. We have very flexible filters both on the master as well as the slave but our experience has been exactly what Baron found, namely that filtering on the slave is best.

  8. Morgan Christiansson says:

    Which is why this behaviour is very clearly documented in the MySQL manual for this setting.

    http://dev.mysql.com/doc/refman/5.0/en/replication-options-binary-log.html

  9. Morgan, I have slowly come to realize that most people never read the manual. They read wikihow.com or some other garbage and think they know things. Sad but true.

  10. >> Because binlog-ignore-db doesn’t do what you think.
    I wish it did what it says. It would have brought down the network traffic when I have to update the slaves located in other cities.

  11. Morgan Christiansson says:

    Shantanu, there is also –replicate-do-table and –replicate-wild-do-table which have different behaviour than –binlog-do-db

    See
    http://dev.mysql.com/doc/refman/5.0/en/replication-options-slave.html#option_mysqld_replicate-do-table
    http://dev.mysql.com/doc/refman/5.0/en/replication-options-slave.html#option_mysqld_replicate-wild-do-table

    The documentation even says “This works for cross-database updates, in contrast to –replicate-ignore-db.”

    But you still need to be aware of it’s quirks, queries that JOIN the wrong tables in updates could leave your replication out of sync.

  12. Simon Mudd says:

    Row based replication in 5.1 makes this sort of thing clearer and that’s almost certainly one of the reasons it was added. For certain SQL statements it can also be much quicker.

    The whole replicate-wild* or replicate*ignore options would be so much better handled if they were done differently, for example as done by Sybase were you can easily configure replication on a per table basis.

    Even having a few mysql.XXXX tables which define the replication rules would be better than using the my.cnf only options. I’ve only looked in detail at Sybase replication, so am not sure what is offered by Oracle or other commercial RDBMS vendors. MySQL’s replication facilities for simple stuff is great, but the moment you want to do things in a slightly more complex fashion opens the way for potential headaches if you are not really aware of how replication works in MySQL.

    Which is why I stand by the claim in my blog posting a while back that it would be so much better if replication were pulled out of mysqld and moved to a separate process dedicated to the task.

    Then as mentioned the binlog could be used for what it’s supposed to be: point in time recovery and the replication process could be improved without having to worry so much about what goes on in the database.

  13. I seem to have a case in an existing setup where binlog_do_db and binlog_ignore_db do not appear as advertized in the manual.

    The mysql config file has:

    binlog_do_db=mydb
    binlog_ignore_db=mysql
    binlog_ignore_db=test

    But when I login using ‘mysql -u -p’ and then issue a

    create database percona;

    Then the percona database is not replicated to the slave. It is if I use binlog_do_db=percona. In this case it is replicated to the client. But this is in contrast to the manual because ‘percona’ does not match any of the binlog_do_db configs in the original setup so then the ignore_db rules are matched. Since none of those rules match, it should replicate but it doesn’t.

    This is happening with mysql-server 5.1.61 on centos 6.2. What could be wrong here?

  14. I also tested what the default database was using

    select database();

    but this returns NULL so the default database does not match anything or does NULL match with the ignore db settings somehow?

  15. Erik – there are flow charts at http://dev.mysql.com/doc/refman/5.1/en/replication-rules-db-options.html – so if it’s not following the rules of that flow chart (it seems not to be following those rules) then you should file a bug at http://bugs.mysql.com.

  16. Akshay says:

    Hello i have setup binlog_do_db filter on my production server,please tell me how to safely remove this option?

Speak Your Mind

*