GET 24/7 LIVE HELP NOW

Announcement

Announcement Module
Collapse
No announcement yet.

Master/Master between two clusters

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Master/Master between two clusters

    I'm trying to set up a cross-datacenter replication topology. I was looking at Percona for its multi-node capabilities. I'm trying to see if I can set up two independent clusters, one in each data center, and then pick one node from each and set up master/master replication.

    In the FAQ on the Percona home page, there is some indication that this would work:

    Q: Does it work with regular MySQL replication?
    A: Yes. On the node you are going to use as master, you should enable log-bin and log-slave-update options.

    I'm guessing the intent of that question was to use one node as a master to send a standard MySQL replication stream to a slave. But I want to take it a step further.

    I've set up the two independent clusters, and then I set up standard mysql master/master replication between node 1 in DC1 and node 1 in DC2. The MySQL slave threads are up and running no problem.

    When I insert a row into any node in the DC1 cluster, the row shows up on all nodes in DC1. Then the mysql replication pushes that row to node 1 in DC2. But that's where it stops! The row doesnt make it to nodes 2,3,4 in DC2. It seems that the insert statement by the MySQL slave thread is ignored by the underlying XtraDB cluster mechanism (which I'm assuming is Galera). It's almost like the Galera-equivalent parameter of "log-slave-enabled" is not turned on.

    So how do I get this to work? Is there some wsrep setting that will allow the XtraDB cluster components to recognize the incoming transaction from the MySQL slave thread and apply that to all other nodes in that cluster?

    (To anticipate the first question I think I will get: No I don't want to set up one big cluster across the two DCs. The two DCs are on opposite sides of the world, and the asynchronous "eventually consistent" replication manner of MySQL is better for us than setting up synchronous commit cluster cross the WAN.)

  • #2
    Ok, here's some new information. After I set up the master/master between the two clusters, I restarted mysql on all nodes, and it started replicating!!

    So now if I insert a row into any node in DC1, it shows up in all other nodes in that DC, and then mysql replication pushes it over to the other node and it shows up there as well. But now, it wont stop replicating!! It's like an infinite loop! It keeps replicating the same row into the table on all the nodes. Here's the table

    create table test1 (col1 int);

    All I did was insert a number 1 a total of one time. And within seconds, every node had hundreds of rows with 1 in it! Its like it doesnt know that it already received that row and its trying to re-apply it!! I cant make it stop! Even when I drop the table, and then recreate it, replication KEEPS GOING! New rows just keep showing up!

    Now I have to go figure out how to clear up all the queued up replication that I DON'T want to apply.

    How do I stop this? Would global transaction IDs help here? But that's not available until MariaDB 10.0.2.

    What's the solution here?

    Comment


    • #3
      Did you ever come up with a solution? I am trying to do something similar to you and I came across the same issue.

      I have 2 sites. A single mysql instances in a circular replication fashion with a cluster. Whenever I did an insert in one of the nodes then it kept on adding rows and all the servers got 100k rows lol.

      Comment


      • #4
        GTIDs are available in PXC 5.6 as well.
        Also - did you use unique server_id on each node?

        Comment


        • #5
          i have 3 nodes in cluster 1 and 3 nodes in cluster 2.Then i setup master-master replication between node 3 in cluster 1 and node 1 cluster 2.
          enabled log-slave-updates on both nodes and provided unique server-ids for all nodes in both cluster.
          When creating a database and dropping database it is causing infinite loop.
          Please let me know what are the changes i have to make to config in order to avoid infinite loop
          Appreciate your help

          Comment


          • #6
            The issue for infinite loop is due to enabling log slave updates on both nodes node 3 of cluster 1 and node 1 of cluster 2.disabling of log slave updates on node 1 of cluster 2 will resolve the infinite loop issue.

            disabling of log slave updates on node 1 of cluster 2 have following issues: (replication between node 3 of cluster 1(master) and node 1 of cluster 2 (master))
            1. any changes made from node 3 of cluster 1 will reflect to node 1 of cluster 2, but not to remaining nodes in cluster 2
            2.any changes made from any nodes other than node 1 in cluster 2 will not reflect to any of the nodes in cluster 1

            we will see consistent data on all nodes in both clusters, only if changes are made from node1 in cluster 2

            without enabling log slave updates on both nodes we can't achieve consistent data on all nodes on other hand it will cause infinite loop issue.

            master-master replication from one node in cluster 1 and other node in cluster 2 is not possible.

            Please let me if there are any other suggestions/recommendations.


            Comment


            • #7
              Sorry for late reply, busy times.
              So indeed the replication loop happens when you do master-master topology between two PXC clusters when you use unique server_id on each node. This is because the asynchronous replication relies on server_id to make the decision whether to execute the transaction or not.
              Here, for example you do insert on node1 with server_id=1, this is being replicated via Galera to other nodes, including the async-master node (say this one has server_id=2), this master node writes this transaction to it's own binary log, but still with server_id=1. Then async slave executes it, write to it's own binary log (with server_id = 1 again), and this is pulled back to cluster 1 by it's async slave (may be same node that is async-master for cluster 2). As server_id is different from the one the slave has, it executes it again, and again,...

              To prevent that, you have to consider one PXC cluster as one server from the asynchronous replication point of view, thus give the same server_id for each node within the same cluster. Galera does not care about server_id at all, but as you can see it's very important for standard - async replication.

              Comment

              Working...
              X