Announcement Module
No announcement yet.

New Cluster locked after starting second node

Page Title Module
Move Remove Collapse
Conversation Detail Module
  • Filter
  • Time
  • Show
Clear All
new posts

  • New Cluster locked after starting second node


    I am trying to move our database 270 GB to a new cluster .

    To do this i installed a 3 node cluster runnign centos6 and xtradb cluster and every thing worked fine .

    Then i shut down the 3 nodes and did xtradbbackup from our old server and started the first node then configured it as slave to keep data up to date and every thing worked fine.

    Then i started the second node and it started to do State transfer but as soon as the transfer is done and mysql started on second node both nodes enter a dead zone i think and both got locked.

    i can't do any command on both when try to do use database on second node i got unkown command !!! , also on first node i tried to do stop slave but it stuck untill i did /etc/init.d/mysql stop on second node so it was freed and continued normally but still second node was stuck the in the shutdown process at

    120914 21:20:22 [Note] WSREP: recv_thread() joined.
    120914 21:20:22 [Note] WSREP: Closing slave action queue.

    so i am very confused why the nodes works fine before copying out database but as soon as i copy our database some thing goes wrong and it is not saying what it is about.


    I think i am having a split brain here , so i will try now to stop the replication to have time to start both node02 and node03 . and we will see how it goes.

    Any idea how to be able to start node02 and node03 whithout having to step into split brain, knowing that when i start node02 i directly go into this state without even doing any query on the system.

    Any Ideas

  • #2

    I did start the three nodes but still after running the start slave by 3 min the got stuck.

    I was able to do select on all three but not any insert or update.

    I checked the wsrep and the were all ready and sync .

    when running SHOW FULL PROCESSLIST; on the second node it showes the below ?/

    Any Idea ?

    +----+-------------+-----------+------+---------+-------+--- ---------------------------------------+-------------------- ---+-----------+---------------+-----------+
    | Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined | Rows_read |
    +----+-------------+-----------+------+---------+-------+--- ---------------------------------------+-------------------- ---+-----------+---------------+-----------+
    | 1 | system user | | NULL | Sleep | 70948 | applied write set 3178572 | NULL | 0 | 0 | 1 |
    | 2 | system user | | NULL | Sleep | 2431 | wsrep aborter idle | NULL | 0 | 0 | 1 |
    | 3 | system user | | NULL | Sleep | 70948 | Delete_rows_log_event::find_row(3178571) | NULL | 0 | 0 | 1 |
    | 5 | root | localhost | NULL | Query | 0 | sleeping | SHOW FULL PROCESSLIST | 0 | 0 | 1 |
    +----+-------------+-----------+------+---------+-------+--- ---------------------------------------+-------------------- ---+-----------+---------------+-----------+
    4 rows in set (0.00 sec)

    Any idea ?


    • #3

      It seems second node is taking too much time in running


      what is this and why it is locking every thing.



      • #4

        It seems the problem is from deleting big chunk of data or from lock table .

        i am not sure if we are doing MIXED replication if it do the lock table query ?