Announcement

Announcement Module
Collapse
No announcement yet.

Cluster nodes not connecting

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cluster nodes not connecting

    Hey all, I followed the Percona Guide on how to install percona XtraDB Cluster on CentOS but im still not able to get it up and running.

    I can start the first node fine just like the guide says it should work. But when i try to add the next node it doesn't connect. mysqld command returns these errors:


    130405 9:30:57 [Warning] WSREP: last inactive check more than PT1.5S ago, skipping check130405 9:31:26 [Note] WSREP: view((empty))130405 9:31:26 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out) at gcomm/src/pc.cpp:connect():139130405 9:31:26 [ERROR] WSREP: gcs/src/gcs_core.c:gcs_core_open():195: Failed to open backend connection: -110 (Connection timed out)130405 9:31:26 [ERROR] WSREP: gcs/src/gcs.c:gcs_open():1290: Failed to open channel 'my_centos_cluster' at 'gcomm://172.16.174.11': -110 (Connection timed out)130405 9:31:26 [ERROR] WSREP: gcs connect failed: Connection timed out130405 9:31:26 [ERROR] WSREP: wsrep::connect() failed: 6130405 9:31:26 [ERROR] Aborting130405 9:31:26 [Note] WSREP: Service disconnected.


    Iptables is off, SeLinux is disabled. im not sure what else would be causing it. here are my my.cnf files

    node1:

    [mysqld]datadir=/var/lib/mysql/datauser=mysql# Path to Galera librarywsrep_provider=/usr/lib64/libgalera_smm.so# Cluster connection URL contains IPs of node#1, node#2 and node#3wsrep_cluster_address=gcomm://172.16.174.11,172.16.174.12# In order for Galera to work correctly binlog format should be ROWbinlog_format=ROW# MyISAM storage engine has only experimental supportdefault_storage_engine=InnoDB# This is a recommended tuning variable for performanceinnodb_locks_unsafe_for_binlog=1# This changes how InnoDB autoincrement locks are managed and is a requirement for Galerainnodb_autoinc_lock_mode=2# Node #2 addresswsrep_node_address=172.16.174.11# SST methodwsrep_sst_method=xtrabackup#Authentication for SST methodwsrep_sst_auth="sstuser:s3cret"


    node2

    [mysqld]datadir=/var/lib/mysql/datauser=mysql# Path to Galera librarywsrep_provider=/usr/lib64/libgalera_smm.so# Cluster connection URL contains the IPs of node#1, node#2 and node#3wsrep_cluster_address=gcomm://172.16.174.11# In order for Galera to work correctly binlog format should be ROWbinlog_format=ROW# MyISAM storage engine has only experimental supportdefault_storage_engine=InnoDB# This is a recommended tuning variable for performanceinnodb_locks_unsafe_for_binlog=1# This changes how InnoDB autoincrement locks are managed and is a requirement for Galerainnodb_autoinc_lock_mode=2# Node #1 addresswsrep_node_address=172.16.174.12# SST methodwsrep_sst_method=xtrabackup# Cluster namewsrep_cluster_name=my_centos_cluster# Authentication for SST methodwsrep_sst_auth="sstuser:s3cret"


    any help will be appreciated

  • #2
    Hi,

    I think that it is because you don't have any wsrep_cluster_name in your node 1 and the default cluster name is 'my_wsrep_cluster'. Please check the node1 error log .. if it is not a communication error.. you will see there what is happening.

    You will see something like:

    130406 8:49:29 [Note] WSREP: handshake failed, my group: 'my_wsrep_cluster', peer group: 'my_centos_cluster'

    Regards,

    Martin Arrieta.
    @martinarrietac

    Comment


    • #3
      well that seems to have connected them. thanks!

      I am a little bit worried that when i show the status of mysql on the second node it says its running but it cannot update the PID. but at the same time it is synced and working.

      Comment


      • #4
        Glad to hear that.

        Could you please send us what error you see to determine that? (can't update the pid)

        Also:

        - mysql> SHOW VARIABLES LIKE 'pid_file';
        - shell> ls -la "pid_file"
        - SELinux or AppArmor status.

        Regards,

        Martin Arrieta
        @martinarrietac

        Comment


        • #5
          I seem to have fixed it.
          I noticed that the PID file name was hostname.pid and it was looking for hostname.domain.pid.

          I simply copied the hostname.pid to hostname.domain.pid and the errors went away. not sure why it didnt create the pid file with the correct name in the first place.

          Comment

          Working...
          X