Emergency

Vagrant+Vbox, first steps to XtraDB. Bootstrap first node shows an error

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Vagrant+Vbox, first steps to XtraDB. Bootstrap first node shows an error

    Hi

    I've deployed a small environment of test with Vagrant and Vbox with 3 nodes (ubuntu TLS)

    IPs are private network, 1 per node

    192.168.33.50
    192.168.33.51
    192.168.33.52

    1 cpu and 1 GB RAM per node also
    No firewall inside Ubuntu
    each node can ping another and all knows the hostname of the others

    I've read some guides about Xtradb Cluster, finally I choose:

    http://redcrackle.com/blog/how-set-p...cluster-ubuntu

    but when I wan to restart the first node with bootsrap it finish with an error:

    2015-03-16 12:13:59 0 [Warning] Using unique option prefix key_buffer instead of key_buffer_size is deprecated and will be removed in a future release. Please use the full name instead.
    2015-03-16 12:13:59 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
    2015-03-16 12:13:59 10647 [Note] WSREP: Read nil XID from storage engines, skipping position init
    2015-03-16 12:13:59 10647 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/libgalera_smm.so'
    2015-03-16 12:13:59 10647 [Note] WSREP: wsrep_load(): Galera 3.9(r93aca2d) by Codership Oy <info@codership.com> loaded successfully.
    2015-03-16 12:13:59 10647 [Note] WSREP: CRC-32C: using "slicing-by-8" algorithm.
    2015-03-16 12:13:59 10647 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1
    2015-03-16 12:13:59 10647 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 192.168.33.50; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false
    2015-03-16 12:13:59 10647 [Note] WSREP: Service thread queue flushed.
    2015-03-16 12:13:59 10647 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
    2015-03-16 12:13:59 10647 [Note] WSREP: wsrep_sst_grab()
    2015-03-16 12:13:59 10647 [Note] WSREP: Start replication
    2015-03-16 12:13:59 10647 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
    2015-03-16 12:13:59 10647 [Note] WSREP: protonet asio version 0
    2015-03-16 12:13:59 10647 [Note] WSREP: Using CRC-32C for message checksums.
    2015-03-16 12:13:59 10647 [Note] WSREP: backend: asio
    2015-03-16 12:13:59 10647 [Note] WSREP: restore pc from disk successfully
    2015-03-16 12:13:59 10647 [Note] WSREP: GMCast version 0
    2015-03-16 12:13:59 10647 [Note] WSREP: (673c2adf, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
    2015-03-16 12:13:59 10647 [Note] WSREP: (673c2adf, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
    2015-03-16 12:13:59 10647 [Note] WSREP: EVS version 0
    2015-03-16 12:13:59 10647 [Note] WSREP: gcomm: bootstrapping new group 'PXC'
    2015-03-16 12:13:59 10647 [Note] WSREP: start_prim is enabled, turn off pc_recovery
    2015-03-16 12:13:59 10647 [Note] WSREP: Node 673c2adf state prim
    2015-03-16 12:13:59 10647 [Note] WSREP: view(view_id(PRIM,673c2adf,3) memb {
    673c2adf,0
    } joined {
    } left {
    } partitioned {
    })
    2015-03-16 12:13:59 10647 [Note] WSREP: save pc into disk
    2015-03-16 12:13:59 10647 [Note] WSREP: discarding pending addr without UUID: tcp://192.168.33.50:4567
    2015-03-16 12:13:59 10647 [Note] WSREP: discarding pending addr proto entry 0x20e1d60
    2015-03-16 12:13:59 10647 [Note] WSREP: discarding pending addr without UUID: tcp://192.168.33.51:4567
    2015-03-16 12:13:59 10647 [Note] WSREP: discarding pending addr proto entry 0x20ea4f0
    2015-03-16 12:13:59 10647 [Note] WSREP: discarding pending addr without UUID: tcp://192.168.33.52:4567
    2015-03-16 12:13:59 10647 [Note] WSREP: discarding pending addr proto entry 0x20f2ce0
    2015-03-16 12:13:59 10647 [Note] WSREP: clear restored view
    2015-03-16 12:13:59 10647 [ERROR] WSREP: failed to open gcomm backend connection: 11: : 11 (Resource temporarily unavailable)
    at gcs/src/gcs_gcomm.cpp:connect():253
    2015-03-16 12:13:59 10647 [ERROR] WSREP: gcs/src/gcs_core.cpp:long int gcs_core_open(gcs_core_t*, const char*, const char*, bool)():206: Failed to open backend connection: -11 (Resource temporarily unavailable)
    2015-03-16 12:13:59 10647 [ERROR] WSREP: gcs/src/gcs.cpp:long int gcs_open(gcs_conn_t*, const char*, const char*, bool)():1379: Failed to open channel 'PXC' at 'gcomm://192.168.33.50,192.168.33.51,192.168.33.52': -11 (Resource temporarily unavailable)
    2015-03-16 12:13:59 10647 [ERROR] WSREP: gcs connect failed: Resource temporarily unavailable
    2015-03-16 12:13:59 10647 [ERROR] WSREP: wsrep::connect() failed: 7
    2015-03-16 12:13:59 10647 [ERROR] Aborting

    2015-03-16 12:13:59 10647 [Note] WSREP: Service disconnected.
    2015-03-16 12:14:00 10647 [Note] WSREP: Some threads may fail to exit.
    2015-03-16 12:14:00 10647 [Note] Binlog end
    2015-03-16 12:14:00 10647 [Note] /usr/sbin/mysqld: Shutdown complete

    150316 12:14:00 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended


    It's very hard for me to debug what is happening... any ideas?

    Thanks!

  • #2
    Hi again

    finally was able to fix these issue. Was related to available ram, I have setup next parameter into my.cnf and then restarted

    performance_schema = off

    * Bootstrapping Percona XtraDB Cluster database server mysqld [ OK ]

    Thanks


    Comment


    • #3
      This is an interesting case. I've got a galera cluster with 3 nodes running on CentOS 6.6. I'm using Vagrant to run all the VMs as well. So, I didn't see the same behaviour here...let's take a look:

      #: enabling the performance_schema on the very first node and bootstrapping the cluster

      Code:
      [vagrant@sql01 ~]$ my_print_defaults mysqld
      --user=mysql
      --server_id=1
      --datadir=/var/lib/mysql
      --performance_schema=1   # enabling performance_schema
      --log_bin=node01-bin
      --log_slave_updates=true
      --enforce_gtid_consistency=true
      --binlog_format=ROW
      --default_storage_engine=innodb
      --innodb_locks_unsafe_for_binlog=1
      --innodb_autoinc_lock_mode=2
      --innodb_log_group_home_dir=/var/lib/mysql
      --innodb_log_files_in_group=2
      --innodb_log_file_size=48M
      --wsrep_provider=/usr/lib64/libgalera_smm.so
      --wsrep_cluster_name=sqlmag
      --wsrep_node_address=192.168.0.101
      --wsrep_node_name=node01
      --wsrep_cluster_address=gcomm://192.168.0.101:4567,192.168.0.102:4567,192.168.0.103:4567
      --wsrep_slave_threads=2
      --wsrep_sst_method=xtrabackup-v2
      --wsrep_sst_auth=sst:123
      --wsrep_provider_options=pc.ignore_quorum=true
      And bootstrapping the cluster...

      Code:
      [vagrant@sql01 ~]$ sudo service mysql bootstrap-pxc
      Bootstrapping PXC (Percona XtraDB Cluster)Starting MySQL (Percona XtraDB Cluster)....... SUCCESS!
      
      mysql> select @@performance_schema;
      +----------------------+
      | @@performance_schema |
      +----------------------+
      |                    1 |
      +----------------------+
      1 row in set (0.00 sec)
      BTW, going ahead to start the second node...

      Code:
      [vagrant@sql02 ~]$ sudo service mysql start
      Starting MySQL (Percona XtraDB Cluster)....... SUCCESS!
      [vagrant@sql02 ~]$ mysql -u root -e "select @@performance_schema"  # performance_schema enabled
      +----------------------+
      | @@performance_schema |
      +----------------------+
      |                    1 |
      +----------------------+
      And finally the third node...

      Code:
      [vagrant@sql03 ~]$ sudo service mysql start
      Starting MySQL (Percona XtraDB Cluster).............. SUCCESS!
      [vagrant@sql03 ~]$ mysql -u root -e "select @@performance_schema"  # performance_schema enabled
      +----------------------+
      | @@performance_schema |
      +----------------------+
      |                    1 |
      +----------------------+
      I don't have an environment where I'm running Galera on Ubuntu, but, not sure if it's different due to the packets.

      BTW, the critical on the logs you posted is this part below:

      Code:
      2015-03-16 12:13:59 10647 [ERROR] WSREP: failed to open gcomm backend connection: 11: : 11 (Resource temporarily unavailable)
      at gcs/src/gcs_gcomm.cpp:connect():253
      2015-03-16 12:13:59 10647 [ERROR] WSREP: gcs/src/gcs_core.cpp:long int gcs_core_open(gcs_core_t*, const char*, const char*, bool)():206: Failed to open backend connection: -11 (Resource temporarily unavailable)
      2015-03-16 12:13:59 10647 [ERROR] WSREP: gcs/src/gcs.cpp:long int gcs_open(gcs_conn_t*, const char*, const char*, bool)():1379: Failed to open channel 'PXC' at 'gcomm://192.168.33.50,192.168.33.51,192.168.33.52': -11 (Resource temporarily unavailable)
      2015-03-16 12:13:59 10647 [ERROR] WSREP: gcs connect failed: Resource temporarily unavailable
      2015-03-16 12:13:59 10647 [ERROR] WSREP: wsrep::connect() failed: 7
      2015-03-16 12:13:59 10647 [ERROR] Aborting
      It seems something strange with the gcomm values, I meant, the configured IPs and ports or even...have you bootstrapped the cluster as I stated here on node01?

      Cheers,
      WB

      Comment


      • #4
        Hi

        it depends also on how much memory you give to your vms. In my case, as I only have a Mac Book Pro with 8 GB of RAM, I allocated just 1 GB RAM per node and with 0 GB of Swap.

        Maybe in your environtment your VMS have more RAM than my VMS?

        Cheers

        R

        Comment


        • #5
          Hmmm, interesting point. Checking the configuration written to the my VM's Vagrantfile, I found this:

          Code:
          # -*- mode: ruby -*-
          # vi: set ft=ruby :
          
          # Vagrantfile API/syntax version. Don't touch unless you know what you're doing!
          VAGRANTFILE_API_VERSION = "2"
          
          Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
          #  config.vm.provision "shell", path: "setup_env.sh"
            config.vm.define "node01" do |node01|
              node01.vm.box = "centos65-x86_64"
              node01.vm.hostname="sql01"
              node01.vm.network "private_network", virtualbox__intnet: "galera", ip: "192.168.0.101"
              node01.vm.provider "virtualbox" do |v|
                v.memory = 256
                v.cpus   = 1
              end
            end
            config.vm.define "node02" do |node02|
              node02.vm.box = "centos65-x86_64"
              node02.vm.hostname="sql02"
              node02.vm.network "private_network", virtualbox__intnet: "galera", ip: "192.168.0.102" 
              node02.vm.provider "virtualbox" do |v|
                v.memory = 256
                v.cpus   = 1
              end
            end
            config.vm.define "node03" do |node03|
              node03.vm.box="centos65-x86_64"
              node03.vm.hostname="sql03"
              node03.vm.network "private_network", virtualbox__intnet: "galera", ip: "192.168.0.103" 
              node03.vm.provider "virtualbox" do |v|
                v.memory = 256
                v.cpus   = 1
              end
            end
            end
          Checking the MemTotal form /proc/meminfo:

          Code:
          [vagrant@sql03 ~]$ cat /proc/meminfo | grep MemTotal
          MemTotal:         244176 kB
          
          [vagrant@sql02 ~]$ cat /proc/meminfo | grep MemTotal
          MemTotal:         244176 kB
          
          [vagrant@sql03 ~]$ cat /proc/meminfo | grep MemTotal
          MemTotal:         244176 kB
          All these things are showing us that we're a good direction to confirm that servers on VMs are using just 256MB RAM of my host which is a MacBookPro 8GB RAM as well. So, next action is to start up the Galera Cluster...

          Code:
          [vagrant@sql01 ~]$ sudo service mysql bootstrap-pxc
          Bootstrapping PXC (Percona XtraDB Cluster)Starting MySQL (Percona XtraDB Cluster).................................................. SUCCESS!
          [vagrant@sql01 ~]$ mysql -u root -e "select @@performance_schema"
          +----------------------+
          | @@performance_schema |
          +----------------------+
          |                    1 |
          +----------------------+
          It's OK, now, I'm going to start up the second (SQL02) e third (SQL03) node:

          Code:
          [vagrant@sql02 ~]$ sudo service mysql start
          Starting MySQL (Percona XtraDB Cluster)................ SUCCESS!
          [vagrant@sql02 ~]$ mysql -u root -e "select @@performance_schema"
          +----------------------+
          | @@performance_schema |
          +----------------------+
          |                    1 |
          +----------------------+
          
          [vagrant@sql03 ~]$ sudo service mysql start
          Starting MySQL (Percona XtraDB Cluster)................. SUCCESS!
          [vagrant@sql03 ~]$ mysql -u root -e "select @@performance_schema"
          +----------------------+
          | @@performance_schema |
          +----------------------+
          |                    1 |
          +----------------------+
          And finally...

          Code:
          [vagrant@sql01 ~]$ mysql -u root -e "show global status like 'wsrep_cluster_size'"
          +--------------------+-------+
          | Variable_name      | Value |
          +--------------------+-------+
          | wsrep_cluster_size | 3     |
          +--------------------+-------+
          I'm still thinking what is the resource that was temporarily unavailable in terms of gcomm which is the group communication system used by Galera API as per the documentation here http://galeracluster.com/documentati...hitecture.html . I tried to simulate something so as to get the stated error - even configuring my gcomm with a inexistent IP address, it was able to bootstrap with clean start up logs, having the second node joined the cluster view.

          Code:
          2015-03-18 18:51:01 3024 [Note] WSREP: Quorum results:
              version    = 3,
              component  = PRIMARY,
              conf_id    = 1,
              members    = 2/2 (joined/total),
              act_id     = 10166,
              last_appl. = 0,
              protocols  = 0/6/3 (gcs/repl/appl),
              group UUID = f9cdb70c-8483-11e4-86af-4b70fd8e7a88
          2015-03-18 18:51:01 3024 [Note] WSREP: Flow-control interval: [23, 23]
          WB

          Comment

          Working...
          X