GET 24/7 LIVE HELP NOW

Announcement

Announcement Module
Collapse
No announcement yet.

Write performance is horrible on 2nd node

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Write performance is horrible on 2nd node

    Hey Guys, Im new to percona and i looked before posting but couldn't find anything haveing similar issues but appologies if i missed one.

    I'm looking at migrating a standard MySQL replicated setup to a Percona XtraDB cluster for obvious reason. I recently swapped my slave over to XtraDB cluster and started a new server for testing. I've just got these communication with each other and have found that the write performance is pretty horrible on the new node.

    The nodes are on separate VM hosts and neither of them are suffering from any hardware performance issues that i can see (Disk IO was what I checked first). Prior ro using the XtraDB cluster software there wasn't an issue with replication between the master and slave but now the slave isn't able to "catch up" to the master (going on 570000 seconds now). I'm assuming the extra latency is because of the synchronous replication between the slave and the new node which is slowing down the replication process but it doesn't seem to be a latency thing as there isn't and wasn't an issue before. On the new node the receive queue fills to max and flow control engages and start sending pauses back to the slave (i assume this is why the slave never catches up to the master).

    Here are the configs I'm using on the 2 nodes. The latency between the 2 nodes averages about 0.5ms peaking at 1ms and low end of 0.1ms. Excuse where I have removed IP's and passwords.

    The slave node
    Code:
    [mysqld]
    user=mysql
    max_connections=5000
    max_heap_table_size=12884901888
    datadir=/var/lib/mysql
    socket=/var/lib/mysql/mysql.sock
    max_allowed_packet = 64M
    innodb_buffer_pool_size=6368709120
    innodb_write_io_threads=20
    server-id=2
    replicate-do-db=wikidb
    replicate-do-db=ElevationData
    replicate-do-db=Portal
    replicate-do-db=BizMon
    relay_log_space_limit=50G
    expire-logs-days=7
    slave-skip-errors = 1317
    log-slave-updates=TRUE
    binlog_format=ROW
    innodb_file_per_table=1
    innodb_autoinc_lock_mode=2
    innodb_locks_unsafe_for_binlog=1
    relay_log_index=mysqld-relay-bin.index
    relay_log=mysqld-relay-bin
    skip-slave-start=TRUE
    
    wsrep_provider=/usr/lib64/libgalera_smm.so
    
    wsrep_cluster_address=gcomm://xx.xx.xx.xx,xx.xx.xx.xx
    
    wsrep_node_address=xx.xx.xx.xx
    wsrep_cluster_name=PortalMaster
    
    wsrep_slave_threads=16
    wsrep_provider_options = "gcache.size=6G; gcs.fc_limit = 512; gcs.fc_factor = 1.0; gcs.fc_master_slave=YES; evs.keepalive_period = PT3S; evs.inactive_check_period = PT10S; evs.suspect_timeout = PT30S; evs.inactive_timeout = PT1M; evs.install_timeout = PT1M"
    wsrep_retry_autocommit=3
    wsrep_sst_method=xtrabackup
    wsrep_sst_auth="BackupAdmin:xxxxxxx"
    
    [mysqld_safe]
    log-error=/var/log/mysqld.log
    pid-file=/var/run/mysqld/mysqld.pid
    After 10 seconds this is the wreps variables
    Code:
    mysql> SHOW GLOBAL STATUS LIKE 'wsrep%';
    +----------------------------+--------------------------------------+
    | Variable_name              | Value                                |
    +----------------------------+--------------------------------------+
    | wsrep_local_state_uuid     | 368afb41-ddf1-11e2-bbfa-3e6dd3ff3bcf |
    | wsrep_protocol_version     | 4                                    |
    | wsrep_last_committed       | 3118248                              |
    | wsrep_replicated           | 22184                                |
    | wsrep_replicated_bytes     | 32007311                             |
    | wsrep_received             | 37                                   |
    | wsrep_received_bytes       | 655                                  |
    | wsrep_local_commits        | 22184                                |
    | wsrep_local_cert_failures  | 0                                    |
    | wsrep_local_bf_aborts      | 0                                    |
    | wsrep_local_replays        | 0                                    |
    | wsrep_local_send_queue     | 1                                    |
    | wsrep_local_send_queue_avg | 0.000000                             |
    | wsrep_local_recv_queue     | 0                                    |
    | wsrep_local_recv_queue_avg | 0.000000                             |
    | wsrep_flow_control_paused  | 0.931073                             |
    | wsrep_flow_control_sent    | 0                                    |
    | wsrep_flow_control_recv    | 33                                   |
    | wsrep_cert_deps_distance   | 1.087500                             |
    | wsrep_apply_oooe           | 0.000000                             |
    | wsrep_apply_oool           | 0.000000                             |
    | wsrep_apply_window         | 1.000000                             |
    | wsrep_commit_oooe          | 0.000000                             |
    | wsrep_commit_oool          | 0.000000                             |
    | wsrep_commit_window        | 1.000000                             |
    | wsrep_local_state          | 4                                    |
    | wsrep_local_state_comment  | Synced                               |
    | wsrep_cert_index_size      | 934                                  |
    | wsrep_causal_reads         | 0                                    |
    | wsrep_incoming_addresses   | xx.xx.xx.xx:3306,xx.xx.xx.xx:3306 |
    | wsrep_cluster_conf_id      | 2                                    |
    | wsrep_cluster_size         | 2                                    |
    | wsrep_cluster_state_uuid   | 368afb41-ddf1-11e2-bbfa-3e6dd3ff3bcf |
    | wsrep_cluster_status       | Primary                              |
    | wsrep_connected            | ON                                   |
    | wsrep_local_index          | 1                                    |
    | wsrep_provider_name        | Galera                               |
    | wsrep_provider_vendor      | Codership Oy <info@codership.com>    |
    | wsrep_provider_version     | 2.6(r152)                            |
    | wsrep_ready                | ON                                   |
    +----------------------------+--------------------------------------+
    The new node connecting to the slave
    Code:
    [mysqld]
    user=mysql
    max_connections=5000
    max_heap_table_size=12884901888
    datadir=/var/lib/mysql
    socket=/var/lib/mysql/mysql.sock
    max_allowed_packet = 64M
    innodb_buffer_pool_size=6368709120
    innodb_write_io_threads=20
    
    server-id=3
    relay_log_space_limit=50G
    expire-logs-days=7
    binlog_format=ROW
    innodb_file_per_table=1
    innodb_autoinc_lock_mode=2
    innodb_locks_unsafe_for_binlog=1
    
    wsrep_provider=/usr/lib64/libgalera_smm.so
    wsrep_cluster_address=gcomm://xx.xx.xx.xx,xx.xx.xx.xx
    wsrep_node_address=xx.xx.xx.xx
    wsrep_cluster_name=PortalMaster
    wsrep_slave_threads=16
    wsrep_provider_options = "gcache.size=6G; gcs.fc_limit = 512; gcs.fc_factor = 1.0; gcs.fc_master_slave=YES; evs.keepalive_period = PT3S; evs.inactive_check_period = PT10S; evs.suspect_timeout = PT30S; evs.inactive_timeout = PT1M; evs.install_timeout = PT1M"
    wsrep_retry_autocommit=3
    wsrep_sst_method=xtrabackup
    wsrep_sst_auth="BackupAdmin:xxxxxxx"
    After 10 seconds this is the wreps variables
    Code:
    mysql> show status like 'wsrep%';
    +----------------------------+--------------------------------------+
    | Variable_name              | Value                                |
    +----------------------------+--------------------------------------+
    | wsrep_local_state_uuid     | 368afb41-ddf1-11e2-bbfa-3e6dd3ff3bcf |
    | wsrep_protocol_version     | 4                                    |
    | wsrep_last_committed       | 3119023                              |
    | wsrep_replicated           | 0                                    |
    | wsrep_replicated_bytes     | 0                                    |
    | wsrep_received             | 23012                                |
    | wsrep_received_bytes       | 32795292                             |
    | wsrep_local_commits        | 0                                    |
    | wsrep_local_cert_failures  | 0                                    |
    | wsrep_local_bf_aborts      | 0                                    |
    | wsrep_local_replays        | 0                                    |
    | wsrep_local_send_queue     | 0                                    |
    | wsrep_local_send_queue_avg | 0.000000                             |
    | wsrep_local_recv_queue     | 257                                  |
    | wsrep_local_recv_queue_avg | 254.522222                           |
    | wsrep_flow_control_paused  | 0.935110                             |
    | wsrep_flow_control_sent    | 44                                   |
    | wsrep_flow_control_recv    | 44                                   |
    | wsrep_cert_deps_distance   | 1.098338                             |
    | wsrep_apply_oooe           | 0.088398                             |
    | wsrep_apply_oool           | 0.000000                             |
    | wsrep_apply_window         | 15.049724                            |
    | wsrep_commit_oooe          | 0.000000                             |
    | wsrep_commit_oool          | 0.000000                             |
    | wsrep_commit_window        | 1.099448                             |
    | wsrep_local_state          | 4                                    |
    | wsrep_local_state_comment  | Synced                               |
    | wsrep_cert_index_size      | 1189                                 |
    | wsrep_causal_reads         | 0                                    |
    | wsrep_incoming_addresses   | xx.xx.xx.xx:3306,xx.xx.xx.xx:3306 |
    | wsrep_cluster_conf_id      | 2                                    |
    | wsrep_cluster_size         | 2                                    |
    | wsrep_cluster_state_uuid   | 368afb41-ddf1-11e2-bbfa-3e6dd3ff3bcf |
    | wsrep_cluster_status       | Primary                              |
    | wsrep_connected            | ON                                   |
    | wsrep_local_index          | 0                                    |
    | wsrep_provider_name        | Galera                               |
    | wsrep_provider_vendor      | Codership Oy <info@codership.com>    |
    | wsrep_provider_version     | 2.6(r152)                            |
    | wsrep_ready                | ON                                   |
    +----------------------------+--------------------------------------+
    Steve

  • #2
    Hi Steve,
    If you have a node sending flow control, that's a pretty good indicator of some problem directly with the node or tuning. My first glance would suggest that you haven't set innodb_log_file_size on these nodes, and that will severely limit Innodb's throughput. There may be some other basic Innodb tuning to apply.

    It does also happen that PXC slaves simply can't keep up with standalone masters because of sheer volume -- each transaction coming through async replication (typically every statement with autocommit) must get synchronously replicated on the slave cluster and sometimes it's just not possible to keep up with a single async replication thread.

    Comment


    • #3
      Hey Jayj

      Turns out we were sold a server solution with a raid card with the performance of a... slow nana in a wheel chair?.. nah that's being nice hahaha.

      After we got what we were supposed to i tried it again and it seems to be fine now will be testing a 3 client structure later in the week

      Thanks for your help

      Steve

      Comment

      Working...
      X