GET 24/7 LIVE HELP NOW

Announcement

Announcement Module
Collapse
No announcement yet.

Transient differences in pt-table-checksum

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Transient differences in pt-table-checksum

    Background:

    We have a five-server Percona Server group, using ROW-based binlog replication to keep four standby servers up to date with changes on our master. In the process of chasing down a bug, we noticed that our replicas had fallen out of sync with the master somewhere along the line, and used pt-table-checksum and pt-table-sync to bring them back into sync.

    There are a number of databases whose names follow the pattern "liveNN" for some number. In each database is a table named "user", with identical structure but differing data. These tables all have the schema

    Code:
    Create Table: CREATE TABLE `user` (
      `userid` int(11) NOT NULL AUTO_INCREMENT,
      `systemid` int(11) NOT NULL,
      `username` varchar(50) COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
      `password` varbinary(50) DEFAULT NULL,
      `level` smallint(6) NOT NULL DEFAULT '0',
      `fname` varchar(50) COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
      `lname` varchar(50) COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
      `last_login` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
      `num_logins` int(11) NOT NULL DEFAULT '0',
      `signup_date` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
      `email` varchar(100) COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
      `ownerid` int(11) NOT NULL DEFAULT '0',
      `organization` varchar(255) COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
      `p_street` varchar(50) COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
      `p_city` varchar(50) COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
      `p_province` varchar(50) COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
      `p_country` varchar(50) COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
      `p_code` varchar(13) COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
      `bus_phone` varchar(25) COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
      `mob_phone` varchar(25) COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
      `fax` varchar(25) COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
      `s_street` varchar(50) COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
      `s_city` varchar(50) COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
      `s_province` varchar(50) COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
      `s_country` varchar(50) COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
      `s_code` varchar(13) COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
      `active` int(1) NOT NULL DEFAULT '1',
      `setup` tinyint(4) unsigned DEFAULT '0',
      `notified` tinyint(3) unsigned DEFAULT '0',
      `p_street2` varchar(255) COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
      `s_street2` varchar(255) COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
      `home_phone` varchar(25) COLLATE utf8_unicode_ci DEFAULT NULL,
      `rate` decimal(25,8) DEFAULT NULL,
      `note` mediumtext COLLATE utf8_unicode_ci,
      `support_order` tinyint(4) NOT NULL DEFAULT '0',
      `sort_department` varchar(30) COLLATE utf8_unicode_ci DEFAULT '',
      `individual` tinyint(4) NOT NULL DEFAULT '0',
      `archive` tinyint(4) DEFAULT '0',
      `sort_gmail` varchar(30) COLLATE utf8_unicode_ci DEFAULT '',
      `pref_email` tinyint(4) DEFAULT '1',
      `pref_gmail` tinyint(4) DEFAULT '0',
      `timesheetDate` varchar(4) COLLATE utf8_unicode_ci DEFAULT '2',
      `ext_systemid` int(11) DEFAULT NULL,
      `sort_contractors` varchar(30) COLLATE utf8_unicode_ci DEFAULT NULL,
      `mapped_taskid` int(11) DEFAULT NULL,
      `updated` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
      `api_token` char(32) COLLATE utf8_unicode_ci DEFAULT NULL,
      `language` varchar(12) COLLATE utf8_unicode_ci NOT NULL DEFAULT 'en',
      `currency_code` char(3) COLLATE utf8_unicode_ci DEFAULT NULL,
      `vat_name` varchar(30) COLLATE utf8_unicode_ci DEFAULT NULL,
      `vat_number` varchar(30) COLLATE utf8_unicode_ci DEFAULT NULL,
      `allow_late_notifications` tinyint(1) NOT NULL DEFAULT '1',
      `allow_late_fees` tinyint(1) NOT NULL DEFAULT '1',
      `company_size` varchar(20) COLLATE utf8_unicode_ci DEFAULT NULL,
      `company_industry` varchar(100) COLLATE utf8_unicode_ci DEFAULT NULL,
      PRIMARY KEY (`systemid`,`userid`),
      UNIQUE KEY `username` (`systemid`,`username`),
      KEY `level_idx` (`systemid`,`level`)
    ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
    (Yes, yes, MyISAM, I know. There's a plan for that. Bear with me.)

    The weirdness:

    After running

    Code:
    pt-table-sync \
         --tables 'user' \
         --databases 'live,live15,live16,live17,live18,live19,live2,live21,live3,live5,live6,live7,live8,live9' \
         --user 'percona-toolkit' \
         --replicate 'percona.checksums' \
         --check-master \
         --check-slave \
         --wait 5 \
         --execute \
         --verbose \
         --sync-to-master \
         h=172.30.30.5
    to bring the "user" table back into sync on one replica, we ran

    Code:
    pt-table-checksum \
         --quiet \
        --nocheck-replication-filters \
        --no-check-binlog-format \
        --tables 'user' \
        --user percona-toolkit \
        --host db-master
    three times to validate that the affected tables were in sync. The results were, shall we say, surprising:

    Code:
                TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
    06-14T13:17:48      0      2   220253       8       0   5.868 live22.user
    Code:
                TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
    06-14T13:47:28      0      2   141113       6       0   9.036 live24.user
    Code:
                TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
    06-14T14:13:36      0      2   522900      17       0  12.887 live9.user
    The question:

    What the heck is going on here? Why are tables becoming out of sync, and then becoming in sync again without further action? Will using InnoDB help, leaving aside all of the other very good reasons to switch ASAP?

  • #2
    You might be facing this bug: https://bugs.launchpad.net/percona-toolkit/+bug/1080765

    Comment

    Working...
    X