Announcement

Announcement Module
Collapse
No announcement yet.

pt-table-checksum - miscellaneous questions

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • pt-table-checksum - miscellaneous questions

    Taking a look at pt-table-checksum, a couple of questions came up. One of which I feel may be an omission from the docs:

    1) What are the minimum user privileges required for pt-table-checksum to function? Of course you need a user/password to connect as, but what privileges does that user require? I have only seen reference to "all privileges" in a couple of forum posts, and was wondering if there isn't a smaller/safer set?

    2) I am seeing weird output in my dev system (testing before running this in production):
    01-05T16:05:45 Skipping chunk 1 of zarafa.tproperties because MySQL used only 6 bytes of the PRIMARY index instead of 12. See the --[no]check-plan documentation for more information.
    01-05T16:05:45 0 0 0 16 16 0.101 zarafa.tproperties
    Does this mean I need to do some kind of (index) repair on zarafa.tproperties?

    3) How do you actually "fix" diffs? For example:
    01-05T16:09:37 0 1 7 1 0 0.450 zarafa.syncs
    There is one diff here. How can I see what the diff is or simply just fix the one diff? I am assuming the answer is "pt-table-sync"?

    When I run pt-table-sync with some options:
    pt-table-sync --verbose --user pt --password ptpassword --databases zarafa --sync-to-master shrugged.dev --print
    # Syncing h=shrugged.dev,p=...,u=pt
    # DELETE REPLACE INSERT UPDATE ALGORITHM START END EXIT DATABASE.TABLE
    # 0 0 0 0 Chunk 16:27:58 16:27:58 0 zarafa.abchanges
    # 0 0 0 0 Nibble 16:27:58 16:27:58 0 zarafa.acl
    # 0 0 0 0 Chunk 16:27:58 16:27:59 0 zarafa.changes
    # 0 0 0 0 Chunk 16:27:59 16:27:59 0 zarafa.clientupdatestatus
    # 0 0 0 0 Chunk 16:27:59 16:27:59 0 zarafa.deferredupdate
    # 0 0 0 0 Chunk 16:27:59 16:27:59 0 zarafa.hierarchy
    # 0 0 0 0 Nibble 16:27:59 16:27:59 0 zarafa.indexedproperties
    # 0 0 0 0 Nibble 16:27:59 16:27:59 0 zarafa.lob
    # 0 0 0 0 Nibble 16:27:59 16:27:59 0 zarafa.mvproperties
    # 0 0 0 0 Chunk 16:27:59 16:27:59 0 zarafa.names
    # 0 0 0 0 Chunk 16:27:59 16:27:59 0 zarafa.object
    # 0 0 0 0 Nibble 16:27:59 16:27:59 0 zarafa.objectmvproperty
    # 0 0 0 0 Nibble 16:27:59 16:27:59 0 zarafa.objectproperty
    # 0 0 0 0 Nibble 16:27:59 16:27:59 0 zarafa.objectrelation
    # 0 0 0 0 Nibble 16:27:59 16:27:59 0 zarafa.outgoingqueue
    # 0 0 0 0 Nibble 16:27:59 16:27:59 0 zarafa.properties
    # 0 0 0 0 Chunk 16:27:59 16:27:59 0 zarafa.receivefolder
    # 0 0 0 0 Nibble 16:27:59 16:27:59 0 zarafa.searchresults
    # 0 0 0 0 Chunk 16:27:59 16:27:59 0 zarafa.settings
    # 0 0 0 0 Nibble 16:27:59 16:27:59 0 zarafa.singleinstances
    # 0 0 0 0 Chunk 16:27:59 16:27:59 0 zarafa.stores
    # 0 0 0 0 Nibble 16:27:59 16:27:59 0 zarafa.syncedmessages
    REPLACE INTO `zarafa`.`syncs`(`id`, `sourcekey`, `change_id`, `sync_type`, `sync_time`) VALUES ('2740', '', '12791', '1', '2014-01-05 16:27:58') /*percona-toolkit src_db:zarafa src_tbl:syncs src_dsn:P=3306,h=atlas.localeject,p=...,u=pt dst_db:zarafa dst_tbl:syncs dst_dsn:h=shrugged.dev,p=...,u=pt lock:1 transaction:1 changing_src:1 replicate:0 bidirectional:0 pid:26155 user:root host:atlas.dev.jsstest.net*/;
    # 0 1 0 0 Chunk 16:27:59 16:27:59 2 zarafa.syncs
    # 0 0 0 0 Nibble 16:27:59 16:28:00 0 zarafa.tproperties
    # 0 0 0 0 Chunk 16:28:00 16:28:00 0 zarafa.users
    # 0 0 0 0 Nibble 16:28:00 16:28:00 0 zarafa.versions
    And then actually execute, if I run checksum again:
    01-05T16:27:45 0 1 7 1 0 0.347 zarafa.syncs
    If I run this replace manually on the slave server:
    mysql> REPLACE INTO `zarafa`.`syncs`(`id`, `sourcekey`, `change_id`, `sync_type`, `sync_time`) VALUES ('2740', '', '12791', '1', '2014-01-05 16:26:54');
    Query OK, 2 rows affected (0.10 sec)
    The timestamp is basically "now". I'm wondering if I'm catching an item that is changing while the checksum is running, that hasn't been picked up during replication yet by the slave?

    Thanks all!

  • #2
    4) I also see this error:
    Error setting innodb_lock_wait_timeout: DBD::mysql::db do failed: Unknown system variable 'innodb_lock_wait_timeout'. The current value for innodb_lock_wait_timeout is 50. If the variable is read only (not dynamic), specify --set-vars innodb_lock_wait_timeout=50 to avoid this warning, else manually set the variable and restart MySQL.
    I do have perl-DBD-MySQL installed:
    perl-DBD-MySQL.x86_64 3.0007-2.el5

    This is MySQL 5.0 on EL5:
    mysql.x86_64 5.0.95-5.el5_9

    Comment


    • #3
      Any thoughts here?

      Comment


      • #4
        1) https://blueprints.launchpad.net/per...checksum-privs
        2) What version of pt-table-checksum you are running ? I noticed couple of bugs reported. Can you please try with latest version.
        3) Diffs should be fixed with pt-table-sync tool. You can inspect the queries which will execute to fix the differences on slave via pt-table-sync --print option. Running diff queries manually on slave is not recommended. pt-table-sync executes REPLACE queries on master so this not change any data on master and changes will be propagated to slave via replication which is safe way to do it. Not sure, what's your question regarding to NOW() function this may help you on this http://www.mysqlperformanceblog.com/...able-checksum/
        4) innodb_lock_wait_timeout is not dynamic. pt-table-checksum sets its session-level innodb_lock_wait_timeout to 1 second. You can override the default value with --set-vars option. Check here for details http://www.percona.com/doc/percona-t...ksum--set-vars

        Comment


        • #5
          1) Great info! This should probably be somewhere in the pt-table-checksum docs?

          2) pt-table-checksum 2.2.6, which is the latest from the website.

          3) Yeah, I read the docs a little later and noticed not to run them on the slave. I guess the question is -- I'm running the REPLACE, but if I then run pt-table-checksum after running the REPLACE, it still says there's a diff. If I run the table-sync --print again, it shows the same exact REPLACE statement as before. It's as if nothing changed on the slave. But the slave is properly running (seconds behind = 0, IO running, etc).

          What I mean by "now" is not NOW() the MySQL command, but rather if you look at the output of the replace statement, it has a timestamp on it. For example:
          '2014-01-05 16:26:54'

          Here is some output:
          date && pt-table-sync --execute --verbose --user pt --password ptpassword --databases zarafa --sync-to-master shrugged.dev --print | grep -i replace
          Sat Jan 11 01:11:07 GMT 2014
          # DELETE REPLACE INSERT UPDATE ALGORITHM START END EXIT DATABASE.TABLE
          REPLACE INTO `zarafa`.`syncs`(`id`, `sourcekey`, `change_id`, `sync_type`, `sync_time`) VALUES ('2740', '', '12791', '1', '2014-01-11 01:11:05')
          You can see that the timestamp of just before we ran pt-table-sync and the timestamp of the thing that is not sync'd is almost identical (within seconds). So this makes me wonder if the pt-table-sync is "seeing" out of sync before the slave table actually gets updated by the original insert?

          Every time I run the above command (date && pt-table-sync...) there is only one result, and it's within seconds of the sync being run, and always a new sync_time value.

          4)
          /etc/puppet]$ pt-table-checksum --user pt --password ptpassword --nocheck-plan --set-vars innodb_lock_wait_timeout=1
          Error setting innodb_lock_wait_timeout: DBD::mysql::db do failed: Unknown system variable 'innodb_lock_wait_timeout'. The current value for innodb_lock_wait_timeout is 50. If the variable is read only (not dynamic), specify --set-vars innodb_lock_wait_timeout=50 to avoid this warning, else manually set the variable and restart MySQL.
          It doesn't appear that with mysql Ver 14.12 Distrib 5.0.95, for redhat-linux-gnu (x86_64) using readline 5.1 this works. Is a timeout of 50 a bad thing? Is 1 a sane level for normal operations? Should this work with --set-vars against 5.0.95?

          5) It seems one of my original questions is still unanswered:
          I am seeing weird output in my dev system (testing before running this in production):
          01-05T16:05:45 Skipping chunk 1 of zarafa.tproperties because MySQL used only 6 bytes of the PRIMARY index instead of 12. See the --[no]check-plan documentation for more information.
          01-05T16:05:45 0 0 0 16 16 0.101 zarafa.tproperties


          Does this mean I need to do some kind of (index) repair on zarafa.tproperties? If I run pt-table-checksum with --nocheck-plan it will check this table, but the description for --nocheck-plan makes it sound "dangerous":
          http://www.percona.com/doc/percona-t...ble-checksum--[no]check-plan

          Thoughts here?

          Thanks!

          Comment


          • #6
            Is this bug perhaps related?

            https://bugs.launchpad.net/percona-toolkit/+bug/1130498

            Comment


            • #7
              @thoraxe

              For #2 - what is the structure of you zarafa.tproperties? I would imagine the PRIMARY key to be composite of at least 2 columns. The warning means that an EXPLAIN output executed by pt-table-checksum (to make sure chunk-size*chunk-size-limit is not exceeded) says that not all columns are used on the index. This is usually not a problem - but if it does, you can tune --chunk-size and --chunk-size-limit in case the chunks becomes oversized and of course the --no-check-plan option (but I do not recommend that).
              Our documentation has a lot of answers about common questions on Percona software, have you checked there before posting that question here? http://www.percona.com/forums/core/i...lies/smile.png

              Join us at the annual Percona Live MySQL Users Conference - http://www.percona.com/live/mysql-conference-2014/

              Comment


              • #8
                If I don't use --no-check-plan it doesn't appear to check those tables at all.

                Here is the describe for tproperties:

                Code:
                mysql> describe tproperties;
                +-------------+----------------------+------+-----+---------+-------+
                | Field       | Type                 | Null | Key | Default | Extra |
                +-------------+----------------------+------+-----+---------+-------+
                | folderid    | int(11) unsigned     | NO   | PRI | 0       |       |
                | hierarchyid | int(11) unsigned     | NO   | PRI | 0       |       |
                | tag         | smallint(6) unsigned | NO   | PRI | 0       |       |
                | type        | smallint(6) unsigned | NO   | PRI | NULL    |       |
                | val_ulong   | int(11) unsigned     | YES  |     | NULL    |       |
                | val_string  | longtext             | YES  |     | NULL    |       |
                | val_binary  | longblob             | YES  |     | NULL    |       |
                | val_double  | double               | YES  |     | NULL    |       |
                | val_longint | bigint(20)           | YES  |     | NULL    |       |
                | val_hi      | int(11)              | YES  |     | NULL    |       |
                | val_lo      | int(11) unsigned     | YES  |     | NULL    |       |
                +-------------+----------------------+------+-----+---------+-------+
                11 rows in set (0.00 sec)

                Comment

                Working...
                X