Emergency

pt-table-checksum not able to get differences for columns

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • pt-table-checksum not able to get differences for columns

    Hello Folks,

    I have an old case that I'm trying to get solved. I'm seeing that, pt-table-checksum is not able to verify that the same table, on master and on slave, they have different values for some of te rows although both tables have the same number of rows. It's not that hard to get and I would like to know if there is a way I can better check the data for complete consistency beyond of the number of rows (we cannot really say a table is 100% consistent on master and slaves if count() is all the same, IMHO).

    Let's consider a simple example:

    Code:
    box01 [foo]> select i,j from foo01;
    +---+------+
    | i | j    |
    +---+------+
    | 1 | a    |
    | 2 | b    |
    | 3 | c    |
    | 4 | d    |
    | 5 | e    |
    +---+------+
    5 rows in set (0.00 sec)
    
    box02 [foo]> select * from foo.foo01;
    +---+------+
    | i | j    |
    +---+------+
    | 1 | a    |
    | 2 | b    |
    | 3 | c    |
    | 4 | f    |
    | 5 | g    |
    +---+------+
    5 rows in set (0.00 sec)
    
    [root@box01 ~]# pt-table-checksum --recursion-method=dsn="h=192.168.50.11,D=bianchi,t=dsns" --replicate bianchi.checksums --no-check-binlog-format --no-version-check --no-check-replication-filters --databases foo --tables foo01 -c j
                TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
    08-04T20:53:34      0      0        5       1       0   0.016 foo.foo01
    Above, I attempted to force it to checksum only this comma-separated list of columns as my intention is to really just check this table, no other one.

    By the way, after running the below simple checksum, I verified the checksums table and ...

    Code:
    box02 [foo]> select db,tbl,this_cnt,master_cnt,this_crc,master_crc from bianchi.checksums\G
    *************************** 1. row ***************************
            db: foo
           tbl: foo01
      this_cnt: 5
    master_cnt: 5
      this_crc: 2e243fc8
    master_crc: 2e243fc8
    1 row in set (0.00 sec)
    Anything to share here on this case folks? Thanks a lot!!

    PS.: of course in the case in the case I detect that this is happening, a good solution will be to check what dataset is the most accurate one, working with the customer and agree that we need to merge data or even, using pt-table-sync --sync-to-master, but, if table-checksum was able to do it at some point comparing a e.g. CRC32 per row basis, it would be good stuff as well.
    Last edited by wagnerbianchi; 08-05-2017, 02:32 PM. Reason: Adding more content.
    WB

  • #2
    Hey Wagner, I hope you're doing well!

    Have you tried other hashing functions such as MD5 or MURMUR_HASH perhaps?
    https://www.percona.com/doc/percona-...cksum-function

    Comment


    • #3
      Hey Jericho, all good around here man, how about you? I hope everything is OK with you as well.

      Unless I'm missing something on the below test, all the functions appearing on the docs are not able to catch that kind pf diffs (unless I'm missing something here):

      Code:
      [root@box01 plugin]# for i in {"FNV1A_64()" "MURMUR_HASH()" "SHA1()" "MD5()" "CRC32()"}; do echo "Testing with --function: $i"; pt-table-checksum --recursion-method=dsn="h=192.168.50.11,t=bianchi.dsns" --replicate bianchi.checksums --no-check-binlog-format --no-version-check --no-check-replication-filters --databases foo --tables foo01 --function="$i"; done
      
      Testing with --function: FNV1A_64()
                  TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
      08-06T03:28:13      0      0        5       1       0   0.015 foo.foo01
      
      Testing with --function: MURMUR_HASH()
                  TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
      08-06T03:28:13      0      0        5       1       0   0.016 foo.foo01
      
      Testing with --function: SHA1()
                  TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
      08-06T03:28:13      0      0        5       1       0   0.015 foo.foo01
      
      Testing with --function: MD5()
                  TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
      08-06T03:28:13      0      0        5       1       0   0.014 foo.foo01
      
      Testing with --function: CRC32()
                  TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
      08-06T03:28:13      0      0        5       1       0   0.015 foo.foo01
      Are you seeing any other stuff I'm not considering here? Thanks man, keep in touch!
      WB

      Comment


      • #4
        Wagner, this looks like a bug in 3.0.4 since I get the diff when I use pt-tc 3.0.3:

        Code:
        [root@rpl56a ~]# pt-table-checksum --recursion-method=dsn=D=percona,t=dsns,u=pt,p=pt,h=10.0.3.179 --no-check-binlog-format --no-version-check --no-check-replication-filters --databases test --tables foo --columns s
                    TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
        08-05T23:57:11      0      0        6       1       0   0.035 test.foo
        [root@rpl56a ~]# pt-table-checksum --version
        pt-table-checksum 3.0.4
        [root@rpl56a ~]# 
        [root@rpl56a ~]# ./percona-toolkit-3.0.3/bin/pt-table-checksum --recursion-method=dsn=D=percona,t=dsns,u=pt,p=pt,h=10.0.3.179 --no-check-binlog-format --no-version-check --no-check-replication-filters --databases test --tables foo --columns s
                    TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
        08-05T23:57:30      0      1        6       1       0   0.022 test.foo

        Comment


        • #5
          Cool, thanks for testing Jericho! I will add the old version to the environment I've been using for tests and I will let you know fir sure soon. Thanks again, we talk soon!
          WB

          Comment


          • #6
            Hi,

            I've created an issue in Jira: https://jira.percona.com/browse/PT-187
            Please check my comment there.

            Thanks.

            Comment


            • #7
              Hola Carlos, que tal? Thanks for opening a JIRA with this matter, I will keep track on JIRA as well. Definitely something we need to be looking into. Un abrazo!
              WB

              Comment


              • #8
                Reading material:
                https://www.percona.com/blog/2014/10...rc32-function/
                https://www.percona.com/doc/percona-...a_toolkit.html

                Comment


                • #9
                  I've got an interesting case with pt-table-checksum, look this:

                  Code:
                  #: master
                  x0002 [x0002]> SELECT COUNT(*) FROM cc;
                  +----------+
                  | COUNT(*) |
                  +----------+
                  |    33042 |
                  +----------+
                  1 row in set (0.01 sec)
                  
                  #: slave
                  x0002-005 [x0002]> SELECT COUNT(*) FROM cc;
                  +----------+
                  | COUNT(*) |
                  +----------+
                  |    21274 |
                  +----------+
                  1 row in set (0.01 sec)
                  
                  #: running pt-table-checksum on master
                  pt-table-checksum --recursion-method=dsn="h=localhost,D=x0002,t=dsns" --replicate x0002.checksums --nocheck-replication-filters --no-check-binlog-format --no-version-check --user=admin --password=xxxxxx --databases x0002,foo
                  
                  #: checksums table on slave
                  x0002-005 [x0002]> select * from checksums WHERE tbl='cc';
                  +--------------+---------------+-------+------------+-------------+----------------+----------------+----------+----------+------------+------------+---------------------+
                  | db           | tbl           | chunk | chunk_time | chunk_index | lower_boundary | upper_boundary | this_crc | this_cnt | master_crc | master_cnt | ts                  |
                  +--------------+---------------+-------+------------+-------------+----------------+----------------+----------+----------+------------+------------+---------------------+
                  | x0002 | cc |     1 |   0.101848 | NULL        | NULL           | NULL           | f7b3664b |    33039 | f7b3664b   |      33039 | 2017-09-28 22:20:23 |
                  +--------------+---------------+-------+------------+-------------+----------------+----------------+----------+----------+------------+------------+---------------------+
                  1 row in set (0.00 sec)
                  As you can see above, the checksum table gives it having the same number of rows on both sides, on master and slave, but, actually it is not consistent, as master has more rows than slave for that table. Additionally, the remote access with the admin user is working properly between servers, so, not sure what's going on here...any comments? This the Percona Toolkit 3.0.4-1.el6.

                  I removed the version 3.0.4-1.el6 and setup 2.2.18-1 and now it's correct, diffs were found correctly with the same scenario aforementioned. Something buggy is happening with pt-table-checksum on package 3.0.4-1.el6.

                  Thanks folks!
                  Last edited by wagnerbianchi; 09-28-2017, 08:42 PM. Reason: Adding more text about downgrading the package's version.
                  WB

                  Comment


                  • #10
                    Hi,

                    There was an error with no-check-binlog-format.
                    It was fixed in the latest package in the repo: https://raw.githubusercontent.com/pe...table-checksum

                    Regards

                    Comment


                    • #11
                      Gracias Carlos, I'm gonna test that out and let you know what is the new outcome after this fix. By the way, could you point me to a JIRA where I can open a feature request to the pt-table-checksum? Thanks a lot, look forward to hear from you. Hasta pronto!
                      WB

                      Comment


                      • #12
                        Just create a ticket in launchpad and I will create the corresponding ticket in Jira.

                        Comment


                        • #13
                          Hi,
                          pt-table-checksum is failed against AWS RDS in the latest version 3.0.5 with following errors:

                          12-07T15:15:46 Failed to /*!50108 SET @@binlog_format := 'STATEMENT'*/: DBD::mysql::db do failed: Access denied; you need (at least one of) the SUPER privilege(s) for this operation [for Statement "/*!50108 SET @@binlog_format := 'STATEMENT'*/"] at /tmp/percona-toolkit-3.0.5/bin/pt-table-checksum line 9403.


                          This tool requires binlog_format=STATEMENT, but the current binlog_format is set to MIXED and an error occurred while attempting to change it. If running MySQL 5.1.29 or newer, setting binlog_format requires the SUPER privilege. You will need to manually set binlog_format to 'STATEMENT' before running this tool.

                          Comment


                          • #14
                            hi,
                            does it work with AWS RDS for MySQL

                            Comment


                            • #15
                              Originally posted by carlos.salguero View Post
                              Hi,

                              There was an error with no-check-binlog-format.
                              It was fixed in the latest package in the repo: https://raw.githubusercontent.com/pe...table-checksum

                              Regards
                              i installed the latest version 3.0.5 and ran against AWS RDS for MySQL 5.7, failed with following message:

                              12-07T16:46:47 Failed to /*!50108 SET @@binlog_format := 'STATEMENT'*/: DBD::mysql::db do failed: Access denied; you need (at least one of) the SUPER privilege(s) for this operation [for Statement "/*!50108 SET @@binlog_format := 'STATEMENT'*/"] at /tmp/percona-toolkit-3.0.5//bin/pt-table-checksum line 9403.




                              This tool requires binlog_format=STATEMENT, but the current binlog_format is set to MIXED and an error occurred while attempting to change it. If running MySQL 5.1.29 or newer, setting binlog_format requires the SUPER privilege. You will need to manually set binlog_format to 'STATEMENT' before running this tool.

                              Comment

                              Working...
                              X