Timezone and pt-table-checksum

Timezone and pt-table-checksum


I recently worked through an issue with a client trying to detect data drift across some servers that were located in different timezones.  Unfortunately, several of the tables had timestamp fields and were set to a default value of CURRENT_TIMESTAMP.  From the manual, here is how MySQL handles timezone locality with timestamp fields:

Values for TIMESTAMP columns are converted from the current time zone to UTC for storage, and from UTC to the current time zone for retrieval.

This can result in every row being deemed out of sync when each server is using in it’s own timezone.  Fortunately, there is a workaround for this as a result of how pt-table-checksum/pt-table-sync calculate their checksums.  Lets look at an example:

Node1 is configured in PDT time while node2 is configured in EDT time.  Now, lets insert some rows into a table with timestamp field configured to use CURRENT_TIMESTAMP as the default and verify that they look different when no timezone manipulation takes place:

So, when we run pt-table-checksum against node1, we see that even though MySQL is storing these values in UTC internally, we have “data drift” on both nodes:

pt-table-checksum –replicate=percona.checksums –databases=foo h=localhost

Naturally, pt-table-sync finds this and reports that all 3 rows in this chunk are different and gives the REPLACE INTO statements bring node2 in sync:

This is where we can use the @@session.time_zone variable to our advantage.  If both servers were set up using UTC as the timezone (which is the recommended practice – store everything in UTC and then perform the timezone logic in the application), we wouldn’t see any data drift.  And as explained above, the timestamps are actually stored in UTC on the server regardless of the timezone.  So, to overcome the timezone inconsistencies, we can use the –set-vars option with pt-table-checksum like this:

pt-table-checksum –replicate percona.checksums –databases=foo –set-vars=”time_zone=’+00:00′” h=localhost

This effectively sets both servers to UTC (in terms of display logic) and the checksums are calculated in the same locality:

This same option can and should be applied to pt-table-sync as well, especially if the checksums were created without it.  Since it uses the checksum table to find the chunks but still recalculates the checksums on the fly to get the actual rows in need of syncing, it will not find those rows even though pt-table-checksum reported the chunk as out of sync.  You can verify this yourself looking at the actual SQL generated by pt-table-sync when looking at rows:

SELECT /*foo.bar:1/1*/ 0 AS chunk_num, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS(‘#’, id, data, date_created + 0, CONCAT(ISNULL(data)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM foo.bar FORCE INDEX (PRIMARY) WHERE (1=1) FOR UPDATE;

When you update your session timezone (SET @@session.time_zone = “+01:00” for example), you will see different output for this.  Experiment with different timezones and each time, the checksum will change.  However, the underlying data isn’t touched and remains the same.

In general, it is safest to always use UTC as your timezone for MySQL to prevent this extra logic from being required, but the use of –set-vars can be a very powerful tool when you need to influence the session variables on different servers for whatever reason.


Share this post

Comment (1)

  • Derek Downey Reply

    Hey Mike, thanks for sharing this experience. I’ve never had to manage cross-timezone servers, but this is an apt reminder to ‘plan for the future’ and so I’ll be making sure my server config files include the default_time_zone option…just in case I forgot to set the server locale to be UTC 😉

    September 25, 2012 at 1:38 pm

Leave a Reply