November 27, 2014

Percona Toolkit Webinar followup Q&A

First, a thank you to everyone who attended the webinar Today, I appreciate your time and nice comments. As promised, here are answers to questions that couldn’t be answered during the talk:

 

Q: How do you install the tools?

The manual has full details, but it’s important to know that the latest release for every tool is also available for direct download in a url like this: http://www.percona.com/get/<tool-name>. I use this a lot when I just need a specific tool and I’m working on a server without the toolkit installed.

Q: How to make the tools connect to MySQL

The slides are now available here, and slide 7 provides a basic introduction to Data Source Names, with more details available in the manual.

 

Q: pt-table-checksum can’t connect to my slave server

As part of its safeguards, pt-table-checksum monitors lag on replicas, and pauses as needed to let them catch up (it also pauses if replication stops on a replica). In order to do this, it must connect to all replicas off the master server, and it tries to use the same credentials used in the DSN specified for the master. This is why I recommended having consistent credentials across hosts, but if your setup is different, you can control the way the tool discovers and connects to slaves with the –recursion-method option.

 

Q: In the case of slave behind master, can you explain how to run pt-table-checksum and pt-table-sync from slave?

This was replied during the webinar, but I think it is important enough to repeat it here: Both these tools should be run from the master. Writing to slaves is the #1 cause I have seen for replication problems, so it only makes sense that the tools are designed to write to the master.

Q: How about pt-table-checksum with databases greater than 1TB?

It will work just fine, though it will obviously take more time! The tool processes the database one table at a time, and processes each table one chunk at a time, checksumming each chunk with a single query. It monitors master load to suspend itself if there is impact, and has other safeguards (as mentioned in a previous reply), so it is safe to use even for very large data sets.

Q: Can I see a live example of pt-table-sync?

Here‘s a Vagrant box my colleague Marcos Albe and myself prepared for our Percona Toolkit Tutorial at the last Percona Live New York, and it includes an example of pt-table-checksum and pt-table-sync, among others.

 

Q: Which options were used to get this output?

Unfortunately, I don’t know about which specific output this question is, so I’ll review all the slides which include output from the tools just in case:

  • Slide 37 is the default report output by pt-query-digest. You can see the command line used to generate it at the top, it’s just pt-query-digest master/data/telecaster-slow.log. As mentioned during the webinar, the tool auto detects the log type so typically you will invoke it this way.
  • Slide 38 is just the detail for the first query found. This is part of the same output from which the previous screenshot was taken
  • Slide 41 is obtained by using –report-all and –review with pt-query-digest. For example, with the sandbox server I used for the examples, it would be pt-query-digest –review S=/tmp/mysql_sandbox5527.sock,u=msandbox,p=msandbox,D=percona,t=reviews –report-all master/data/telecaster-slow.log
  • Slide 45 is the default output for pt-upgrade, obtained with the invocation on slide 44
  • Slide 48 is the default output for pt-online-schema-change, ran as detailed on slide 47
  • Slide 57 is the detail screen for one capture as shown by pt-sift. This is printed after one capture is selected from the timestamp list (the timestamp list is the view you get when first running pt-sift).

Q: For pt-online-schema-change, does renaming the table take a long time, and what about the transaction in the meanwhile?

The rename table itself should be fast, but as it needs to lock the table, long running transactions that lock (part of) the table can make the tool abort with a ‘Lock wait timeout exceeded’ error. The table and triggers created during the process are dropped, and the original table is left unmodified.

The same attendee asked about any performance differences when using the tool with or without innodb_file_per_table, and the first thing that comes to my mind is that when using a single shared tablespace, using this on very large tables can make your ibdata file grow a lot. As is known, this space won’t be recovered when the auxiliary table is dropped unless you’re using innodb_file_per_table.

 

Q: Does pt-online-schema-change work for fast index drops and creates?

Yes, though you may want to test, as even for large data set sizes, it may be faster to run the alter table command directly (specially so when dropping an index, based on my experience).

 

Q: I see lot of info on cacti are they going to be all translated to nagios plugins as all nagios plugin doesn’t output performance data?

The Percona Monitoring Plugins include cacti templates and nagios plugins, and each should be used for its appropriate use case. The cacti templates provide very good performance metrics.

 

Q: Does pt-sift read pt-stalk’s output?

Yes, pt-sift does not collect any diagnosis data. It is only a way to get a bird’s eye view of a capture (or a set of captures) obtained with pt-stalk. If the latest release is invoked without arguments, it looks for captures under /var/lib/pt-stalk/

 

Q: Is there a way with pt-table-sync to determine when the differences are large enough to favor manual synchronization?

 

This is a very good question, and in the end, it is the DBA’s judgement call. As I explained in the webinar, once you have data differences between two (or more) servers, if using statement based replication, it is possible that new differences are introduced while the sync is happening. In other words: bad data usually leads to more bad data! My preference is to use pt-table-sync if the damage is contained (i.e. one or a few tables, in rows that probably won’t be updated short term), or if the data set size, storage engine, network speed, or other factors make cloning a server too difficult, and just clone the host otherwise. If you’re using Innodb only, streaming a backup with xtrabackup could be a good and safe alternative.

 

 

Thanks again to all those who attended, if you have further questions, just reply here or drop me an email.

About Fernando Ipar

Fernando is part of Percona's team working as Senior Consultant. Prior to joining Percona, Fernando worked as a consultant for financial services institutions, telcos, and technology providers.

Comments

  1. Hi,

    Pt-table sync in my experience was not good enough when I was trying to sync two servers with all MYISAM tables. There was a huge difference between them and pt-table-sync was causing table locks on master as expected. So when you have Myisam tables and you have data difference this tool is not recommended atleast without arranging a maintainence window.

  2. Andy says:

    I am wondering if pt-table-checksum can be set to statement based replication in the run time. I am using row based. binlog_format is dynamic variable.

Speak Your Mind

*