pt-kill: How it Works

pt-kill MySQLI receive many questions about errors being reported about pt-kill… and often those errors are not real. The main problem I have found is that pt-kill is hard to understand and the documentation doesn’t have many examples (I’ll add them), so, I decided to write a post about the most common usage of pt-kill: to kill the queries that match a regexp (usually a SELECT) and have been running for more than n seconds.

Let’s assume that we are running a tool like sysbench that executes queries like:

and at the same time, in a different process, we are running:

Then, while those queries are running, we start pt-kill with these parameters:

In the output, we will have something like this:

It is clear that SELECT SLEEP(10)  should be killed since it exceeds the --busy-time 2s but why are the other queries being killed too if the execution time is 0 seconds?

Well, it is because they match the specified criteria. How so? Let’s analyze how the program works.

First, it creates “buckets”,  grouping the queries according to the --group-by criteria. On our command line, there are no grouping criteria so there will only be one bucket with all the found queries in it. The second step here is to check the matching criteria. It is important to point out that all the specified criteria are combined via the logical OR.

We have two matching conditions:

--busy-time 2s
--match-info "(select|SELECT)"

The SELECT SLEEP(10)  will match both, so, it should be killed but all other selects will match --match-info "(select|SELECT)". Since the matching criteria is applied via OR, where at least ONE of the conditions should match, all other queries (the ones from sysbench) will be eligible for kill (or print).

‑‑each-busy-time

According to the documentation:

‑‑each-busy-time

type: time; group: Class Matches

Match query class if each query has been running for longer than this time. “Longer than” means that if you specify 10, for example, the class will only match if each and every query has been running for greater than 10 seconds.

What happens if we use --each-busy-time instead of --busy-time?

Answer: no queries will be killed.

Why? The reason for not killing either SELECT SLEEP(10)  or SELECT c FROM sbtest24 WHERE id=4997  is that --each-busy-time will kill the queries only if all the queries in the bucket exceed the specified period (2 seconds in this example) and since we haven’t specified a group by condition, we have only one bucket with all the queries in it. Let’s make it easier to visualize:

Here, in the same bucket (group) we have queries running for 10 seconds and fast queries that take 0 seconds to run. Going back to the documentation, --each-busy-time  matches query class (bucket) if each query has been running for longer than this time and here not all queries have been running for more than 2 seconds; we have one query running for more than 2 seconds but all the others were running for 0 seconds. So, the criteria do not match and no queries are going to be killed.

‑‑group-by to the rescue

From the previous example, it is clear that not all queries should be considered together. Using --group-by will let us separate them into different buckets (classes).
Let’s try pt-kill using these parameters: --kill-busy-commands=Query --run-time 30m --interval 1 --each-busy-time 2s --print --match-info "(select|SELECT)" --group-by=fingerprint .

Now, pt-kill will have 2 buckets as follows:

The matching criterion (–each-busy-time) will be applied to each group independently. For the first group (SELECT SLEEP), it does match since SELECT has been running for more than 2 seconds and then it will be killed. But, for the second group, no queries will match since the execute time for all of them is 0 and to be considered for killing the execute time for all of them should be higher than 2 seconds. So, the queries in the second group won’t be killed and we have achieved what we wanted in the first place.

Happy query killing!

Share this post

Comments (3)

  • viswanath basu Reply

    How to stop this error in errorlog.
    2020-04-18T10:31:56.991322Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use –explicit_defaults_for_timestamp server option (see documentation for more details).
    2020-04-18T10:31:56.995167Z 0 [Note] /usr/sbin/mysqld (mysqld 5.7.24-0ubuntu0.18.04.1) starting as process 979 …
    2020-04-18T10:31:57.062600Z 0 [Note] InnoDB: PUNCH HOLE support available
    2020-04-18T10:31:57.062662Z 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
    2020-04-18T10:31:57.062674Z 0 [Note] InnoDB: Uses event mutexes
    2020-04-18T10:31:57.062684Z 0 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
    2020-04-18T10:31:57.062693Z 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
    2020-04-18T10:31:57.062702Z 0 [Note] InnoDB: Using Linux native AIO
    2020-04-18T10:31:57.063935Z 0 [Note] InnoDB: Number of pools: 1
    2020-04-18T10:31:57.065765Z 0 [Note] InnoDB: Using CPU crc32 instructions
    2020-04-18T10:31:57.069476Z 0 [Note] InnoDB: Initializing buffer pool, total size = 128M, instances = 1, chunk size = 128M
    2020-04-18T10:31:57.086387Z 0 [Note] InnoDB: Completed initialization of buffer pool
    2020-04-18T10:31:57.089919Z 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
    2020-04-18T10:31:57.106927Z 0 [Note] InnoDB: Highest supported file format is Barracuda.
    2020-04-18T10:31:57.244792Z 0 [Note] InnoDB: Creating shared tablespace for temporary tables
    2020-04-18T10:31:57.244867Z 0 [Note] InnoDB: Setting file ‘./ibtmp1’ size to 12 MB. Physically writing the file full; Please wait …
    2020-04-18T10:31:57.288836Z 0 [Note] InnoDB: File ‘./ibtmp1’ size is now 12 MB.
    2020-04-18T10:31:57.290546Z 0 [Note] InnoDB: 96 redo rollback segment(s) found. 96 redo rollback segment(s) are active.
    2020-04-18T10:31:57.290575Z 0 [Note] InnoDB: 32 non-redo rollback segment(s) are active.
    2020-04-18T10:31:57.291047Z 0 [Note] InnoDB: Waiting for purge to start
    2020-04-18T10:31:57.342809Z 0 [Note] InnoDB: 5.7.24 started; log sequence number 211612994526
    2020-04-18T10:31:57.342986Z 0 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool
    2020-04-18T10:31:57.343807Z 0 [Note] Plugin ‘FEDERATED’ is disabled.
    2020-04-18T10:31:57.390888Z 0 [Warning] Failed to set up SSL because of the following SSL library error: SSL context is not usable without certificate and private key
    2020-04-18T10:31:57.390922Z 0 [Note] Server hostname (bind-address): ‘10.99.21.215’; port: 3306

    April 18, 2020 at 12:10 pm
  • viswanath basu Reply

    & also /var/lib/mysql/database size increases to 15GB .How to stop the above error & shrink the database

    April 18, 2020 at 12:12 pm
  • Carlo Salguero Reply

    Hello. It is not clear for me what’s the error. I see MySQL initialization output.
    Could you clarify please?

    Thanks

    April 20, 2020 at 8:11 am

Leave a Reply