I receive many questions about errors being reported about pt-kill… and often those errors are not real. The main problem I have found is that pt-kill is hard to understand and the documentation doesn’t have many examples (I’ll add them), so, I decided to write a post about the most common usage of pt-kill: to kill the queries that match a regexp (usually a SELECT) and have been running for more than n seconds.
Let’s assume that we are running a tool like sysbench that executes queries like:
1 2 3 4 5 |
select distinct c from sbtest1 where id between 1 and 20 order by c select distinct c from sbtest2 where id between 5 and 25 order by c select distinct c from sbtest3 where id between 3 and 7 order by c select distinct c from sbtest1 where id between 2 and 8 order by c select distinct c from sbtest3 where id between 15 and 30 order by c |
and at the same time, in a different process, we are running:
1 |
SELECT SLEEP 10 |
Then, while those queries are running, we start pt-kill with these parameters:
1 |
--kill-busy-commands=Query --run-time 30m --interval 1 --busy-time 2s --print --match-info "(select|SELECT)" |
In the output, we will have something like this:
1 2 3 4 5 |
# 2019-06-20T11:40:23 KILL 31 (Query 2 sec) SELECT SLEEP(10) # 2019-06-20T11:40:23 KILL 513 (Execute 0 sec) SELECT DISTINCT c FROM sbtest48 WHERE id BETWEEN 4980 AND 5079 ORDER BY c # 2019-06-20T11:40:23 KILL 524 (Execute 0 sec) SELECT c FROM sbtest24 WHERE id=4997 # 2019-06-20T11:40:23 KILL 512 (Execute 0 sec) SELECT c FROM sbtest48 WHERE id BETWEEN 5001 AND 5100 ORDER BY c # 2019-06-20T11:40:23 KILL 516 (Execute 0 sec) SELECT SUM(k) FROM sbtest41 WHERE id BETWEEN 4988 AND 5087 |
It is clear that SELECT SLEEP(10) should be killed since it exceeds the --busy-time 2s but why are the other queries being killed too if the execution time is 0 seconds?
Well, it is because they match the specified criteria. How so? Let’s analyze how the program works.
First, it creates “buckets”, grouping the queries according to the --group-by criteria. On our command line, there are no grouping criteria so there will only be one bucket with all the found queries in it. The second step here is to check the matching criteria. It is important to point out that all the specified criteria are combined via the logical OR.
We have two matching conditions:
--busy-time 2s
--match-info "(select|SELECT)"
The SELECT SLEEP(10) will match both, so, it should be killed but all other selects will match --match-info "(select|SELECT)". Since the matching criteria is applied via OR, where at least ONE of the conditions should match, all other queries (the ones from sysbench) will be eligible for kill (or print).
‑‑each-busy-time
According to the documentation:
‑‑each-busy-time
type: time; group: Class Matches
Match query class if each query has been running for longer than this time. “Longer than” means that if you specify 10, for example, the class will only match if each and every query has been running for greater than 10 seconds.
What happens if we use --each-busy-time instead of --busy-time?
Answer: no queries will be killed.
Why? The reason for not killing either SELECT SLEEP(10) or SELECT c FROM sbtest24 WHERE id=4997 is that --each-busy-time will kill the queries only if all the queries in the bucket exceed the specified period (2 seconds in this example) and since we haven’t specified a group by condition, we have only one bucket with all the queries in it. Let’s make it easier to visualize:
1 2 3 4 5 6 |
Bucket: default Queries: - SELECT SLEEP (10) - SELECT c FROM sbtest24 WHERE id=4997 |
Here, in the same bucket (group) we have queries running for 10 seconds and fast queries that take 0 seconds to run. Going back to the documentation, --each-busy-time matches query class (bucket) if each query has been running for longer than this time and here not all queries have been running for more than 2 seconds; we have one query running for more than 2 seconds but all the others were running for 0 seconds. So, the criteria do not match and no queries are going to be killed.
‑‑group-by to the rescue
From the previous example, it is clear that not all queries should be considered together. Using
--group-by will let us separate them into different buckets (classes).
Let’s try pt-kill using these parameters:
--kill-busy-commands=Query --run-time 30m --interval 1 --each-busy-time 2s --print --match-info "(select|SELECT)" --group-by=fingerprint .
Now, pt-kill will have 2 buckets as follows:
1 2 3 4 5 |
Bucket: select sleep(?) Queries: - (Query 3 sec) SELECT SLEEP(10) |
1 2 3 4 5 6 7 8 9 10 11 |
Bucket: select c from sbtest? where id=? Queries: # (Execute 0 sec) SELECT DISTINCT c FROM sbtest9 WHERE id BETWEEN 4985 AND 5084 ORDER BY c # (Execute 0 sec) SELECT SUM(k) FROM sbtest32 WHERE id BETWEEN 4986 AND 5085 # (Execute 0 sec) SELECT c FROM sbtest8 WHERE id=5983 # (Execute 0 sec) SELECT DISTINCT c FROM sbtest20 WHERE id BETWEEN 5570 AND 5669 ORDER BY c # (Execute 0 sec) SELECT c FROM sbtest30 WHERE id BETWEEN 4504 AND 4603 ORDER BY c # (Execute 0 sec) SELECT c FROM sbtest8 WHERE id BETWEEN 4988 AND 5087 # (Execute 0 sec) SELECT c FROM sbtest27 WHERE id=5013 |
The matching criterion (–each-busy-time) will be applied to each group independently. For the first group (SELECT SLEEP), it does match since SELECT has been running for more than 2 seconds and then it will be killed. But, for the second group, no queries will match since the execute time for all of them is 0 and to be considered for killing the execute time for all of them should be higher than 2 seconds. So, the queries in the second group won’t be killed and we have achieved what we wanted in the first place.
Happy query killing!
Comments (3)
How to stop this error in errorlog.
2020-04-18T10:31:56.991322Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use –explicit_defaults_for_timestamp server option (see documentation for more details).
2020-04-18T10:31:56.995167Z 0 [Note] /usr/sbin/mysqld (mysqld 5.7.24-0ubuntu0.18.04.1) starting as process 979 …
2020-04-18T10:31:57.062600Z 0 [Note] InnoDB: PUNCH HOLE support available
2020-04-18T10:31:57.062662Z 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2020-04-18T10:31:57.062674Z 0 [Note] InnoDB: Uses event mutexes
2020-04-18T10:31:57.062684Z 0 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
2020-04-18T10:31:57.062693Z 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2020-04-18T10:31:57.062702Z 0 [Note] InnoDB: Using Linux native AIO
2020-04-18T10:31:57.063935Z 0 [Note] InnoDB: Number of pools: 1
2020-04-18T10:31:57.065765Z 0 [Note] InnoDB: Using CPU crc32 instructions
2020-04-18T10:31:57.069476Z 0 [Note] InnoDB: Initializing buffer pool, total size = 128M, instances = 1, chunk size = 128M
2020-04-18T10:31:57.086387Z 0 [Note] InnoDB: Completed initialization of buffer pool
2020-04-18T10:31:57.089919Z 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2020-04-18T10:31:57.106927Z 0 [Note] InnoDB: Highest supported file format is Barracuda.
2020-04-18T10:31:57.244792Z 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2020-04-18T10:31:57.244867Z 0 [Note] InnoDB: Setting file ‘./ibtmp1’ size to 12 MB. Physically writing the file full; Please wait …
2020-04-18T10:31:57.288836Z 0 [Note] InnoDB: File ‘./ibtmp1’ size is now 12 MB.
2020-04-18T10:31:57.290546Z 0 [Note] InnoDB: 96 redo rollback segment(s) found. 96 redo rollback segment(s) are active.
2020-04-18T10:31:57.290575Z 0 [Note] InnoDB: 32 non-redo rollback segment(s) are active.
2020-04-18T10:31:57.291047Z 0 [Note] InnoDB: Waiting for purge to start
2020-04-18T10:31:57.342809Z 0 [Note] InnoDB: 5.7.24 started; log sequence number 211612994526
2020-04-18T10:31:57.342986Z 0 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool
2020-04-18T10:31:57.343807Z 0 [Note] Plugin ‘FEDERATED’ is disabled.
2020-04-18T10:31:57.390888Z 0 [Warning] Failed to set up SSL because of the following SSL library error: SSL context is not usable without certificate and private key
2020-04-18T10:31:57.390922Z 0 [Note] Server hostname (bind-address): ‘10.99.21.215’; port: 3306
& also /var/lib/mysql/database size increases to 15GB .How to stop the above error & shrink the database
Hello. It is not clear for me what’s the error. I see MySQL initialization output.
Could you clarify please?
Thanks