Request for Comments: Global Processlist in Percona Monitoring and Management

Global Processlist in Percona Monitoring and ManagementOne piece of feedback I often hear from users of Percona Monitoring and Management  (PMM) is that while the Query Analytics feature is great and provides a lot of insights into queries the server handled, it can’t help us to see which queries are running now.

Problem Statement

Real-time access to queries that are running right now can be extremely helpful in case of pileups if the optimizer gets crazy, a bad query id is deployed, or some unexpected locking situation takes place. The usual result is that many queries of the same kind pile up… and if you’re not lucky, they may not complete for many minutes or even hours, all this time invisible in PMM.

Proposed Solution

In addition to Query History, what Query Analytics really provides now is access to “Live Queries”.  This basically gathers currently-running queries from all the nodes that a user currently observes (could be one node or could be a hundred), where queries can be grouped and sliced in a way similar to how Query Analytics works.

For example, for a given a QueryId (Query Pattern), we can see how many instances of such a query are running right now, what the maximum and average execution time is so far, what database hosts and what databases it is active for, what client IPs and users this query are coming from, etc.

Some other Query Analytics features such as EXPLAIN for a query, information about involved tables, etc., also remain relevant for running queries too.

Also, working with current events and not just history means we can do more than just observe them. I could imagine killing some particular query instance or even all queries which match a particular pattern, which would be handy too.

What do you think? Would having such a “Global Processlist” feature in Percona Monitoring and Management be helpful for you?   Anything else we should consider? Let me know in the comments!

Share this post

Comments (10)

  • tanuj Reply

    it will be extremely helpful to have a processlist view for any of the database choosen and have the option to kill the query /pid

    February 12, 2021 at 9:28 am
  • rautamiekka Reply

    Sounds like very useful !

    February 12, 2021 at 12:11 pm
  • Daniel Reply

    Adding a “quick analysis” or “live analysis” section to PMM can be a huge WIN. Here’s an example: Currently while using PMM to monitor Postgres RDS everything related to prometheus (actually Victoriametrics) works fine in terms of data freshness. However, QAN get lagged due the amount of nodes being monitored.

    The workaround was to write a python script that get the last 1MB of the pg log, run pgBadger to analyze it and then use that for “hot debug”

    What I will like to see on PMM? Something like that integrated, that doesn’t necessary go through the regular QAN code path but that feed the ClickHouse db so one can use the “Query Analyzer” dashboard. A kind of a merge.

    For MySQL it can be the same with a pt-query-digest integrated, etc etc.

    February 12, 2021 at 1:23 pm
  • Jean-François Gagné Reply

    This looks useful. In addition to that, the possibility to remotely run ps-top on a server. The use-case: a MySQL node is not healthy, in addition to knowing what is running now, I wan to see which table are queried (both in number of ops per second and latency as shown by ps-top) and which files are read from and written to (as shown by ps-top). The other view of ps-top are probably useful, but these 3 views would already be a good start.

    February 12, 2021 at 7:17 pm
  • Aakash Reply

    Sounds good. Also,it will be much helpful if we have options to list / ignore sleep or queries from particular host or particular pattern of queries along with kill and explain a connection as well.

    February 15, 2021 at 7:31 am
  • Francisco Miguel Biete Banon Reply

    How different would this be to use performance_schema as the source of QAN data?
    We have custom dashboards that show the current processlist of the server and allows to kill sessions, but we are missing the option to explain an active statement and see its history in the same screen.

    February 15, 2021 at 10:57 am
    • Daniel Reply

      You already can use peformance_schema as the source of QAN. The problem is that for example in Postgres there’s no P_S and the monitoring extension that Percona made is not available on RDS so one is stuck

      February 15, 2021 at 11:51 am
  • Thomas Reply

    Absolutely yes. When I check an unknown system, I always run “select * from information_schema.processlist where time > 1 and command != ‘Sleep’;” multiple times per minute to get an overall picture.

    February 21, 2021 at 11:19 am

Leave a Reply