Emergency

pmm-server was unable to connect pmm-client to collect linux:metrics

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • pmm-server was unable to connect pmm-client to collect linux:metrics

    pmm-server was unable to connect pmm-client to collect linux:metrics. The follow is the output of pmm-admin check-network. I checked pmm-admin list and linux:metrics showing as running. I verified firewall issue also. There is no firewall issue between pmm-server and pmm-client for 42000 port.

    * Connection: Client <-- Server
    -------------- ----------------- ------------------ ------- ---------- ---------
    SERVICE TYPE NAME REMOTE ENDPOINT STATUS HTTPS/TLS PASSWORD
    -------------- ----------------- ------------------ ------- ---------- ---------
    linux:metrics pmm-client client_ip_address::42000 DOWN YES -
    mysql:metrics dbcrpmysqlsbxha2 client_ip_address:42002 OK YES -




    -------------- ----------------- ----------- -------- --------------------------------------------- ------------------------
    SERVICE TYPE NAME LOCAL PORT RUNNING DATA SOURCE OPTIONS
    -------------- ----------------- ----------- -------- --------------------------------------------- ------------------------
    linux:metrics pmm_client ip address 42000 YES -


    Thanks,
    Vishnu

  • #2
    Hi Vishnu,

    lets check node_exporter.
    can you run the following command on pmm client itself and on pmm server?
    Code:
    wget https://pmm-client-ip:42000/metrics --no-check-certificate
    I am wondering wget command output (not url content)
    Last edited by Mykola; 03-30-2017, 02:24 AM.

    Comment


    • #3
      Thanks Mykola for your help!

      I tried the command (wget https://10.49.xx.xx:42000/metrics --no-check-certificate) from client and server and getting Connection refused error.

      Follow is the error message from pmm-server targets GUI (http://10.49.xx.xx/prometheus/targets)

      State: Down
      Error: context deadline exceeded


      Follow is the error from pmm-linux-metrics-42000.log.

      time="2017-03-29T12:09:44-07:00" level=info msg="Starting node_exporter (version=1.1.1, branch=master, revision=2d78e22000779d63c714011e4fb30c65623b9c77) " source="node_exporter.go:170"
      time="2017-03-29T12:09:44-07:00" level=info msg="Build context (go=go1.7.4, user=, date=)" source="node_exporter.go:171"
      time="2017-03-29T12:09:44-07:00" level=info msg="Enabled collectors:" source="node_exporter.go:190"
      time="2017-03-29T12:09:44-07:00" level=info msg=" - stat" source="node_exporter.go:192"
      time="2017-03-29T12:09:44-07:00" level=info msg=" - vmstat" source="node_exporter.go:192"
      time="2017-03-29T12:09:44-07:00" level=info msg=" - filesystem" source="node_exporter.go:192"
      time="2017-03-29T12:09:44-07:00" level=info msg=" - meminfo" source="node_exporter.go:192"
      time="2017-03-29T12:09:44-07:00" level=info msg=" - netdev" source="node_exporter.go:192"
      time="2017-03-29T12:09:44-07:00" level=info msg=" - netstat" source="node_exporter.go:192"
      time="2017-03-29T12:09:44-07:00" level=info msg=" - uname" source="node_exporter.go:192"
      time="2017-03-29T12:09:44-07:00" level=info msg=" - diskstats" source="node_exporter.go:192"
      time="2017-03-29T12:09:44-07:00" level=info msg=" - filefd" source="node_exporter.go:192"
      time="2017-03-29T12:09:44-07:00" level=info msg=" - loadavg" source="node_exporter.go:192"
      time="2017-03-29T12:09:44-07:00" level=info msg=" - time" source="node_exporter.go:192"
      time="2017-03-29T12:09:44-07:00" level=info msg="HTTPS/TLS is enabled" source="node_exporter.go:235"
      time="2017-03-29T12:09:44-07:00" level=info msg="Listening on 10.49.xx.xx:42000" source="node_exporter.go:238"
      2017/03/29 12:09:55 http: TLS handshake error from 10.49.xx.xx:44816: tls: first record does not look like a TLS handshake


      Thanks,
      Vishnu

      Comment


      • #4
        So it looks like firewall or network configuration issue.
        prometheus (on PMM Server side) fetches https://pmm-client-ip:42000/metrics url every second.
        this url should be accessible from PMM Server.
        can you open 42000, 42002 ports for PMM Server ?

        Comment


        • #5
          It doesn't seem firewall issue. Because I am able to do telnet from pmm-server to pmm-client with port number.

          [root@dbcrpmysqlsbxha3 log]# telnet dbcrpmysqlsbxha3 42000
          Trying 10.49.80.46...
          Connected to dbcrpmysqlsbxha2.

          [root@dbcrpmysqlsbxha3 log]# telnet dbcrpmysqlsbxha2 42002
          Trying 10.49.80.46...
          Connected to dbcrpmysqlsbxha2.

          What is context deadline exceeded error? and Where can I see error logs for more information?

          Thanks,
          Vishnu

          Comment


          • #6
            Hm,

            is the following command working find on pmm client?
            Code:
             
             wget https://pmm-client-ip:42000/metrics --no-check-certificate

            Comment


            • #7
              Yes, It seems working fine. See the output of wget https://dbcrpmysqlsbxha2:42000/metrics --no-check-certificate command.

              [root@dbcrpmysqlsbxha2 ~]# wget https://dbcrpmysqlsbxha2:42000/metrics --no-check-certificate
              --2017-03-30 10:22:37-- https://dbcrpmysqlsbxha2:42000/metrics
              Resolving dbcrpmysqlsbxha2... 10.49.80.46
              Connecting to dbcrpmysqlsbxha2|10.49.80.46|:42000... connected.
              WARNING: cannot verify dbcrpmysqlsbxha2’s certificate, issued by “/O=PMM Client”:
              Unable to locally verify the issuer’s authority.
              WARNING: certificate common name “” doesn't match requested host name “dbcrpmysqlsbxha2”.
              HTTP request sent, awaiting response... 200 OK
              Length: 17784427 (17M) [text/plain]
              Saving to: “metrics”

              100%[================================================== ================================================== ================================================== =====>] 17,784,427 6.35M/s in 2.7s

              2017-03-30 10:24:06 (6.35 MB/s) - “metrics” saved [17784427/17784427]




              Thanks,
              Vishnu

              Comment


              • #8
                And additional to this, how to configure SMTP options to set mail notification alerts. I am using followed options and unable to get notifications. Please let me know where I need to modify below SMTP configuration.

                #################################### SMTP / Emailing ##########################
                [smtp]
                ;enabled = true
                ;host = localhost:25
                ;user = ivishnu7@gmail.com
                ;password =
                ;cert_file =
                ;key_file =
                ;skip_verify = false
                ;from_address = admin@grafana.localhost

                Comment


                • #9
                  about SMTP - I created separate topic https://www.percona.com/forums/quest...configure-smtp

                  about network connectivity,
                  is the same wget command working fine on PMM Server machine?

                  Comment


                  • #10
                    Yes, wget https://dbcrpmysqlsbxha2:42000/metrics --no-check-certificate command working fine in pmm-server also.

                    [root@dbcrpmysqlsbxha3 ~]# wget https://dbcrpmysqlsbxha2:42000/metrics --no-check-certificate
                    --2017-03-31 10:10:09-- https://dbcrpmysqlsbxha2:42000/metrics
                    Resolving dbcrpmysqlsbxha2... 10.49.80.46
                    Connecting to dbcrpmysqlsbxha2|10.49.80.46|:42000... connected.
                    WARNING: cannot verify dbcrpmysqlsbxha2’s certificate, issued by “/O=PMM Client”:
                    Unable to locally verify the issuer’s authority.
                    WARNING: certificate common name “” doesn't match requested host name “dbcrpmysqlsbxha2”.
                    HTTP request sent, awaiting response... 200 OK
                    Length: 17784140 (17M) [text/plain]
                    Saving to: “metrics.1”

                    100%[================================================== ================================================== ================================================== =====>] 17,784,140 2.41M/s in 7.0s

                    2017-03-31 10:11:37 (2.41 MB/s) - “metrics.1” saved [17784140/17784140]

                    [root@dbcrpmysqlsbxha3 ~]#
                    [root@dbcrpmysqlsbxha3 ~]#
                    [root@dbcrpmysqlsbxha3 ~]# telnet dbcrpmysqlsbxha2 42000
                    Trying 10.49.80.46...
                    Connected to dbcrpmysqlsbxha2.

                    Comment


                    • #11
                      can you check https://dbcrpmysqlsbxha2:42000/metrics status on targets page?
                      http://pmm-server-ip/prometheus/targets
                      it should be "UP"

                      Comment


                      • #12
                        Thanks Mykola, The issue has resolved now. The problem is with interval timeout to connect pmm-server with pmm-client. Previously it was 1sec, Now I changed to 5 minutes. After that, everything seems fine.

                        These are present my configuration variables in /etc.prometheus.yml:

                        scrape_interval: 5s
                        scrape_timeout: 5s

                        Comment


                        • #13
                          it is recommended to keep 1s because internal database performance debug requires high resolution.
                          if you want to keep scrape_interval 1s it is needed to found found what kind of monitoring query is slow and disable query via mysqld_exporter options.
                          sometimes mysqld_exporter queries are ok, in this case it is needed place servers in one physical network or high performance network without delays.

                          Comment


                          • #14
                            Hi Mykola,

                            How should I know the exact reasons for context deadline exceeded error? As you mentioned, I tried with scrape_interval 1s only. But I am seeing context deadlock error exceed error for some monitoring servers (Not for all servers) (Present I am monitoring 5 mysql instance from remote PMM server with 16 GB RAM and 4 CPU's. In this, I am able to collect linux and mysql metrics for 2 instances without any issue and not able to collect linux metrics for remaining 3 instances.). Getting context deadline exceeded errors in prometheus/target page. How to control linux metrics monitoring more efficiently? No firewall issue.

                            Comment


                            • #15
                              Hi Vishnu,

                              `context deadline exceeded` means that mysqld_exporter works longer that prometheus expected (cannot finish work in 1 second).
                              so mysqld_exporter creates long additional load on database server.
                              mysqld_exporter runs many queries to database, so we can disable some checks to speedup it.
                              usually most long query is 'tablestat', it is possible to disable this query by the following commands.
                              Code:
                              pmm-admin remove mysql:metrics
                              pmm-admin add mysql:metrics --disable-tablestats
                              also --disable-userstats --disable-processlist --disable-binlogstats options available.

                              Comment

                              Working...
                              X