Emergency

pmm-server was unable to connect pmm-client to collect linux:metrics

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • pmm-server was unable to connect pmm-client to collect linux:metrics

    pmm-server was unable to connect pmm-client to collect linux:metrics. The follow is the output of pmm-admin check-network. I checked pmm-admin list and linux:metrics showing as running. I verified firewall issue also. There is no firewall issue between pmm-server and pmm-client for 42000 port.

    * Connection: Client <-- Server
    -------------- ----------------- ------------------ ------- ---------- ---------
    SERVICE TYPE NAME REMOTE ENDPOINT STATUS HTTPS/TLS PASSWORD
    -------------- ----------------- ------------------ ------- ---------- ---------
    linux:metrics pmm-client client_ip_address::42000 DOWN YES -
    mysql:metrics dbcrpmysqlsbxha2 client_ip_address:42002 OK YES -




    -------------- ----------------- ----------- -------- --------------------------------------------- ------------------------
    SERVICE TYPE NAME LOCAL PORT RUNNING DATA SOURCE OPTIONS
    -------------- ----------------- ----------- -------- --------------------------------------------- ------------------------
    linux:metrics pmm_client ip address 42000 YES -


    Thanks,
    Vishnu

  • #2
    Hi Vishnu,

    lets check node_exporter.
    can you run the following command on pmm client itself and on pmm server?
    Code:
    wget https://pmm-client-ip:42000/metrics --no-check-certificate
    I am wondering wget command output (not url content)
    Last edited by Mykola; 03-30-2017, 02:24 AM.

    Comment


    • #3
      Thanks Mykola for your help!

      I tried the command (wget https://10.49.xx.xx:42000/metrics --no-check-certificate) from client and server and getting Connection refused error.

      Follow is the error message from pmm-server targets GUI (http://10.49.xx.xx/prometheus/targets)

      State: Down
      Error: context deadline exceeded


      Follow is the error from pmm-linux-metrics-42000.log.

      time="2017-03-29T12:09:44-07:00" level=info msg="Starting node_exporter (version=1.1.1, branch=master, revision=2d78e22000779d63c714011e4fb30c65623b9c77) " source="node_exporter.go:170"
      time="2017-03-29T12:09:44-07:00" level=info msg="Build context (go=go1.7.4, user=, date=)" source="node_exporter.go:171"
      time="2017-03-29T12:09:44-07:00" level=info msg="Enabled collectors:" source="node_exporter.go:190"
      time="2017-03-29T12:09:44-07:00" level=info msg=" - stat" source="node_exporter.go:192"
      time="2017-03-29T12:09:44-07:00" level=info msg=" - vmstat" source="node_exporter.go:192"
      time="2017-03-29T12:09:44-07:00" level=info msg=" - filesystem" source="node_exporter.go:192"
      time="2017-03-29T12:09:44-07:00" level=info msg=" - meminfo" source="node_exporter.go:192"
      time="2017-03-29T12:09:44-07:00" level=info msg=" - netdev" source="node_exporter.go:192"
      time="2017-03-29T12:09:44-07:00" level=info msg=" - netstat" source="node_exporter.go:192"
      time="2017-03-29T12:09:44-07:00" level=info msg=" - uname" source="node_exporter.go:192"
      time="2017-03-29T12:09:44-07:00" level=info msg=" - diskstats" source="node_exporter.go:192"
      time="2017-03-29T12:09:44-07:00" level=info msg=" - filefd" source="node_exporter.go:192"
      time="2017-03-29T12:09:44-07:00" level=info msg=" - loadavg" source="node_exporter.go:192"
      time="2017-03-29T12:09:44-07:00" level=info msg=" - time" source="node_exporter.go:192"
      time="2017-03-29T12:09:44-07:00" level=info msg="HTTPS/TLS is enabled" source="node_exporter.go:235"
      time="2017-03-29T12:09:44-07:00" level=info msg="Listening on 10.49.xx.xx:42000" source="node_exporter.go:238"
      2017/03/29 12:09:55 http: TLS handshake error from 10.49.xx.xx:44816: tls: first record does not look like a TLS handshake


      Thanks,
      Vishnu

      Comment


      • #4
        So it looks like firewall or network configuration issue.
        prometheus (on PMM Server side) fetches https://pmm-client-ip:42000/metrics url every second.
        this url should be accessible from PMM Server.
        can you open 42000, 42002 ports for PMM Server ?

        Comment


        • #5
          It doesn't seem firewall issue. Because I am able to do telnet from pmm-server to pmm-client with port number.

          [root@dbcrpmysqlsbxha3 log]# telnet dbcrpmysqlsbxha3 42000
          Trying 10.49.80.46...
          Connected to dbcrpmysqlsbxha2.

          [root@dbcrpmysqlsbxha3 log]# telnet dbcrpmysqlsbxha2 42002
          Trying 10.49.80.46...
          Connected to dbcrpmysqlsbxha2.

          What is context deadline exceeded error? and Where can I see error logs for more information?

          Thanks,
          Vishnu

          Comment


          • #6
            Hm,

            is the following command working find on pmm client?
            Code:
             
             wget https://pmm-client-ip:42000/metrics --no-check-certificate

            Comment


            • #7
              Yes, It seems working fine. See the output of wget https://dbcrpmysqlsbxha2:42000/metrics --no-check-certificate command.

              [root@dbcrpmysqlsbxha2 ~]# wget https://dbcrpmysqlsbxha2:42000/metrics --no-check-certificate
              --2017-03-30 10:22:37-- https://dbcrpmysqlsbxha2:42000/metrics
              Resolving dbcrpmysqlsbxha2... 10.49.80.46
              Connecting to dbcrpmysqlsbxha2|10.49.80.46|:42000... connected.
              WARNING: cannot verify dbcrpmysqlsbxha2’s certificate, issued by “/O=PMM Client”:
              Unable to locally verify the issuer’s authority.
              WARNING: certificate common name “” doesn't match requested host name “dbcrpmysqlsbxha2”.
              HTTP request sent, awaiting response... 200 OK
              Length: 17784427 (17M) [text/plain]
              Saving to: “metrics”

              100%[================================================== ================================================== ================================================== =====>] 17,784,427 6.35M/s in 2.7s

              2017-03-30 10:24:06 (6.35 MB/s) - “metrics” saved [17784427/17784427]




              Thanks,
              Vishnu

              Comment


              • #8
                And additional to this, how to configure SMTP options to set mail notification alerts. I am using followed options and unable to get notifications. Please let me know where I need to modify below SMTP configuration.

                #################################### SMTP / Emailing ##########################
                [smtp]
                ;enabled = true
                ;host = localhost:25
                ;user = ivishnu7@gmail.com
                ;password =
                ;cert_file =
                ;key_file =
                ;skip_verify = false
                ;from_address = admin@grafana.localhost

                Comment


                • #9
                  about SMTP - I created separate topic https://www.percona.com/forums/quest...configure-smtp

                  about network connectivity,
                  is the same wget command working fine on PMM Server machine?

                  Comment


                  • #10
                    Yes, wget https://dbcrpmysqlsbxha2:42000/metrics --no-check-certificate command working fine in pmm-server also.

                    [root@dbcrpmysqlsbxha3 ~]# wget https://dbcrpmysqlsbxha2:42000/metrics --no-check-certificate
                    --2017-03-31 10:10:09-- https://dbcrpmysqlsbxha2:42000/metrics
                    Resolving dbcrpmysqlsbxha2... 10.49.80.46
                    Connecting to dbcrpmysqlsbxha2|10.49.80.46|:42000... connected.
                    WARNING: cannot verify dbcrpmysqlsbxha2’s certificate, issued by “/O=PMM Client”:
                    Unable to locally verify the issuer’s authority.
                    WARNING: certificate common name “” doesn't match requested host name “dbcrpmysqlsbxha2”.
                    HTTP request sent, awaiting response... 200 OK
                    Length: 17784140 (17M) [text/plain]
                    Saving to: “metrics.1”

                    100%[================================================== ================================================== ================================================== =====>] 17,784,140 2.41M/s in 7.0s

                    2017-03-31 10:11:37 (2.41 MB/s) - “metrics.1” saved [17784140/17784140]

                    [root@dbcrpmysqlsbxha3 ~]#
                    [root@dbcrpmysqlsbxha3 ~]#
                    [root@dbcrpmysqlsbxha3 ~]# telnet dbcrpmysqlsbxha2 42000
                    Trying 10.49.80.46...
                    Connected to dbcrpmysqlsbxha2.

                    Comment


                    • #11
                      can you check https://dbcrpmysqlsbxha2:42000/metrics status on targets page?
                      http://pmm-server-ip/prometheus/targets
                      it should be "UP"

                      Comment


                      • #12
                        Thanks Mykola, The issue has resolved now. The problem is with interval timeout to connect pmm-server with pmm-client. Previously it was 1sec, Now I changed to 5 minutes. After that, everything seems fine.

                        These are present my configuration variables in /etc.prometheus.yml:

                        scrape_interval: 5s
                        scrape_timeout: 5s

                        Comment


                        • #13
                          it is recommended to keep 1s because internal database performance debug requires high resolution.
                          if you want to keep scrape_interval 1s it is needed to found found what kind of monitoring query is slow and disable query via mysqld_exporter options.
                          sometimes mysqld_exporter queries are ok, in this case it is needed place servers in one physical network or high performance network without delays.

                          Comment


                          • #14
                            Hi Mykola,

                            How should I know the exact reasons for context deadline exceeded error? As you mentioned, I tried with scrape_interval 1s only. But I am seeing context deadlock error exceed error for some monitoring servers (Not for all servers) (Present I am monitoring 5 mysql instance from remote PMM server with 16 GB RAM and 4 CPU's. In this, I am able to collect linux and mysql metrics for 2 instances without any issue and not able to collect linux metrics for remaining 3 instances.). Getting context deadline exceeded errors in prometheus/target page. How to control linux metrics monitoring more efficiently? No firewall issue.

                            Comment

                            Working...
                            X