Buy Percona ServicesBuy Now!

prometheus high cpu

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • prometheus high cpu

    The following host load conditions

    Attached Files

  • #2
    what problems do you have (except high cpu numbers)?
    do you see any issues in prometheus log?
    you can open it with the following command help
    Code:
    docker exec -it pmm-server less /var/log/prometheus.log

    Comment


    • #3
      We are seeing the same issue since moving to 1.2.0. We removed pmm-data volumes and started fresh two days ago (7-18 roughly 12pm) . Prometheous CPU usage jumped, disk io and load climbing steadily since install time.

      prometheus.log is filled with the following:

      time="2017-07-20T18:40:10Z" level=warning msg="Scrape duration sample discarded" error="sample timestamp out of order" sample=scrape_duration_seconds{instance="db2", job="mysql"} => 0.882654579 @[1500576009.988] source="scrape.go:590"
      time="2017-07-20T18:40:10Z" level=warning msg="Scrape sample count sample discarded" error="sample timestamp out of order" sample=scrape_duration_seconds{instance="db2", job="mysql"} => 0.882654579 @[1500576009.988] source="scrape.go:593"
      time="2017-07-20T18:40:10Z" level=warning msg="Scrape sample count post-relabeling sample discarded" error="sample timestamp out of order" sample=scrape_duration_seconds{instance="db2", job="mysql"} => 0.882654579 @[1500576009.988] source="scrape.go:596"
      time="2017-07-20T18:40:12Z" level=warning msg="Storage has entered rushed mode." chunksToPersist=1032 memoryChunks=37175 source="storage.go:1867" urgencyScore=0.803
      time="2017-07-20T18:40:12Z" level=info msg="Completed initial partial maintenance sweep through 763 in-memory fingerprints in 25.691535331s." source="storage.go:1398"
      time="2017-07-20T18:40:14Z" level=info msg="Storage has left rushed mode." chunksToPersist=1002 memoryChunks=37242 source="storage.go:1857" urgencyScore=0.569

      Time is synced between hosts and within docker, except docker is on UTC.

      nms1:~ : date
      Thu Jul 20 12:32:37 PDT 2017

      nms1:~ : ssh db1 date
      Thu Jul 20 12:32:37 PDT 2017

      nms1:~ : ssh db2 date
      Thu Jul 20 12:32:37 PDT 2017

      nms1:~ : sudo docker exec -it pmm-server date
      Thu Jul 20 19:32:38 UTC 2017

      Comment


      • #4
        Issue is resolved. Removed all client services (pmm-admin rm --all), removed pmm-server and pmm-data, then added pmm-server with -e METRICS_RESOLUTION=5s -e METRICS_MEMORY=786432 options, then added back all clients same as before. Adding those two options seems to have done the trick. Load and disk io on Prometheus server is fine and steady, and no more storage rushed mode. Still seeing "sample timestamp out of order" and "sample discarded" messages though.

        Comment


        • #5
          cloud-admin@sbwell.com, thank you for your feedback. we are going to increase METRICS_MEMORY default value soon.

          Comment


          • #6
            Is there any update about "sample timestamp out of order" log?

            thanks

            Comment


            • #7
              We are seeing excessive CPU use from prometheus and there are gaps in the data. Where is the log hidden in the current version (1.9.1)? There is no prometheus.log in /var/log or anywhere else on the filesystem.

              Comment

              Working...
              X