In his excellent blog post, Pavel Trukhanov showed the value of S.M.A.R.T. metric collections, so I wondered how hard would it be to enable their collection in Percona Monitoring and Management (PMM)
A quick search led me to the text_collector plugin SmartMon, which can be easily integrated with any Prometheus Installation
For PMM, Vadim Yalovets recently showed how to do custom integrations based on text_collector
Let’s put those together:
|
1 |
echo "*/5 * * * * root bash /usr/local/bin/smartmon.sh > /tmp/smart_metrics.prom " > /etc/cron.d/smartmon |
That’s it! You should get your data flowing. Now you can use Prometheus to query device information:

Or if you want to get a specific S.M.A.R.T value, such as media_wearout indicator:

If you would like to see a nicer visualization in Grafana, you can install the appropriate dashboard from the Grafana web site.

The number and kind of metrics you’re going to get depends on the storage device vendor and model. Here is an example list from one of my test systems:
|
1 |
# HELP smartmon_smartctl_version SMART metric smartctl_version<br># TYPE smartmon_smartctl_version gauge<br>smartmon_smartctl_version{version="6.5"} 1<br># HELP smartmon_current_pending_sector_raw_value SMART metric current_pending_sector_raw_value<br># TYPE smartmon_current_pending_sector_raw_value gauge<br>smartmon_current_pending_sector_raw_value{disk="/dev/sda",type="sat",smart_id="197"} 0.000000e+00<br># HELP smartmon_current_pending_sector_threshold SMART metric current_pending_sector_threshold<br># TYPE smartmon_current_pending_sector_threshold gauge<br>smartmon_current_pending_sector_threshold{disk="/dev/sda",type="sat",smart_id="197"} 0<br># HELP smartmon_current_pending_sector_value SMART metric current_pending_sector_value<br># TYPE smartmon_current_pending_sector_value gauge<br>smartmon_current_pending_sector_value{disk="/dev/sda",type="sat",smart_id="197"} 100<br># HELP smartmon_current_pending_sector_worst SMART metric current_pending_sector_worst<br># TYPE smartmon_current_pending_sector_worst gauge<br>smartmon_current_pending_sector_worst{disk="/dev/sda",type="sat",smart_id="197"} 100<br># HELP smartmon_device_info SMART metric device_info<br># TYPE smartmon_device_info gauge<br>smartmon_device_info{disk="/dev/sda",type="sat",vendor="",product="",revision="",lun_id="",model_family="",device_model="Crucial_CT275MX300SSD1",serial_number="16431465B53F",firmware_version="M0CR031"} 1<br># HELP smartmon_device_smart_available SMART metric device_smart_available<br># TYPE smartmon_device_smart_available gauge<br>smartmon_device_smart_available{disk="/dev/sda",type="sat"} 1<br># HELP smartmon_device_smart_enabled SMART metric device_smart_enabled<br># TYPE smartmon_device_smart_enabled gauge<br>smartmon_device_smart_enabled{disk="/dev/sda",type="sat"} 1<br># HELP smartmon_device_smart_healthy SMART metric device_smart_healthy<br># TYPE smartmon_device_smart_healthy gauge<br>smartmon_device_smart_healthy{disk="/dev/sda",type="sat"} 1<br># HELP smartmon_end_to_end_error_raw_value SMART metric end_to_end_error_raw_value<br># TYPE smartmon_end_to_end_error_raw_value gauge<br>smartmon_end_to_end_error_raw_value{disk="/dev/sda",type="sat",smart_id="184"} 0.000000e+00<br># HELP smartmon_end_to_end_error_threshold SMART metric end_to_end_error_threshold<br># TYPE smartmon_end_to_end_error_threshold gauge<br>smartmon_end_to_end_error_threshold{disk="/dev/sda",type="sat",smart_id="184"} 0<br># HELP smartmon_end_to_end_error_value SMART metric end_to_end_error_value<br># TYPE smartmon_end_to_end_error_value gauge<br>smartmon_end_to_end_error_value{disk="/dev/sda",type="sat",smart_id="184"} 100<br># HELP smartmon_end_to_end_error_worst SMART metric end_to_end_error_worst<br># TYPE smartmon_end_to_end_error_worst gauge<br>smartmon_end_to_end_error_worst{disk="/dev/sda",type="sat",smart_id="184"} 100<br># HELP smartmon_offline_uncorrectable_raw_value SMART metric offline_uncorrectable_raw_value<br># TYPE smartmon_offline_uncorrectable_raw_value gauge<br>smartmon_offline_uncorrectable_raw_value{disk="/dev/sda",type="sat",smart_id="198"} 0.000000e+00<br># HELP smartmon_offline_uncorrectable_threshold SMART metric offline_uncorrectable_threshold<br># TYPE smartmon_offline_uncorrectable_threshold gauge<br>smartmon_offline_uncorrectable_threshold{disk="/dev/sda",type="sat",smart_id="198"} 0<br># HELP smartmon_offline_uncorrectable_value SMART metric offline_uncorrectable_value<br># TYPE smartmon_offline_uncorrectable_value gauge<br>smartmon_offline_uncorrectable_value{disk="/dev/sda",type="sat",smart_id="198"} 100<br># HELP smartmon_offline_uncorrectable_worst SMART metric offline_uncorrectable_worst<br># TYPE smartmon_offline_uncorrectable_worst gauge<br>smartmon_offline_uncorrectable_worst{disk="/dev/sda",type="sat",smart_id="198"} 100<br># HELP smartmon_power_cycle_count_raw_value SMART metric power_cycle_count_raw_value<br># TYPE smartmon_power_cycle_count_raw_value gauge<br>smartmon_power_cycle_count_raw_value{disk="/dev/sda",type="sat",smart_id="12"} 2.000000e+01<br># HELP smartmon_power_cycle_count_threshold SMART metric power_cycle_count_threshold<br># TYPE smartmon_power_cycle_count_threshold gauge<br>smartmon_power_cycle_count_threshold{disk="/dev/sda",type="sat",smart_id="12"} 0<br># HELP smartmon_power_cycle_count_value SMART metric power_cycle_count_value<br># TYPE smartmon_power_cycle_count_value gauge<br>smartmon_power_cycle_count_value{disk="/dev/sda",type="sat",smart_id="12"} 100<br># HELP smartmon_power_cycle_count_worst SMART metric power_cycle_count_worst<br># TYPE smartmon_power_cycle_count_worst gauge<br>smartmon_power_cycle_count_worst{disk="/dev/sda",type="sat",smart_id="12"} 100<br># HELP smartmon_power_on_hours_raw_value SMART metric power_on_hours_raw_value<br># TYPE smartmon_power_on_hours_raw_value gauge<br>smartmon_power_on_hours_raw_value{disk="/dev/sda",type="sat",smart_id="9"} 1.313300e+04<br># HELP smartmon_power_on_hours_threshold SMART metric power_on_hours_threshold<br># TYPE smartmon_power_on_hours_threshold gauge<br>smartmon_power_on_hours_threshold{disk="/dev/sda",type="sat",smart_id="9"} 0<br># HELP smartmon_power_on_hours_value SMART metric power_on_hours_value<br># TYPE smartmon_power_on_hours_value gauge<br>smartmon_power_on_hours_value{disk="/dev/sda",type="sat",smart_id="9"} 100<br># HELP smartmon_power_on_hours_worst SMART metric power_on_hours_worst<br># TYPE smartmon_power_on_hours_worst gauge<br>smartmon_power_on_hours_worst{disk="/dev/sda",type="sat",smart_id="9"} 100<br># HELP smartmon_raw_read_error_rate_raw_value SMART metric raw_read_error_rate_raw_value<br># TYPE smartmon_raw_read_error_rate_raw_value gauge<br>smartmon_raw_read_error_rate_raw_value{disk="/dev/sda",type="sat",smart_id="1"} 0.000000e+00<br># HELP smartmon_raw_read_error_rate_threshold SMART metric raw_read_error_rate_threshold<br># TYPE smartmon_raw_read_error_rate_threshold gauge<br>smartmon_raw_read_error_rate_threshold{disk="/dev/sda",type="sat",smart_id="1"} 0<br># HELP smartmon_raw_read_error_rate_value SMART metric raw_read_error_rate_value<br># TYPE smartmon_raw_read_error_rate_value gauge<br>smartmon_raw_read_error_rate_value{disk="/dev/sda",type="sat",smart_id="1"} 100<br># HELP smartmon_raw_read_error_rate_worst SMART metric raw_read_error_rate_worst<br># TYPE smartmon_raw_read_error_rate_worst gauge<br>smartmon_raw_read_error_rate_worst{disk="/dev/sda",type="sat",smart_id="1"} 100<br># HELP smartmon_reallocated_sector_ct_raw_value SMART metric reallocated_sector_ct_raw_value<br># TYPE smartmon_reallocated_sector_ct_raw_value gauge<br>smartmon_reallocated_sector_ct_raw_value{disk="/dev/sda",type="sat",smart_id="5"} 0.000000e+00<br># HELP smartmon_reallocated_sector_ct_threshold SMART metric reallocated_sector_ct_threshold<br># TYPE smartmon_reallocated_sector_ct_threshold gauge<br>smartmon_reallocated_sector_ct_threshold{disk="/dev/sda",type="sat",smart_id="5"} 10<br># HELP smartmon_reallocated_sector_ct_value SMART metric reallocated_sector_ct_value<br># TYPE smartmon_reallocated_sector_ct_value gauge<br>smartmon_reallocated_sector_ct_value{disk="/dev/sda",type="sat",smart_id="5"} 100<br># HELP smartmon_reallocated_sector_ct_worst SMART metric reallocated_sector_ct_worst<br># TYPE smartmon_reallocated_sector_ct_worst gauge<br>smartmon_reallocated_sector_ct_worst{disk="/dev/sda",type="sat",smart_id="5"} 100<br># HELP smartmon_reported_uncorrect_raw_value SMART metric reported_uncorrect_raw_value<br># TYPE smartmon_reported_uncorrect_raw_value gauge<br>smartmon_reported_uncorrect_raw_value{disk="/dev/sda",type="sat",smart_id="187"} 0.000000e+00<br># HELP smartmon_reported_uncorrect_threshold SMART metric reported_uncorrect_threshold<br># TYPE smartmon_reported_uncorrect_threshold gauge<br>smartmon_reported_uncorrect_threshold{disk="/dev/sda",type="sat",smart_id="187"} 0<br># HELP smartmon_reported_uncorrect_value SMART metric reported_uncorrect_value<br># TYPE smartmon_reported_uncorrect_value gauge<br>smartmon_reported_uncorrect_value{disk="/dev/sda",type="sat",smart_id="187"} 100<br># HELP smartmon_reported_uncorrect_worst SMART metric reported_uncorrect_worst<br># TYPE smartmon_reported_uncorrect_worst gauge<br>smartmon_reported_uncorrect_worst{disk="/dev/sda",type="sat",smart_id="187"} 100<br># HELP smartmon_smartctl_run SMART metric smartctl_run<br># TYPE smartmon_smartctl_run gauge<br>smartmon_smartctl_run{disk="/dev/sda",type="sat"} 1535666337<br># HELP smartmon_temperature_celsius_raw_value SMART metric temperature_celsius_raw_value<br># TYPE smartmon_temperature_celsius_raw_value gauge<br>smartmon_temperature_celsius_raw_value{disk="/dev/sda",type="sat",smart_id="194"} 3.100000e+01<br># HELP smartmon_temperature_celsius_threshold SMART metric temperature_celsius_threshold<br># TYPE smartmon_temperature_celsius_threshold gauge<br>smartmon_temperature_celsius_threshold{disk="/dev/sda",type="sat",smart_id="194"} 0<br># HELP smartmon_temperature_celsius_value SMART metric temperature_celsius_value<br># TYPE smartmon_temperature_celsius_value gauge<br>smartmon_temperature_celsius_value{disk="/dev/sda",type="sat",smart_id="194"} 69<br># HELP smartmon_temperature_celsius_worst SMART metric temperature_celsius_worst<br># TYPE smartmon_temperature_celsius_worst gauge<br>smartmon_temperature_celsius_worst{disk="/dev/sda",type="sat",smart_id="194"} 59<br># HELP smartmon_udma_crc_error_count_raw_value SMART metric udma_crc_error_count_raw_value<br># TYPE smartmon_udma_crc_error_count_raw_value gauge<br>smartmon_udma_crc_error_count_raw_value{disk="/dev/sda",type="sat",smart_id="199"} 0.000000e+00<br># HELP smartmon_udma_crc_error_count_threshold SMART metric udma_crc_error_count_threshold<br># TYPE smartmon_udma_crc_error_count_threshold gauge<br>smartmon_udma_crc_error_count_threshold{disk="/dev/sda",type="sat",smart_id="199"} 0<br># HELP smartmon_udma_crc_error_count_value SMART metric udma_crc_error_count_value<br># TYPE smartmon_udma_crc_error_count_value gauge<br>smartmon_udma_crc_error_count_value{disk="/dev/sda",type="sat",smart_id="199"} 100<br># HELP smartmon_udma_crc_error_count_worst SMART metric udma_crc_error_count_worst<br># TYPE smartmon_udma_crc_error_count_worst gauge<br>smartmon_udma_crc_error_count_worst{disk="/dev/sda",type="sat",smart_id="199"} 100 |
Resources
RELATED POSTS