In this blog post, I’ll look at the types of NVMe flash health information you can get from using the NVMe command line tools.
Checking SATA-based drive health is easy. Whether it’s an SSD or older spinning drive, you can use the smartctl command to get a wealth of information about the device’s performance and health. As an example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 |
root@blinky:/var/lib/mysql# smartctl -A /dev/sda smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-62-generic] (local build) Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org === START OF READ SMART DATA SECTION === SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 100 100 000 Pre-fail Always - 0 5 Reallocated_Sector_Ct 0x0032 100 100 010 Old_age Always - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 41 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 2 171 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0 172 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0 173 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 1 174 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0 183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0 184 End-to-End_Error 0x0032 100 100 000 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 194 Temperature_Celsius 0x0022 065 059 000 Old_age Always - 35 (Min/Max 21/41) 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 100 100 000 Old_age Always - 0 202 Unknown_SSD_Attribute 0x0030 100 100 001 Old_age Offline - 0 206 Unknown_SSD_Attribute 0x000e 100 100 000 Old_age Always - 0 246 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 145599393 247 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 4550280 248 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 582524 180 Unused_Rsvd_Blk_Cnt_Tot 0x0033 000 000 000 Pre-fail Always - 1260 210 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0 |
While smartctl might not know all vendor-specific smart values, typically you can Google the drive model along with “smart attributes” and find documents like this to get more details.
Checking NVMe Flash Health
If you move to newer generation NVMe-based flash storage, smartctl won’t work anymore – at least it doesn’t work for the packages available for Ubuntu 16.04 (what I’m running). It looks like support for NVMe in Smartmontools is coming, and it would be great to get a single tool that supports both SATA and NVMe flash storage.
In the meantime, you can use the nvme tool available from the nvme-cli package. It provides some basic information for NVMe devices.
To get information about the NVMe devices installed:
1 2 3 4 5 6 7 |
root@alex:~# nvme list Node SN Model Version Namespace Usage Format FW Rev ---------------- -------------------- ---------------------------------------- -------- --------- -------------------------- ---------------- -------- /dev/nvme0n1 S3EVNCAHB01861F Samsung SSD 960 PRO 1TB 1.2 1 689.63 GB / 1.02 TB 512 B + 0 B 1B6QCXP7 |
To get SMART information:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
root@alex:~# nvme smart-log /dev/nvme0 Smart Log for NVME device:nvme0 namespace-id:ffffffff critical_warning : 0 temperature : 34 C available_spare : 100% available_spare_threshold : 10% percentage_used : 0% data_units_read : 3,465,389 data_units_written : 9,014,689 host_read_commands : 89,719,366 host_write_commands : 134,671,295 controller_busy_time : 310 power_cycles : 11 power_on_hours : 21 unsafe_shutdowns : 8 media_errors : 0 num_err_log_entries : 1 Warning Temperature Time : 0 Critical Composite Temperature Time : 0 Temperature Sensor 1 : 34 C Temperature Sensor 2 : 47 C Temperature Sensor 3 : 0 C Temperature Sensor |