Self-Monitoring, Analysis, and Reporting Technology, or S.M.A.R.T., is a monitoring system for computer hard disks to detect and report on various indicators of reliability, and assist in the anticipation of failures.
PC techguide's page on S.M.A.R.T. (2003) comments that the technology has gone through three phases: "In its original incarnation SMART provided failure prediction by monitoring certain online hard drive activities. A subsequent version improved failure prediction by adding an automatic off-line read scan to monitor additional operations. The latest SMART III technology not only monitors hard drive activities but adds failure prevention by attempting to detect and repair sector errors. Also, whilst earlier versions of the technology only monitored hard drive activity for data that was retrieved by the operating system, SMART III tests all data and all sectors of a drive by using off-line data collection to confirm the drive's health during periods of inactivity."
S.M.A.R.T History and predecessors
S.M.A.R.T Information
The inability to read some sectors is not always an indication that the drive is about to fail; one way that unreadable sectors can be created even when the drive is functioning within specification is if the power fails while the drive is writing. Even if the physical disk is damaged in one location so that a sector is unreadable, the disk may be able to use spare space to replace the bad area so that the sector can be overwritten.
A drive supporting SMART may optionally support a number of self-test or maintenance routines, and the results of the tests are kept in the self-test log. The self-test routines can be efficiently used to detect any unreadable sectors on the disk so that they may be restored from backup (for example, from other disks in a RAID). This helps to reduce the risk of a situation where one sector on a disk becomes unreadable, then the backup is damaged, and the data is lost forever.
S.M.A.R.T Standards and Implementation
Some S.M.A.R.T.-enabled motherboards and related software may not communicate with certain S.M.A.R.T.-capable drives, depending on the type of interface. Few external drives connected via USB and Firewire correctly send S.M.A.R.T. data over those interfaces. With so many ways to connect a hard drive (e.g. SCSI, Fibre Channel, ATA, SATA, SAS, SSA and SSD) it's difficult to predict whether S.M.A.R.T. reports will function correctly.
Even on hard drives and interfaces that support it, S.M.A.R.T. data may not be reported correctly to the computer's operating system. Some disk controllers can duplicate all write operations on a secondary "backup" drive in real-time. This feature is known as "RAID mirroring". However, many programs which are designed to analyze changes in drive behavior and relay S.M.A.R.T. alerts to the operator do not function when a computer system is configured for RAID support, usually because under normal RAID array operational conditions, the computer may not be permitted to 'see' (or directly access) individual physical drives, but only logical volumes, by the RAID array subsystem.
S.M.A.R.T Attributes
Spinup retry count
Count of retry of spin start attempts. This attribute stores a total count of the spin start attempts to reach the fully operational speed (under the condition that the first attempt was unsuccessful). An increase of this attribute value is a sign of problems in the hard disk mechanical subsystem. (Better: RAW Value LESS)
Spinup retry count
Count of retry of spin start attempts. This attribute stores a total count of the spin start attempts to reach the fully operational speed (under the condition that the first attempt was unsuccessful). An increase of this attribute value is a sign of problems in the hard disk mechanical subsystem. (Better: RAW Value LESS)
Note that the attribute values are always mapped to the range of 1 to 253 in a way that means higher values are better. For example, the "Reallocated Sectors Count" attribute value decreases as the number of reallocated sectors increases. In this case, the attribute's raw value will often indicate the actual number of sectors that were reallocated.
Known S.M.A.R.T. attributesLegend | ||||
Higher value is better | Lower value is better | |||
Critical | Potential indicators of imminent electromechanical failure | |||
ID | Hex | Attribute name | Better | Description |
01 | 01 | Read Error Rate | Indicates the rate of hardware read errors that occurred when reading data from a disk surface. Any number indicates a problem with either disk surface or read/write heads. | |
02 | 02 | Throughput Performance | Overall (general) throughput performance of a hard disk drive. If the value of this attribute is decreasing there is a high probability that there is a problem with the disk. | |
03 | 03 | Spin-Up Time | Average time of spindle spin up (from zero RPM to fully operational). | |
04 | 04 | Start/Stop Count | A tally of spindle start/stop cycles. | |
05 | 05 | Reallocated Sectors Count | Count of reallocated sectors. When the hard drive finds a read/write/verification error, it marks this sector as "reallocated" and transfers data to a special reserved area (spare area). This process is also known as remapping and "reallocated" sectors are called remaps. This is why, on modern hard disks, "bad blocks" cannot be found while testing the surface — all bad blocks are hidden in reallocated sectors. However, the more sectors that are reallocated, the more read/write speed will decrease. | |
06 | 06 | Read Channel Margin | Margin of a channel while reading data. The function of this attribute is not specified. | |
07 | 07 | Seek Error Rate | Rate of seek errors of the magnetic heads. If there is a failure in the mechanical positioning system, a servo damage or a thermal widening of the hard disk, seek errors arise. More seek errors indicates a worsening condition of a disk surface and the mechanical subsystem. | |
08 | 08 | Seek Time Performance | Average performance of seek operations of the magnetic heads. If this attribute is decreasing, it is a sign of problems in the mechanical subsystem. | |
09 | 09 | Power-On Hours (POH) | Count of hours in power-on state. The raw value of this attribute shows total count of hours (or minutes, or seconds, depending on manufacturer) in power-on state. | |
10 | 0A | Spin Retry Count | Count of retry of spin start attempts. This attribute stores a total count of the spin start attempts to reach the fully operational speed under the condition that the first attempt was unsuccessful). An increase of this attribute value is a sign of problems in the hard disk mechanical subsystem. | |
11 | 0B | Recalibration Retries | This attribute indicates the number of times recalibration was requested (under the condition that the first attempt was unsuccessful). A decrease of this attribute value is a sign of problems in the hard disk mechanical subsystem. | |
12 | 0C | Device Power Cycle Count | This attribute indicates the count of full hard disk power on/off cycles. | |
13 | 0D | Soft Read Error Rate | Uncorrected read errors reported to the operating system. If the value is non-zero, you should back up your data. | |
190 | BE | Airflow Temperature (WDC) | Airflow temperature on Western Digital HDs (Same as temp. (C2), but current value is 50 less.) | |
190 | BE | Temperature Difference from 100 | Value is equal to (100 -temp °C), allowing manufacturer to set a minimum threshold which corresponds to a maximum temperature. Seagate ST910021AS: Verified Present Seagate ST3802110A: Verified Present 2007-02-13 Seagate ST980825AS: Verified Present 2007-04-05 Seagate ST3320620AS: Verified Present 2007-04-23 Seagate ST3500641AS: Verified Present 2007-06-12 Seagate ST3250824AS: Verified Present 2007-08-07 | |
191 | BF | G-sense error rate | Frequency of mistakes as a result of impact loads | |
192 | C0 | Power-off Retract Count | Number of times the heads are loaded off the media. Heads can be unloaded without actually powering off. (or Emergency Retract Cycle count -Fujitsu) | |
193 | C1 | Load/Unload Cycle | Count of load/unload cycles into head landing zone position. | |
194 | C2 | Temperature | Current internal temperature. | |
195 | C3 | Hardware ECC Recovered | Time between ECC-corrected errors. | |
196 | C4 | Reallocation Event Count | Count of remap operations. The raw value of this attribute shows the total number of attempts to transfer data from reallocated sectors to a spare area. Both successful & unsuccessful attempts are counted. | |
197 | C5 | Current Pending Sector Count | Number of "unstable" sectors (waiting to be remapped). If the unstable sector is subsequently written or read successfully, this value is decreased and the sector is not remapped. Read errors on the sector will not remap the sector, it will only be remapped on a failed write attempt. This can be problematic to test because cached writes will not remap the sector, only direct I/O writes to the disk. | |
198 | C6 | Uncorrectable Sector Count | The total number of uncorrectable errors when reading/writing a sector. A rise in the value of this attribute indicates defects of the disk surface and/or problems in the mechanical subsystem. | |
199 | C7 | UltraDMA CRC Error Count | The number of errors in data transfer via the interface cable as determined by ICRC (Interface Cyclic Redundancy Check). | |
200 | C8 | Write Error Rate / Multi-Zone Error Rate | The total number of errors when writing a sector. | |
201 | C9 | Soft Read Error Rate | Number of off-track errors. If non-zero, make a backup. | |
202 | CA | Data Address Mark errors | Number of Data Address Mark errors (or vendorspecific). | |
203 | CB | Run Out Cancel | Number of ECC errors | |
204 | CC | Soft ECC Correction | Number of errors corrected by software ECC | |
205 | CD | Thermal Asperity Rate (TAR) | Number of thermal asperity errors. | |
206 | CE | Flying Height | ||
207 | CF | Spin High Current | Amount of high current used to spin up the drive. | |
208 | D0 | Spin Buzz | Number of buzz routines to spin up the drive | |
209 | D1 | Offline Seek Performance | Drive’s seek performance during offline operations | |
220 | DC | Disk Shift | Distance the disk has shifted relative to the spindle (usually due to shock). Unit of measure is unknown. | |
221 | DD | G-Sense Error Rate | The number of errors resulting from externally-induced shock & vibration. | |
222 | DE | Loaded Hours | Time spent operating under data load (movement of magnetic head armature) | |
223 | DF | Load/Unload Retry Count | Number of times head changes position. | |
224 | E0 | Load Friction | Resistance caused by friction in mechanical parts while operating. | |
225 | E1 | Load/Unload Cycle Count | Total number of load cycles | |
226 | E2 | Load 'In'-time | Total time of loading on the magnetic heads actuator (time not spent in parking area). | |
227 | E3 | Torque Amplification Count | Number of attempts to compensate for platter speed variations | |
228 | E4 | Power- Off Retract Cycle | The number of times the magnetic armature was retracted automatically as a result of cutting power. | |
230 | E6 | GMR Head Amplitude | Amplitude of "thrashing" (distance of repetitive forward/reverse head motion) | |
231 | E7 | Temperature | Drive Temperature | |
240 | F0 | Head Flying Hours | Time while head is positioning | |
250 | FA | Read Error Retry Rate | Number of errors while reading from a disk |
No comments:
Post a Comment