Bienvenido! - Willkommen! - Welcome!

Bitácora Técnica de Tux&Cía., Santa Cruz de la Sierra, BO
Bitácora Central: Tux&Cía.
Bitácora de Información Avanzada: Tux&Cía.-Información
May the source be with you!

Friday, February 1, 2013

smartctl

Analyse einer fehlerhaften Festplatte mit smartctl
Mit smartctl können Sie unter Linux die SMART Werte von Festplatten auslesen. In diesem Beispiel zeigen wir die Analyse einer defekten Festplatte. Die Festplatte in diesem Beispiel kann mehrere Sektoren nicht mehr lesen und ist somit defekt. Sie muss damit ausgetauscht werden.
=========================== 
S.M.A.R.T. (Self-Monitoring, Analysis and Reporting Technology; often written as SMART) is a monitoring system for computer hard disk drives to detect and report on various indicators of reliability, in the hope of anticipating failures.
When a failure is anticipated by S.M.A.R.T., the user may choose to replace the drive to avoid unexpected outage and data loss. The manufacturer may be able to use the S.M.A.R.T. data to discover where faults lie and prevent them from recurring in future drive designs.

Background

The purpose of S.M.A.R.T. is to warn a user of impending drive failure while there is still time to take action, such as copying the data to a replacement device.
Hard disk failures fall into one of two basic classes:
  • Predictable failures: These failures result from slow processes such as mechanical wear and gradual degradation of storage surfaces. Monitoring can determine when such failures are becoming more likely.
  • Unpredictable failures: These failures happen suddenly and without warning. They range from electronic components becoming defective to a sudden mechanical failure (perhaps due to improper handling).
Mechanical failures account for about 60% of all drive failures.[1] While the eventual failure may be catastrophic, most mechanical failures result from gradual wear and there are usually certain indications that failure is imminent. These may include increased heat output, increased noise level, problems with reading and writing of data, or an increase in the number of damaged disk sectors.
Work at Google on over 100,000 drives found correlations between certain S.M.A.R.T. information and actual failure rates. In the 60 days following the first off-line scan uncorrectable error on a drive (SMART attribute 0xC6 or 198), the drive was, on average, 39 times more likely to fail than it would have been if no such error occurred. First errors in reallocations, offline reallocations (SMART attributes 0xC4 and 0x05 or 196 and 5) and probational counts (SMART attribute 0xC5 or 197) were also strongly correlated to higher probabilities of failure. Conversely, little correlation was found for increased temperature and no correlation for usage level. However, a large proportion (56%) of the failed drives failed without giving any S.M.A.R.T. warnings at all, meaning that S.M.A.R.T. data alone was of limited usefulness in anticipating failures.[2]
PCTechGuide's page on S.M.A.R.T. (2003)[3] comments that the technology has gone through three phases:
"In its original incarnation SMART provided failure prediction by monitoring certain online hard drive activities. A subsequent version improved failure prediction by adding an automatic off-line read scan to monitor additional operations. The latest "SMART" technology not only monitors hard drive activities but adds failure prevention by attempting to detect and repair sector errors. Also, while earlier versions of the technology only monitored hard drive activity for data that was retrieved by the operating system, this latest SMART tests all data and all sectors of a drive by using "off-line data collection" to confirm the drive's health during periods of inactivity."

No comments: