하드디스크 수명 알아내는 노하우?

컴퓨터를 사용하는 중에 하드디스크가 수명을 다하면 정말 난감하잖아요?
얼마전 제 노트북에 이상이 생겨 문의를 드렸더니 수명이 얼마 남지 않았다고 알려주셔서
요즘 정말 조심조심 사용하고 있습니다.
조만간에 교체를 하려고 하고 있는데요 …

그런데 최근 문제가 터졌습니다.
제가 전에 서버를 구축하여 주었던 어떤 교회 홈페이지가 갑자기 다운된 것입니다.
알고보니 서버의 하드디스크가 고장난 것입니다.
제가 그곳을 떠난지 오래 되었기 때문에 신경 쓸 일은 아니지만
왕초보인 제가 구축해 주고 온 곳이라 마음이 쓰리네요.
디비 백업하고 하드디스크 교체하고 해야 한다고 그렇게 신신 당부를 했는데…

지금 생각해보니 서버를 구축한지 3년이 넘은 것 같습니다.

갑자기 궁금한 점이 생겨서 …

  1. 일반 데스크탑에 SATA하드로 서버를 구축할 때 서버의 하드디스크 수명을 어느 정도로 생각하시는지요?
    정해진 기간은 없겠지만 여러분들은 어느 정도 기간으로 교체 주기를 삼고 계신지요?

  2. 노트북을 사용하다보면 노트북 소리만 들어보아도 감으로 알 수 있는 징조들이 있잖아요?
    ‘이러다가 한 달 못 버티겠구나!’,
    ‘이건 일주일 내에 나간다!’
    이런 감 말입니다. 그런 사례들이 알고 싶네요. 여러분은 어떻게 판단하시나요?

왠지 요즘 노트북 소리가 커진 것 같기도 하네요.(팬 소리인지도 구분 못하면서… ^^)

그럼 …

자주쓰면 보통 3년지나면 bad sector가 생기는 것 같습니다.
자주 쓰지 않은 컴퓨터는 7년 써도 거뜬하네요 ^^;

SMART 기능이 있는 하드디스크는 스스로 진단하지 않나요, 요즘 대부분의 하드디스크가 SMART기능이 있던데요.

SMART 기능이 있는 하드디스크는 회전속도가 느려진다는지 헤드가 여러번 움직여야 된다든지 등등을 체크해서 사용자에게 알려줄겁니다. 우분투에서도 SMART 모니터링을 할 수 있어요.


GUI 환경의 storage device manager도 있었는데, 그거는 이름을 정확하게 모르겠네요.

좋은 하루 되세요 :)

정말 좋은 정보 감사합니다. 이런 방법이 다 있었군요.

위 링크를 열어 못하는 영어 실력으로 대충!!! 제 노트북에 설치하고 실행해 보니 이런 메시지가 뜨네요.
(제대로 한 것인지는 모르지만 … )
무슨 뜻인가요? 대충이라도 요약을 …

[color=#4080FF:5kik3t9t]choi@choi-laptop:~$ sudo smartctl -a /dev/sda1
smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

Device Model: SAMSUNG HM100JI
Serial Number: S0EFJ1YL900101
Firmware Version: YH100-19
User Capacity: 100,030,242,816 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 7
ATA Standard is: ATA/ATAPI-7 T13 1532D revision 0
Local Time is: Mon May 17 14:36:15 2010 CEST

==> WARNING: May need -F samsung or -F samsung2 enabled; see manual for details.

SMART support is: Available - device has SMART capability.
SMART support is: Enabled

SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x05) Offline data collection activity
was aborted by an interrupting command from host.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (4372) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 72) minutes.
SCT capabilities: (0x003f) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
1 Raw_Read_Error_Rate 0x000f 253 100 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0007 253 253 025 Pre-fail Always - 2752
4 Start_Stop_Count 0x0032 092 092 000 Old_age Always - 8757
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 1
7 Seek_Error_Rate 0x000e 253 253 000 Old_age Always - 0
8 Seek_Time_Performance 0x0024 253 253 000 Old_age Offline - 0
9 Power_On_Hours 0x0032 099 099 000 Old_age Always - 696496
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 3
11 Calibration_Retry_Count 0x0012 100 100 000 Old_age Always - 27
12 Power_Cycle_Count 0x0032 096 096 000 Old_age Always - 4378
190 Airflow_Temperature_Cel 0x0022 052 040 000 Old_age Always - 48 (Lifetime Min/Max 7/60)
191 G-Sense_Error_Rate 0x0012 089 089 000 Old_age Always - 111880
192 Power-Off_Retract_Count 0x0012 253 253 000 Old_age Always - 0
193 Load_Cycle_Count 0x0012 066 066 000 Old_age Always - 351995
194 Temperature_Celsius 0x0022 052 040 000 Old_age Always - 48 (Lifetime Min/Max 7/60)
195 Hardware_ECC_Recovered 0x001a 100 100 000 Old_age Always - 10556
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 1
197 Current_Pending_Sector 0x0012 253 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 253 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x000a 100 100 000 Old_age Always - 0
201 Soft_Read_Error_Rate 0x0012 253 253 000 Old_age Always - 0
223 Load_Retry_Count 0x0012 100 100 000 Old_age Always - 27
225 Load_Cycle_Count 0x0012 066 066 000 Old_age Always - 351995
255 Unknown_Attribute 0x000a 253 100 000 Old_age Always - 0

SMART Error Log Version: 1
ATA Error Count: 1436 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 1436 occurred at disk power-on lifetime: 5679 hours (236 days + 15 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:

40 51 08 40 6f 8e e1 Error: UNC 8 sectors at LBA = 0x018e6f40 = 26111808

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name

c8 00 08 40 6f 8e e1 00 02:01:41.750 READ DMA
ec 00 00 00 00 00 a0 00 02:01:41.688 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 00 02:01:41.688 SET FEATURES [Set transfer mode]
ec 00 00 00 00 00 a0 00 02:01:41.688 IDENTIFY DEVICE

Error 1435 occurred at disk power-on lifetime: 5679 hours (236 days + 15 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:

40 51 08 40 6f 8e e1 Error: UNC 8 sectors at LBA = 0x018e6f40 = 26111808

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name

c8 00 08 40 6f 8e e1 00 02:01:39.250 READ DMA
ec 00 00 00 00 00 a0 00 02:01:39.250 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 00 02:01:39.250 SET FEATURES [Set transfer mode]
ec 00 00 00 00 00 a0 00 02:01:39.188 IDENTIFY DEVICE

Error 1434 occurred at disk power-on lifetime: 5679 hours (236 days + 15 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:

40 51 08 40 6f 8e e1 Error: UNC 8 sectors at LBA = 0x018e6f40 = 26111808

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name

c8 00 08 40 6f 8e e1 00 02:01:36.750 READ DMA
ec 00 00 00 00 00 a0 00 02:01:36.750 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 00 02:01:36.750 SET FEATURES [Set transfer mode]
ec 00 00 00 00 00 a0 00 02:01:36.750 IDENTIFY DEVICE

Error 1433 occurred at disk power-on lifetime: 5679 hours (236 days + 15 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:

40 51 08 40 6f 8e e1 Error: UNC 8 sectors at LBA = 0x018e6f40 = 26111808

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name

c8 00 08 40 6f 8e e1 00 02:01:34.313 READ DMA
ec 00 00 00 00 00 a0 00 02:01:34.313 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 00 02:01:34.250 SET FEATURES [Set transfer mode]
ec 00 00 00 00 00 a0 00 02:01:34.250 IDENTIFY DEVICE

Error 1432 occurred at disk power-on lifetime: 5679 hours (236 days + 15 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:

40 51 08 40 6f 8e e1 Error: UNC 8 sectors at LBA = 0x018e6f40 = 26111808

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name

c8 00 08 40 6f 8e e1 00 02:01:31.813 READ DMA
ec 00 00 00 00 00 a0 00 02:01:31.813 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 00 02:01:31.813 SET FEATURES [Set transfer mode]
ec 00 00 00 00 00 a0 00 02:01:31.813 IDENTIFY DEVICE

SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]

SMART Selective Self-Test Log Data Structure Revision Number (0) should be 1
SMART Selective self-test log data structure revision number 0
Warning: ATA Specification requires selective self-test log data structure revision number = 1
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


우분투의 시스템>관리>디스크 도구가 SMART 데이터를 체크해서 하드디스크에 문제가 있으면 알려 주더군요.
전에 하드디스크에 문제가 있다고 해서 하드디스크 교체 받았어요.
재할당 횟수인가 뭔가가 있다고 해서…