NetBSD has supported S.M.A.R.T. for a long time. But this functionality is well hidden. You can enable S.M.A.R.T. and check a single disk like this:
# atactl wd0 smart enable SMART supported, SMART enabled # atactl wd0 smart status SMART supported, SMART enabled id value thresh crit collect reliability description raw 1 200 51 yes online positive Raw read error rate 0 3 151 21 yes online positive Spin-up time 9441 4 100 0 no online positive Start/stop count 16 5 200 140 yes online positive Reallocated sector count 0 7 200 0 no online positive Seek error rate 0 9 89 0 no online positive Power-on hours count 8477 10 100 0 no online positive Spin retry count 0 11 100 0 no online positive Calibration retry count 0 12 100 0 no online positive Device power cycle count 15 192 200 0 no online positive Power-off retract count 4 193 134 0 no online positive Load cycle count 199998 194 114 0 no online positive Temperature 38 196 200 0 no online positive Reallocated event count 0 197 200 0 no online positive Current pending sector 0 198 100 0 no offline positive Offline uncorrectable 0 199 200 0 no online positive Ultra DMA CRC error count 0 200 100 0 no offline positive Write error rate 0
While this is very useful for manual checks it doesn’t provide automatic health reporting. And the recent abrupt failure of the backup hard disk in a friend’s machine reminded me of the importance of such monitoring. I therefore decided to implement an automated solution on top of NetBSD’s S.M.A.R.T. support.
The first step was to enable S.M.A.R.T. at system startup. I added the following lines to /etc/rc.local to make that happen:
echo "Turning on S.M.A.R.T.:" for disk in $(sysctl -n hw.disknames | tr " " \\n | grep ^wd) do echo -n "${disk}: " atactl $disk smart enable done
Now I only needed something that checks the reported metrics every night. I therefore added the following snippet to /etc/daily.local:
found= for disk in $(sysctl -n hw.disknames | tr " " \\n | grep ^wd) do relocated=$(atactl $disk smart status | sed -n -e 's/.* Reallocated sector count[^0-9]*//p') if [ $relocated -gt 0 ]; then if [ -z "$found" ]; then found=true echo "" echo "SMART checks:" fi echo "Disk $disk has $relocated relocated sectors." fi done unset disk found relocated
The above shell code reports any IDE and SATA hard disks with relocated sectors. If a hard disk reports a lot of relocated sectors or their number is growing quickly in a short time frame the disk will probably fail very soon.
Let’s hope that this way I will get an advance warning before the next major catastrophe.