In December 2009 I rebuilt my home file server to be a FreeBSD 8.x machine with six 1.5TByte Western Digital Caviar Green hard disk drives (HDDs) in a ZFS raidz2 configuration. In late January 2010 I discovered that my chosen HDDs have a known problem — they park the drive heads after 8 seconds of idle time, and then get woken up every minute or so by FreeBSD (apparently Linux has same issue) which causes (a) a pause as a the head unloads, and (b) abnormally high wear as the head is loading/unloading far more times per hour than designed. This post summarises what I borrowed from other people to ‘solve’ the problem.
All six of my 1.5TB drives (
/dev/ada5) were affected. I have three WD15EADS-00P8B0 and three WD15EADS-00S2B0 drives, all but one with 01.00A01 firmware:
Jan 18 09:15:10 gjabkup2 kernel: ada0: <WDC WD15EADS-00P8B0 01.00A01> ATA-8 SATA 2.x device Jan 18 09:15:10 gjabkup2 kernel: ada1: <WDC WD15EADS-00S2B0 04.05G04> ATA-8 SATA 2.x device Jan 18 09:15:10 gjabkup2 kernel: ada2: <WDC WD15EADS-00S2B0 01.00A01> ATA-8 SATA 2.x device Jan 18 09:15:10 gjabkup2 kernel: ada3: <WDC WD15EADS-00P8B0 01.00A01> ATA-8 SATA 2.x device Jan 18 09:15:10 gjabkup2 kernel: ada4: <WDC WD15EADS-00P8B0 01.00A01> ATA-8 SATA 2.x device Jan 18 09:15:10 gjabkup2 kernel: ada5: <WDC WD15EADS-00S2B0 01.00A01> ATA-8 SATA 2.x device
I first noticed a thread on FreeBSD-stable mailing list in mid January (e.g. this and related posts) which alerted me to the potential problem. This then sent me on a hunt, which turned up earlier posts in Linux forums (such as this recommending against WD Caviar Green drives in home NASes). The problem reveals itself through the “
Load_Cycle_Count” (LCC) SMART parameter, which indicates how many times the drive head has been parked and then unloaded in order to resume regular drive read/write operations.
If you’re running Windows, more than 8 seconds of no activity means an idle system. The same cannot be said if you’re running FreeBSD or Linux. The drives will park their heads, and then a minute or so later, FreeBSD will do something file-system related and cause the drive to un-park its head again. Wash, rinse, repeat.
Over the period of 16 minutes, the LCC on each of my drives was growing noticeably:
Time 1264567481 Drive /dev/ada0 /dev/ada1 /dev/ada2 /dev/ada3 /dev/ada4 /dev/ada5 Load_Cycle_Count 16020 14970 15087 14464 14679 14971 Time 1264567680 Drive /dev/ada0 /dev/ada1 /dev/ada2 /dev/ada3 /dev/ada4 /dev/ada5 Load_Cycle_Count 16028 14978 15095 14472 14687 14979 Time 1264567921 Drive /dev/ada0 /dev/ada1 /dev/ada2 /dev/ada3 /dev/ada4 /dev/ada5 Load_Cycle_Count 16033 14983 15100 14477 14692 14984 Time 1264568160 Drive /dev/ada0 /dev/ada1 /dev/ada2 /dev/ada3 /dev/ada4 /dev/ada5 Load_Cycle_Count 16036 14987 15104 14480 14695 14988
Over longer periods of time I was seeing on average 40 head park events per hour for each drive.
Yup. Looks like I’m a doofus for buying WD Caviar Green drives for my home server. *sigh*
However: There is hope! Turns out, if you tickle the drives every 5 seconds you can stop them from parking the heads. See this post for the basic idea and credit. My server runs a gmirror across the first partition of all six drives for “/”, and a raidz2 zpool across the 2nd partition of all six drives for all other file systems. I configured the python script to tickle a file on the “/” file system every five seconds (so the gmirror would touch all six drives), and voila! LCC stopped rising on all drives.
The figure below shows a plot of LCC growth over time between January 27th and April 16th 2010 (with LCC for each drive normalised to start at zero on Jan 27th).
The initial growth was a few days prior to starting the python script tickling the drives every 5 seconds. Once the drives are being tickled regularly, LCC growth completely stops. Then in early April I turned off the python script for a brief period, and confirmed that indeed rapid LCC growth resumes if the drives are no longer being tickled frequently.