2024-04-30 - ZFS Pool in Degraded state, unable to wake up sleeping drive
What went wrong?
The ZFS pool was degraded.
A single disk in VDev 2 was marked as 'FAULTED'
Timeline
Time | Event |
---|---|
17:50 BST | Pool degraded |
17:55 BST | Page recieved from ZFSPoolNotOnline alert |
18:30 BST | Investigation of the issue started, /dev/sdh identified as the culprit. |
18:40 | Running a command to blink the activity LED on the bay to find out which drive location sdh is in |
18:45 BST | Alert auto-resolved after ZPool recovered |
Root Cause
What I think happened is that the sdh
drive had spun down after a period of inactivity.
ZFS was unable to wake the drive and then marked the drive as 'FAULTED' as it was unresponsive.
Resolution
Drive sdh
was woken up after spinning down.
watch -n 1 'echo "q" | fdisk /dev/sdh'
Corrective Actions
- Disable sleeping (activity timeout) on all drives