Home Back to RCA

2024-04-30 - ZFS Pool in Degraded state, unable to wake up sleeping drive

What went wrong?

The ZFS pool was degraded.

A single disk in VDev 2 was marked as 'FAULTED'

Timeline

Time Event
17:50 BST Pool degraded
17:55 BST Page recieved from ZFSPoolNotOnline alert
18:30 BST Investigation of the issue started, /dev/sdh identified as the culprit.
18:40 Running a command to blink the activity LED on the bay to find out which drive location sdh is in
18:45 BST Alert auto-resolved after ZPool recovered

Root Cause

What I think happened is that the sdh drive had spun down after a period of inactivity. ZFS was unable to wake the drive and then marked the drive as 'FAULTED' as it was unresponsive.

Resolution

Drive sdh was woken up after spinning down.

                watch -n 1 'echo "q" | fdisk /dev/sdh'
            

Corrective Actions