Home Back to RCA

2024-04-15 - ZFS Pool Suspended

What went wrong?

The ZFS pool was suspended due to a loose connection.

After the pool was taken offline the transmission service crashed due to failing to write to the /hdd mountpoint.

Timeline

Time Event
05:01 BST Pool suspended
05:06 BST Page recieved after transmission service crashed due to I/O issues
05:07 BST 2 disks in "FAULTED" state
05:10 BST London-B restarted manually
05:13 BST 3 disks in "FAULTED" state
05:15 BST Firm push on drive bays to reseat failing drives
05:16 BST ZFS pool errors cleared using CLI
05:17 BST Pool listed as healthy. Subsequent scrub was initialized
05:30 BST 4 disks in "FAULTED" state
09:21 BST London-B taken out of rack, cables unplugged and reseated all drives
09:30 BST London-B plugged back in
09:32 BST VDev 0 reporting checksum errors. Rapidly increasing.
09:45 BST Swapped locations of drive bay of VDev 1 with VDev 0. London-B restarted
09:50 BST Pool healthy
09:55 BST ZFS scrub intialized to correct any corrupted data

Resolution

After reseating the drives and swapping the locations of the drive bays, the pool was healthy and a scrub was initialized to correct any corrupted data.

Corrective Actions

After the incident, the following actions were taken: