2024-04-15 - ZFS Pool Suspended
What went wrong?
The ZFS pool was suspended due to a loose connection.
After the pool was taken offline the transmission service crashed due to failing to write to the /hdd mountpoint.
Timeline
| Time | Event | 
|---|---|
| 05:01 BST | Pool suspended | 
| 05:06 BST | Page recieved after transmission service crashed due to I/O issues | 
| 05:07 BST | 2 disks in "FAULTED" state | 
| 05:10 BST | London-B restarted manually | 
| 05:13 BST | 3 disks in "FAULTED" state | 
| 05:15 BST | Firm push on drive bays to reseat failing drives | 
| 05:16 BST | ZFS pool errors cleared using CLI | 
| 05:17 BST | Pool listed as healthy. Subsequent scrub was initialized | 
| 05:30 BST | 4 disks in "FAULTED" state | 
| 09:21 BST | London-B taken out of rack, cables unplugged and reseated all drives | 
| 09:30 BST | London-B plugged back in | 
| 09:32 BST | VDev 0 reporting checksum errors. Rapidly increasing. | 
| 09:45 BST | Swapped locations of drive bay of VDev 1 with VDev 0. London-B restarted | 
| 09:50 BST | Pool healthy | 
| 09:55 BST | ZFS scrub intialized to correct any corrupted data | 
Resolution
After reseating the drives and swapping the locations of the drive bays, the pool was healthy and a scrub was initialized to correct any corrupted data.
Corrective Actions
After the incident, the following actions were taken:
- Drives were reseated
 - Drive bays were swapped
 - Pool scrub was initialized