2024-04-15 - ZFS Pool Suspended
What went wrong?
The ZFS pool was suspended due to a loose connection.
After the pool was taken offline the transmission service crashed due to failing to write to the /hdd
mountpoint.
Timeline
Time | Event |
---|---|
05:01 BST | Pool suspended |
05:06 BST | Page recieved after transmission service crashed due to I/O issues |
05:07 BST | 2 disks in "FAULTED" state |
05:10 BST | London-B restarted manually |
05:13 BST | 3 disks in "FAULTED" state |
05:15 BST | Firm push on drive bays to reseat failing drives |
05:16 BST | ZFS pool errors cleared using CLI |
05:17 BST | Pool listed as healthy. Subsequent scrub was initialized |
05:30 BST | 4 disks in "FAULTED" state |
09:21 BST | London-B taken out of rack, cables unplugged and reseated all drives |
09:30 BST | London-B plugged back in |
09:32 BST | VDev 0 reporting checksum errors. Rapidly increasing. |
09:45 BST | Swapped locations of drive bay of VDev 1 with VDev 0. London-B restarted |
09:50 BST | Pool healthy |
09:55 BST | ZFS scrub intialized to correct any corrupted data |
Resolution
After reseating the drives and swapping the locations of the drive bays, the pool was healthy and a scrub was initialized to correct any corrupted data.
Corrective Actions
After the incident, the following actions were taken:
- Drives were reseated
- Drive bays were swapped
- Pool scrub was initialized