We experience a raid failure on the s03 node. We try to recover the data but it does not look good as a second disk failed while the pool was resilvering.
 As usual, make sure you take snapshots for such events, and others, such as hacking, accidental deletion or removal of the wrong instance when you do housekeeping, etc.
More information as we have available here.

Update: 2:17
The resilvering is running and will be running for a few hours more. The second disk failed is not completely dead, has too many failures though, so we are not confident the data will be consistent. We suggest you think of restoring a snapshot when the node will be back in the morning.
When the resilvering will be complete, we will post an update as we will know how consistent the data really is, until then there is nothing much we can do or update you on.

Update 7:29
Resilvering completed with many (>3 millions) errors. A consistence check of the filesystem show I/O errors on every volume, so no data recovery is possible. Local storage is being rebuilt.

Update 9:43
Storage has been rebuilt and empty volumes created. You can proceed to restore snapshots or reinstall using the reset VM function to use the same template.

Wednesday, September 27, 2017

« Back

Powered by WHMCompleteSolution