Yesterday at about 22:30 CST a major malfunction knocked our SAN offline.

That is an expensive piece of hardware and with a comprehensive service contract where only the Hitachi authorized personal can intervene.

They came in weekend in a few hours and quickly restored 2 partitions which were lightly used, therefore the errors have been less severe.

The main partition with many petabytes of data took much longer especially because they have found the cause of the corruption, a bug in the SANs firmware. They have sent all logs to the central labs, they replicated the issue and created a patch in only a few hours, but the effective recovery and data integrity check lasted many more hours due to the sheer size of it.

This was a major malfunction and we are really sorry for the inconvenience, yet the response time in weekend and holiday of the manufacturer has been quick.

I would like to apologize especially for our site being down during this time, we will investigate possibilities to host it externally.

Our efforts are going now towards restarting the services, we will consider collateral issues after that is done.

In case your service is down:

  1. Shut it down and start it. A simple restart MIGHT work, but if you had an ISO mounted, for example, that is unlikely as well as in other situations. Best is to be a full power down and up.
  2. If 1 does not work, please wait for a while. Check the console if the service has it for possible fsck running, that could take a long time even as the data has been restored completely, as the support team said.
  3. If there is an error at start in the control panel, please let us know about it, we are still starting the systems as this must be done in a certain order. We will announce here when everything is up and running.

We will keep this space updated with anything we think it might be in the interest of our customers to know.



At this time all services have been restored.

It does not mean your service has been automatically restarted, there can be errors at the start or a long fsck. Please check the console before restarting it.

In case you had a few years of uptime and multiple updates done without restart, that might mean the start-up files have been mangled long ago during an update gone wrong but you might not know as the service has not been restarted since. In that case, please open a ticket, but the response time might not be great. we are sorry for that.

It looks like there are no more reports of problems and everything is operating within parameters. If your service still has problems, we can now address individual cases.

Sunday, October 31, 2021

« Back

Powered by WHMCompleteSolution