Thursday, November 29, 2012

Possible hard drive failure

At 4am our primary server (hal, which is the hypervisor for most of our servers) reported two uncorrected read errors on one of its hard drives  (in a RAID 10 setup). The presence of uncorrected errors can indicate hard drive failure because it means that the hard drive has run out of spare sectors.

We have a hard drive replacement on hand, but as our servers are in Eshleman (which is under a lockdown) there may be a delay getting physical access.

Update Dec 3: We have taken backups of important data stored on hal, but do not have physical access to our servers at this time.