Over the last couple of articles we have looked at the definition of RAID, the various configuration of RAID arrays, and the basics of software vs. hardware RAID systems. By now you may be thinking that this RAID stuff seems pretty neat, my data is protected because of all the redundancy features that RAID provides. It isn't my intention to deceive you, so this is the article in which I set everyone straight, RAID systems do in fact fail.
Its true, many RAID users, myself included, have been led to believe that a RAID system should not fail, as a result we have thought that our RAID's fault tolerance and auto rebuild capabilities provide almost perfect data protection, so with limited data exceptions why even perform complete backups of the RAID array.
As we saw last time RAID configurations can be implemented using either hardware or software and the distinction may even be unknown in their own system to the typical user except for the fact that they either shelled out the cash to purchase a RAID controller, or they didn't, even if the computer is configured as a RAID array.
No doubt many popular computer manufacturers of servers and high end workstations using RAID, along with RAID controller manufacturers, do promote the idea of extended data availability and protection when a failed hard disk is detected. We are led to believe that our RAID should be able to recover our data no matter what the cause of failure.
This is how it is supposed to work. In a typical RAID 5 configuration, the RAID controller should be able to rebuild the data on an array to either a standby drive or a replacement drive. Since the only time it should possibly fail to do so is when two disks fail simultaneously which is almost an impossibility (or so we are led to believe), we are safe to rely on our RAID. Accordingly it's east to understand how someone, even myself, could be duped into thinking that RAID systems can not fail. But the reality is that RAID systems do in fact fail, even to the extent that significant quantities may be lost.
In my own case an operating system failure resulted in not only corruption of one of the disks in the array, but it also corrupted the RAID configuration settings in the RAID controller card, the results was a significant loss of data. The natural tendency to follow the steps for a RAID based recovery or rebuild is absolutely the wrong thing to do in this scenario. Any attempt to rebuild in this case would have only complicated the situation and may have even resulted in the loss of even more data. Fortunately we undertook steps to secure all the disks of the array and attempt data recovery via an alternative method than RAID restoration.
The lesson I have learned is simple, in my way of thinking the most simple RAID configurations really seem the best, a set of mirrored drives provides outstanding protection, without all the sophisticated processes of trying to increase fault tolerance while at the same time reduce disk overhead requirements that hallmark more complicated RAID configurations.
I guess the old acronym KISS (keep it simple stupid) applies, as far as my thinking now. If you can't afford the disk requirements for a simple RAID, then you are probably better off buying no RAID at all. In either case RAID, or no RAID, the question shouldn't be do you have one or not, but do you still have a reliable back-up for your data despite having a RAID?
I for one have learned my lesson, you are better safe than sorry, so even a RAID is no substitute for a good system back-up.