RAID is an acronym first defined at the University of California, Berkeley in 1987 to describe a redundant array of inexpensive disks, a technology that allowed computer users to achieve high levels of storage reliability from low-cost and less reliable PC-class disk-drive components, via the technique of arranging the devices into arrays for redundancy.
Redundancy is achieved by either writing the same data to multiple drives (known as mirroring), or writing extra data (known as parity data) across the array, calculated such that the failure of one (or possibly more, depending on the type of RAID) disks in the array will not result in loss of data. A failed disk may be replaced by a new one, and the lost data reconstructed from the remaining data and the parity data. Organizing disks into a redundant array decreases the usable storage capacity. For instance, a 2-disk RAID 1 array loses half of the total capacity that would have otherwise been available using both disks independently, and a RAID 5 array with several disks loses the capacity of one disk. Other types of RAID arrays are arranged so that they are faster to write to and read from than a single disk.
There are various combinations of these approaches giving different trade-offs of protection against data loss, capacity, and speed. RAID levels 0, 1, and 5 are the most commonly found, and cover most requirements.
- RAID 0 (striped disks) distributes data across several disks in a way that gives improved speed at any given instant. If one disk fails, however, all of the data on the array will be lost, as there is neither parity nor mirroring.
- RAID 1 mirrors the contents of the disks, making a form of 1:1 ratio realtime backup. The contents of each disk in the array are identical to that of every other disk in the array.
- RAID 5 (striped disks with parity) combines three or more disks in a way that protects data against loss of any one disk. The storage capacity of the array is reduced by one disk.
- RAID 6 (striped disks with dual parity) can recover from the loss of two disks.
- RAID 10 (or 1+0) uses both striping and mirroring. "01" or "0+1" is sometimes distinguished from "10" or "1+0": a striped set of mirrored subsets and a mirrored set of striped subsets are both valid, but distinct, configurations.
RAID can involve significant computation when reading and writing information. With traditional "real" RAID hardware, a separate controller does this computation. In other cases the operating system or simpler and less expensive controllers require the host computer's processor to do the computing, which reduces the computer's performance on processor-intensive tasks (see "Software RAID" and "Fake RAID" below). Simpler RAID controllers may provide only levels 0 and 1, which require less processing.
RAID systems with redundancy continue working without interruption when one (or possibly more, depending on the type of RAID) disks of the array fail, although they are then vulnerable to further failures. When the bad disk is replaced by a new one the array is rebuilt while the system continues to operate normally. Some systems have to be powered down when removing or adding a drive; others support hot swapping, allowing drives to be replaced without powering down. RAID with hot-swapping is often used in high availability systems, where it is important that the system remains running as much of the time as possible.
RAID is not a good alternative to backing up data. Data may become damaged or destroyed without harm to the drive(s) on which they are stored. For example, some of the data may be overwritten by a system malfunction; a file may be damaged or deleted by user error or malice and not noticed for days or weeks; and, of course, the entire array is at risk of physical damage.
RAID combines two or more physical hard disks into a single logical unit by using either special hardware or software. Hardware solutions often are designed to present themselves to the attached system as a single hard drive, so that the operating system would be unaware of the technical workings. For example, you might configure a 1TB RAID 5 array using three 500GB hard drives in hardware RAID, the operating system would simply be presented with a "single" 1TB volume. Software solutions are typically implemented in the operating system and would present the RAID drive as a single volume to applications running upon the operating system.
There are three key concepts in RAID:
mirroring, the copying of data to more than one disk;
striping, the splitting of data across more than one disk; and
error correction, where redundant data is stored to allow problems to be detected and possibly fixed (known as fault tolerance).
Different RAID levels use one or more of these techniques, depending on the system requirements. RAID's main aim can be either to improve reliability and availability of data, ensuring that important data is available more often than not (e.g. a database of customer orders), or merely to improve the access speed to files (e.g. for a system that delivers video on demand TV programs to many viewers).
"Striped set without parity" or "Striping". Provides improved performance and additional storage but no redundancy or fault tolerance. Any disk failure destroys the array, which has greater consequences with more disks in the array (at a minimum, catastrophic data loss is twice as severe compared to single drives without RAID). A single disk failure destroys the entire array because when data is written to a RAID 0 drive, the data is broken into fragments. The number of fragments is dictated by the number of disks in the array. The fragments are written to their respective disks simultaneously on the same sector. This allows smaller sections of the entire chunk of data to be read off the drive in parallel, increasing bandwidth. RAID 0 does not implement error checking so any error is unrecoverable. More disks in the array means higher bandwidth, but greater risk of data loss.
'Mirrored set without parity' or 'Mirroring'. Provides fault tolerance from disk errors and failure of all but one of the drives. Increased read performance occurs when using a multi-threaded operating system that supports split seeks, as well as a very small performance reduction when writing. Array continues to operate so long as at least one drive is functioning. Using RAID 1 with a separate controller for each disk is sometimes called duplexing.
Striped set with distributed parity or interleave parity. Distributed parity requires all drives but one to be present to operate; drive failure requires replacement, but the array is not destroyed by a single drive failure. Upon drive failure, any subsequent reads can be calculated from the distributed parity such that the drive failure is masked from the end user. The array will have data loss in the event of a second drive failure and is vulnerable until the data that was on the failed drive is rebuilt onto a replacement drive. A single drive failure in the set will result in reduced performance of the entire set until the failed drive has been replaced and rebuilt.