Bienvenido! - Willkommen! - Welcome!

Bitácora Técnica de Tux&Cía., Santa Cruz de la Sierra, BO
Bitácora Central: Tux&Cía.
Bitácora de Información Avanzada: Tux&Cía.-Información
May the source be with you!

Wednesday, September 15, 2010

fixing a RAID superblock

Grub won't fix a bad superblock, the advice above only applied if you were using raid on your / partition.
1: View the contents /etc/raid.blah.conf or mdadm.conf
2: Use fsck for the partitions
4: mdadm -E -D -s
To rebuild the superblock if your .conf hasn't changed:
#mdadm -A
Recovering from a real hardware failure
This process is similar to recovering from a "simulated failure":
To recover from a from a real hardware failure, do:
  • make sure that partitions on a new device are the same as on the old one:
    • create them with fdisk (fdisk -l will tell you what partitions you have on a good disk; remember to set the same start/end blocks, and to set partition's system id to "Linux raid autodetect")
    • consult /etc/mdadm.conf file, which describes which partitions are used for md devices
  • add a new device to the array:
# mdadm /dev/md0 -a /dev/sda1
mdadm: hot added /dev/sda1
Then, you can consult mdadm --detail /dev/md0 and/or /proc/mdstat to see how long the reconstruction will take.
Make sure you run lilo when the reconstruction is complete - see below.
RAID boot CD-ROM It's always a good idea to have a CD-ROM, from which you can always boot your system (in case lilo was removed etc.).
It can be created with mkbootdisk tool:
# mkbootdisk --iso --device /root/raid-boot.iso `uname -r`
Then, just burn the created ISO.

If everything fails...

f you've done a reinstall you've probably lost your /etc/mdadm.conf . Try running mdadm assemble --scan to see if it picks up the drives again.
It's highly unlikely that you've lost the superblocks on both drives, unless it was done intentionally. If the blind assemble doesn't work, try assembling with one drive. If that works, add the other drive as a hot spare and the array will automatically rebuild.
If you have lost superblocks off both drives, you can try mdadm build to bring up an array.
After more search on the internet, I found that one of the new features of Debian 505 (I had 504) was "experimental support for software raid...", this seemed to imply that support was not present in the version I was using.
So I installed 505 with software raid1 which seemed to require a mount point for the 2 sata raid drives. Raid 1 was not working after reboot and I tried the following command:
mdadm --create /dev/md0 --metadata=1 --level=1 --raid-devices=2 /dev/sda1 /dev/sdb1
this did not work at first but it did after unmounting the 2 drives
and this gave the following result:
size=488383936K mtime=Thu Jul 22 01:21:15 2010
mdadm: /dev/sdb1 appears to contain an ext2fs file system
size=488383936K mtime=Thu Jul 22 01:21:15 2010
Continue creating array? y
mdadm: array /dev/md0 started.
Does this mean the array is now working or do I still need to test if it is?
Also, the 2 drives have ext3 filesystem, should the drives be reformated to ext2? How?
(It seems experimental support for raid.. is intended to work on ext2 filesystems only!)
cat /proc/mdstat gave the following result
Personalities : [raid1]
md0 : active (auto-read-only) raid1 sdb1[1] sda1[0]
487307508 blocks super 1.0 [2/2] [UU]
unused devices:

4. Error Recovery

  1. Q: I have a RAID-1 (mirroring) setup, and lost power while there was disk activity. Now what do I do?
    A: The redundancy of RAID levels is designed to protect against a disk failure, not against a power failure. There are several ways to recover from this situation.
    • Method (1): Use the raid tools. These can be used to sync the raid arrays. They do not fix file-system damage; after the raid arrays are sync'ed, then the file-system still has to be fixed with fsck. Raid arrays can be checked with ckraid /etc/raid1.conf (for RAID-1, else, /etc/raid5.conf, etc.) Calling ckraid /etc/raid1.conf --fix will pick one of the disks in the array (usually the first), and use that as the master copy, and copy its blocks to the others in the mirror. To designate which of the disks should be used as the master, you can use the --force-source flag: for example, ckraid /etc/raid1.conf --fix --force-source /dev/hdc3 The ckraid command can be safely run without the --fix option to verify the inactive RAID array without making any changes. When you are comfortable with the proposed changes, supply the --fix option.
    • Method (2): Paranoid, time-consuming, not much better than the first way. Lets assume a two-disk RAID-1 array, consisting of partitions /dev/hda3 and /dev/hdc3. You can try the following:

      1. fsck /dev/hda3
      2. fsck /dev/hdc3
      3. decide which of the two partitions had fewer errors, or were more easily recovered, or recovered the data that you wanted. Pick one, either one, to be your new ``master'' copy. Say you picked /dev/hdc3.
      4. dd if=/dev/hdc3 of=/dev/hda3
      5. mkraid raid1.conf -f --only-superblock
      Instead of the last two steps, you can instead run ckraid /etc/raid1.conf --fix --force-source /dev/hdc3 which should be a bit faster.
    • Method (3): Lazy man's version of above. If you don't want to wait for long fsck's to complete, it is perfectly fine to skip the first three steps above, and move directly to the last two steps. Just be sure to run fsck /dev/md0 after you are done. Method (3) is actually just method (1) in disguise.
    In any case, the above steps will only sync up the raid arrays. The file system probably needs fixing as well: for this, fsck needs to be run on the active, unmounted md device. With a three-disk RAID-1 array, there are more possibilities, such as using two disks to ''vote'' a majority answer. Tools to automate this do not currently (September 97) exist.

No comments: