How to recover a crashed Linux md RAID5 array?

Written by: J Dawg

Some time ago I had a RAID5 system at home. One of the 4 disks failed but after removing and putting it back it seemed to be OK so I started a resync. When it finished I realized, to my horror, that 3 out of 4 disks failed. However I don’t believe that’s possible. There are multiple partitions on the disks each part of a different RAID array.

md0 is a RAID1 array comprised of sda1, sdb1, sdc1 and sdd1.
md1 is a RAID5 array comprised of sda2, sdb2, sdc2 and sdd2.
md2 is a RAID0 array comprised of sda3, sdb3, sdc3 and sdd3.

md0 and md2 reports all disks up while md1 reports 3 failed (sdb2, sdc2, sdd2). It’s my understanding that when hard drives fail all the partitions should be lost not just the middle ones.

At that point I turned the computer off and unplugged the drives. Since then I was using that computer with a smaller new disk.

Is there any hope of recovering the data? Can I somehow convince mdadm that my disks are in fact working? The only disk that may really have a problem is sdc but that one too is reported up by the other arrays.

Update

I finally got a chance to connect the old disks and boot this machine from SystemRescueCd. Everything above was written from memory. Now I have some hard data. Here is the output of mdadm --examine /dev/sd*2

/dev/sda2:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 53eb7711:5b290125:db4a62ac:7770c5ea
  Creation Time : Sun May 30 21:48:55 2010
     Raid Level : raid5
  Used Dev Size : 625064960 (596.11 GiB 640.07 GB)
     Array Size : 1875194880 (1788.33 GiB 1920.20 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 1

    Update Time : Mon Aug 23 11:40:48 2010
          State : clean
 Active Devices : 3
Working Devices : 4
 Failed Devices : 1
  Spare Devices : 1
       Checksum : 68b48835 - correct
         Events : 53204

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     0       8        2        0      active sync   /dev/sda2

   0     0       8        2        0      active sync   /dev/sda2
   1     1       8       18        1      active sync   /dev/sdb2
   2     2       8       34        2      active sync   /dev/sdc2
   3     3       0        0        3      faulty removed
   4     4       8       50        4      spare   /dev/sdd2
/dev/sdb2:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 53eb7711:5b290125:db4a62ac:7770c5ea
  Creation Time : Sun May 30 21:48:55 2010
     Raid Level : raid5
  Used Dev Size : 625064960 (596.11 GiB 640.07 GB)
     Array Size : 1875194880 (1788.33 GiB 1920.20 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 1

    Update Time : Mon Aug 23 11:44:54 2010
          State : clean
 Active Devices : 2
Working Devices : 3
 Failed Devices : 1
  Spare Devices : 1
       Checksum : 68b4894a - correct
         Events : 53205

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     1       8       18        1      active sync   /dev/sdb2

   0     0       0        0        0      removed
   1     1       8       18        1      active sync   /dev/sdb2
   2     2       8       34        2      active sync   /dev/sdc2
   3     3       0        0        3      faulty removed
   4     4       8       50        4      spare   /dev/sdd2
/dev/sdc2:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 53eb7711:5b290125:db4a62ac:7770c5ea
  Creation Time : Sun May 30 21:48:55 2010
     Raid Level : raid5
  Used Dev Size : 625064960 (596.11 GiB 640.07 GB)
     Array Size : 1875194880 (1788.33 GiB 1920.20 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 1

    Update Time : Mon Aug 23 11:44:54 2010
          State : clean
 Active Devices : 1
Working Devices : 2
 Failed Devices : 2
  Spare Devices : 1
       Checksum : 68b48975 - correct
         Events : 53210

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     2       8       34        2      active sync   /dev/sdc2

   0     0       0        0        0      removed
   1     1       0        0        1      faulty removed
   2     2       8       34        2      active sync   /dev/sdc2
   3     3       0        0        3      faulty removed
   4     4       8       50        4      spare   /dev/sdd2
/dev/sdd2:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 53eb7711:5b290125:db4a62ac:7770c5ea
  Creation Time : Sun May 30 21:48:55 2010
     Raid Level : raid5
  Used Dev Size : 625064960 (596.11 GiB 640.07 GB)
     Array Size : 1875194880 (1788.33 GiB 1920.20 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 1

    Update Time : Mon Aug 23 11:44:54 2010
          State : clean
 Active Devices : 1
Working Devices : 2
 Failed Devices : 2
  Spare Devices : 1
       Checksum : 68b48983 - correct
         Events : 53210

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     4       8       50        4      spare   /dev/sdd2

   0     0       0        0        0      removed
   1     1       0        0        1      faulty removed
   2     2       8       34        2      active sync   /dev/sdc2
   3     3       0        0        3      faulty removed
   4     4       8       50        4      spare   /dev/sdd2

It appears that things have changed since the last boot. If I’m reading this correctly sda2, sdb2 and sdc2 are working and contain synchronized data and sdd2 is spare. I distinctly remember seeing 3 failed disks but this is good news. Yet the array still isn’t working:

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md125 : inactive sda2[0](S) sdb2[1](S) sdc2[2](S)
      1875194880 blocks

md126 : inactive sdd2[4](S)
      625064960 blocks

md127 : active raid1 sda1[0] sdd1[3] sdc1[2] sdb1[1]
      64128 blocks [4/4] [UUUU]

unused devices: <none>

md0 appears to be renamed to md127. md125 and md126 are very strange. They should be one array not two. That used to be called md1. md2 is completely gone but that was my swap so I don’t care.

I can understand the different names and it doesn’t really matter. But why is an array with 3 “active sync” disks unreadable? And what’s up with sdd2 being in a separate array?

Update

I tried the following after backing up the superblocks:

root@sysresccd /root % mdadm --stop /dev/md125
mdadm: stopped /dev/md125
root@sysresccd /root % mdadm --stop /dev/md126
mdadm: stopped /dev/md126

So far so good. Since sdd2 is spare I don’t want to add it yet.

root@sysresccd /root % mdadm --assemble /dev/md1 /dev/sd{a,b,c}2 missing 
mdadm: cannot open device missing: No such file or directory
mdadm: missing has no superblock - assembly aborted

Apparently I can’t do that.

root@sysresccd /root % mdadm --assemble /dev/md1 /dev/sd{a,b,c}2        
mdadm: /dev/md1 assembled from 1 drive - not enough to start the array.
root@sysresccd /root % cat /proc/mdstat 
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md1 : inactive sdc2[2](S) sdb2[1](S) sda2[0](S)
      1875194880 blocks

md127 : active raid1 sda1[0] sdd1[3] sdc1[2] sdb1[1]
      64128 blocks [4/4] [UUUU]

unused devices: <none>

That didn’t work either. Let’s try with all the disks.

mdadm --stop /dev/md1
mdadm: stopped /dev/md1
root@sysresccd /root % mdadm --assemble /dev/md1 /dev/sd{a,b,c,d}2
mdadm: /dev/md1 assembled from 1 drive and 1 spare - not enough to start the array.
root@sysresccd /root % cat /proc/mdstat                           
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md1 : inactive sdc2[2](S) sdd2[4](S) sdb2[1](S) sda2[0](S)
      2500259840 blocks

md127 : active raid1 sda1[0] sdd1[3] sdc1[2] sdb1[1]
      64128 blocks [4/4] [UUUU]

unused devices: <none>

No luck. Based on this answer I’m planning to try:

mdadm --create /dev/md1 --assume-clean --metadata=0.90 --bitmap=/root/bitmapfile --level=5 --raid-devices=4 /dev/sd{a,b,c}2 missing
mdadm --add /dev/md1 /dev/sdd2

Is it safe?

Update

I publish the superblock parser script I used to make that table in the my comment. Maybe someone will find it useful. Thanks for all your help.

First check the disks, try running smart selftest

for i in a b c d; do
    smartctl -s on -t long /dev/sd$i
done

It might take a few hours to finish, but check each drive’s test status every few minutes, i.e.

smartctl -l selftest /dev/sda

If the status of a disk reports not completed because of read errors, then this disk should be consider unsafe for md1 reassembly. After the selftest finish, you can start trying to reassembly your array. Optionally, if you want to be extra cautious, move the disks to another machine before continuing (just in case of bad ram/controller/etc).

Recently, I had a case exactly like this one. One drive got failed, I re-added in the array but during rebuild 3 of 4 drives failed altogether. The contents of /proc/mdadm was the same as yours (maybe not in the same order)

md1 : inactive sdc2[2](S) sdd2[4](S) sdb2[1](S) sda2[0](S)

But I was lucky and reassembled the array with this

mdadm --assemble /dev/md1 --scan --force

By looking at the –examine output you provided, I can tell the following scenario happened: sdd2 failed, you removed it and re-added it, So it became a spare drive trying to rebuild. But while rebuilding sda2 failed and then sdb2 failed. So the events counter is bigger in sdc2 and sdd2 which are the last active drives in the array (although sdd didn’t have the chance to rebuild and so it is the most outdated of all). Because of the differences in the event counters, –force will be necessary. So you could also try this

mdadm --assemble /dev/md1 /dev/sd[abc]2 --force

To conclude, I think that if the above command fails, you should try to recreate the array like this:

mdadm --create /dev/md1 --assume-clean -l5 -n4 -c64 /dev/sd[abc]2 missing

If you do the --create, the missing part is important, don’t try to add a fourth drive in the array, because then construction will begin and you will lose your data. Creating the array with a missing drive, will not change its contents and you’ll have the chance to get a copy elsewhere (raid5 doesn’t work the same way as raid1).

If that fails to bring the array up, try this solution (perl script) here Recreating an array

If you finally manage to bring the array up, the filesystem will be unclean and probably corrupted. If one disk fails during rebuild, it is expected that the array will stop and freeze not doing any writes to the other disks. In this case two disks failed, maybe the system was performing write requests that wasn’t able to complete, so there is some small chance you lost some data, but also a chance that you will never notice it :-)

edit: some clarification added.

I experienced many problems while I used mdadm, but never lost data.
You should avoid the --force option, or use it very carefully, because you can lose all of your data.
Please post your /etc/mdadm/mdadm.conf