Wednesday, March 5, 2008

Fileserver drama

It's been an exciting few days here in Geekland. I successfully installed my two new hard drives. That involved routing cables in an ugly manner, but was otherwise uneventful. I was not able to replace my north bridge fan, though - the wiring for the current one goes under the heatsink, but to remove that I think I need to remove the motherboard from the case. Anyway, at that point, I was all ready to grow my raid5 array.

But wait! How could this be? My 4-disk raid5 array is only running with 3 active disks1. It would appear that sometime on December 17, a power outage or similar caused a hard drive to be marked as failed. I should really set up some sort of notification. Well, I took the opportunity to learn all about recovering a dirty raid array. Good ol' mdadm was marvelous!2

With a fully functioning, clean, 4-disk raid5 array, it was time to grow the array. I called upon mdadm once again3 and got my two new drives added as hot spares. Just one more command4 to grow the array --

What's this? Linux and mdadm require versions 2.6.17 (2.6.19 according to some) and 2.4.1 and later, respectively, in order to grow a raid5 array? Surely Ubuntu, the most user-friendly linux version available will have a convenient upgrade mechanism - well, sort of. Upgrading reported a few errors, but I was running 2.6.20 and mdadm4 wasn't throwing an error anymore. Huzzah, my raid array was growing!

Kernel Panic? Aiee! Oh god oh god oh god. 2TB of data lost! Please let this reboot erase this bad dream... Eep! /dev/md0 no longer exists! There has to be a way to fix this - it's linux! Why, of course! I can always rely on mdadm. Why, once you reassemble the array5, it goes right on growing! Kernel Panic, again? Screw this.

After tiring of kernel panics and screaming "Aiee!" I downloaded an installation CD for the latest Ubuntu distribution. I had to reformat /root and /boot (and opted to format /home while I was at it), but I had a clean installation. In fact, it made everything easier. Once I reinstalled the mdadm package, /dev/md0 magically reappeared and was growing once again. It's now 56.7% done growing. After that, I need to resize the ext3 file system6, but I think that will go smoother.

At least I know the data is still there (at least mostly). After another while reinstalling and configuring samba and mounting /dev/md0, I have successfully watched an episode of TV. Indeed, I can still use my 2TB of file storage while it's growing into 3.5TB! If it weren't for a faulty upgrade, I probably wouldn't have had to reboot except for the hardware installation (and I admit that can even be avoided given proper cable planning in the case). I'm still amazed that mdadm can handle a bad disk, adding two disks, a faulty OS upgrade, kernel panics interrupting a reshape, reassembling unclean disks and making them clean again, resuming an interrupted reshape operation from a different version, and almost all while allowing the drive to remain accessible. Simply amazing.

Commands to remember:
  1. cat /proc/mdstat
  2. mdadm --add /dev/md0 /dev/sdd1
  3. mdadm --add /dev/md0 /dev/sde1 /dev/sdf1
  4. mdadm --grow /dev/md0 -n 6
  5. mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sde1 /dev/sdf1
  6. resize2fs /dev/md0 (theoretically)

No comments: