Linux software raid

From Bitpost wiki
Revision as of 18:12, 8 November 2019 by M (talk | contribs) (→‎Monitor)

Linux software raid is done via the mdraid package.

Install

mdraid has not received the polish it needs to Just Work. It has serious flaws that after hours of learning, still leave you unsure and hanging and most likely bailing out of the entire process. But it is the best thing we have on the planet, so let’s distill it down to the essentials.

  • check S.M.A.R.T. data of drives – run tests and make sure they are completely healthy!
  • clean raid drives of superblock and partition data
   mdadm --misc --zero-superblock /dev/sdd && dd if=/dev/zero of=/dev/sdd bs=1M count=100 && mdadm --examine /dev/sdd
    mdadm: No md superblock detected on /dev/sdd.
  • use whole drives (not drive partitions) in a newly created raid
   mdadm --create --verbose /dev/md0 --level=mirror --raid-devices=2 /dev/sdd /dev/sde 
    mdadm: size set to 3906887488K 
    mdadm: automatically enabling write-intent bitmap on large array 
    Continue creating array? yes 
    mdadm: Defaulting to version 1.2 metadata 
    mdadm: array /dev/md0 started.
   watch -n 1 cat /proc/mdstat
   # wait 400 FUCKING MINUTES for a GODDAMNED EMPTY 4TB DRIVE to sync with ANOTHER empty 4TB drive, FUCKSAKE
   # IF YOU REBOOT BEFORE THAT, IT WILL BE AS IF YOU NEVER SET UP A RAID
  • save and reboot and make sure the mdraid service restores the raid
   mdadm --detail --scan >>/etc/mdadm.conf 
   rc-update add mdraid boot
   # start then stop then start the /etc/init.d/mdraid service, make sure this works to restore your raid (check /proc/mdstat) 
   # format /dev/md0 as ext4 and set up an auto mount point in /etc/fstab 
   # reboot and pray
  • hopefully much later, upon failure, to restore a single drive, set it up as a raid:
   madadm -A /dev/sdd # I THINK! it's all very iffy. Which SUCKS.

Monitor

RAID is no use if you don't monitor it! So... monitor it! The horribleness of mdadm is confirmed again here in these monitoring notes. Do this on occasion:

# GREAT simple status of any raids
# bitpost should show: md0 : active raid1 sdd[0] sde[1]
cat /proc/mdstat    

# drive health
sudo smartctl /dev/sdd -a
sudo smartctl /dev/sde -a

# this is supposed to maybe email you on monitoring events... i'll believe it when i get one... 
# there's no way email works... and that notes page warns that it may just silently die on you...
# sigh
sudo mdadm -F -s --mail m@bitpost.com

Also try the stuff here.

Pray

Honestly, you should spend more time on this... before you have to...