Setup a Raid 10 with mdadm

In the past I already blogged a few times about mdadm. Today we’ve a short article about creating a Raid 10 with mdadm on new disks.

# lsblk 
NAME           MAJ:MIN RM  SIZE RO TYPE  MOUNTPOINTS
sda              8:0    0 16.4T  0 disk  
sdb              8:16   0 16.4T  0 disk  
sdc              8:32   0  2.7T  0 disk  
├─sdc1           8:33   0    1M  0 part  
└─sdc2           8:34   0  2.7T  0 part  
  └─md1          9:1    0  2.7T  0 raid1 
sdd              8:48   0  2.7T  0 disk  
├─sdd1           8:49   0    1M  0 part  
└─sdd2           8:50   0  2.7T  0 part  
  └─md1          9:1    0  2.7T  0 raid1 
nvme0n1        259:0    0  1.8T  0 disk  
├─nvme0n1p1    259:1    0  953M  0 part  /boot
└─nvme0n1p2    259:2    0  1.8T  0 part  
  └─cryptroot  254:0    0  1.8T  0 crypt 
    └─vg1-root 254:1    0  750G  0 lvm   /

First we need to identify the two disks. lsblk is always a good indicator about all block devices. In this case I know that the new drives are sda and sdb because of their size and the absence of partitions. Since those drives are new, we should start a smart test first:

for disk in /dev/sd[a,b]; do smartctl --test long "${disk}"; done

We now need to wait a few hours. The status of the self test can be checked with:

for disk in /dev/sd[a,b]; do smartctl --all "${disk}"; done

Afterwards we can create the raid. Keep the bathtub curve in mind. Hard drives, while used correctly, fail within their first hundred hours our after their estimated lifetime. The best system would be a setup with used drives, from older systems. And you always replace failed disks with new drives. This isn’t really viable for private workstations. I suggest to fill the drives with dd a few times or keep them idle for a few days. If the still work afterwards, they are usually good to go. If you feel confident to use the disks we can continue.

If you have a system with legacy boot (no UEFI) you often have a bios/grub boot partition. This partition starts at sector 2048 and ends at 4095 (and needs a special flag). It turns out to be good practice to have this partition on all disks so you are able to install a bootloader on all disks, if ever required. Since that’s really not a lot of lost space for our raid, we should create this partition on the raid disks as well. We will create a second partition with the remaining space. That will be used within the raid setup:

for disk in /dev/sd[a,b]; do
  parted "${disk}" --script mklabel gpt
  parted "${disk}" --script mkpart primary ext3 2048s 4095s
  parted "${disk}" --script set 1 bios_grub on
  parted "${disk}" --script mkpart primary ext3 4096s 100%
 done
mdadm --verbose --create /dev/md/0 --level=1 --raid-devices=2 --metadata=1.2 /dev/sd[a,b]2

parted needs a parameter for the filesystem. It won’t create it. So don’t get confused with the ext3. There won’t be any ext3. Now we need to wait until the raid is initialized and afterwards continue:

# configure max raid sync
echo 999999999 > /proc/sys/dev/raid/speed_limit_min
echo 999999999 > /proc/sys/dev/raid/speed_limit_max
until ! grep -q resync /proc/mdstat; do echo "sleeping for 2s"; sleep 2; done

mdadm will read through the whole partitions and sync them. If you’ve full trust in the disks you could initialize the raid with –assume-clean and have no initial sync. Now we create a luks container and on top LVM:

cryptsetup luksFormat -c aes-xts-plain64 -s 512 --hash sha512 --use-random -i 10000 /dev/md/0
cryptsetup luksOpen /dev/md/0 raid
pvcreate --verbose /dev/mapper/raid
vgcreate --verbose vg1 /dev/mapper/raid
lvcreate --verbose --name root --size 50G vg0
mkfs.ext4 -v  -E lazy_itable_init=0,lazy_journal_init=0 -m0 /dev/mapper/vg0-root
mount /dev/mapper/vg0-root /mnt

ext4, since kernel 2.7.37, uses lazy initialization for the inode table. That means that the mkfs process is quite fast, but after the first mount the inode table will be initialized. That takes some time and slows down any other write commands. the -v option provides us some nice output and -m tells mkfs to have zero reserved blocks. The default is 5% and that’s quite a lot for 18TB disks.

Additional information

We could parse the hdd temp from smartctl, but that’s a bit ugly. We can simply use hddtemp. Arch Linux has that tool packaged.

hddtemp /dev/sd?

And of course we can continuously watch it:

watch hddtemp /dev/sd?

We can also check the Raid status with:

cat /prod/mdstat

And that works also well with watch:

watch cat /proc/mdstat
This entry was posted in General, Linux and tagged , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.