How-To Debootstrap

Posted on avril 23, 2014 in System

For my infrastructure purposes I often need to install as fast as possible. Most of my servers comes with 4 disks and one or more RAID card.

I usually don't trust the RAID cards, so I always create a raid0 / disk in order to use every logical volume like it was a real disk.

And I always use the above partition schema

mount size
/boot 200M
/ *

hpacucli

# find your slot
slot=`hpacucli ctrl all show | grep -i slot | awk '{print $6}'
hpacucli ctrl slot=$slot ld 1 delete
# create one raid0 per physical disk
for phys in `hpacucli ctrl all show config | grep physicaldrive | awk '{print $2}'`;
do
  hpacucli controller slot=$slot create type=ld drives=$phys raid=0
done;

Cleaning

If you use an old server, you must do some cleaning

Let's start by zeroing the first 100MB of the partition in order to be sure to erase the partition table, the MBR

for i in {a..d} ;
do
  dd if=/dev/sda of=/dev/zero count=100 bs=1M
done

Afterwars, let's notify the kernel about devices changes

partprobe

MSDOS partitions

for i in {a..d} ;
do
  parted /dev/sd$i --script -- mklabel msdos
  parted /dev/sd$i -a optimal --script -- unit MB mkpart primary 1 200
  parted /dev/sd$i -a optimal --script -- unit MB mkpart primary 200 -1
done;

GPT partitions

For GPT partitions you need to create BIOS Boot partition a small partition, at least 1mb.

 

for i in {a..d} ;
do
    parted /dev/sd$i --script -- mklabel gpt
    parted /dev/sd$i -a optimal --script -- unit MB mkpart grub fat32 1mb 2mb
    parted /dev/sd$i -a optimal --script -- unit MB set 1 bios_grub on
    parted /dev/sd$i -a optimal --script -- unit MB mkpart primary 2mb 200
    parted /dev/sd$i -a optimal --script -- unit MB mkpart primary 200 -1
done;

Installation

I prefer to use software raid with mdadm. If you want to boot on a mdadm's volume you need it to use the 0.90 metadatas For you /, use the raid-level you want and don't give any metadata paramaters so it can takes the 1.2 one.

/!\ If you use GPT partitions, be aware that /dev/sdx1 is the BIOS partition, not your future /boot, start at /dev/sdx2

# for msdos partitions
mdadm --create /dev/md0 --metadata=0.90 --assume-clean --raid-devices=4 --level=1 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1
mdadm --create /dev/md1 --assume-clean --raid-devices=4 --level=6 /dev/sda2 /dev/sdb2 /dev/sdc2 /dev/sdd2

# for gpt partitions
mdadm --create /dev/md0 --metadata=0.90 --assume-clean --raid-devices=4 --level=1 /dev/sda2 /dev/sdb2 /dev/sdc2 /dev/sdd2
mdadm --create /dev/md1 --assume-clean --raid-devices=4 --level=6 /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3

Let's format the RAID volumes

mkfs.ext4 /dev/md0
mkfs.ext4 /dev/md1

Let's start the debootstrap session. I use a basic /etc/apt/sources.list using this convenient sources.list generator

mkdir /mnt/root
mount /dev/md1 /mnt/root
apt-get update; apt-get install -y debootstrap
debootstrap trusty /mnt/root
mount -o bind /dev /mnt/root/dev
mount -o bind /proc /mnt/root/proc
mount -o bind /sys /mnt/root/sys

# basic fstab
echo "proc            /proc   proc    defaults                0       0
/dev/md1 /       ext4    errors=remount-ro       0       1
/dev/md0        /boot   ext4    defaults                0       2
" > /mnt/root/etc/fstab

echo "#############################################################
################### OFFICIAL UBUNTU REPOS ###################
#############################################################

###### Ubuntu Main Repos
deb http://fr.archive.ubuntu.com/ubuntu/ trusty main restricted universe multiverse
deb-src http://fr.archive.ubuntu.com/ubuntu/ trusty main restricted universe multiverse

###### Ubuntu Update Repos
deb http://fr.archive.ubuntu.com/ubuntu/ trusty-security main restricted universe multiverse
deb http://fr.archive.ubuntu.com/ubuntu/ trusty-updates main restricted universe multiverse
deb-src http://fr.archive.ubuntu.com/ubuntu/ trusty-security main restricted universe multiverse
deb-src http://fr.archive.ubuntu.com/ubuntu/ trusty-updates main restricted universe multiverse
" > /mnt/root/etc/apt/sources.list

Now we can go the installed volume and prepare the OS

cd /mnt/root
chroot .
# mount /boot for the future kernel installation
mount /boot
# generate a few locales
locale-gen fr_FR.UTF-8
locale-gen fr_FR
locale-gen en_US.UTF-8
locale-gen en_US
update-locale

apt-get update
# don't forget to install mdadm on the system so it can boots correctly
apt-get install -y mdadm lvm2
# install the required kernel
apt-get install -y linux-image-generic
# install an openssh-server so you can remotely have access to the system
apt-get install -y openssh-server
# change your root password!!
echo "root:changeme"|chpasswd

Stop the few services

/etc/init.d/ssh stop

Umount everything, sync for the last i/o and reboot

umount /boot
exit
umount /mnt/root/dev
umount /mnt/root/proc
umount /mnt/root/sys
sync
reboot

LVM

Work in progress

Rescue

Without LVM

If you happen to boot on a rescue live-cd on one of this configuration, it will detect a RAID system but without the correct device names

mdadm -S /dev/md126
mdadm -S /dev/md127
mdadm --examine --scan /dev/sda{1..4} >> /etc/mdadm/mdadm.conf
mdadm --assemble --scan

Your /dev/md0 and /dev/md1 should come online

mkdir -p /mnt/root
mount /dev/md1 /mnt/root
mount -o bind /dev /mnt/root/dev
mount -o bind /proc /mnt/root/proc
mount -o bind /sys /mnt/root/sys
chroot /mnt/root

Here you go!

Credits

Thanks to my friends Pierre Tourbeaux and Michael Kozma for all the advices and debugging over the year :)


Continue reading

ZFSonLinux

Posted on avril 18, 2014 in System

At Online, we've been trying ZFS On Linux on a few services.

Here's a small how-to (and also a reminder) on how to install it and manage it:

Install

    $ apt-add-repository --yes ppa:zfs-native/stable
    $ apt-get update && apt-get install ubuntu-zfs

ZFS comes with a RAID soft like system

RAID type RAID-z type Loosable disks Min disks
RAID5 raidz 1 disks 3 disks
RAID6 raidz-2 2 disks 4 disks
RAID7 raidz-3 3 disks 5 disks

Now we're going to create a zpool called storage

    $ zpool create -f storage raidz2 c2d{1..5}

If we wan't to add MOAR disks

    $ zpool add -f storage raidz2 c2d{6..10}

Here are a few problems I've experienced:

ZFS Resilvering (replace a drive)

If you've got some spare disks, you should add them your spare pool

    $ zpool add storage spare c2d11 c2d12

By doing so, if a disk fails, ZFS will replace it automatically with the failed one. Personnaly, I prefer to do it manualy. Assuming c2d4 failed, to replace it by c2d11, let's do this:

    $ zpool replace c2d4 c2d11

You will now have c2d11 resilvering your entire zpool. Once the resilver ends, the failed disk is ejected from the zpool.

ZFS Scrubbing

ZFS has a scrub feature to detect and correct silently errors. You could assimilate this to ECC RAM (RAM with error recovery). ZFS scrub feature check every block of your pool against a SHA-256 checksum.

You can invoke a scrub or be forced to live the scrub when a disk fails and you have to replace it.

Recently, on a 200T system, I replaced a failed disk by a spare one. It scrubbed the 200T. The zpool status was mentionning a duration of about 500 hours of scrubbing. Time to hang yourself.

Fortunately, there is some tunnable settings in /sys/module/zfs/parameters

    # Prioritize resilvering by setting the delay to zero
    $ echo 0 > zfs_resilver_delay

    # Prioritize scrubs by setting the delay to zero
    $ echo 0 > zfs_scrub_delay

These changes takes effect immediatly and I haven't experienced any problems afterwards. Everything synced in 60 hours.

Attached a few other features to tune your scrub:

feature default value description
zfs_top_maxinflight 32 maximum I/Os per top-level
zfs_resilver_delay 2 number of ticks to delay resilver
zfs_scrub_delay 4 number of ticks to delay scrub
zfs_scan_idle 50 idle window in clock ticks
zfs_scan_min_time_ms 1000 min millisecs to scrub per txg
zfs_free_min_time_ms 1000 min millisecs to free per txg
zfs_resilver_min_time_ms 3000 min millisecs to resilver per txg
zfs_no_scrub_io 0 (bool) set to disable scrub i/o
zfs_no_scrub_prefetch 0 (bool) set to disable srub prefetching

Links


Continue reading