ZFSonLinux

Posted on avril 18, 2014 in System

At Online, we've been trying ZFS On Linux on a few services.

Here's a small how-to (and also a reminder) on how to install it and manage it:

Install

    $ apt-add-repository --yes ppa:zfs-native/stable
    $ apt-get update && apt-get install ubuntu-zfs

ZFS comes with a RAID soft like system

RAID type RAID-z type Loosable disks Min disks
RAID5 raidz 1 disks 3 disks
RAID6 raidz-2 2 disks 4 disks
RAID7 raidz-3 3 disks 5 disks

Now we're going to create a zpool called storage

    $ zpool create -f storage raidz2 c2d{1..5}

If we wan't to add MOAR disks

    $ zpool add -f storage raidz2 c2d{6..10}

Here are a few problems I've experienced:

ZFS Resilvering (replace a drive)

If you've got some spare disks, you should add them your spare pool

    $ zpool add storage spare c2d11 c2d12

By doing so, if a disk fails, ZFS will replace it automatically with the failed one. Personnaly, I prefer to do it manualy. Assuming c2d4 failed, to replace it by c2d11, let's do this:

    $ zpool replace c2d4 c2d11

You will now have c2d11 resilvering your entire zpool. Once the resilver ends, the failed disk is ejected from the zpool.

ZFS Scrubbing

ZFS has a scrub feature to detect and correct silently errors. You could assimilate this to ECC RAM (RAM with error recovery). ZFS scrub feature check every block of your pool against a SHA-256 checksum.

You can invoke a scrub or be forced to live the scrub when a disk fails and you have to replace it.

Recently, on a 200T system, I replaced a failed disk by a spare one. It scrubbed the 200T. The zpool status was mentionning a duration of about 500 hours of scrubbing. Time to hang yourself.

Fortunately, there is some tunnable settings in /sys/module/zfs/parameters

    # Prioritize resilvering by setting the delay to zero
    $ echo 0 > zfs_resilver_delay

    # Prioritize scrubs by setting the delay to zero
    $ echo 0 > zfs_scrub_delay

These changes takes effect immediatly and I haven't experienced any problems afterwards. Everything synced in 60 hours.

Attached a few other features to tune your scrub:

feature default value description
zfs_top_maxinflight 32 maximum I/Os per top-level
zfs_resilver_delay 2 number of ticks to delay resilver
zfs_scrub_delay 4 number of ticks to delay scrub
zfs_scan_idle 50 idle window in clock ticks
zfs_scan_min_time_ms 1000 min millisecs to scrub per txg
zfs_free_min_time_ms 1000 min millisecs to free per txg
zfs_resilver_min_time_ms 3000 min millisecs to resilver per txg
zfs_no_scrub_io 0 (bool) set to disable scrub i/o
zfs_no_scrub_prefetch 0 (bool) set to disable srub prefetching

Links