Replace RAID5 with striped mirrors using ZFS
If you are thinking about upgrading your NAS in 2018, I have some unsolicited advive for you: Do not go with RAID5. Do not even go with a traditional RAID setup.
RAID5 looks great for home setups, since you can get
some level of redundncy cheaply, but over time and with much bigger
disks on the market its disadvantages are becoming more obvious.
It is starting to look like the RAID card from that antique shop
may have to be paid for with your soul … data *manical laughter*.
You can only ever loose one disk at a time
If you just lost one of the disks in your RAID5, you now have to identify which disk you need to replace. This can be difficult especially when it fails by starting to read/write gibberish some of the time. Once you figured that one out, you have to pray that all the other disks survive rebuilding the raid.
Due to the nature of RAID5, you need to read all of the disks fully in order to recover the data from the one that got away. This puts a lot of stress on the remaining disks bringing them closer to their maybe-immediate demise.
Embrace the striped-mirror
My advice would be to go for some mirrored setup like RAID1 instead. This already saves you from having to read n disks to recover one disk. Still I would not urge anybody to get a traditional RAID1 setup in 2018, neither in hardware nor software.
I would instead advise you to start building a set of striped mirrors with ZFS. Let me illstrate what a striped mirror setup is by showing how you could create one and adapt it to your needs over time:
- Crate your storage pool using one 4 TB disk for all your data
- Add another 4 TB disk to your pool for redundancy. Now you have a mirror
- Add another pair of same-sized disks to increase your storage with the same level of redundancy. This is a striped mirror. A stripe of n mirrors.
- If you have SATA ports and physical space left go to step 3, otherwise go to step 5
- Replace the smallest pair of disks with bigger ones. In place. No need to copy stuff around
- Go to step 5
I will also walk through this whole process using real ZFS commands and a few dummy files as disks in the second half of this post which is a demo that you can follow along with your own PC.
Embrace ZFS
ZFS can give you a bunch of benefits on top of that:
- Checksums for data and metadata
- Copy-On-Write - never overwrites data in-place
- Very cheap snapshots using
zfs snap
- File-system level replication and backups using
zfs send
andzfs recv
- Configurable compression and deduplication
- No downtime when a disks fails
FH LUG lightning talk
This writeup is actually based on a lightning talk I did at the FH Linux User Group in Hagenberg in 2017. Here are the slides for that.
Demo to follow along
The second half of this post will be the demo I did for that 2017 talk. You can follow along with a system that has ZFS support, like FreeBSD or Ubuntu.
We will perfrom there ZFS shenanagains on some ordinary files inside the /tmp
folder that we use as disks.
Setup a pool
First we will create a simple 100 MB file called disk
.
truncate -s 100M /tmp/disk0
We will use this file as the first ‘disk’ for ZFS to manage, so let’s create a pool from that one disk.
zpool create tank /tmp/disk0
Now let’s also make sure compression is enabled for that pool.
zfs set compression=lz4 tank
We can now use zpool status tank
to query the status of our newly created pool.
root@freebsd-zfsdemo-test:/tmp # zpool status tank
pool: tank
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
/tmp/disk0 ONLINE 0 0 0
errors: No known data errors
We can also query information about the datasets in our pool. ZFS divides storage up into datasets. This is where the actual data goes. You can think of them as partitions, but more flexible.
root@freebsd-zfsdemo-test:/tmp # zfs list tank
NAME USED AVAIL REFER MOUNTPOINT
tank 75.5K 39.9M 23K /tank
We can see that due to some bookkeeping-overhead not all of the 100 MB are available to us. We can also see that our tank
dataset was automatically mounted to /tank
. Let’s create a file full of zeroes there.
root@freebsd-zfsdemo-test:/tank # dd if=/dev/zero of=/tank/zeroes bs=1M count=100
100+0 records in
100+0 records out
104857600 bytes transferred in 1.304338 secs (80391454 bytes/sec)
As you can see, thanks to compression we were able to write a 100 MB file full of zeros to 40 MB of storage.
Add a mirror
Now let’s create another disk and add it as a mirror to create some redundancy in our setup.
root@freebsd-zfsdemo-test:/tank # truncate -s 100M /tmp/disk1
root@freebsd-zfsdemo-test:/tank # zpool attach tank /tmp/disk0 /tmp/disk1
root@freebsd-zfsdemo-test:/tank # zpool status tank
pool: tank
state: ONLINE
scan: resilvered 102K in 0h0m with 0 errors on Wed May 2 16:47:04 2018
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
/tmp/disk0 ONLINE 0 0 0
/tmp/disk1 ONLINE 0 0 0
Stripe some mirrors
Now lets add another two disk to increase the amount of available storage space.
root@freebsd-zfsdemo-test:/tank # truncate -s 100M /tmp/disk2
root@freebsd-zfsdemo-test:/tank # truncate -s 100M /tmp/disk3
root@freebsd-zfsdemo-test:/tank # zpool add tank mirror /tmp/disk2 /tmp/disk3
root@freebsd-zfsdemo-test:/tank # zpool status tank
pool: tank
state: ONLINE
scan: resilvered 102K in 0h0m with 0 errors on Wed May 2 16:47:04 2018
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
/tmp/disk0 ONLINE 0 0 0
/tmp/disk1 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
/tmp/disk2 ONLINE 0 0 0
/tmp/disk3 ONLINE 0 0 0
errors: No known data errors
root@freebsd-zfsdemo-test:/tank # zfs list tank
NAME USED AVAIL REFER MOUNTPOINT
tank 106K 79.9M 23K /tank
The output of zfs list tank
confirms that the amount of available space increased.
Try corrupting data
Next we will write some gibberish to one of our disks to simulate the kind of nasty failure where a disk goes bad without telling us. I already pointed out that traditional RAID setups have some problems with that. We will overwrite 1/4 of the disk.
root@freebsd-zfsdemo-test:/tank # dd if=/dev/urandom of=/tmp/disk0 bs=1M count=25
25+0 records in
25+0 records out
26214400 bytes transferred in 17.696901 secs (1481299 bytes/sec)
Now zpool scrub
will make ZFS look for errors and as we can see our pool is now listed as DEGRADED
. This means we can still read from or write to the pool. As long as one copy of all data is available, we can keep going with degraded performance. We can continue normal operation until we are able to fix things.
root@freebsd-zfsdemo-test:/tank # zpool scrub tank
root@freebsd-zfsdemo-test:/tank # zpool status tank
pool: tank
state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: http://illumos.org/msg/ZFS-8000-4J
scan: scrub repaired 0 in 0h0m with 0 errors on Wed May 2 16:57:37 2018
config:
NAME STATE READ WRITE CKSUM
tank DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
16262305164111969708 UNAVAIL 0 0 0 was /tmp/disk0
/tmp/disk1 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
/tmp/disk2 ONLINE 0 0 0
/tmp/disk3 ONLINE 0 0 0
errors: No known data errors
Replace faulty disks
Of course we should fix things as early as possible, so let’s quickly get some larger disks to replace disk0
and disk1
.
root@freebsd-zfsdemo-test:/tank # truncate -s 200M /tmp/disk5
root@freebsd-zfsdemo-test:/tank # truncate -s 200M /tmp/disk6
First we add the first disk and wait until zpool status
indicates that the replacement is complete.
root@freebsd-zfsdemo-test:/tank # sudo zpool replace tank /tmp/disk0 /tmp/disk5
root@freebsd-zfsdemo-test:/tank # zpool status tank
pool: tank
state: ONLINE
scan: resilvered 96K in 0h0m with 0 errors on Wed May 2 17:04:53 2018
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
/tmp/disk5 ONLINE 0 0 0
/tmp/disk1 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
/tmp/disk2 ONLINE 0 0 0
/tmp/disk3 ONLINE 0 0 0
errors: No known data errors
root@freebsd-zfsdemo-test:/tank # zfs list tank
NAME USED AVAIL REFER MOUNTPOINT
tank 113K 79.9M 23K /tank
Extend storage space to fill a larger disk
Now we can replace the second, non-faulty disk as well and extend our pool utilize the additional storage the new pair of disks provides using zpool online
.
root@freebsd-zfsdemo-test:/tank # sudo zpool replace tank /tmp/disk1 /tmp/disk6
root@freebsd-zfsdemo-test:/tank # zpool status tank
pool: tank
state: ONLINE
scan: resilvered 96K in 0h0m with 0 errors on Wed May 2 17:07:43 2018
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
/tmp/disk5 ONLINE 0 0 0
/tmp/disk6 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
/tmp/disk2 ONLINE 0 0 0
/tmp/disk3 ONLINE 0 0 0
errors: No known data errors
root@freebsd-zfsdemo-test:/tank # zfs list tank
NAME USED AVAIL REFER MOUNTPOINT
tank 113K 79.9M 23K /tank
root@freebsd-zfsdemo-test:/tank # zpool online -e tank /tmp/disk5
root@freebsd-zfsdemo-test:/tank # zfs list tank
NAME USED AVAIL REFER MOUNTPOINT
tank 113K 144M 23K /tank
As you can see we were able to extend our storage in-place and on-the-fly without any downtime.
View history
Another nice feature of ZFS is that it logs all of the ZFS commands that you executed, so that you can get help when something goes wrong.
root@freebsd-zfsdemo-test:/tank # zpool history tank
History for 'tank':
2018-05-02.16:43:07 zpool create tank /tmp/disk0
2018-05-02.16:43:14 zfs set compression=lz4 tank
2018-05-02.16:47:09 zpool attach tank /tmp/disk0 /tmp/disk1
2018-05-02.16:50:26 zpool add tank mirror /tmp/disk2 /tmp/disk3
2018-05-02.16:57:42 zpool scrub tank
2018-05-02.17:04:55 zpool replace tank /tmp/disk0 /tmp/disk5
2018-05-02.17:07:48 zpool replace tank /tmp/disk1 /tmp/disk6
2018-05-02.17:19:40 zpool online -e tank /tmp/disk5
Destroy your data
When you don’t like having your data anymore, for example because your reached the end of some tutorial, you can destroy the pool with your data with the following command.
sudo zpool destroy -f tank
Thank you for reading.