When it comes to storage, there is a high chance that your mind whirls a bit due to the many options and tonnes of terminologies that crowd that arena. Why can’t we just plug a disk on the host and call it a day? That was one of my frustrations until I came to see the essence of all of the technologies in place. The problems that storage presents to you as a system administrator or Engineer will make you appreciate the various technologies that have been developed to help mitigate and solve them.
In this brief article, we are going to look at RAID, Logical Volume Manager (LVM) and ZFS technologies. We shall investigate what best they do in implementations as well as check on their differences. Welcome and stay tuned.
Similar: Ext4 vs XFS – Which one to choose
RAID stands for Redundant Array of Independent Disks. It was basically developed to allow one to combine many inexpensive and small disks into an array in order to realize redundancy goals. Redundancy cannot be achieved by one huge disk drive plugged into your project. Even though the array is made up of multiple disks, the computer “sees” it as one drive or a single logical storage unit which is quite amazing.
Using techniques such as disk striping (RAID Level 0), disk mirroring (RAID Level 1), and disk striping with parity (RAID Level 5), RAID is capable of achieving redundancy, lower latency, increased bandwidth, and maximized ability to recover from hard disk crashes.
Primary reasons you should consider deploying RAID in your projects that manage large amounts of data include the following:
- Achievement of better speeds
- Increases storage capacity using a single virtual disk
- Minimizes data loss from disk failure. Depending on your RAID type, you will be able to achieve redundancy which will later save you in case there are incidences of data losses.
This RAID technology comes in three flavors: Firmware RAID, Hardware RAID and Software RAID. Hardware RAID handles its arrays independently from the host and it still presents the host with a single disk per RAID array. It uses Hardware RAID controller card that handles the RAID tasks transparently to the operating system. Software RAID, on the other hand, implements the various RAID levels in the kernel disk (block device) code and offers the cheapest possible solution, as expensive disk controller cards or hot-swap chassis are not required. There are faster CPUs in the current era, therefore Software RAID generally outperforms Hardware RAID.
Cardinal Features of Software RAID. Source (access.redhat.com)
- Portability of arrays between Linux machines without reconstruction
- Backgrounded array reconstruction using idle system resources
- Hot-swappable drive support
- Automatic CPU detection to take advantage of certain CPU features such as streaming SIMD support
- Automatic correction of bad sectors on disks in an array
- Regular consistency checks of RAID data to ensure the health of the array
- Proactive monitoring of arrays with email alerts sent to a designated email address on important events
- Write-intent bitmaps which drastically increase the speed of resync events by allowing the kernel to know precisely which portions of a disk need to be resynced instead of having to resync the entire array
Here comes the pretty Logical Volume Manager. What LVM beautifully does is the abstraction of the idea of individual disk drives and allows you as the administrator to carve out “pieces” of space to use as drives. It allows you to plug as many physical drives onto your individual system and then flexibly increase and decrease your logical volumes on your live host. You can add other physical drives in the future and add your space without reformatting or worrying about stopping applications or unmounting file systems or shutting down your host. This kind of flexibility makes working with LVM such a smooth process.
Advantages of LVM over physical partitions. Source (access.redhat.com)
When using logical volumes, file systems can extend across multiple disks, since you can aggregate disks and partitions into a single logical volume.
Resizeable storage pools
You can extend logical volumes or reduce logical volumes in size with simple software commands, without reformatting and repartitioning the underlying disk devices.
Online data relocation
To deploy newer, faster, or more resilient storage subsystems, you can move data while your system is active. Data can be rearranged on disks while the disks are in use. For example, you can empty a hot-swappable disk before removing it.
Convenient device naming
Logical storage volumes can be managed in user-defined and custom-named groups.
You can create a logical volume that stripes data across two or more disks. This can dramatically increase throughput. Specifying stripe configuration is done when creating the Logical Volume with lvcreate
Logical volumes provide a convenient way to configure a mirror for your data. Even though LVM did not support this natively in the past, recent versions provide it.
Using logical volumes, you can take device snapshots for consistent backups or test the effect of changes without affecting the real data.
The only difference between RAID and LVM is that LVM does not provide any options for redundancy or parity that RAID provides.
ZFS was originally developed by Sun Microsystems for Solaris (owned by Oracle), but has been ported to Linux.
ZFS is fundamentally different in this arena because it is more than just a file system. ZFS combines the roles of a file system and volume manager, enabling additional storage devices to be added to a live system and having the new space available on all of the existing file systems in that pool immediately. It does what LVM and RAID do in one package. Therefore, ZFS is able to overcome previous limitations that prevented RAID groups from being able to grow. Combining the traditionally separate roles of volume manager and file system provides ZFS with a unique set of advantages.
Traditionally, file systems could be created on a single disk at a time. This means that if there were two disks, then two file systems would have to be created. RAID avoided this problem by presenting the operating system with a single logical disk made up of the space provided by the combination of many physical disks. The operating system then placed a file system on top. But with ZFS, the file system is aware of the underlying disk structure. This awareness makes the automatic growth of the existing file system possible for existing when additional disks are added to the pool. Moreover, in ZFS, a number of different properties can be applied to each file system, hence the ability to create a number of different file systems and datasets rather than a single monolithic file system.
Features of ZFS
ZFS implements RAID-Z, a variation on standard RAID-5 that offers better distribution of parity and eliminates the “RAID-5 write hole” in which the data and parity information become inconsistent in case of power loss.
Redundancy is possible in ZFS because it supports three levels of RAID-Z. The types are named RAID-Z1 through RAID-Z3 based on the number of parity devices in the array and the number of disks that can fail while the pool remains operational.
ZFS has a special pseudo-vdev type for keeping track of available hot spares. Note that installed hot spares are not deployed automatically; they must manually be configured to replace the failed device using zfs replace.
This is the second level of the ZFS caching system. The primary Adaptive Replacement Cache (ARC) is stored in RAM. Since the amount of available RAM is often limited, ZFS can also use cache vdevs (a single disk or a group of disks). Solid State Disks (SSDs) are often used as these cache devices due to their higher speed and lower latency
A mirror is made up of two or more devices and all data will be written to all member devices. A mirror vdev will only hold as much data as its smallest member. A mirror vdev can withstand the failure of all but one of its members without losing any data.
SSD Hybrid Storage Pools
High performing SSDs can be added in the ZFS storage pool to create a hybrid kind of pool. These high performing SSDs can be configured as a cache to hold frequently accessed data in order to increase performance.
Copy on Write
The Copy on Write technique is used by ZFS to check data consistency on the disks.
Every block that is allocated is checksummed using per-dataset property checksum algorithm fletcher2, fletcher4, sha25). The checksum of each block is transparently validated as it is read, allowing ZFS to detect silent corruption. In case the read data does not match the checksums expected. ZFS goes ahead and tries to recover the data from configured redundancy such as mirror or RAID-Z.
Find out more about ZFS: https://www.freebsd.org/doc/handbook/zfs-term.html
There is much more out there about ZFS, RAID, and LVM. I hope you have had a good foundation as far as those three technologies are concerned and you can be able to choose one that befits your project. Thank you for reading through.