ASUS ROG Zephyrus S GX502 Glacier Blue

RAID drive redundancy solution has been around since 1989 and honestly, it is getting a bit dated. Usually, people go with RAID 1 if they have 2 drives. RAID 5 if they want to take advantage of having more usable space when they have 3 or more disks. RAID 6 comes in when you have a large array of disk drives (minimum 4 drives) which protects against more than one-drive failure scenario. LizardFS however, is a more modern solution for a modern problem.

[Tutorial] LizardFS - Drive Level File Redundancy On Single Computer 1
Credit: Cburnett
[Tutorial] LizardFS - Drive Level File Redundancy On Single Computer 2
Credit: Cburnett
[Tutorial] LizardFS - Drive Level File Redundancy On Single Computer 3
Credit: Cburnett

All of these RAID solutions are great for protecting your data from drive failure. However, you are stuck with a single redundancy scheme for the entire drive array. Meaning, if you have 8 drives, all of your files are going to be stuck with the same level of protection (single parity for RAID 5 and double parity for RAID 6 and so on) for a single array. Thus, stuck with a single number of usable space for the entire RAID array.

ASUS ZenBook 14 AMD

What if there is a solution for you to set a custom level of protection on the file level instead of a drive array level? With LizardFS, you can set each file’s protection level individually on the LizardFS array. Less important files can have a lower level of protection while more important files can have a higher level of redundancy or protection. This would mean, different files can have a recoverability success for a different number of drive-failure depending on the file’s redundancy setting.
Do note that file-level does not mean you have to set every file. You can simply set a folder’s setting and all the files in the folder will follow the folder’s redundancy setting

[Tutorial] LizardFS - Drive Level File Redundancy On Single Computer 4
Credit: LizardFS

LizardFS was, however, designed to be a server-level redundancy rather than drive an individual drive level redundancy in a single machine. Meaning, a single server is something like a single drive in a traditional RAID array. An array of servers makes up something like a hard drive RAID array. Out of the box, LizardFS does not support drive level redundancy. However, in this tutorial, I will show you how to do just that for those who have a single machine and wanted to take advantage of LizardFS’s features.

The best part of using LizardFS is that you can freely add more drives into the array without having to even match drive sizes at any time. LizardFS will then work to rebalance your files according to each file’s set redundancy and protection level.

[Tutorial] LizardFS - Drive Level File Redundancy On Single Computer 5
Credit: LizardFS

Overall LizardFS Plan

LizardFS system consists of a master server, chunk server, and shadow servers.

So here is our plan:

  1. Configure a master server.
  2. Configure multiple chunk servers (according to how many drives you have).
  3. Configure the metalogger server.

The trick to this part is configuring all of these into a single machine where each will have different ports instead of separate servers with different IP. The master server being the redundancy controller while the chunk server is for the individual drive. Except for the shadow server. You can get around by not having a shadow server but this will be explained later.

Make sure you stick to the very end! If you don’t finish this tutorial completely, there might be a chance that your setup will run just fine but when a disk failure occurs or the OS disk died, you will be left with a bunch of data that you might not be able to recover or even rebuild without professional enterprise support from LizardFS. 

Following all the steps to the end will ensure that your setup was proper!

Tutorial note:

In this tutorial, I will use 4 empty drives. Typically this is where people just set up RAID 5 or RAID 6 and be done with the setup having a fixed number of drive-failure protection and fixed usable storage.

If you follow this guide, make sure your drives are empty because when we format it, you are going to lose all data inside the drives.

Ubuntu 18.04.3 LTS is used in this tutorial. It can be any Linux distro of your liking but do note that some paths might be different.

Some commands might require root permission.

If you are required to perform multiple of the same command, I will use capital X as a variable. For example, sda, sdb, sdc and sdb will be represented by sdX.

 

Format Drives & Mount Drives

List the disks available in your system:

lsblk

Take note of the drives you want to use. In our case, it will be:

/dev/sda
/dev/sdb
/dev/sde
/dev/sdf

First, format the all drives into GPT

fdisk /dev/sdX

In this utility, we just go with the default values since we are going to use the entire drive space. If there are any signature warnings, just overwrite the signature.

Welcome to fdisk (util-linux 2.31.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

Command (m for help): n
Partition number (1-128, default 1):
First sector (2048-2097118, default 2048):
Last sector, +sectors or +size{K,M,G,T,P} (2048-2097118, default 2097118):

Created a new partition 1 of type 'Linux filesystem' and of size 1023 MiB.
Partition #1 contains a xfs signature.

Do you want to remove the signature? [Y]es/[N]o: y

The signature will be removed by a write command.

Command (m for help): w
The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.

Repeat for all drives.

The second step is to create an xfs partition. You can also use ext4. It does not matter but xfs might give you slightly more performance.

mkfs.xfs /dev/sdX1

Repeat on all drives. We format the partition number 1 since that is what we set when we use fdisk to create a GPT partition in the previous step.
Note: If there is an error, install the xfs format first

apt install xfsprogs

Now we should see four drives with one partition each that has been formatted to xfs.

lsblk -f
NAME FSTYPE LABEL UUID MOUNTPOINT
sda
└─sda1 xfs 2b1175c1-7948-4085-8aac-abfaee561b4d
sdb
└─sdb1 xfs 91c673fb-e354-43f9-ba14-cfd17c3a728c
sdc
└─sdc1 xfs 00d19473-f111-4b51-95b7-f15d64b7390c
sdd
└─sdd1 xfs bd103d36-ab6b-4d65-8002-fcc5bc76d0fa

You might want to take note of the UUID and save it somewhere else because we will use it to mount our drives later. Otherwise, your drives would not be mounted if your machine reboots.

Let’s try mounting all of the drives:

mkdir /mnt/diskX
mount /dev/sdX1 /mnt/diskX

If everything was mounted fine, it means your drives are ready to be registered for automatic mounting using the fstab configuration file.

Edit your fstab file and add these 4 lines at the bottom. Don’t delete any existing lines or your machine will not boot!
Use the UUID that we took note earlier for the appropriate drive number.
In my case, sda1 is disk1, sdb1 is disk2 and so on.

/etc/fstab

UUID=1b0fcaf6-2b60-4b59-8cc0-e88ef2d9d4d1 /mnt/disk1 xfs rw,noexec,nodev,noatime,nodiratime,largeio,inode64 0 0
UUID=e3f3e452-c5a1-4262-974f-6653a3f636f5 /mnt/disk2 xfs rw,noexec,nodev,noatime,nodiratime,largeio,inode64 0 0
UUID=19b166d8-9a91-4c8a-8d3a-8332136dd155 /mnt/disk3 xfs rw,noexec,nodev,noatime,nodiratime,largeio,inode64 0 0
UUID=d21756fc-5f73-43a0-80ab-864827395a56 /mnt/disk4 xfs rw,noexec,nodev,noatime,nodiratime,largeio,inode64 0 0

According to the official documentation, it is recommended to have this as a mount option as we had done above

rw,noexec,nodev,noatime,nodiratime,largeio,inode64,barrier=0

However, this is recommended only if you are formatted as xfs.

At this point, it is a good idea to reboot your system to see if the drives are properly mounted automatically.

Since LizardFS will be accessing the drives, make sure to give the correct permission to each drive:

chown lizardfs /mnt/diskX

Install LizardFS

In the most basic setup, we are going to need to install these four components of LizardFS:

apt install lizardfs-master lizardfs-chunkserver lizardfs-client lizardfs-cgiserv

You will also have to set a static IP for your machine. For some reason, LizardFS just don’t allow you to use locahost or 127.0.0.1. We will need to set your machine’s IP and the name as mfsmaster.

/etc/hosts

127.0.0.1 localhost
127.0.1.1 hostname
192.168.0.173 mfsmaster

Configure The Master Server

Copy the metadata database into the default location to create a new database.

cp /var/lib/lizardfs/metadata.mfs.empty /var/lib/lizardfs/metadata.mfs

This will be our database file. All of the file trackings will be done in this file. Don’t worry,  It will start to make sense later on during the last steps.

For the master server, we will need three files:

mfsmaster.cfg
mfsexports.cfg
mfsgoals.cfg

These are to be located in:

/etc/lizardfs/

Here are the contents of each file:

/etc/lizardfs/mfsmaster.cfg

PERSONALITY = master
NO_ATIME = 1
AUTO_RECOVERY = 1
CHUNKS_LOOP_MAX_CPU = 80
ENDANGERED_CHUNKS_PRIORITY = 1
CHUNKS_LOOP_MIN_TIME = 15

You may change the values of how much percentage does LizardFS has on your CPU. Depending on your situation.
Endangered chunks priority, I just set to value 1 since we are doing all of this in a single machine.

More information about mfsmaster.cfg can be read here.

/etc/lizardfs/mfsexports.cfg

* / rw

/etc/lizardfs/mfsgoals.cfg

1 single : _
2 redundant : $xor3
3 critical : $ec(3,2)
4 double : _ _

This is the important part that you may want to pay attention to. Goals mean the replication level goals of each file. Here, you are specifying the levels of redundancies that you have enabled for your particular LizardFS setup.

Do note that goal name can be any name. I just put mine as single, redundant and such.

For the sake of this tutorial, I will explain the modes briefly but I recommend that you read up further about the modes and configurations in detail to fully understand it.

The official documentation regarding replication goals can be read here.

_ a single underscore means your file will be written to a single hard drive in the pool.

_ _ a double underscore means that your data will be written on two drives, duplicated. This is pretty much like RAID 1.

$xor3 means that you are splitting the file into 3 chunks with one parity. This is similar to how RAID 5 works which give single-drive failure protection.

$ec(3,2) is for when you want to have 3 chunks across 3 drives with two parity redundancy. Just like RAID 6, this can tolerate two-drive failure.

The combination can be very flexible. You can do $xor7 or $ec(4,3) if you want. This is why I recommend you to read up on this part in the official documentation to understand how it really works.

Later on in this tutorial, we will see how you can set a file’s goal or a folder’s goal.

Now with the master configuration done, we can go ahead and enable the LizardFS master server. Using the enable command will register our service as an autostart on boot.

systemctl enable lizardfs-master
systemctl start lizardfs-master

Configure The Chunk Servers

Alright, so as we understood earlier, each chunk server is a drive. Since LizardFS is designed to be a server-level data redundancy, we will just treat each drive in our machine, as a chunk server.

We will configure four chunk servers, one for each drive. The configuration is identical except for the IP ports.

My preferred way of doing this is to store the configuration on the OS drive that the master server sits in. This way, it is easier for us to rebuild the array when we have to swap out a disk in an event of a failure.

Make a directory as such

mkdir /etc/lizardfs/chunkservers
mkdir /etc/lizardfs/chunkservers/diskX

Inside chunk servers directory, there should be four folders as such:

[email protected]:/etc/lizardfs/chunkservers# ls -l
total 20
drwxr-xr-x 2 root root 4096 Nov 10 22:37 disk1
drwxr-xr-x 2 root root 4096 Nov 10 22:37 disk2
drwxr-xr-x 2 root root 4096 Nov 10 22:36 disk3
drwxr-xr-x 2 root root 4096 Nov 10 22:53 disk4

For each chunk server, we will need two configuration files:

mfschunkserver.cfg
mfshdd.cfg

/etc/lizardfs/chunkservers/diskX/mfschunkserver.cfg

SYSLOG_IDENT = mfschunkserver-diskX
DATA_PATH = /mnt/diskX
CSSERV_LISTEN_PORT = 5000X
HDD_CONF_FILENAME = /etc/lizardfs/chunkservers/diskX/mfshdd.cfg
HDD_LEAVE_SPACE_DEFAULT = 0

Make sure to change the X with the drive number.

/etc/lizardfs/chunkservers/diskX/mfshdd.cfg

/mnt/diskX

Alright, everything seems to be good to go. Next step is to configure on how to actually run the server individually for each disk. In order to acheive this, we will use systemd to help us.

Create a service file

/etc/systemd/system/[email protected]

[Unit]
Description=LizardFS chunkserver daemon
Documentation=man:mfschunkserver
After=local-fs.target network.target lizardfs-master.service
Wants=network-online.target

[Service]
Type=notify
ExecStart=/usr/sbin/mfschunkserver -d start -c /etc/lizardfs/chunkservers/%i/mfschunkserver.cfg
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-abort
OOMScoreAdjust=-999
IOAccounting=true
IOWeight=250
StartupIOWeight=100

[Install]
WantedBy=multi-user.target

As we can see here, the variable when we start the chunk server service will be used to load the appropriate mfschunkserver.cfg from the different directories of diskX.

Enable all of the chunk server services:

systemctl enable [email protected]
systemctl start [email protected]

Remember to replace X with your drive numbers. This is a variable that will be passed into the systemd file that we created earlier into the ExecStart path to select the appropriate configuration file based on disk number ( %i ).

By now, you will have the master server running and four chunk servers.

Finalizing The Array For Practical Use

We can check the status of our array by using the web GUI. Enable the webserver:

systemctl enable lizardfs-cgiserv
systemctl start lizardfs-cgiserv

Visit in your web browser

http://[machine’s IP address]:9425/mfs.cgi

You would want to pay attention to these three tabs:

Servers
Disks
Config

[Tutorial] LizardFS - Drive Level File Redundancy On Single Computer 6

As we can see, all of our 4 drives are showing up. Yours will show 0% used disk space.
Make sure everything looks okay including the port number matching up to the disk number.

Lastly, make sure the goal definitions appears as what you set in mfsgoals.cfg earlier. We will use this, later on, to set individual redundancy level for each file or folder.

It is time to mount the array and start doing some write tests!

mkdir /mnt/lizardfs
chown lizardfs /mnt/lizardfs
mfsmount /mnt/lizardfs

It can be any name such as /mnt/array and such.

(optional) There are cases where you would want to make this mount point permanent such as if you are setting your LizardFS array as a NAS running SMB or just that you are setting LizardFS array up on your main daily computer.

Insert this into the end of your fstab

/etc/fstab

lizardfs fuse rw,mfsdelayedinit,nodev,noexec,nosuid,nofail 0 0

nodev = don’t allow device nodes (can’t have /dev/sda on this disk)
noexec = don’t allow executable files (can’t run programs stored on this disk)
nosuid = don’t allow executable files that autorun as root
nofail = don’t fail boot sequence if this fails

Now your LizardFS array will show up as a single drive. Just as if you were mounting a RAID array.
If you write to this array now, the default redundancy goal is a single disk mode _

This means it will write your data into any of the drives without any redundancy. To make life easier, we will create some folders to assist us with managing file redundancy modes.

Configuring Redundancy Modes

For our case, we configured 3 redundancy modes which was

1 single : _
2 redundant : $xor3
3 critical : $ec(3,2)
4 double : _ _

By default, everything written to the root will be treated as single. Create four folders, one for each mode:

mkdir /mnt/lizardfs/single
mkdir /mnt/lizardfs/redundant
mkdir /mnt/lizardfs/critical
mkdir /mnt/lizardfs/double

Then set the goals of each folder:

lizardfs setgoal -r single /mnt/lizardfs/single
lizardfs setgoal -r redundant /mnt/lizardfs/redundant
lizardfs setgoal -r critical /mnt/lizardfs/critical
lizardfs setgoal -r double /mnt/lizardfs/double

Now check if our replication goals were set properly:

lizardfs getgoal -r /mnt/lizardfs/single
lizardfs getgoal -r /mnt/lizardfs/redundant
lizardfs getgoal -r /mnt/lizardfs/critical
lizardfs getgoal -r /mnt/lizardfs/double

We should see something like this for each folder:

/mnt/lizardfs/important/:
files with goal important : 1
directories with goal important : 1

So, when you write files into the folder single, you will use less space in the entire pool but you will have no redundancy.
If you write into the redundant folder, you will use as much space as you would use in a RAID 5 setup and the data can survive a single-disk failure.
If you write your files into the critical folder, you will have a better redundancy by having two parity chunks, meaning this data can survive a two-disk failure.

That is the main idea, your entire array can be as efficient as you want it to be by balancing out the redundancy level and recoverability level.

It depends on the redundancy goals you set earlier on in mfsgoals.cfg.

Final Step – Ensuring Recoverability and Security

RECOVERABILITY

So, the way LizardFS keeps track of which pieces of files go into which chunk server (in our case, drives) is by keeping a database. Earlier, we set up the master server which keeps this database inside the current OS hard drive.

It is critical that we ensure that this database is redundant! This is so that we can still access, recover or rebuild our LizardFS array incase the OS drive goes bad or anything happens to it.

The official way of making sure our LizardFS database is safe is to configure the LizardFS metalogger server. This ensures that records are kept for any changes that occurred in the database to ensure that we could rebuild back our database from the metalogger.

By default, the metalogger files are generated and stored in /var/lib/lizardfs/
We should see a bunch of files as such:

[Tutorial] LizardFS - Drive Level File Redundancy On Single Computer 7

Our plan to ensure that the metalogger files are redundant is to write the metalogger files into all of your drives, bypassing LizardFS by directly writing to the drives (/mnt/diskX/).

Lets create four config files. Create one config file for each metalogger so we could tell it where to write the metalogger files to.

/etc/lizardfs/chunkservers/diskX/mfsmetalogger.cfg

DATA_PATH = /mnt/diskX/metalogger

Create a folder inside our disks directly

mkdir /mnt/diskX/metalogger
chown lizardfs -R /mnt/diskX/metalogger

Just like how we configured chunk servers, we are going to have 4 metaloggers, one writing to each drive by using systemd.

/etc/systemd/system/[email protected]

[Unit]
Description=LizardFS metalogger daemon
Documentation=man:mfsmetalogger
After=local-fs.target network.target lizardfs-metalogger.service
Wants=network-online.target

[Service]
Type=notify
ExecStart=/usr/sbin/mfsmetalogger -d -c /etc/lizardfs/chunkservers/%i/mfsmetalogger.cfg
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-abort
OOMScoreAdjust=-999
IOAccounting=true
IOWeight=250
StartupIOWeight=100

[Install]
WantedBy=multi-user.target

Now we can enable all four metalogger servers

systemctl enable [email protected]
systemctl start [email protected]

Now we have ensured that all of our data can be recovered in an event of main OS failure, we will take a look into making sure that our setup is secure.

SECURITY

By default, any new chunk server could just connect to the LizardFS master server. This is ideal in a datacenter setup as people could just set up a new physical chunk server and have all the data automatically rebalanced from all the other chunk servers.

In a home or office environment, we do not want this to happen. We want to only restrict everything to a single machine. Having a random person plugging into the network running a random LizardFS chunk server will result in a huge disaster.

Imagine your setup is a 4 disk system (4 chunk servers). A random attacker will come in and run their own LizardFS chunk server on their computer. LizardFS master will gladly just accept this connection and start to distribute chunks of data to the attacker’s chunk server. This will result in potentially irrecoverable data loss and not mention, data leakage.

To protect us from this, we need to seal off everything from the outside from contacting our LizardFS Master server and keep everything internal inside our single machine. If you are still reading this but have no idea on how to protect against this, we will set up a firewall inside our machine. The reason why I mention it like this is that there are many ways of doing this. This includes advanced firewall and network policy such as VLAN or IPsec.

Setup the firewall on our machine. We need these ports open:

9425 – LizardFS Monitor GUI
SSH – If you are using SSH
Samba – If you are planning to set up this as a NAS

If   have other services that need their port open for an incoming connection, take note and you can add them into the allowed ports:

ufw allow SSH
ufw allow Samba
ufw allow 9425

Once you are confident that everything is properly set, enable the firewall:

ufw enable

Your setup is now secured! All incoming connection will be blocked from reaching your server except for those ports we allowed.

Conclusion

You should now have a proper setup for a single machine with LizardFS running on localhost with all the chunk servers as a single drive. File-level redundancy can now be enjoyed at its fullest and you won’t have to squeeze your head on “should I really put these junk files inside my RAID 6 array and waste all those space to redundancy when I don’t need the redundancy?”

By now, you should have these running on your local machine

  1. LizardFS Master
  2. Four LizardFS Chunk Server
  3. LizardFS Metalogger
  4. LizardFS Web GUI

If you are going to set this up as a NAS, I would recommend that you find good software to manage users and mount points to expose. Alternatively, you can setup SMB manually and create a share to your LizardFS mount point.

Special thanks to @Volvagia356 for the systemd config files and the idea to set up LizardFS in such manner. 


Like our Facebook Page here at NasiLemakTech.com for more news and in-depth reviews! Also, join our Facebook Group for insightful information and memes!