glusterfs

This article describes the process of deploying Gluster FS on Debian 10 (Buster) from an experimental point of view, trying to make the steps easy and able to be done quickly by one person. To achieve this, this article tries to use packages from the official Debian repo, avoiding compilation, and uses ansible to execute commands uniformly.

Before you start

What is Gluster FS

An excerpt from the official GlusterFS website.

Gluster is a free and open-source scalable network filesystem

Gluster is a scalable network filesystem. Using common off-the-shelf hardware, you can create large, distributed storage solutions for media streaming, data analysis, and other data- and bandwidth-intensive tasks. Gluster is free.

Gluster | Storage for your cloud

This introduction is to the effect that GlusterFS is a distributed file system, a network file system, suitable for use in data and bandwidth intensive applications.

I think GlusterFS is also suitable for HPC cluster scenarios in addition to the scenario described above. Compared to traditional file systems like Lustre and BeeGFS in the HPC space, GlusterFS has the following advantages.

  • Continuous maintenance updates (I think this is most important). A widely used HPC infrastructure like Lustre, surprisingly the maintainer has changed several times, the community is very confused, although there are still continuous updates, but the state of such a project can hardly be called a good project.
  • Excellent ecology. The large amount of attention paid to GlusterFS by the open source community has resulted in a large number of materials, articles, and official pre-compiled packages for all major Linux distributions.
  • Relatively good performance. Although no comparative testing has been done, GlusterFS has shown very good performance in other application scenarios, and I believe it should perform well in compute-intensive application scenarios like HPC.
  • Using FUSE mounts. Another feature that I think is relatively advantageous is that GlusterFS is mounted based on FUSE, i.e. in user space. Compared to Lustre, which requires recompiling the kernel or using dkms, GlusterFS mounting method definitely provides a higher degree of decoupling and is more conducive to failure recovery and the like.

Basic concepts

  • Brick: The basic unit of storage, similar to the concept of block device on Linux
  • Volume: A volume is a logical collection of bricks, similar to a volume in LVM
  • GFID: Each file or directory in GlusterFS has a 128-bit numeric identifier called GFID
  • glusterd: Gluster’s management daemon, which needs to run on all servers that provide Volume
  • glusterfsd: daemon used to mount GlusterFS Volumes
  • Namespace: Namespace is an abstract container or environment that is created to hold unique identifier numbers
  • Quorum: Sets the maximum number of failed host nodes in a trusted storage pool
  • Quota: Allows you to set limits on disk space usage by directory or volume
  • Distributed: Distributes volumes
  • Replicated: Replica volumes
  • Distributed Replicated: Distributed replicated volumes
  • Geo-Replication: Offsite backup provides a continuous asynchronous and incremental replication service for sites over LANs, WANs, and the Internet
  • Metedata: Metadata is defined as data description information about the data, GlusterFS does not specifically distinguish between metadata storage
  • Extended Attributes: Extended Attributes are a feature of the file system
  • FUSE: File System in User Space is a loadable kernel module for Unix-like operating systems to allow unprivileged users to create their own file systems while using kernel code. Enables running filesystem code in user space.

Ansible

This installation will use Ansible, which is a powerful tool, see the Ansible website for information, so I won’t go into it here.

Install GlusterFS

Following the official tutorial, the first step is to add the public key for the APT source.

1
wget -O - https://download.gluster.org/pub/gluster/glusterfs/LATEST/rsa.pub | apt-key add -

But in practice this LATEST needs to be replaced with the actual latest major version number, e.g. the latest version of the writeup is 8, so the actual command executed is:

1
ansible all -m shell -a "wget -O - https://download.gluster.org/pub/gluster/glusterfs/LATEST/rsa.pub | apt-key add -" --become

The next step is to add the APT source.

1
ansible -m shell -a "echo deb https://download.gluster.org/pub/gluster/glusterfs/LATEST/Debian/10/amd64/apt buster main > /etc/apt/sources.list.d/gluster.list" --become

Then it was ready to install. Since in my case I was using all four machines as both server and client, it was straightforward to install them all.

1
ansible -m shell -a "apt update; apt-get install -y glusterfs-server glusterfs-cli" --become

In my environment after the installation is complete, it will automatically run glusterd and join systemd boot up, so we have installed all the software. Next is the configuration.

Build the cluster

Run it on any node.

1
sudo gluster peer probe <another node hostname>

This peer probe is theoretically bi-directional, i.e. it does not need to be performed on the other node again. But there is one thing to note, these nodes need to be configured in advance with hostname and hosts files, so that they can all use hostname to access each other. Of course, it is also possible to configure a dns, but it is out of the scope of this article.

Use gluster peer status to see the status of the cluster at this point. Here’s what it looks like in my environment. Note that Connected only means one-way connection, if you don’t configure the hostname in advance, you need to pay attention to the accessibility of other machines. Otherwise, there will be an error when creating the Volume.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
easton@main:~$ sudo gluster peer status
Number of Peers: 3

Hostname: node3
Uuid: 24460b94-03a2-48ea-ab29-ec2cbdd2a008
State: Peer in Cluster (Connected)

Hostname: node1
Uuid: e079d8e3-5972-4801-9525-04f8050b9b6f
State: Peer in Cluster (Connected)

Hostname: node2
Uuid: 84074def-a277-469b-942c-19963113b0c4
State: Peer in Cluster (Connected)

Create Volume

Volume in GlusterFS supports many combinations of Brick, similar to the concept of RAID0, RAID1, RAID10, etc. in RAID, but does not support RAID5 or RAIDZ in ZFS or the like.

Here we directly create a mirror volume with 4 copies.

1
sudo gluster volume create <volume name> replica 4 main{1..3}:/data/gluster main:/data/gluster

The Gluster CLI supports writing like expand {n..m}. If a Volume with the same name has been created before but with an error, it cannot be recreated, so adding the force parameter will solve this problem.

The created volume needs to be started before it can be mounted.

1
sudo gluster volume <volume name> start

Use gluster volume info to view information about Volume.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
easton@main:~$ sudo gluster volume info

Volume Name: home
Type: Replicate
Volume ID: 73fa38da-a3f9-4a97-838b-352fd7b7ff9a
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 4 = 4
Transport-type: tcp
Bricks:
Brick1: node1:/data/gluster
Brick2: node2:/data/gluster
Brick3: node3:/data/gluster
Brick4: main:/data/gluster
Options Reconfigured:
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off

Mount GlusterFS

Mounting GlusterFS is very simple.

1
sudo mount -t glusterfs <server hostname>:/<volume name> <mount point>

For me it’s like this.

1
sudo mount -t glusterfs main:/home /mnt/home

GlusterFS also provides support for NFS mounts. However, since NFS does not support RDMA technology such as InfiniBand, it is rarely used in the HPC field, so NFS mounting is not recommended.

At this point we can create some failures to test the reliability of GlusterFS, for example, after we shut down the main server, the node3 node can still access the files normally, and the changes to the files are synchronized to the main directory after the main server comes online.

One thing worth noting is that it took quite a long time (5-10 seconds) for the three nodes to confirm that the main server was offline and elect a new one after the main server was shut down, which is a big problem in some business applications with high immediacy requirements, but I don’t think it is a particularly big problem in HPC scenarios.

Postscript

The reason for installing GlusterFS is that I needed to build a cluster for testing in my lab, and I did some research on distributed file systems before building it, and finally chose GlusterFS.

The installation of GlusterFS can be said to be quite easy, plus the preliminary research and installation of the operating system, a total of 4-5 hours of my time, if the traditional Lustre scheme, can only use CentOS as the host operating system, because in other distributions Lustre are removed from the kernel, from 4.xx onwards by the Kernel removed The reason is “bad code”, which is one of the reasons why I abandoned Lustre.

Postscript 2

In use, I found that the design of GlusterFS does not distinguish metadata, which makes it extremely time consuming to perform various directory operations, and the performance of small file operations is also very poor, if executed on GlusterFS . /configure on GlusterFS may never be finished. Later I replaced the cluster home directory file system with BeeGFS, which solved this architectural problem.