I recently needed to make the group’s cluster computing environment available to a third party that was not fully trusted, and needed some isolation (most notably user data under /home), but also needed to provide a normal operating environment (including GPU, Infiniband, SLURM job submission, toolchain management, etc.). After thinking about it, some of the existing options are not very suitable:

  • POSIX permission control: changing the individual folders under /home to 0750 would be a simple and quick way to prevent other users from reading and writing. However, people in the group have the need to access each other very frequently, so it is more troublesome to change it.
  • Container Isolation: Usually you can isolate most things well with containers, but it is more troublesome in HPC environment and requires extra configuration to support GPU and IB NICs. Not to mention that we use SLURM for committing tasks, making the configuration of containers more complex.
  • SELinux control: it can be done very finely, but I have never had any experience deploying SELinux on an HPC cluster and had no idea what problems I would encounter, nor did I find any articles about it.

After further thought, I realized that for users without root privileges, it would be sufficient to hide other people’s home directories from them to meet the needs of such HPC scenarios. So I came up with the idea of using chroot to implement this requirement, and also found libpam-chroot, a module that can automatically switch between users when they log in.

Debian Configuration

In Debian bullseye, the configuration is as follows:

First prepare a user and an environment for isolation:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
useradd -m foo 
mkdir -p /home/jailed/rootfs
cd /home/jailed/rootfs

binds="bin dev etc lib lib32 lib64 opt proc run sbin srv sys tmp usr var home/$user home/spack home/intel"

for d in $binds; do
  mkdir -p $d
  if mount | grep $(realpath $d/); then
    echo $d already mounted;
  else
    mount --bind /$d $d && echo $d mounted;
  fi
done

where binds specifies all directories visible to that user, which can be adjusted as needed, such as mounting home for more users (spack and intel above are both common software environments), or mounting empty tmpfs separately.

Thereafter, install libpam-chroot:

1
2
apt install libpam-chroot
ln -v -s -r -t /usr/lib/x86_64-linux-gnu/security/ /usr/lib/x86_64-linux-gnu/pam_chroot.so

The latter line is due to a bug in Debian packaging(can be seen in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=980047, https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=991113), which currently also requires manually fixing the file location, otherwise the PAM module cannot be found.

Then modify the configuration of PAM itself and the module (keep the original content and add it to the end).

1
2
3
4
5
6
7
--- /etc/pam.d/common-session
+++ /etc/pam.d/common-session
+session optional pam_chroot.so

--- /etc/chroot.conf
+++ /etc/chroot.conf
+foo /home/jailed/rootfs

Then try to log in with this user and you will see that /home is only the three directories mounted in the script, and the cluster hardware and software usage is not affected.

SLURM support

In order to make the SLURM compute nodes hidden as well, in addition to all the above configurations, the following configurations should be added (on all nodes).

1
2
3
4
5
6
7
8
9
--- /etc/slurm/slurm.conf
+++ /etc/slurm/slurm.conf
-UsePAM=0
+UsePAM=1

--- /dev/null
+++ /etc/pam.d/slurm
+session required pam_permit.so
+session optional pam_chroot.so

The slurm PAM service above is a minimalist write, and you can add other items as you see fit.

In addition, if SLURM uses cgroup management tasks, additional mounts of /sys/fs/cgroup and /sys/fs/cgroup/freezer are required, otherwise starting the task will cause slurmd to get stuck.

Notes

OpenSSH also supports direct chroot of a user in sshd_config:

1
2
Match User foo
    ChrootDirectory /home/jailed/rootfs

However, I don’t recommend this as PAM is a more general approach, e.g. to get a consistent experience when other local users switch to these identities. Another important reason is that SLURM does not use SSH at all (instead it is directly fork & exec by its own process), so this is not effective on compute nodes.

There are several different versions of libpam-chroot, Debian comes with gpjt/pam-chroot which needs to read the chroot.conf configuration file. FreeBSD also has module of the same name, which allows you to configure the root and working directory of chroot via the home directory entry in passwd, which feels a bit more convenient.

Finally, with this scheme, multiple users on the same cluster that need to be isolated can share a single rootfs, as long as their respective home directories are properly configured with permissions and cannot access each other. The configuration in chroot.conf can use wildcards and regular expressions, and also allows for more complex and powerful features (such as different rootfs directories for each user).