Introduction
The data that you need to work with is usually stored outside of the cluster. The major disadvantage of synchronizing (i.e. copying) your data is that it occupies storage space on the cluster which is limited and might be expensive.
In many cases, it is better to "mount" the remote storage into your home folder.
Info
What is mounting?
On Linux (and also on MacOS), there are no "drives" like on Windows (C:, D:, etc.).
Instead, one can "mount" a storage (a local device or a remote storage) into a
folder. So, if we have a folder /home/user/mnt/data
, and the folder is empty,
we can "mount" a remote storage into that folder. After that, when we access
/home/user/mnt/data
, we are actually accessing the remote storage.
Understanding user mounts
Normally, a user can not mount devices, only the administrator (root) can do that. However, Linux comes with a special filesystem called FUSE (Filesystem in Userspace) which allows users create mounts. This system is quite powerful and lots of FUSE modules exist.
The one that we are going to use here is called sshfs
which allows to mount a remote
filesystem over SSH. This means that you can mount any server that you can access via SSH.
Understanding the problem
If you would only work on one machine, you could just mount the remote storage. However, the SCC being a cluster, your jobs are going to run on any of the compute nodes.
And although you can access the same files in your home folder on all compute nodes, the mounts are created on the machine on which you issue the mount command. I.e. if you mount the storage on the login node, your jobs are going to see an empty folder.
The solution: systemd mount services
systemd basically takes care of starting and stopping services on Linux. We are going to exploit two features here:
- systemd can also take care of mounting filesystems
- Users can create their own systemd "units" which are started when the user logs in (or a user's jobs start running)
What you need
To be able to mount a remote storage, you need the following:
- A server that you can access via SSH. In this examples, we are going to use
ssh-gateway.plus.ac.at
- This means you also need an account on that server.
- You need to know the folder on the server that you want to access from the SCC. For instance
/mnt/data/project_X
Warning
The hostname of the server and the folders are just examples. You need to replace them with the actual server and folder that you want to access.
Step by step guide
0. Login to the SCC
Obviously, you need to login to the SCC first.
1. Create an SSH keypair for the remote server
We obviously cannot use a password to mount the remote server because this needs to be done automatically. Instead, we are going to use so-called SSH keys.
It is very important to:
- Use a dedicated keypair for this mount. Do NOT use your regular SSH keys.
- NOT use a passphrase for the key.
Remember, if anyone gets access to this key, they are you on the remote server.
To create a keypair, issue the following command:
ssh-keygen -N "" -f ~/.ssh/mount_data
This creates a keypair in the files ~/.ssh/mount_data
(private key) and ~/.ssh/mount_data.pub
(public key).
2. Register the public key on the remote server
Now we need to register the public key on the remote server.
Warning
It is very important to not use ssh-copy-id here because we need to restrict what can be done with this passphrase-less key.
-
Copy the content of the public key to the clipboard:
cat ~/.ssh/mount_data.pub
-
Login to the remote server:
ssh user@ssh-gateway.plus.ac.at
-
Edit the file
~/.ssh/authorized_keys
:nano ~/.ssh/authorized_keys
-
Add the following line at the beginning of the file:
command="internal-sftp",restrict ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCy... user@local
where
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCy... user@local
is the content of your public key.
3. Test the SSH connection
First, try to use a normal ssh connection with the new key:
ssh -i ~/.ssh/mount_data user@ssh-gateway.plus.ac.at
This should give you a message like:
This service allows sftp connections only.
Connection to ssh-gateway.plus.ac.at closed.
Next, try to use sftp:
sftp -i ~/.ssh/mount_data user@ssh-gateway.plus.ac.at
This should succeed.
If this does not work, you need to fix this before proceeding.
4. Create the mount point
Now, create the folder where you want to mount the remote storage. For instance:
mkdir -p ~/mnt/data
5. Create the systemd mount service
Now we need to create a systemd unit file which takes care of the mount.
Create the folder ~/.config/systemd/user
if it does not exist yet:
mkdir -p ~/.config/systemd/user
Now create the file ~/.config/systemd/user/mnt-data.service
:
nano ~/.config/systemd/user/mnt-data.service
And add the following content (replace user
, ssh-gateway.plus.ac.at
and /mnt/data/project_X
with your actual username, server and folder):
[Unit]
Description=Mount remote storage via sshfs
Requires=home-user.mount # Replace 'user' with your actual username
After=home-user.mount # Replace 'user' with your actual username
[Service]
# Replace 'user', 'ssh-gateway.plus.ac.at' and '/mnt/data/project_X' with your actual username, server and folder
ExecStart=/usr/bin/sshfs -f -o IdentityFile=%h/.ssh/mount_data -o default_permissions,reconnect,ServerAliveInterval=15,ServerAliveCountMax=3 user@ssh-gateway.plus.ac.at:/mnt/data/project_X %h/mnt/data
ExecStop=/usr/bin/fusermount -u %h/mnt/data
[Install]
WantedBy=default.target
6. Reload the systemd user daemon
To make systemd aware of the new unit, issue:
systemctl --user daemon-reload
7. Start the mount
To start the mount, issue:
systemctl --user start mnt-data
You can check the status of the mount with:
systemctl --user status mnt-data
If everything worked, you should now be able to access the remote storage in the folder ~/mnt/data
.
8. Enable automatic mounting at login
To make sure that the mount is automatically started when you login (or when your jobs start running), issue:
systemctl --user enable mnt-data