The data you need to work is usually stored at different locations. This could be:
- The NetApp storage of the PLUS
- GFS
- PFS
- SFS
- A Server you can reach via SSH
- A gitlab repository:
- https://gitlab.com
- https://data.anc.plus.ac.at
- Or some other gitlab instance
- Cloud filesystem:
- Microsoft OneDrive
- Google Drive
- Dropbox
- MyFiles
- ...
NetApp
How you can access to the NetApp storage of the PLUS depends on the type of share you have. There are two type:
- If you can use it directly on MS Windows, you have what is called a "CIFS" or "SMB" share.
- If you use it on a Linux machine (like the bombers, the previous clusters), you have an "NFS" share.
CIFS/SMB
As far as we know, there is currently no way to access these.
NFS
You need to use a server within the PLUS networks that acts as a gateway. This server should have the share mounted and you can access it via SSH.
If you have used the "acsc" cluster before, you can just use one of its login nodes. If you are a member of the SBDL group, you can use your bomber.
You can use any methods that allows you to access your files via SSH/SFTP:
- plus_sync (see below)
- filezilla
- rsync
- rclone
gitlab and Cloud filesystems
You can use any tool you like to access your data. However, there is one tool that tries to make your life a lot easier and that was specifically developed for this purpose: plus_sync
Using plus_sync to sync your data
plus_sync is a command line tool that lets you sync your data with the same commands from different platforms.
You can either install it in your home folder or as a python package in the environment of your project. Detailed instructions can be found here.
The tool is very versatile. As an example, I'm going to show here how to sync
MEG data that you would have accessed on the acsc
cluster or the bombers
using the /mnt/sinuhe
mount point. But you can also sync data that you have on
your OneDrive or in a gitlab repository (like the ANC, for example).
We're going to use the sftp
remote type, so we need a server that acts as a gateway. Let's say
you want to use your bomber, then we can make the following
Assumptions
- Your gateway server is called:
obob-bomber-mmustermann.hpc.sbg.ac.at
- Your username on the gateway is:
b1234567
USERNAME
is the username on the SCC Pilot- The MEG data is mounted at
/mnt/sinuhe
and you are looking for the project calledmy_project
Steps
First, open a terminal and navigate to the folder where your scripts should be located.
If you do not already have a folder, create it. If you do not have installed plus_sync
"globally",
do this now or create a new python environment and install it in there.
In any case, you should be able to run plus_sync
from the terminal.
❯ plus_sync
Usage: plus_sync [OPTIONS] COMMAND [ARGS]...
Sync data between Gitlab and SinuheMEG or anything else that can be reached
via gitlab, SFTP or rsync.
Enter plus_sync init to get started.
╭─ Options ────────────────────────────────────────────────────────────────────╮
│ --config-file TEXT The configuration file to use. │
│ [env var: PLUS_SYNC_CONFIG_FILE] │
│ [default: plus_sync.toml] │
│ --install-completion Install completion for the current shell. │
│ --show-completion Show completion for the current shell, to │
│ copy it or customize the installation. │
│ --help Show this message and exit. │
╰──────────────────────────────────────────────────────────────────────────────╯
╭─ Commands ───────────────────────────────────────────────────────────────────╮
│ add Add new synchronisation items. │
│ init Initialize a new configuration file. │
│ list-remotes List the available remotes. │
│ list-subjects List the subjects in a sync endpoint. │
│ ls List the files that are available. │
│ sync Sync the data. │
╰──────────────────────────────────────────────────────────────────────────────╯
If you have not done so, you need to initialize the configuration file. This is done by issuing
plus_sync init
This will create a file called plus_sync.toml
in the current directory. This is where we
keep the configuration for plus_sync
.
Now, we use plus_sync add
to add the remote for the MEG data:
plus_sync add sftp sinuhe obob-bomber-mmustermann.hpc.sbg.ac.at b1234567 /mnt/sinuhe/data_raw/my_project
This will add the remote to the configuration file. You can check this by issuing
plus_sync list-remotes
plus_sync ls sinuhe
plus_sync list-subjects sinuhe
Now, you can sync the data by issuing
plus_sync sync sinuhe