Sync Data

The data you need to work is usually stored at different locations. This could be:

  1. The NetApp storage of the PLUS
    • GFS
    • PFS
    • SFS
  2. A Server you can reach via SSH
  3. A gitlab repository:
  4. Cloud filesystem:
    • Microsoft OneDrive
    • Google Drive
    • Dropbox
    • MyFiles
    • ...

NetApp

How you can access to the NetApp storage of the PLUS depends on the type of share you have. There are two type:

  1. If you can use it directly on MS Windows, you have what is called a "CIFS" or "SMB" share.
  2. If you use it on a Linux machine (like the bombers, the previous clusters), you have an "NFS" share.

CIFS/SMB

As far as we know, there is currently no way to access these.

NFS

You need to use a server within the PLUS networks that acts as a gateway. This server should have the share mounted and you can access it via SSH.

If you have used the "acsc" cluster before, you can just use one of its login nodes. If you are a member of the SBDL group, you can use your bomber.

You can use any methods that allows you to access your files via SSH/SFTP:

  1. plus_sync (see below)
  2. filezilla
  3. rsync
  4. rclone

gitlab and Cloud filesystems

You can use any tool you like to access your data. However, there is one tool that tries to make your life a lot easier and that was specifically developed for this purpose: plus_sync

Using plus_sync to sync your data

plus_sync is a command line tool that lets you sync your data with the same commands from different platforms.

You can either install it in your home folder or as a python package in the environment of your project. Detailed instructions can be found here.

The tool is very versatile. As an example, I'm going to show here how to sync MEG data that you would have accessed on the acsc cluster or the bombers using the /mnt/sinuhe mount point. But you can also sync data that you have on your OneDrive or in a gitlab repository (like the ANC, for example).

We're going to use the sftp remote type, so we need a server that acts as a gateway. Let's say you want to use your bomber, then we can make the following

Assumptions

  • Your gateway server is called: obob-bomber-mmustermann.hpc.sbg.ac.at
  • Your username on the gateway is: b1234567
  • USERNAME is the username on the SCC Pilot
  • The MEG data is mounted at /mnt/sinuhe and you are looking for the project called my_project

Steps

First, open a terminal and navigate to the folder where your scripts should be located. If you do not already have a folder, create it. If you do not have installed plus_sync "globally", do this now or create a new python environment and install it in there.

In any case, you should be able to run plus_sync from the terminal.

 plus_sync

 Usage: plus_sync [OPTIONS] COMMAND [ARGS]...                                   

 Sync data between Gitlab and SinuheMEG or anything else that can be reached    
 via gitlab, SFTP or rsync.                                                     
 Enter plus_sync init to get started.                                           

╭─ Options ────────────────────────────────────────────────────────────────────╮
 --config-file               TEXT  The configuration file to use.             
                                   [env var: PLUS_SYNC_CONFIG_FILE]           
                                   [default: plus_sync.toml]                  
 --install-completion              Install completion for the current shell.  
 --show-completion                 Show completion for the current shell, to  
                                   copy it or customize the installation.     
 --help                            Show this message and exit.                
╰──────────────────────────────────────────────────────────────────────────────╯
╭─ Commands ───────────────────────────────────────────────────────────────────╮
 add             Add new synchronisation items.                               
 init            Initialize a new configuration file.                         
 list-remotes    List the available remotes.                                  
 list-subjects   List the subjects in a sync endpoint.                        
 ls              List the files that are available.                           
 sync            Sync the data.                                               
╰──────────────────────────────────────────────────────────────────────────────╯

If you have not done so, you need to initialize the configuration file. This is done by issuing

plus_sync init

This will create a file called plus_sync.toml in the current directory. This is where we keep the configuration for plus_sync.

Now, we use plus_sync add to add the remote for the MEG data:

plus_sync add sftp --name sinuhe --host obob-bomber-mmustermann.hpc.sbg.ac.at --username b1234567 --path /mnt/sinuhe/data_raw/my_project

This will add the remote to the configuration file. You can check this by issuing

plus_sync list-remotes
plus_sync ls sinuhe
plus_sync list-subjects sinuhe

Now, you can sync the data by issuing

plus_sync sync sinuhe