Skip to content

Transferring data to the IKIM cluster

Nota bene: The cluster can accomodate only de-identified data, no directly patient related data can be uploaded. All PII personal identifying information has to be removed prior to upload.

The cluster provices a number of storage facilities described here.

Introduction to data transfer

Data transfer across the network offers a great way to avoid the use of sneakernet (i.e. carrying hard drivers around campus to move data from A to B). We strongly advise reading the info on the cluster storage prior to moving forward here.

Larger scale data transfer requires some degree of familiarity with the technologies available and the sending and receiving systems.

We provide three different means for data transfer. We note that for larger transfers, the speed of the device the data is stored on remotely makes a difference.

Using ssh / scp to move data into the cluster

In short on the remote system execute tar -cpf - | ssh -J "tar -xpf -"

Read this for more details.

Using NC to move data into the cluster

In short:

  • ensure NC is installed on the remote system
  • you need to execute commands on both sending and receiving system On the receiving end, use this command:
nc -vl 44444 | tar zxv

On the sending end, use:

tar czp /path/to/directory/to/send | nc -N 44444

Read this this

How to pick a path here

Depending on your needs and the systems involved, your technology choices may vary. The table below might help pick the right path.

approach size limit number of files comment
browser 500 GB <100 easy to use
ssh/scp 5TB unlimited use tar to group files
nc unlimited unlimited complicated, use zip or tar to group files

Miscellaneous comments

The local storage on each node typically consists of a system partition and a data partition.

Uploading data to the European Genome-phenome Archive (EGA)

The European Genome-phenome Archive (EGA) is a service for permanent archiving and sharing of personally identifiable genetic, phenotypic, and clinical data generated for the purposes of biomedical research projects or in the context of research-focused healthcare systems.

Thus, you can archive patient-related research data that you use in publications, and provide it to other scientists. Importantly, you can controll access and bind other users of the data to any conditions necessary to conform to the patient consent you originally obtained.

There are different paths to upload data to different European Genome-phenome Archive (EGA) server locations, and each can be used from different interfaces to then provide the project and sample metadata. Please refer to the EGA submission documentation for up to date details on the different pathways. Here, we only document working ways of doing the data upload:

SFTP upload to the EGA Inbox

The SFTP upload to the EGA Inbox should work as described in the EGA submission documentation. However, this restricts Metadata submission to the Submitter Portal, which is not documented beyond the obvious features. So if anything doesn't work there, you cannot finish your submission and might have to wait weeks for the HelpDesk to respond.

FTP upload to EGA

For this pathway, make sure to first encrypt your data with EGACryptor, as described in the EGA docs. The only tool we got working for FTP upload is LFTP, however not as described in the EGA docs. Instead, the following set of commands should get a working FTP connection established:

lftp # this just starts the tool and sends you to an lftp prompt, all the following commands are within lftp
set ftp:ssl-allow 0
USER <ega_user_name>

This should ask for your password and after successful login you should be able to use all the standard lftp commands, for example ls to query the remote directory or mput to upload multiple files. With this upload route, you should then be able to use the programmatic metadata submission via Webin.