Storage on the IKIM cluster

The cluster has a number of options for retrieving and storing data. They have vastly different performance characteristics and greatly influence the time required to complete your computational analyses.

Classes of storage

Not all storage locations are alike and it is worth your while to understand their specific properties.

Local storage on the system partition

The local storage on each node typically consists of a system partition and a data partition. The system partition is used for the operating system, the configuration, swap files, pre-installed software. Most directories on the node are read-only to users.

location	purpose	user read-write status	comment
/etc/	configuration	read-only
/var/	temporary files	read-only
/var/tmp	user-generated temporary files	read-write	local disk
/tmp/	user-generated temporary files, deleted on reboot	read-write	local disk

Local storage on the data partition

For some operations the NFS comes with unnecessary overhead. Therefore, the path /local/work is available for creating files and directories that reside on the data partition of the current host. This location should only be used for quick testing, preliminary experimentation and intermediate output. As soon as you need your files saved, move them to /projects or /groups. Local-only files are not backed up and can be deleted without notice.

Here are tips on writing programs, scripts, containers, etc. that make good use of network resources:

Read inputs from and write the final results to /projects or /groups.
Write intermediate output to /local/work.

NFS storage

Read operations on network storage (/projects, /groups) are cached transparently on local storage in the data partition. Generally speaking, your first access to a dataset will be slightly slower than usual due, but any subsequent access will be made from local storage.

The file server has a 10Gib (10Gbs, 10 gigabit per second connection to the entire cluster. As a consequence each node can access a fraction of 10Gib, in the worst case a tiny fraction. However we note that a 250MB (megabyte) file will need a fraction of a second to transfer from the server to the client. This rather brilliant performance stats drastically change if and when random IO (as in not streaming large files, write-locking files, etc.) enter the equation. Those complex operations are best left to local disk.

As a consequence, using local files or cached files is a good idea to ensure good runtime performance.

Three different storage locations exist on the file server:

location	purpose	user read-write status	comment
/projects/	project data	read-write	not listable
/groups/	group files	read-write	not listable
/homes	user home directory	read-write	not cached

Each user has a private home-directory. The contents of which are private to the userm typically no data relevant to any other user, project or your PI should be stored here.

The projects directory provides a means to generate project specific storage, typically associated with a linux group shared by all members of the project. Thus /projects/abc is shared only by members of the project abc. We note that by using the id command users can identify all the groups they belong to. The contents of /projects are cached on the local disk, read access against data in /projects will typically no place too much of burden on the file server. The contents of the /projects folder will not be completely listed when e.g. executing ls /projects/ as contents are mounted on demand by automounter. You can request a /project directory by talking to us on Mattermost or have your PI request one.

The /groups directory is identical to /projects in technology. However every group on the organization has their own subdirectory.