Migration between HPC systems
There are three typical scenarios when migrating to a new HPC system. These are:
1. First-time access from a workplace PC to an HPC cluster,
2. a transition between two HPC systems of similar scale within the HPC Performance Pyramid, e.g., when changing institutions, and
3. a transition from one HPC system to a larger system, e.g., to carry out a research project on a larger scale.
In each of these cases, questions arise regarding the adaptation of the workflow, such as access, the batch system, available software, or data transfers.
If you are using an HPC system for the first time, you can find more information about the structure and usage of HPC systems in our introductory courses and through our consulting services.
Frequently Asked Questions
Below, we answer frequently asked questions for the thirteen HPC.NRW computing sites in North Rhine-Westphalia.
These points are also summarized in the table of HPC.NRW computing sites.
- Table of HPC.NRW computing suites
- How do I get an account?
- Where can I find the cluster documentation and how do I contact the responsible local support team?
- Are there any technical specifics for access?
- What am I allowed to do on the login node and what belongs in the batch system?
- Which batch system is used?
- What hardware is available on the target system?
- Which filesystems are used?
- What software is available?
- Can I use containers?
- Which services are available (JupyterHub, Remote Desktop, ...)?
- How do I transfer data between two HPC centers?
Table of HPC.NRW computing sites
For access to the individual HPC.NRW computing sites, please refer to the Quick Reference Cards on the computing time request page.
This table summarizes the specific characteristics of each target system.
Location | Batch | Access | Transfer Service | Software3 | EESSI | JupyterHub2 | Container | Documentation |
---|---|---|---|---|---|---|---|---|
|
Slurm |
|
✓ |
✓ |
|
| ||
|
Slurm | SSH with SSH-key | Globus endpoint, rsync, sc |
|
|
|
| Wiki page |
| none, but |
|
|
| ||||
|
|
|
|
| ||||
|
|
| (planned from LiDO4) |
| vmtl Apptainer | |||
Duisburg-Essen | Software | - | planned | |||||
|
|
|
|
- |
✓ | Snakemake Singularity/ |
| |
| Partially (JUSUF) |
✓ | ||||||
Köln | scp, rsync | - | ✓ | Apptainer/ | ||||
Münster | (planned) | planned | Apptainer | |||||
|
|
| ||||||
Siegen | scp | - | ✓ | |||||
Wuppertal | ✓ | ✓ |
1 Indirect
2 With connection to the HPC system
3 In most cases via environment modules
In our support section, you will find the contact options for the respective local support teams.
How do I get an account?
The process varies between locations and typically differs in scope depending on the amount of computing time requested.
Here, we answer all questions regarding the application for computing time.
Where can I find the cluster documentation and how can I contact the responsible local support team?
You can find the links to the documentation of your HPC system in the table of HPC.NRW computing sites.
In our support section, you will also find contact information for the respective local support teams.
Are there any technical specifics regarding access?
HPC systems are attractive targets for hackers and are therefore often subject to various access restrictions to enhance security. Local users can often only access the system from within the university network. Alternatively, a VPN connection may be required to access the system from outside the local network. In some cases, parts of the system are completely disconnected from the internet.
Password-based login may be entirely disabled for external connections — or even in general. In such cases, a public key from an SSH key pair must be registered in order to log in. A two-factor authentication system may also be a prerequisite for access.
For details specific to each location, please refer to the table of HPC.NRW computing sites.
What am I allowed to do on the login node, and what belongs in the batch system?
Users who are new to HPC systems often wonder which tasks can be performed interactively on the login node and which tasks should be handled by the so-called batch system.
Login nodes are typically the main entry point into the HPC system and are shared by many users. This means that any tasks that place a noticeable load on the login node should be avoided. Examples include:
- Running calculations or simulations
- Compiling programs in preparation for computations
- Large-scale file transfers that stress the file systems or network connection of the login node
In general, computations and other resource-intensive tasks should be submitted to the batch system, which manages all users’ jobs according to fair scheduling policies and has access to significantly more resources.
More specific usage guidelines may vary, as some HPC systems are designed with larger login nodes to support certain types of interactive work. These guidelines can usually be found in the system’s documentation.
Which batch system is used?
The batch system ensures a fair distribution of available computing resources and takes into account many parameters that can influence job priority.
At eleven of the thirteen HPC.NRW sites, the batch system is currently implemented using Slurm. In Düsseldorf, PBS Pro is used.
You should check which partitions or queues the HPC system provides and how to use them. Partitions or queues typically represent a group of compute nodes with a specific hardware configuration. For example, there might be one partition/queue for CPU-based computations and another for GPU-based workloads.
Partitions and queues can also be used to assign exclusive access rights to specific user groups or to represent specific configurations — such as high-performance network connections between compute nodes, which may be important for simulations using MPI.
Details on the batch system and the available partitions or queues can be found in the table of HPC.NRW computing sites.
What hardware is available on the target system?
It is important to gather information about the available hardware. The details can be found in the respective HPC system documentation.
A common first question is how many compute nodes are available and how many CPU cores each node has. How much RAM does each node have, and how much RAM per CPU core is available? These factors influence the number of processes and threads used in parallel computations.
The network between compute nodes is relevant for data transfers and calculations using MPI and similar technologies.
For optimizations, it is also important to know whether x86 CPUs (AMD or Intel) or ARM CPUs are used. Compilers and program code should take this into account in order to ensure that calculations are performed as efficiently as possible.
Another common question is about available accelerators, usually GPUs (Nvidia, AMD, Intel), or even FPGAs.
Lastly, understanding the shared file systems is important. Where are work data, software, and configuration files stored? What is available on the login node, and which file systems are also accessible to the compute nodes?
The answers to all these questions may vary between HPC systems, but there are often comparable conventions. More specific information should be obtained from the documentation of the target system whenever switching.
Which file systems are used?
At least one shared file system is used to distribute data between the nodes of the HPC system. There may also be multiple shared file systems, and it is important to find out what each is intended for. For example, is a large distributed computation allowed to frequently access input data or write output data to the file system?
Furthermore, you should know how much storage space is available to you (quota) and whether the file system has regular backups.
Which software is available?
HPC systems often provide a module system, such as LMod, through which software is offered in various versions. Users can load these modules in interactive sessions or in batch jobs.
The installed software and versions may vary between HPC centers and are often based on the previous needs of the users. Specific details about the available software can also be found in the table of HPC.NRW computing sites.
In the case of licensed software, it is important to clarify under what conditions existing licenses can be used or whether access to existing license servers can be provided. This matter is best addressed with the relevant support team of the HPC system.
On any system that provides clients for the CVMFS file system, there is also the option to use software modules from the EESSI project. This solution can be easily tested with the command ls /cvfms/software.eessi.io. In this way, you can use the same software environment across different HPC systems or even on your laptop.
If CVMFS is not available on the HPC system, EESSI modules through containers may be an option.
Can I use containers?
For technical reasons, dockers are often not used on HPC systems. The most common alternative is Apptainer, which is generally used to run containerized processes on HPC systems.
However, some features are dependent on specific settings in the kernel of the host operating system. A common example is user or network namespaces, which may not always be available, as they might need to be temporarily disabled in case a security vulnerability is discovered.
For more details on container usage, please refer to the table of HPC.NRW computing sites.
Which services are available (JupyterHub, Remote Desktop, ...)?
In addition to direct access to the command line via SSH, HPC systems also offer several web-based services.
There is often some form of monitoring available, which allows users to observe the current load of the entire cluster and even the performance of individual batch jobs – at varying levels of detail.
There may also be specialized services for data transfer. These services simplify usage and enable better performance by automatically implementing best practices.
If a JupyterHub instance is available, users can access the resources of the HPC system to interactively develop and execute scripts in languages like Python, Julia, or R.
How do I transfer data between two HPC centers?
The HPC wiki provides an overview of common file transfer methods.
However, transferring non-trivial amounts of data between two HPC systems depends on several factors. If in doubt, please contact the support team of one of the involved HPC systems to clarify any questions.
This overview can serve as a guide:
- Is a direct SSH connection to the target system possible?
- Then use tools such as scp, rsync, rclone, etc.
- Then use tools such as scp, rsync, rclone, etc.
- Is a VPN connection to the target system required, and are you transferring only small amounts of data? (e.g., < 1 GB)
- Activate the VPN on your laptop or workstation.
- The source system should be directly accessible, e.g., by being connected to the university network.
- In this case, you can also use scp, rsync, rclone, etc. to transfer data.
- Transferring large amounts of data requires a direct connection between the participating HPC systems.
- This is due to network throughput limitations and the duration of the transfer process.
- In such cases, please follow the instructions provided in the documentation of the HPC systems or contact one of the local support teams.
- There may be a dedicated transfer service, such as Globus Connect or other solutions designed for this purpose.