Running jobs¶
Compute nodes are accessed through Slurm. Work runs as jobs submitted to a partition (queue). See Hardware for the full list of nodes and which partition each belongs to.
Interactive jobs¶
salloc reserves resources and gives you an interactive shell on the
allocated node:
# anatomy of a salloc call:
salloc [-w hostname] [--gres gpu:vendor:count] [--exclusive] [-n number-of-cores] [-t duration] [-p partition]
# any available node, 1 CPU core, no GPU, 1 hour (default)
# CPU limits are not enforced, but GPUs are unavailable.
salloc
# specific node, 2h, single AMD GPU, develop partition
salloc -w gpu-amd -t 02:00:00 --gres gpu:amd:1 -p develop
# allocate an entire node with 2 GPUs for 30 minutes
# please only do this if you actually need to run benchmarks!
salloc --exclusive --gres gpu:nvidia:2 -t 00:30:00
# allocate 24 tasks (= cores) on any node
# this limit is not enforced, but it prevents others from allocating these cores
salloc -n 24
Batch jobs¶
For non-interactive runs, write a job script and submit it with sbatch. The
LRZ Slurm documentation
covers job script syntax in detail.
Tip
Always set -t (time limit) explicitly. The default is short, and an
overly-long allocation blocks other users unnecessarily.