Selecting Hardware
Node Features & Constraints
On Fluid Numerics systems you can use Slurm constraints to select the compute node to run on by the features available.
You can view the available features for each partition using the sinfo
with appropriate --format
options. We recommend using the following :
sinfo --format="%10P %5c %10m %f"
The format statement provided to sinfo
in the above example has the following meaning
%10P
- Lists the name of the partition using at most 10 characters%5c
- Lists the number of CPUs per node using at most 5 characters%10m
- Lists the amount of memory per node (in MB) using at most 10 characters%f
- Lists the available features
See sinfo
format documentation for more details.
An example listing is shown below
$ sinfo --format="%20N %5c %10m %f"
NODELIST CPUS MEMORY AVAIL_FEATURES
oram 24 54630 epyc,v100,hpc
nicholson 192 513791 epyc,mi300a,hpc
noether 64 240000 epyc,mi210,hpc
Generic resources
On Fluid Numerics systems, we use Generic Resources (GRES) to define the GPU type and quantity available on each compute node. By specifying the GRES via the --gres
flag, you can define the gpu type and count (per node) that you want to have access to in your job.
You can view the available features for each partition using the sinfo
with appropriate --format
options. We recommend using the following :
sinfo --format=%20N %10c %10m %20G"
The format statement provided to sinfo
in the above example has the following meaning
%20N
- Node hostname (up to 20 characters)%10c
- Number of CPUs%10m
- Memory in MB%20G
- GRES (Generic Resources)
An example listing is shown below
sinfo --format="%20N %10c %10m %20G"
NODELIST CPUS MEMORY GRES
oram 24 54630 gpu:v100:4
nicholson 192 513791 gpu:mi300a:4
noether 64 240000 gpu:mi210:4
Keep in mind that on each server, some memory is set aside for the operating system
Getting an exclusive interactive node allocation
A common workflow on the Galapagos cluster is to have a completely interactive session on a compute node with access to all CPUs, GPUs, and available memory. This section provides you a quick one-line salloc
call that will grant you such an allocation (when resources are available), for each of our systems.
Oram (V100)
For a single process/task on the compute node
salloc --time=1:00:00 -N1 -c24 --mem=45G --gres=gpu:v100:4 --exclusive
For one task per GPU
salloc --time=1:00:00 -N1 -n4 -c6 --mem=45G --gres=gpu:v100:4 --exclusive
Noether (MI210)
For a single process/task on the compute node
salloc --time=1:00:00 -N1 -c64 --mem=226G --gres=gpu:mi210:4 --exclusive
For one task per GPU
salloc --time=1:00:00 -N1 -n4 -c16 --mem=226G --gres=gpu:mi210:4 --exclusive
Nicholson (MI300A)
salloc --time=1:00:00 -N1 -c192 --mem=493G --gres=gpu:mi300a:4 --exclusive
For one task per socket
salloc --time=1:00:00 -N1 -n4 -c48 --mem=493G --gres=gpu:mi300a:4 --exclusive
Examples
Selecting MI210 GPUs with constraints
This example allocates a job on a compute node with 1 MI210 GPU using Slurm constraints
#!/bin/bash
#SBATCH --partition=economy
#SBATCH --gpus-per-node=1 # Request one gpu per node
#SBATCH --ntasks-per-node=1 # Indicate you are running one task per node
#SBATCH --cpus-per-task=12 # Request 12 CPUs (6 cores) for your task
#SBATCH --constraint=mi210
Selecting MI210 GPUs with GRES
This example allocates a job on a compute node with 1 MI210 GPU using GRES
#!/bin/bash
#SBATCH --partition=economy
#SBATCH --gres=gpu:mi210:1 # Request one MI210 gpu per node
#SBATCH --ntasks-per-node=1 # Indicate you are running one task per node
#SBATCH --cpus-per-task=12 # Request 12 CPUs (6 cores) for your task