C-SCALE Tutorial: Slurm

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 101017529.
Copernicus - eoSC AnaLytics Engine
C-SCALE tutorial: Slurm
Sebastian Luna-Valero, EGI Foundation
sebastian.luna.valero@egi.eu
C-SCALE tutorial: Slurm | 24th February 2023 | Online

Outline
• Why using batch schedulers?
• Why Slurm?
• Get to know your computational jobs
• Get to know your computational cluster
• Running workloads with Slurm
2

Why using batch schedulers?
3
Workload Computing resources: PC
CPU
RAM

4
CPU
RAM

5
CPU
RAM

6
CPU
RAM

7
Workload Computing resources: Mainframe
CPU
RAM

8
CPU
RAM

9
CPU
RAM

10
CPU
RAM

11
Workload
CPU
RAM
Computing resources: HTC/HPC cluster

12
Workload
CPU
RAM

13
Workload
CPU
RAM

14
Workload
CPU
RAM

How much can I parallelize my workload?
• Amdahl’s law
15
Overall performance
improvement gained by
optimizing a single part
of a system is limited by
the fraction of time that
the improved part is
actually used

How much can I parallelize my workload?
• In theory: Amdahl’s law
• In practice:
• It depends on the problem you are solving
• e.g. by region
• Parallelising code can be complex
• We think sequentially
• Dependency among subtasks are difficult to debug
• Focus today: embarrassingly parallel workloads
• Worth it when time(subtask) >> time(overheads)
16

Motivations
• Resources required to execute your workload >> Resources available on your PC
• Symptoms when running code on the PC
• It takes ages to finish
• Executing the code freezes the PC
• Executing the code eats all the memory or disk on your PC
17

18

Disclaimer
• Using a batch scheduler comes with its own issues!
19

Why Slurm?
Reasons to use Slurm
• Developed in the open since 2000
• Commercial support available
• Widespread use at government laboratories, universities and companies worldwide and
performs workload management for over half of the top 10 systems in the TOP500.
• The job scheduler of choice across C-SCALE HTC/HPC clusters
20

Why Slurm?
Slurm Architecture
21

Why Slurm?
Goals
• It allocates exclusive and/or non-exclusive access to resources (compute nodes) to users
for some duration of time so they can perform work
• It provides a framework for starting, executing, and monitoring work (normally a parallel
job) on the set of allocated nodes
• It arbitrates contention for resources by managing a queue of pending work.
22

Get to know your computational jobs
Questions:
• How many threads (CPU/cores)?
• How much memory (RAM)?
• How much disk (HDD)?
• What’s the estimated runtime?
• How many jobs per experiment?
• Can it be easily broken down into subtasks?
23

Get to know your computational jobs
Profiling
• Iterative process
• Run first a smaller instance of the experiment
• Quantify computational resources: time, CPU used, RAM used, disk used.
• Increase the size of the experiment and repeat
• Basic tools to start with:
• Linux: use top to see your program in action
• Linux: /usr/bin/time -o profile.out -v ./code
• Slurm: sacct -j <id> --format=JobID,JobName,AveCPU,MaxRSS,Elapsed
• Check language-specific tools (e.g. https://pypi.org/project/py-spy/, thanks Bernhard!)
24

Definitions
• nodes: the compute resource in SLURM
• partitions (queues): node groups
• jobs: allocations of resources
• job steps: sets of tasks within a job.
Amount of computational resources available? https://slurm.schedmd.com/sinfo.html
• sinfo
Get to know your computational cluster
25

• sinfo
26

• sinfo --summarize
27

What partitions can I access? https://slurm.schedmd.com/sacctmgr.html
• sacctmgr show user --association
Note: depending on the cluster configuration, I may need to specify account and partition
when submitting jobs.
28

Amount of resources available in a specific partition? https://slurm.schedmd.com/sinfo.html
• sinfo --partition=el7taskp
• scontrol show partition el7taskp
• https://slurm.schedmd.com/scontrol.html
29

Amount of resources available in a specific partition? https://slurm.schedmd.com/sinfo.html
• sinfo --partition=normal
• scontrol show partition normal
• https://slurm.schedmd.com/scontrol.html
30

Amount of resources available in a specific partition? https://slurm.schedmd.com/scontrol.html
• scontrol show node wn-01
• scontrol show node fat[41-44]
31

Is the cluster empty or busy? https://slurm.schedmd.com/sinfo.html
• sinfo --partition=el7taskp --summarize
32

Is the cluster empty of busy? https://slurm.schedmd.com/squeue.html
• squeue
• squeue -u $USER
• squeue --partition=el7taskp
• squeue --nodelist=fat[41-44]
33

Running workloads with Slurm
34
Workload
CPU
RAM

Let’s submit jobs! https://slurm.schedmd.com/sbatch.html
• sbatch --output=<output.file> # by default stderr and stdout go together to “slurm-<jobid>.out”
• sbatch --error=<error.file> # by default stderr and stdout go together to “slurm-<jobid>.out”
• sbatch --job-name=<job-name> # set job name
• sbatch --account=<account> # get your account with “sacctmgr show user --association”
• sbatch --partition=<partition> # get your partitions with “sacctmgr show user --association”
• sbatch --depend=afterok:<ID> # start this job after job with <ID> has completed successfully
35

• sbatch --ntasks=<number> # number of tasks
• sbatch --nodes=<number> # number of nodes requested
• sbatch --ntasks-per-node=<ntasks> # number of tasks per node
• sbatch --cpus-per-task=<number> # threads per task
• sbatch --mem=<size>[units] # amount of memory per node. Units: [K|M(default)|G|T]
Following examples based on: https://doc.aris.grnet.gr/run/job_submission/
• Check out example MPI and OpenMP jobs
36

• vi script.sh
• sbatch script.sh
37
#!/bin/bash -l
#-----------------------------------------------------------------
# Serial job , requesting 1 core , 2800 MB of memory per node
#-----------------------------------------------------------------
#SBATCH --job-name=serialjob# Job name
#SBATCH --output=serialjob.%j.out # Stdout (%j expands to jobId)
#SBATCH --error=serialjob.%j.err # Stderr (%j expands to jobId)
#SBATCH --ntasks=1 # Total number of tasks
#SBATCH --nodes=1 # Total number of nodes requested
#SBATCH --ntasks-per-node=1 # Tasks per node
#SBATCH --cpus-per-task=1 # Threads per task
#SBATCH --mem=2800 # Memory per node in MB
#SBATCH --partition=el7taskp # Submit queue
#SBATCH --account=hisea # Accounting project
# Launch script
./script

• vi script.sh
• sbatch --array=1-100 script.sh
38
#!/bin/bash -l
#-----------------------------------------------------------------
# Serial job , requesting 1 core , 2800 MB of memory per job
#-----------------------------------------------------------------
#SBATCH --job-name=seraljob# Job name
#SBATCH --output=serialjob.%j.out # Stdout (%j expands to jobId)
#SBATCH --error=serialjob.%j.err # Stderr (%j expands to jobId)
#SBATCH --ntasks=1 # Total number of tasks
#SBATCH --nodes=1 # Total number of nodes requested
#SBATCH --ntasks-per-node=1 # Tasks per node
#SBATCH --cpus-per-task=1 # Threads per task
#SBATCH --mem=2800 # Memory per job in MB
#SBATCH --partition=el7taskp # Submit queue
#SBATCH --account=hisea # Accounting project
echo " SLURM_ARRAY_JOB_ID = " $SLURM_ARRAY_JOB_ID
echo " SLURM_ARRAY_TASK_ID = " $SLURM_ARRAY_TASK_ID
echo " SLURM_JOB_ID = " $SLURM_JOB_ID
# Launch script
./script

Monitoring recent jobs:
• scontrol show jobid <ID> # show information about a specific job
• squeue --job <ID> # show information about a specific job
• squeue --user $USER # show all my pending, running, and completing jobs
Monitoring recent/older jobs:
• sacct --job <ID>
• sacct --job <ID> --format=JobID,JobName,AveCPU,MaxRSS,Elapsed
39

Canceling jobs: https://slurm.schedmd.com/scancel.html
• scancel <ID> # ask Slurm to cancel job with <ID>
40

Interactive, you said?
• srun --pty bash # get an interactive shell on a worker node; not available on GRNET
Running interactive workloads with Slurm
41

Thank you for your attention.
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 101017529.
Copernicus - eoSC AnaLytics Engine
contact@c-scale.eu
https://c-scale.eu
@C_SCALE_EU
Sebastian Luna-Valero, EGI Foundation
sebastian.luna.valero@egi.eu

Further material
• Slurm: Scheduling jobs on ARCHER2
Bonus
43

C-SCALE Tutorial: Slurm

More Related Content

Similar to C-SCALE Tutorial: Slurm

Recently uploaded

C-SCALE Tutorial: Slurm