SlideShare a Scribd company logo
Introduction to SLURM
Ismael Fernández Pavón Cristian Gomollón Escribano
08 / 10 / 2019
What is SLURM?
What is SLURM?
• Allocates access to resources for some duration of time.
• Provides a framework for starting, executing, and
monitoring work (normally a parallel job).
• Arbitrates contention for resources by managing
a queue of pending work.
Cluster manager and job scheduler
system for large and small Linux
LoadLeveler (IBM)
Resource Managers Scheduler
What is SLURM?
ALPS (Cray)
✓ Open source
✓ Fault-tolerant
✓ Highly scalable
LoadLeveler (IBM)
Resource Managers Scheduler
What is SLURM?
ALPS (Cray)
SLURM: Resource Management
SLURM: Resource Management
• Baseboards, Sockets,
Cores, Threads
• CPUs (Core or thread)
• Memory size
• Generic resources
• Features
• State
− Idle − Completing
− Mix − Drain / ing
− Alloc − Down
SLURM: Resource Management
• Associatedwith specific
set of nodes
• Nodes can be in more
than one partition
• Job size and time limits
• Access control list
• State information
− Up
− Drain
− Down
SLURM: Resource Management
• ID (a number)
• Name
• Time limit
• Size specification
• Node features required
• Other Jobs Dependency
• Quality Of Service (QoS)
• State (Pending, Running,
Suspended, Canceled,
Failed, etc.)
SLURM: Resource Management
Jobs Step:
• ID (a number)
• Name
• Time limit (maximum)
• Size specification
• Node features required
in allocation
SLURM: Resource Management
✓ Job scheduling
SLURM: Job Scheduling
Scheduling: The process of determining next job to run and
on which resources.
FIFO Scheduler
Backfill Scheduler
SLURM: Job Scheduling
Scheduling: The process of determining next job to run and
on which resources.
Backfill Scheduler:
• Based on the job request, resources available, and
policy limits imposed.
• Starts with job priority.
• Results in a resource allocation over a period.
SLURM: Job Scheduling
Backfill Scheduler:
• Starts with job priority.
Job_priority = site_factor +
(PriorityWeightAge) * (age_factor) +
(PriorityWeightAssoc) * (assoc_factor) +
(PriorityWeightFairshare) * (fair-share_factor) +
(PriorityWeightJobSize) * (job_size_factor) +
(PriorityWeightPartition) * (partition_factor) +
(PriorityWeightQOS) * (QOS_factor) +
SUM(TRES_weight_cpu * TRES_factor_cpu,
TRES_weight_<type> * TRES_factor_<type>,
...) - nice_factor
•sbatch – Submit a batch script to Slurm.
•salloc – Request resources to SLURM for an interactive
•srun – Start a new job step.
•scancel – Cancel a job.
SLURM: Commands
• sinfo – Report system status (nodes, queues, etc.).
rest up infinite 3 idle~ pirineusgpu[1-2],pirineusknl1
rest up infinite 1 idle canigo2
std* up infinite 11 idle~ pirineus[14,19-20,23,25-26,29-30,33-34,40]
std* up infinite 18 mix pirineus[13,15-16,18,21-22,27-28,35,38-39,41-45,48-49]
std* up infinite 7 alloc pirineus[17,24,31,36-37,46-47]
gpu up infinite 2 alloc pirineusgpu[3-4]
knl up infinite 3 idle~ pirineusknl[2-4]
mem up infinite 1 mix canigo1
class_a up infinite 8 mix canigo1,pirineus[1-7]
class_a up infinite 1 alloc pirineus8
class_b up infinite 8 mix canigo1,pirineus[1-7]
class_b up infinite 1 alloc pirineus8
class_c up infinite 8 mix canigo1,pirineus[1-7]
class_c up infinite 1 alloc pirineus8
std_curs up infinite 5 idle~ pirineus[9-12,50]
gpu_curs up infinite 2 idle~ pirineusgpu[1-2]
SLURM: Commands
• sinfo – Report system status (nodes, queues, etc.).
sinfo -Np class_a -O
canigo1 class_a mixed 113/79/0/192 3094521 976571
pirineus1 class_a mixed 20/28/0/48 191904 120275
pirineus2 class_a mixed 24/24/0/48 191904 185499
pirineus3 class_a mixed 46/2/0/48 191904 54232
pirineus4 class_a mixed 38/10/0/48 191904 58249
pirineus5 class_a mixed 38/10/0/48 191904 58551
pirineus6 class_a mixed 36/12/0/48 191904 114986
pirineus7 class_a mixed 38/10/0/48 191904 58622
pirineus8 class_a allocated 48/0/0/48 191904 165682
SLURM: Commands
1193936 std g09d1 upceqt04 PD 0:00 1 16 32G (Priority)
1195916 gpu A2B2_APO_n ubator01 PD 0:00 1 24 3900M (Priority)
1195915 gpu A2B2_APO_n ubator01 PD 0:00 1 24 3900M (Priority)
1195920 gpu A2B2_APO_n ubator01 PD 0:00 1 24 3900M (Priority)
1195927 gpu uncleaved_ ubator02 PD 0:00 1 24 3900M (Priority)
1195928 gpu uncleaved_ ubator02 PD 0:00 1 24 3900M (Priority)
1195929 gpu cleaved_wt ubator02 PD 0:00 1 24 3900M (Priority)
1138005 std U98-CuONN1 imoreira PD 0:00 1 12 3998M (Priority)
1195531 std g09d1 upceqt04 PD 0:00 1 16 32G (Priority)
1195532 std g09d1 upceqt04 PD 0:00 1 16 32G (Priority)
1195533 std g09d1 upceqt04 PD 0:00 1 16 32G (Priority)
1195536 std g09d1 upceqt04 PD 0:00 1 16 32G (Priority)
1195597 std sh gomollon R 20:04:04 4 24 6000M pirineus[31,38,44,47]
1195579 class_a rice crag49366 R 6:44:45 1 8 3998M pirineus5
1195576 class_a rice crag49366 R 6:36:48 1 8 3998M pirineus2
1195578 class_a rice crag49366 R 6:37:53 1 8 3998M pirineus4
• squeue – Report job and job step status.
SLURM: Commands
• scontrol – Administrator tool to view and/or update
system, job, step, partition or reservation status.
scontrol hold <jobid>
scontrol release <jobid>
scontrol show job <jobid>
SLURM: Commands
JobId=1195597 JobName=sh
UserId=gomollon(80128) GroupId=csuc(10000) MCS_label=N/A
Priority=100176 Nice=0 Account=csuc QOS=test
JobState=RUNNING Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0
RunTime=20:09:58 TimeLimit=5-00:00:00 TimeMin=N/A
SubmitTime=2019-10-07T12:21:29 EligibleTime=2019-10-07T12:21:29
StartTime=2019-10-07T12:21:29 EndTime=2019-10-12T12:21:30 Deadline=N/A
PreemptTime=None SuspendTime=None SecsPreSuspend=0
Partition=std AllocNode:Sid=login2:20262
ReqNodeList=(null) ExcNodeList=(null)
NumNodes=4 NumCPUs=24 NumTasks=24 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
MinCPUsNode=1 MinMemoryCPU=6000M MinTmpDiskNode=0
Features=(null) Gres=(null) Reservation=(null)
OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
SLURM: Commands
State Information
Enjoy SLURM!
How to launch jobs?
Login on CSUC infrastructure
• Login
ssh –p 2122
• Transferfiles
scp -P 2122 local_file[path to your folder]
sftp -oPort=2122
• Useful paths
Name Variable Availability Quote/project Time limit Backup
/home/$user $HOME global 4 GB unlimited Yes
/scratch/$user $SCRATCH global unlimited 30 days No
/scratch/$user/tmp/jobid $TMPDIR Local to each node job file limit 1 week No
/tmp/$user/jobid $TMPDIR Local to each node job file limit 1 week No
• Get HC consumption
consum -a ‘any’ (group consumption)
consum -a ‘any’ -u ‘nom_usuari’ (user consumption)
Batch job submission: Default settings
• 4Gb/core (excepting on mem partition).
• 24Gb/core on mem partition.
• 1 core on std and mem partitions.
• 24 cores on gpu partition
• The whole node on KNL partition
• Non-exclusive, multinode job.
• Scratch and Output directory are the submit directory.
Batch job submission
• Basic Linux commands:
Description Command Exemple
List files ls ls /home/user
Making folders mkdir mkdir /home/prova
Changing folder cd cd /home/prova
Copy files cp cp nom_arxiu1 nom_arxiu2
Move file mv mv /home/prova.txt /cescascratch/prova.txt
Delete file rm rm filename
Print file content cat cat filename
Find string into files grep grep ‘word’ filename
List last lines on file tail tail filename
• Text editors : vim, nano, emacs,etc.
• More detailed info and options about the commands:
‘command’ –help
man ‘command’
Scheduler directives/Options
• -c, --cpus-per-task=ncpus number of cpus required per task
• --gres=list required generic resources
• -J, --job-name=jobname name of job
• -n, --ntasks=ntasks number of tasks to run
• --ntasks-per-node=n number of tasks to invoke on each node
• -N, --nodes=N number of nodes on which to run (N = min[-max])
• -o, --output=out file for batch script's standard output
• -p, --partition=partition partition requested
• -t, --time=minutes time limit (format: dd-hh:mm)
• -C, --constraint=list specify a list of constraints(mem, vnc , ....)
• --mem=MB minimum amount of total real memory
• --reservation=name allocate resources from named reservation
• -w, --nodelist=hosts... request a specific list of hosts
• --mem-per-cpu=MB amount of real memory per allocated core
Scheduler directives/Options
#SBATCH-o treball_prova.log
#SBATCH-e treball_prova.err
#SBATCH-p std
#SBATCH-n 48
module load mpi/intel/openmpi/3.1.0
cp –r $input $SCRATCH
mkdir -p $OUTPUT_DIR
cp -r * $output
Batch job submission
Setting up the environment
Move the input files to the working directory
Launch the application(similar to mpirun)
Create the output folderand move the outputs
Gaussian 16 Example
#SBATCH-j gau16_test
#SBATCH-o gau_test_%j.log
#SBATCH-e gau_test_%j.err
#SBATCH-p std
#SBATCH-c 16
module load gaussian/g16b1
cp -r $INPUT_DIR/*.
g16 < input.gau > output.out
mkdir -p $OUTPUT_DIR
cp -r * $output
Vasp 5.4.4 Example
#SBATCH-j vasp_test_%j
#SBATCH-o vasp_test_%j.log
#SBATCH–e vasp_test_%j.err
#SBATCH-p std
#SBATCH-n 24
module load vasp/5.4.4
cp -r $INPUT_DIR/*.
srun `which vasp_std`
mkdir -p $OUTPUT_DIR
cp -r * $output
Gromacs Example
#SBATCH-n 24
#SBATCH-p gpu
module load gromacs/2018.4_mpi
cp -r $HOME/SLMs/gromacs/CASE/*.
srun `which gmx_mpi`mdrun -v -deffnm input_system -ntomp $SLURM_CPUS_PER_TASK -nb
gpu -npme 12 -dlb yes -pin on –gpu_id 01
cp –r * /scratch/$USER/gromacs/CASE/output/
ANSYS Fluent Example
#SBATCH-j truck.cas
#SBATCH-o truck.log
#SBATCH-e truck.err
#SBATCH-p std
#SBATCH-n 16
module load toolchains/gcc_mkl_ompi
cp -r $INPUT_DIR/*.
`/prod/ANSYS16/v162/fluent/bin/fluent3ddp –t $SLURM_NCPUS -mpi=hp -g -i input1_50.txt
mkdir -p $OUTPUT_DIR
cp -r * $output
Best Practices
• Use $SCRATCHas workingdirectory.
• Move only the necessaryfiles(notall files in the folder each time).
• Try to keep importantfiles only at $HOME
• Try to choose the partition and resoruces whose mostfit to your job.

More Related Content

What's hot

Spack - A Package Manager for HPC
Spack - A Package Manager for HPCSpack - A Package Manager for HPC
Spack - A Package Manager for HPC
P99 Pursuit: 8 Years of Battling P99 Latency
P99 Pursuit: 8 Years of Battling P99 LatencyP99 Pursuit: 8 Years of Battling P99 Latency
P99 Pursuit: 8 Years of Battling P99 Latency
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
Flink Forward
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
Alexey Lesovsky
Performance Profiling in Rust
Performance Profiling in RustPerformance Profiling in Rust
Performance Profiling in Rust
Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016
Brendan Gregg
Message Passing Interface (MPI)-A means of machine communication
Message Passing Interface (MPI)-A means of machine communicationMessage Passing Interface (MPI)-A means of machine communication
Message Passing Interface (MPI)-A means of machine communication
Himanshi Kathuria
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old Secrets
Brendan Gregg
Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performance
Container Performance Analysis
Container Performance AnalysisContainer Performance Analysis
Container Performance Analysis
Brendan Gregg
Apache kafka 확장과 응용
Apache kafka 확장과 응용Apache kafka 확장과 응용
Apache kafka 확장과 응용
LBNL Node Health Check Update
LBNL Node Health Check UpdateLBNL Node Health Check Update
LBNL Node Health Check Update
Linux Networking Explained
Linux Networking ExplainedLinux Networking Explained
Linux Networking Explained
Thomas Graf
Improved alerting with Prometheus and Alertmanager
Improved alerting with Prometheus and AlertmanagerImproved alerting with Prometheus and Alertmanager
Improved alerting with Prometheus and Alertmanager
Julien Pivotto
PostGreSQL Performance Tuning
PostGreSQL Performance TuningPostGreSQL Performance Tuning
PostGreSQL Performance Tuning
Maven Logix
Aggregator Leaf Tailer: Bringing Data to Your Users with Ultra Low Latency
Aggregator Leaf Tailer: Bringing Data to Your Users with Ultra Low LatencyAggregator Leaf Tailer: Bringing Data to Your Users with Ultra Low Latency
Aggregator Leaf Tailer: Bringing Data to Your Users with Ultra Low Latency
Futex Scaling for Multi-core Systems
Futex Scaling for Multi-core SystemsFutex Scaling for Multi-core Systems
Futex Scaling for Multi-core Systems
Davidlohr Bueso
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
Flink Forward
5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance
Command Prompt., Inc

What's hot (20)

Spack - A Package Manager for HPC
Spack - A Package Manager for HPCSpack - A Package Manager for HPC
Spack - A Package Manager for HPC
P99 Pursuit: 8 Years of Battling P99 Latency
P99 Pursuit: 8 Years of Battling P99 LatencyP99 Pursuit: 8 Years of Battling P99 Latency
P99 Pursuit: 8 Years of Battling P99 Latency
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
Performance Profiling in Rust
Performance Profiling in RustPerformance Profiling in Rust
Performance Profiling in Rust
Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016
Message Passing Interface (MPI)-A means of machine communication
Message Passing Interface (MPI)-A means of machine communicationMessage Passing Interface (MPI)-A means of machine communication
Message Passing Interface (MPI)-A means of machine communication
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old Secrets
Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performance
Container Performance Analysis
Container Performance AnalysisContainer Performance Analysis
Container Performance Analysis
Apache kafka 확장과 응용
Apache kafka 확장과 응용Apache kafka 확장과 응용
Apache kafka 확장과 응용
LBNL Node Health Check Update
LBNL Node Health Check UpdateLBNL Node Health Check Update
LBNL Node Health Check Update
Linux Networking Explained
Linux Networking ExplainedLinux Networking Explained
Linux Networking Explained
Improved alerting with Prometheus and Alertmanager
Improved alerting with Prometheus and AlertmanagerImproved alerting with Prometheus and Alertmanager
Improved alerting with Prometheus and Alertmanager
PostGreSQL Performance Tuning
PostGreSQL Performance TuningPostGreSQL Performance Tuning
PostGreSQL Performance Tuning
Aggregator Leaf Tailer: Bringing Data to Your Users with Ultra Low Latency
Aggregator Leaf Tailer: Bringing Data to Your Users with Ultra Low LatencyAggregator Leaf Tailer: Bringing Data to Your Users with Ultra Low Latency
Aggregator Leaf Tailer: Bringing Data to Your Users with Ultra Low Latency
Futex Scaling for Multi-core Systems
Futex Scaling for Multi-core SystemsFutex Scaling for Multi-core Systems
Futex Scaling for Multi-core Systems
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance

Similar to Introduction to SLURM

Introduction to SLURM
Introduction to SLURMIntroduction to SLURM
Debugging Ruby Systems
Debugging Ruby SystemsDebugging Ruby Systems
Debugging Ruby Systems
Engine Yard
Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)
Linux Du Jour
Linux Du JourLinux Du Jour
Linux Du Jour
Solr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachSolr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approach
Alexandre Rafalovitch
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Debugging Ruby
Debugging RubyDebugging Ruby
Debugging Ruby
Aman Gupta
Linux Performance Tools 2014
Linux Performance Tools 2014Linux Performance Tools 2014
Linux Performance Tools 2014
Brendan Gregg
Summit demystifying systemd1
Summit demystifying systemd1Summit demystifying systemd1
Summit demystifying systemd1
Susant Sahani
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
Mark Wong
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
Command Prompt., Inc
Basics of unix
Basics of unixBasics of unix
Basics of unix
Deepak Singhal
Infrastructure review - Shining a light on the Black Box
Infrastructure review - Shining a light on the Black BoxInfrastructure review - Shining a light on the Black Box
Infrastructure review - Shining a light on the Black Box
Miklos Szel
Training Slides: 104 - Basics - Working With Command Line Tools
Training Slides: 104 - Basics - Working With Command Line ToolsTraining Slides: 104 - Basics - Working With Command Line Tools
Training Slides: 104 - Basics - Working With Command Line Tools
Unit 10 investigating and managing
Unit 10 investigating and managingUnit 10 investigating and managing
Unit 10 investigating and managing
Designing Tracing Tools
Designing Tracing ToolsDesigning Tracing Tools
Designing Tracing Tools
Brendan Gregg
Designing Tracing Tools
Designing Tracing ToolsDesigning Tracing Tools
Designing Tracing Tools
When the OS gets in the way
When the OS gets in the wayWhen the OS gets in the way
When the OS gets in the way
Mark Price

Similar to Introduction to SLURM (20)

Introduction to SLURM
Introduction to SLURMIntroduction to SLURM
Introduction to SLURM
Debugging Ruby Systems
Debugging Ruby SystemsDebugging Ruby Systems
Debugging Ruby Systems
Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)
Linux Du Jour
Linux Du JourLinux Du Jour
Linux Du Jour
Solr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachSolr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approach
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Debugging Ruby
Debugging RubyDebugging Ruby
Debugging Ruby
Linux Performance Tools 2014
Linux Performance Tools 2014Linux Performance Tools 2014
Linux Performance Tools 2014
Summit demystifying systemd1
Summit demystifying systemd1Summit demystifying systemd1
Summit demystifying systemd1
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
Basics of unix
Basics of unixBasics of unix
Basics of unix
Infrastructure review - Shining a light on the Black Box
Infrastructure review - Shining a light on the Black BoxInfrastructure review - Shining a light on the Black Box
Infrastructure review - Shining a light on the Black Box
Training Slides: 104 - Basics - Working With Command Line Tools
Training Slides: 104 - Basics - Working With Command Line ToolsTraining Slides: 104 - Basics - Working With Command Line Tools
Training Slides: 104 - Basics - Working With Command Line Tools
Unit 10 investigating and managing
Unit 10 investigating and managingUnit 10 investigating and managing
Unit 10 investigating and managing
Designing Tracing Tools
Designing Tracing ToolsDesigning Tracing Tools
Designing Tracing Tools
Designing Tracing Tools
Designing Tracing ToolsDesigning Tracing Tools
Designing Tracing Tools
When the OS gets in the way
When the OS gets in the wayWhen the OS gets in the way
When the OS gets in the way

More from CSUC - Consorci de Serveis Universitaris de Catalunya

Novetats a l'Anella Científica, presentació a la TAC24
Novetats a l'Anella Científica, presentació a la TAC24Novetats a l'Anella Científica, presentació a la TAC24
Novetats a l'Anella Científica, presentació a la TAC24
CSUC - Consorci de Serveis Universitaris de Catalunya
Aprenent a automatitzar amb la Network eAcademy
Aprenent a automatitzar amb la Network eAcademyAprenent a automatitzar amb la Network eAcademy
Aprenent a automatitzar amb la Network eAcademy
CSUC - Consorci de Serveis Universitaris de Catalunya
Cap a l'eficiència total: DevSecOps i entorns basats en contenidors per a una...
Cap a l'eficiència total: DevSecOps i entorns basats en contenidors per a una...Cap a l'eficiència total: DevSecOps i entorns basats en contenidors per a una...
Cap a l'eficiència total: DevSecOps i entorns basats en contenidors per a una...
CSUC - Consorci de Serveis Universitaris de Catalunya
Accelera la innovació amb Copilot per Power Platform
Accelera la innovació amb Copilot per Power PlatformAccelera la innovació amb Copilot per Power Platform
Accelera la innovació amb Copilot per Power Platform
CSUC - Consorci de Serveis Universitaris de Catalunya
Tendències i futur de l'automatització, IA i IAGen
Tendències i futur de l'automatització, IA i IAGenTendències i futur de l'automatització, IA i IAGen
Tendències i futur de l'automatització, IA i IAGen
CSUC - Consorci de Serveis Universitaris de Catalunya
Futurs imaginats i la paradoxa de l'automatització
Futurs imaginats i la paradoxa de l'automatitzacióFuturs imaginats i la paradoxa de l'automatització
Futurs imaginats i la paradoxa de l'automatització
CSUC - Consorci de Serveis Universitaris de Catalunya
Tendencias en herramientas de monitorización de redes y modelo de madurez en ...
Tendencias en herramientas de monitorización de redes y modelo de madurez en ...Tendencias en herramientas de monitorización de redes y modelo de madurez en ...
Tendencias en herramientas de monitorización de redes y modelo de madurez en ...
CSUC - Consorci de Serveis Universitaris de Catalunya
Quantum Computing Master Class 2024 (Quantum Day)
Quantum Computing Master Class 2024 (Quantum Day)Quantum Computing Master Class 2024 (Quantum Day)
Quantum Computing Master Class 2024 (Quantum Day)
CSUC - Consorci de Serveis Universitaris de Catalunya
Publicar dades de recerca amb el Repositori de Dades de Recerca
Publicar dades de recerca amb el Repositori de Dades de RecercaPublicar dades de recerca amb el Repositori de Dades de Recerca
Publicar dades de recerca amb el Repositori de Dades de Recerca
CSUC - Consorci de Serveis Universitaris de Catalunya
In sharing we trust. Taking advantage of a diverse consortium to build a tran...
In sharing we trust. Taking advantage of a diverse consortium to build a tran...In sharing we trust. Taking advantage of a diverse consortium to build a tran...
In sharing we trust. Taking advantage of a diverse consortium to build a tran...
CSUC - Consorci de Serveis Universitaris de Catalunya
Formació RDM: com fer un pla de gestió de dades amb l’eiNa DMP?
Formació RDM: com fer un pla de gestió de dades amb l’eiNa DMP?Formació RDM: com fer un pla de gestió de dades amb l’eiNa DMP?
Formació RDM: com fer un pla de gestió de dades amb l’eiNa DMP?
CSUC - Consorci de Serveis Universitaris de Catalunya
Com pot ajudar la gestió de les dades de recerca a posar en pràctica la ciènc...
Com pot ajudar la gestió de les dades de recerca a posar en pràctica la ciènc...Com pot ajudar la gestió de les dades de recerca a posar en pràctica la ciènc...
Com pot ajudar la gestió de les dades de recerca a posar en pràctica la ciènc...
CSUC - Consorci de Serveis Universitaris de Catalunya
Security Human Factor Sustainable Outputs: The Network eAcademy
Security Human Factor Sustainable Outputs: The Network eAcademySecurity Human Factor Sustainable Outputs: The Network eAcademy
Security Human Factor Sustainable Outputs: The Network eAcademy
CSUC - Consorci de Serveis Universitaris de Catalunya
The Research Portal of Catalonia: Growing more (information) & more (services)
The Research Portal of Catalonia: Growing more (information) & more (services)The Research Portal of Catalonia: Growing more (information) & more (services)
The Research Portal of Catalonia: Growing more (information) & more (services)
CSUC - Consorci de Serveis Universitaris de Catalunya
Facilitar la gestión, visibilidad y reutilización de los datos de investigaci...
Facilitar la gestión, visibilidad y reutilización de los datos de investigaci...Facilitar la gestión, visibilidad y reutilización de los datos de investigaci...
Facilitar la gestión, visibilidad y reutilización de los datos de investigaci...
CSUC - Consorci de Serveis Universitaris de Catalunya
La gestión de datos de investigación en las bibliotecas universitarias españolas
La gestión de datos de investigación en las bibliotecas universitarias españolasLa gestión de datos de investigación en las bibliotecas universitarias españolas
La gestión de datos de investigación en las bibliotecas universitarias españolas
CSUC - Consorci de Serveis Universitaris de Catalunya
Disposes de recursos il·limitats? Prioritza estratègicament els teus projecte...
Disposes de recursos il·limitats? Prioritza estratègicament els teus projecte...Disposes de recursos il·limitats? Prioritza estratègicament els teus projecte...
Disposes de recursos il·limitats? Prioritza estratègicament els teus projecte...
CSUC - Consorci de Serveis Universitaris de Catalunya
Les persones i les seves capacitats en el nucli de la transformació digital. ...
Les persones i les seves capacitats en el nucli de la transformació digital. ...Les persones i les seves capacitats en el nucli de la transformació digital. ...
Les persones i les seves capacitats en el nucli de la transformació digital. ...
CSUC - Consorci de Serveis Universitaris de Catalunya
Enginyeria Informàtica: una cursa de fons
Enginyeria Informàtica: una cursa de fonsEnginyeria Informàtica: una cursa de fons
Enginyeria Informàtica: una cursa de fons
CSUC - Consorci de Serveis Universitaris de Catalunya
Transformació de rols i habilitats en un món ple d'IA
Transformació de rols i habilitats en un món ple d'IATransformació de rols i habilitats en un món ple d'IA
Transformació de rols i habilitats en un món ple d'IA
CSUC - Consorci de Serveis Universitaris de Catalunya

More from CSUC - Consorci de Serveis Universitaris de Catalunya (20)

Novetats a l'Anella Científica, presentació a la TAC24
Novetats a l'Anella Científica, presentació a la TAC24Novetats a l'Anella Científica, presentació a la TAC24
Novetats a l'Anella Científica, presentació a la TAC24
Aprenent a automatitzar amb la Network eAcademy
Aprenent a automatitzar amb la Network eAcademyAprenent a automatitzar amb la Network eAcademy
Aprenent a automatitzar amb la Network eAcademy
Cap a l'eficiència total: DevSecOps i entorns basats en contenidors per a una...
Cap a l'eficiència total: DevSecOps i entorns basats en contenidors per a una...Cap a l'eficiència total: DevSecOps i entorns basats en contenidors per a una...
Cap a l'eficiència total: DevSecOps i entorns basats en contenidors per a una...
Accelera la innovació amb Copilot per Power Platform
Accelera la innovació amb Copilot per Power PlatformAccelera la innovació amb Copilot per Power Platform
Accelera la innovació amb Copilot per Power Platform
Tendències i futur de l'automatització, IA i IAGen
Tendències i futur de l'automatització, IA i IAGenTendències i futur de l'automatització, IA i IAGen
Tendències i futur de l'automatització, IA i IAGen
Futurs imaginats i la paradoxa de l'automatització
Futurs imaginats i la paradoxa de l'automatitzacióFuturs imaginats i la paradoxa de l'automatització
Futurs imaginats i la paradoxa de l'automatització
Tendencias en herramientas de monitorización de redes y modelo de madurez en ...
Tendencias en herramientas de monitorización de redes y modelo de madurez en ...Tendencias en herramientas de monitorización de redes y modelo de madurez en ...
Tendencias en herramientas de monitorización de redes y modelo de madurez en ...
Quantum Computing Master Class 2024 (Quantum Day)
Quantum Computing Master Class 2024 (Quantum Day)Quantum Computing Master Class 2024 (Quantum Day)
Quantum Computing Master Class 2024 (Quantum Day)
Publicar dades de recerca amb el Repositori de Dades de Recerca
Publicar dades de recerca amb el Repositori de Dades de RecercaPublicar dades de recerca amb el Repositori de Dades de Recerca
Publicar dades de recerca amb el Repositori de Dades de Recerca
In sharing we trust. Taking advantage of a diverse consortium to build a tran...
In sharing we trust. Taking advantage of a diverse consortium to build a tran...In sharing we trust. Taking advantage of a diverse consortium to build a tran...
In sharing we trust. Taking advantage of a diverse consortium to build a tran...
Formació RDM: com fer un pla de gestió de dades amb l’eiNa DMP?
Formació RDM: com fer un pla de gestió de dades amb l’eiNa DMP?Formació RDM: com fer un pla de gestió de dades amb l’eiNa DMP?
Formació RDM: com fer un pla de gestió de dades amb l’eiNa DMP?
Com pot ajudar la gestió de les dades de recerca a posar en pràctica la ciènc...
Com pot ajudar la gestió de les dades de recerca a posar en pràctica la ciènc...Com pot ajudar la gestió de les dades de recerca a posar en pràctica la ciènc...
Com pot ajudar la gestió de les dades de recerca a posar en pràctica la ciènc...
Security Human Factor Sustainable Outputs: The Network eAcademy
Security Human Factor Sustainable Outputs: The Network eAcademySecurity Human Factor Sustainable Outputs: The Network eAcademy
Security Human Factor Sustainable Outputs: The Network eAcademy
The Research Portal of Catalonia: Growing more (information) & more (services)
The Research Portal of Catalonia: Growing more (information) & more (services)The Research Portal of Catalonia: Growing more (information) & more (services)
The Research Portal of Catalonia: Growing more (information) & more (services)
Facilitar la gestión, visibilidad y reutilización de los datos de investigaci...
Facilitar la gestión, visibilidad y reutilización de los datos de investigaci...Facilitar la gestión, visibilidad y reutilización de los datos de investigaci...
Facilitar la gestión, visibilidad y reutilización de los datos de investigaci...
La gestión de datos de investigación en las bibliotecas universitarias españolas
La gestión de datos de investigación en las bibliotecas universitarias españolasLa gestión de datos de investigación en las bibliotecas universitarias españolas
La gestión de datos de investigación en las bibliotecas universitarias españolas
Disposes de recursos il·limitats? Prioritza estratègicament els teus projecte...
Disposes de recursos il·limitats? Prioritza estratègicament els teus projecte...Disposes de recursos il·limitats? Prioritza estratègicament els teus projecte...
Disposes de recursos il·limitats? Prioritza estratègicament els teus projecte...
Les persones i les seves capacitats en el nucli de la transformació digital. ...
Les persones i les seves capacitats en el nucli de la transformació digital. ...Les persones i les seves capacitats en el nucli de la transformació digital. ...
Les persones i les seves capacitats en el nucli de la transformació digital. ...
Enginyeria Informàtica: una cursa de fons
Enginyeria Informàtica: una cursa de fonsEnginyeria Informàtica: una cursa de fons
Enginyeria Informàtica: una cursa de fons
Transformació de rols i habilitats en un món ple d'IA
Transformació de rols i habilitats en un món ple d'IATransformació de rols i habilitats en un món ple d'IA
Transformació de rols i habilitats en un món ple d'IA

Recently uploaded

Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
Pablo Gómez Abajo
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024
Vadym Kazulkin
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
Enterprise Knowledge
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Pitangent Analytics & Technology Solutions Pvt. Ltd
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
Safe Software

Recently uploaded (20)

Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters

Introduction to SLURM

  • 1. Introduction to SLURM Ismael Fernández Pavón Cristian Gomollón Escribano 08 / 10 / 2019
  • 3. What is SLURM? • Allocates access to resources for some duration of time. • Provides a framework for starting, executing, and monitoring work (normally a parallel job). • Arbitrates contention for resources by managing a queue of pending work. Cluster manager and job scheduler system for large and small Linux clusters.
  • 4. LoadLeveler (IBM) LSF SLURM PBS Pro Resource Managers Scheduler What is SLURM? ALPS (Cray) Torque Maui Moab
  • 5. ✓ Open source ✓ Fault-tolerant ✓ Highly scalable LoadLeveler (IBM) LSF SLURM PBS Pro Resource Managers Scheduler What is SLURM? ALPS (Cray) Torque Maui Moab
  • 7. Node CPU (Core) CPU (Thread) SLURM: Resource Management Nodes: • Baseboards, Sockets, Cores, Threads • CPUs (Core or thread) • Memory size • Generic resources • Features • State − Idle − Completing − Mix − Drain / ing − Alloc − Down
  • 8. SLURM: Resource Management Partitions: • Associatedwith specific set of nodes • Nodes can be in more than one partition • Job size and time limits • Access control list • State information − Up − Drain − Down Partitions
  • 9. Allocated cores SLURM: Resource Management Allocated memory Jobs: • ID (a number) • Name • Time limit • Size specification • Node features required • Other Jobs Dependency • Quality Of Service (QoS) • State (Pending, Running, Suspended, Canceled, Failed, etc.)
  • 10. Core used SLURM: Resource Management Memory used Jobs Step: • ID (a number) • Name • Time limit (maximum) • Size specification • Node features required in allocation
  • 11. SLURM: Resource Management FULL CLUSTER! ✓ Job scheduling
  • 12. SLURM: Job Scheduling Scheduling: The process of determining next job to run and on which resources. FIFO Scheduler Backfill Scheduler Resources Time
  • 13. SLURM: Job Scheduling Scheduling: The process of determining next job to run and on which resources. Backfill Scheduler: • Based on the job request, resources available, and policy limits imposed. • Starts with job priority. • Results in a resource allocation over a period.
  • 14. SLURM: Job Scheduling Backfill Scheduler: • Starts with job priority. Job_priority = site_factor + (PriorityWeightAge) * (age_factor) + (PriorityWeightAssoc) * (assoc_factor) + (PriorityWeightFairshare) * (fair-share_factor) + (PriorityWeightJobSize) * (job_size_factor) + (PriorityWeightPartition) * (partition_factor) + (PriorityWeightQOS) * (QOS_factor) + SUM(TRES_weight_cpu * TRES_factor_cpu, TRES_weight_<type> * TRES_factor_<type>, ...) - nice_factor
  • 15. •sbatch – Submit a batch script to Slurm. •salloc – Request resources to SLURM for an interactive job. •srun – Start a new job step. •scancel – Cancel a job. SLURM: Commands
  • 16. • sinfo – Report system status (nodes, queues, etc.). PARTITION AVAIL TIMELIMIT NODES STATE NODELIST rest up infinite 3 idle~ pirineusgpu[1-2],pirineusknl1 rest up infinite 1 idle canigo2 std* up infinite 11 idle~ pirineus[14,19-20,23,25-26,29-30,33-34,40] std* up infinite 18 mix pirineus[13,15-16,18,21-22,27-28,35,38-39,41-45,48-49] std* up infinite 7 alloc pirineus[17,24,31,36-37,46-47] gpu up infinite 2 alloc pirineusgpu[3-4] knl up infinite 3 idle~ pirineusknl[2-4] mem up infinite 1 mix canigo1 class_a up infinite 8 mix canigo1,pirineus[1-7] class_a up infinite 1 alloc pirineus8 class_b up infinite 8 mix canigo1,pirineus[1-7] class_b up infinite 1 alloc pirineus8 class_c up infinite 8 mix canigo1,pirineus[1-7] class_c up infinite 1 alloc pirineus8 std_curs up infinite 5 idle~ pirineus[9-12,50] gpu_curs up infinite 2 idle~ pirineusgpu[1-2] SLURM: Commands
  • 17. • sinfo – Report system status (nodes, queues, etc.). sinfo -Np class_a -O "Nodelist,Partition,StateLong,CpusState,Memory,Freemem" NODELIST PARTITION STATE CPUS(A/I/O/T) MEMORY FREE_MEM canigo1 class_a mixed 113/79/0/192 3094521 976571 pirineus1 class_a mixed 20/28/0/48 191904 120275 pirineus2 class_a mixed 24/24/0/48 191904 185499 pirineus3 class_a mixed 46/2/0/48 191904 54232 pirineus4 class_a mixed 38/10/0/48 191904 58249 pirineus5 class_a mixed 38/10/0/48 191904 58551 pirineus6 class_a mixed 36/12/0/48 191904 114986 pirineus7 class_a mixed 38/10/0/48 191904 58622 pirineus8 class_a allocated 48/0/0/48 191904 165682 SLURM: Commands
  • 18. 1193936 std g09d1 upceqt04 PD 0:00 1 16 32G (Priority) 1195916 gpu A2B2_APO_n ubator01 PD 0:00 1 24 3900M (Priority) 1195915 gpu A2B2_APO_n ubator01 PD 0:00 1 24 3900M (Priority) 1195920 gpu A2B2_APO_n ubator01 PD 0:00 1 24 3900M (Priority) 1195927 gpu uncleaved_ ubator02 PD 0:00 1 24 3900M (Priority) 1195928 gpu uncleaved_ ubator02 PD 0:00 1 24 3900M (Priority) 1195929 gpu cleaved_wt ubator02 PD 0:00 1 24 3900M (Priority) 1138005 std U98-CuONN1 imoreira PD 0:00 1 12 3998M (Priority) 1195531 std g09d1 upceqt04 PD 0:00 1 16 32G (Priority) 1195532 std g09d1 upceqt04 PD 0:00 1 16 32G (Priority) 1195533 std g09d1 upceqt04 PD 0:00 1 16 32G (Priority) 1195536 std g09d1 upceqt04 PD 0:00 1 16 32G (Priority) 1195597 std sh gomollon R 20:04:04 4 24 6000M pirineus[31,38,44,47] 1195579 class_a rice crag49366 R 6:44:45 1 8 3998M pirineus5 1195576 class_a rice crag49366 R 6:36:48 1 8 3998M pirineus2 1195578 class_a rice crag49366 R 6:37:53 1 8 3998M pirineus4 • squeue – Report job and job step status. SLURM: Commands
  • 19. • scontrol – Administrator tool to view and/or update system, job, step, partition or reservation status. scontrol hold <jobid> scontrol release <jobid> scontrol show job <jobid> SLURM: Commands
  • 20. JobId=1195597 JobName=sh UserId=gomollon(80128) GroupId=csuc(10000) MCS_label=N/A Priority=100176 Nice=0 Account=csuc QOS=test JobState=RUNNING Reason=None Dependency=(null) Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0 RunTime=20:09:58 TimeLimit=5-00:00:00 TimeMin=N/A SubmitTime=2019-10-07T12:21:29 EligibleTime=2019-10-07T12:21:29 StartTime=2019-10-07T12:21:29 EndTime=2019-10-12T12:21:30 Deadline=N/A PreemptTime=None SuspendTime=None SecsPreSuspend=0 Partition=std AllocNode:Sid=login2:20262 ReqNodeList=(null) ExcNodeList=(null) NodeList=pirineus[31,38,44,47] BatchHost=pirineus31 NumNodes=4 NumCPUs=24 NumTasks=24 CPUs/Task=1 ReqB:S:C:T=0:0:*:* TRES=cpu=24,mem=144000M,node=4 Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* MinCPUsNode=1 MinMemoryCPU=6000M MinTmpDiskNode=0 Features=(null) Gres=(null) Reservation=(null) OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) Command=(null) WorkDir=/home/gomollon Power= SLURM: Commands
  • 23. How to launch jobs?
  • 24. Login on CSUC infrastructure • Login ssh –p 2122 • Transferfiles scp -P 2122 local_file[path to your folder] sftp -oPort=2122 • Useful paths Name Variable Availability Quote/project Time limit Backup /home/$user $HOME global 4 GB unlimited Yes /scratch/$user $SCRATCH global unlimited 30 days No /scratch/$user/tmp/jobid $TMPDIR Local to each node job file limit 1 week No /tmp/$user/jobid $TMPDIR Local to each node job file limit 1 week No • Get HC consumption consum -a ‘any’ (group consumption) consum -a ‘any’ -u ‘nom_usuari’ (user consumption)
  • 25. Batch job submission: Default settings • 4Gb/core (excepting on mem partition). • 24Gb/core on mem partition. • 1 core on std and mem partitions. • 24 cores on gpu partition • The whole node on KNL partition • Non-exclusive, multinode job. • Scratch and Output directory are the submit directory.
  • 26. Batch job submission • Basic Linux commands: Description Command Exemple List files ls ls /home/user Making folders mkdir mkdir /home/prova Changing folder cd cd /home/prova Copy files cp cp nom_arxiu1 nom_arxiu2 Move file mv mv /home/prova.txt /cescascratch/prova.txt Delete file rm rm filename Print file content cat cat filename Find string into files grep grep ‘word’ filename List last lines on file tail tail filename • Text editors : vim, nano, emacs,etc. • More detailed info and options about the commands: ‘command’ –help man ‘command’
  • 27. Scheduler directives/Options • -c, --cpus-per-task=ncpus number of cpus required per task • --gres=list required generic resources • -J, --job-name=jobname name of job • -n, --ntasks=ntasks number of tasks to run • --ntasks-per-node=n number of tasks to invoke on each node • -N, --nodes=N number of nodes on which to run (N = min[-max]) • -o, --output=out file for batch script's standard output • -p, --partition=partition partition requested • -t, --time=minutes time limit (format: dd-hh:mm)
  • 28. • -C, --constraint=list specify a list of constraints(mem, vnc , ....) • --mem=MB minimum amount of total real memory • --reservation=name allocate resources from named reservation • -w, --nodelist=hosts... request a specific list of hosts • --mem-per-cpu=MB amount of real memory per allocated core Scheduler directives/Options
  • 29. #!/bin/bash #SBATCH–jtreball_prova #SBATCH-o treball_prova.log #SBATCH-e treball_prova.err #SBATCH-p std #SBATCH-n 48 module load mpi/intel/openmpi/3.1.0 cp –r $input $SCRATCH Cd $SCRATCH srun $APPLICATION mkdir -p $OUTPUT_DIR cp -r * $output Batch job submission Schedulerdirectives Setting up the environment Move the input files to the working directory Launch the application(similar to mpirun) Create the output folderand move the outputs
  • 30. Gaussian 16 Example #!/bin/bash #SBATCH-j gau16_test #SBATCH-o gau_test_%j.log #SBATCH-e gau_test_%j.err #SBATCH-p std #SBATCH-n 1 #SBATCH-c 16 module load gaussian/g16b1 INPUT_DIR=/$HOME/gaussian_test/inputs OUTPUT_DIR=$HOME/gaussian_test/outputs cd $SCRATCH cp -r $INPUT_DIR/*. g16 < input.gau > output.out mkdir -p $OUTPUT_DIR cp -r * $output
  • 31. Vasp 5.4.4 Example #!/bin/bash #SBATCH-j vasp_test_%j #SBATCH-o vasp_test_%j.log #SBATCH–e vasp_test_%j.err #SBATCH-p std #SBATCH-n 24 module load vasp/5.4.4 INPUT_DIR=/$HOME/vasp_test/inputs OUTPUT_DIR=$HOME/vasp_test/outputs cd $SCRATCH cp -r $INPUT_DIR/*. srun `which vasp_std` mkdir -p $OUTPUT_DIR cp -r * $output
  • 32. Gromacs Example #!/bin/bash #SBATCH--job-name=gromacs #SBATCH--output=gromacs_%j.out #SBATCH--error=gromacs_%j.err #SBATCH-n 24 #SBATCH--gres=gpu:2 #SBATCH-N 1 #SBATCH-p gpu #SBATCH-c 2 #SBATCH--time=00:30:00 module load gromacs/2018.4_mpi cd $SHAREDSCRATCH cp -r $HOME/SLMs/gromacs/CASE/*. srun `which gmx_mpi`mdrun -v -deffnm input_system -ntomp $SLURM_CPUS_PER_TASK -nb gpu -npme 12 -dlb yes -pin on –gpu_id 01 cp –r * /scratch/$USER/gromacs/CASE/output/
  • 33. ANSYS Fluent Example #!/bin/bash #SBATCH-j truck.cas #SBATCH-o truck.log #SBATCH-e truck.err #SBATCH-p std #SBATCH-n 16 module load toolchains/gcc_mkl_ompi INPUT_DIR=$HOME/FLUENT/inputs OUTPUT_DIR=$HOME/FLUENT/outputs cd $SCRATCH cp -r $INPUT_DIR/*. `/prod/ANSYS16/v162/fluent/bin/fluent3ddp –t $SLURM_NCPUS -mpi=hp -g -i input1_50.txt mkdir -p $OUTPUT_DIR cp -r * $output
  • 34. Best Practices • Use $SCRATCHas workingdirectory. • Move only the necessaryfiles(notall files in the folder each time). • Try to keep importantfiles only at $HOME • Try to choose the partition and resoruces whose mostfit to your job.