SlideShare a Scribd company logo
Introduction to SLURM
Ismael Fernández Pavón Cristian Gomollón Escribano
08 / 10 / 2019
What is SLURM?
What is SLURM?
• Allocates access to resources for some duration of time.
• Provides a framework for starting, executing, and
monitoring work (normally a parallel job).
• Arbitrates contention for resources by managing
a queue of pending work.
Cluster manager and job scheduler
system for large and small Linux
clusters.
LoadLeveler (IBM)
LSF
SLURM
PBS Pro
Resource Managers Scheduler
What is SLURM?
ALPS (Cray)
Torque
Maui
Moab
✓ Open source
✓ Fault-tolerant
✓ Highly scalable
LoadLeveler (IBM)
LSF
SLURM
PBS Pro
Resource Managers Scheduler
What is SLURM?
ALPS (Cray)
Torque
Maui
Moab
SLURM: Resource Management
Node
CPU
(Core)
CPU
(Thread)
SLURM: Resource Management
Nodes:
• Baseboards, Sockets,
Cores, Threads
• CPUs (Core or thread)
• Memory size
• Generic resources
• Features
• State
− Idle − Completing
− Mix − Drain / ing
− Alloc − Down
SLURM: Resource Management
Partitions:
• Associatedwith specific
set of nodes
• Nodes can be in more
than one partition
• Job size and time limits
• Access control list
• State information
− Up
− Drain
− Down
Partitions
Allocated
cores
SLURM: Resource Management
Allocated
memory
Jobs:
• ID (a number)
• Name
• Time limit
• Size specification
• Node features required
• Other Jobs Dependency
• Quality Of Service (QoS)
• State (Pending, Running,
Suspended, Canceled,
Failed, etc.)
Core
used
SLURM: Resource Management
Memory
used
Jobs Step:
• ID (a number)
• Name
• Time limit (maximum)
• Size specification
• Node features required
in allocation
SLURM: Resource Management
FULL CLUSTER!
✓ Job scheduling
SLURM: Job Scheduling
Scheduling: The process of determining next job to run and
on which resources.
FIFO Scheduler
Backfill Scheduler
Resources
Time
SLURM: Job Scheduling
Scheduling: The process of determining next job to run and
on which resources.
Backfill Scheduler:
• Based on the job request, resources available, and
policy limits imposed.
• Starts with job priority.
• Results in a resource allocation over a period.
SLURM: Job Scheduling
Backfill Scheduler:
• Starts with job priority.
Job_priority = site_factor +
(PriorityWeightAge) * (age_factor) +
(PriorityWeightAssoc) * (assoc_factor) +
(PriorityWeightFairshare) * (fair-share_factor) +
(PriorityWeightJobSize) * (job_size_factor) +
(PriorityWeightPartition) * (partition_factor) +
(PriorityWeightQOS) * (QOS_factor) +
SUM(TRES_weight_cpu * TRES_factor_cpu,
TRES_weight_<type> * TRES_factor_<type>,
...) - nice_factor
•sbatch – Submit a batch script to Slurm.
•salloc – Request resources to SLURM for an interactive
job.
•srun – Start a new job step.
•scancel – Cancel a job.
SLURM: Commands
• sinfo – Report system status (nodes, queues, etc.).
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
rest up infinite 3 idle~ pirineusgpu[1-2],pirineusknl1
rest up infinite 1 idle canigo2
std* up infinite 11 idle~ pirineus[14,19-20,23,25-26,29-30,33-34,40]
std* up infinite 18 mix pirineus[13,15-16,18,21-22,27-28,35,38-39,41-45,48-49]
std* up infinite 7 alloc pirineus[17,24,31,36-37,46-47]
gpu up infinite 2 alloc pirineusgpu[3-4]
knl up infinite 3 idle~ pirineusknl[2-4]
mem up infinite 1 mix canigo1
class_a up infinite 8 mix canigo1,pirineus[1-7]
class_a up infinite 1 alloc pirineus8
class_b up infinite 8 mix canigo1,pirineus[1-7]
class_b up infinite 1 alloc pirineus8
class_c up infinite 8 mix canigo1,pirineus[1-7]
class_c up infinite 1 alloc pirineus8
std_curs up infinite 5 idle~ pirineus[9-12,50]
gpu_curs up infinite 2 idle~ pirineusgpu[1-2]
SLURM: Commands
• sinfo – Report system status (nodes, queues, etc.).
sinfo -Np class_a -O
"Nodelist,Partition,StateLong,CpusState,Memory,Freemem"
NODELIST PARTITION STATE CPUS(A/I/O/T) MEMORY FREE_MEM
canigo1 class_a mixed 113/79/0/192 3094521 976571
pirineus1 class_a mixed 20/28/0/48 191904 120275
pirineus2 class_a mixed 24/24/0/48 191904 185499
pirineus3 class_a mixed 46/2/0/48 191904 54232
pirineus4 class_a mixed 38/10/0/48 191904 58249
pirineus5 class_a mixed 38/10/0/48 191904 58551
pirineus6 class_a mixed 36/12/0/48 191904 114986
pirineus7 class_a mixed 38/10/0/48 191904 58622
pirineus8 class_a allocated 48/0/0/48 191904 165682
SLURM: Commands
1193936 std g09d1 upceqt04 PD 0:00 1 16 32G (Priority)
1195916 gpu A2B2_APO_n ubator01 PD 0:00 1 24 3900M (Priority)
1195915 gpu A2B2_APO_n ubator01 PD 0:00 1 24 3900M (Priority)
1195920 gpu A2B2_APO_n ubator01 PD 0:00 1 24 3900M (Priority)
1195927 gpu uncleaved_ ubator02 PD 0:00 1 24 3900M (Priority)
1195928 gpu uncleaved_ ubator02 PD 0:00 1 24 3900M (Priority)
1195929 gpu cleaved_wt ubator02 PD 0:00 1 24 3900M (Priority)
1138005 std U98-CuONN1 imoreira PD 0:00 1 12 3998M (Priority)
1195531 std g09d1 upceqt04 PD 0:00 1 16 32G (Priority)
1195532 std g09d1 upceqt04 PD 0:00 1 16 32G (Priority)
1195533 std g09d1 upceqt04 PD 0:00 1 16 32G (Priority)
1195536 std g09d1 upceqt04 PD 0:00 1 16 32G (Priority)
1195597 std sh gomollon R 20:04:04 4 24 6000M pirineus[31,38,44,47]
1195579 class_a rice crag49366 R 6:44:45 1 8 3998M pirineus5
1195576 class_a rice crag49366 R 6:36:48 1 8 3998M pirineus2
1195578 class_a rice crag49366 R 6:37:53 1 8 3998M pirineus4
• squeue – Report job and job step status.
SLURM: Commands
• scontrol – Administrator tool to view and/or update
system, job, step, partition or reservation status.
scontrol hold <jobid>
scontrol release <jobid>
scontrol show job <jobid>
SLURM: Commands
JobId=1195597 JobName=sh
UserId=gomollon(80128) GroupId=csuc(10000) MCS_label=N/A
Priority=100176 Nice=0 Account=csuc QOS=test
JobState=RUNNING Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0
RunTime=20:09:58 TimeLimit=5-00:00:00 TimeMin=N/A
SubmitTime=2019-10-07T12:21:29 EligibleTime=2019-10-07T12:21:29
StartTime=2019-10-07T12:21:29 EndTime=2019-10-12T12:21:30 Deadline=N/A
PreemptTime=None SuspendTime=None SecsPreSuspend=0
Partition=std AllocNode:Sid=login2:20262
ReqNodeList=(null) ExcNodeList=(null)
NodeList=pirineus[31,38,44,47]
BatchHost=pirineus31
NumNodes=4 NumCPUs=24 NumTasks=24 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
TRES=cpu=24,mem=144000M,node=4
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
MinCPUsNode=1 MinMemoryCPU=6000M MinTmpDiskNode=0
Features=(null) Gres=(null) Reservation=(null)
OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
Command=(null)
WorkDir=/home/gomollon
Power=
SLURM: Commands
Jobs:
State Information
Enjoy SLURM!
How to launch jobs?
Login on CSUC infrastructure
• Login
ssh –p 2122 username@hpc.csuc.cat
• Transferfiles
scp -P 2122 local_file username@hpc.csuc.cat:[path to your folder]
sftp -oPort=2122 username@hpc.csuc.cat
• Useful paths
Name Variable Availability Quote/project Time limit Backup
/home/$user $HOME global 4 GB unlimited Yes
/scratch/$user $SCRATCH global unlimited 30 days No
/scratch/$user/tmp/jobid $TMPDIR Local to each node job file limit 1 week No
/tmp/$user/jobid $TMPDIR Local to each node job file limit 1 week No
• Get HC consumption
consum -a ‘any’ (group consumption)
consum -a ‘any’ -u ‘nom_usuari’ (user consumption)
Batch job submission: Default settings
• 4Gb/core (excepting on mem partition).
• 24Gb/core on mem partition.
• 1 core on std and mem partitions.
• 24 cores on gpu partition
• The whole node on KNL partition
• Non-exclusive, multinode job.
• Scratch and Output directory are the submit directory.
Batch job submission
• Basic Linux commands:
Description Command Exemple
List files ls ls /home/user
Making folders mkdir mkdir /home/prova
Changing folder cd cd /home/prova
Copy files cp cp nom_arxiu1 nom_arxiu2
Move file mv mv /home/prova.txt /cescascratch/prova.txt
Delete file rm rm filename
Print file content cat cat filename
Find string into files grep grep ‘word’ filename
List last lines on file tail tail filename
• Text editors : vim, nano, emacs,etc.
• More detailed info and options about the commands:
‘command’ –help
man ‘command’
Scheduler directives/Options
• -c, --cpus-per-task=ncpus number of cpus required per task
• --gres=list required generic resources
• -J, --job-name=jobname name of job
• -n, --ntasks=ntasks number of tasks to run
• --ntasks-per-node=n number of tasks to invoke on each node
• -N, --nodes=N number of nodes on which to run (N = min[-max])
• -o, --output=out file for batch script's standard output
• -p, --partition=partition partition requested
• -t, --time=minutes time limit (format: dd-hh:mm)
• -C, --constraint=list specify a list of constraints(mem, vnc , ....)
• --mem=MB minimum amount of total real memory
• --reservation=name allocate resources from named reservation
• -w, --nodelist=hosts... request a specific list of hosts
• --mem-per-cpu=MB amount of real memory per allocated core
Scheduler directives/Options
#!/bin/bash
#SBATCH–jtreball_prova
#SBATCH-o treball_prova.log
#SBATCH-e treball_prova.err
#SBATCH-p std
#SBATCH-n 48
module load mpi/intel/openmpi/3.1.0
cp –r $input $SCRATCH
Cd $SCRATCH
srun $APPLICATION
mkdir -p $OUTPUT_DIR
cp -r * $output
Batch job submission
Schedulerdirectives
Setting up the environment
Move the input files to the working directory
Launch the application(similar to mpirun)
Create the output folderand move the outputs
Gaussian 16 Example
#!/bin/bash
#SBATCH-j gau16_test
#SBATCH-o gau_test_%j.log
#SBATCH-e gau_test_%j.err
#SBATCH-p std
#SBATCH-n 1
#SBATCH-c 16
module load gaussian/g16b1
INPUT_DIR=/$HOME/gaussian_test/inputs
OUTPUT_DIR=$HOME/gaussian_test/outputs
cd $SCRATCH
cp -r $INPUT_DIR/*.
g16 < input.gau > output.out
mkdir -p $OUTPUT_DIR
cp -r * $output
Vasp 5.4.4 Example
#!/bin/bash
#SBATCH-j vasp_test_%j
#SBATCH-o vasp_test_%j.log
#SBATCH–e vasp_test_%j.err
#SBATCH-p std
#SBATCH-n 24
module load vasp/5.4.4
INPUT_DIR=/$HOME/vasp_test/inputs
OUTPUT_DIR=$HOME/vasp_test/outputs
cd $SCRATCH
cp -r $INPUT_DIR/*.
srun `which vasp_std`
mkdir -p $OUTPUT_DIR
cp -r * $output
Gromacs Example
#!/bin/bash
#SBATCH--job-name=gromacs
#SBATCH--output=gromacs_%j.out
#SBATCH--error=gromacs_%j.err
#SBATCH-n 24
#SBATCH--gres=gpu:2
#SBATCH-N 1
#SBATCH-p gpu
#SBATCH-c 2
#SBATCH--time=00:30:00
module load gromacs/2018.4_mpi
cd $SHAREDSCRATCH
cp -r $HOME/SLMs/gromacs/CASE/*.
srun `which gmx_mpi`mdrun -v -deffnm input_system -ntomp $SLURM_CPUS_PER_TASK -nb
gpu -npme 12 -dlb yes -pin on –gpu_id 01
cp –r * /scratch/$USER/gromacs/CASE/output/
ANSYS Fluent Example
#!/bin/bash
#SBATCH-j truck.cas
#SBATCH-o truck.log
#SBATCH-e truck.err
#SBATCH-p std
#SBATCH-n 16
module load toolchains/gcc_mkl_ompi
INPUT_DIR=$HOME/FLUENT/inputs
OUTPUT_DIR=$HOME/FLUENT/outputs
cd $SCRATCH
cp -r $INPUT_DIR/*.
`/prod/ANSYS16/v162/fluent/bin/fluent3ddp –t $SLURM_NCPUS -mpi=hp -g -i input1_50.txt
mkdir -p $OUTPUT_DIR
cp -r * $output
Best Practices
• Use $SCRATCHas workingdirectory.
• Move only the necessaryfiles(notall files in the folder each time).
• Try to keep importantfiles only at $HOME
• Try to choose the partition and resoruces whose mostfit to your job.

More Related Content

What's hot

Spack - A Package Manager for HPC
Spack - A Package Manager for HPCSpack - A Package Manager for HPC
Spack - A Package Manager for HPC
inside-BigData.com
 
P99 Pursuit: 8 Years of Battling P99 Latency
P99 Pursuit: 8 Years of Battling P99 LatencyP99 Pursuit: 8 Years of Battling P99 Latency
P99 Pursuit: 8 Years of Battling P99 Latency
ScyllaDB
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
confluent
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
Flink Forward
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
Alexey Lesovsky
 
Performance Profiling in Rust
Performance Profiling in RustPerformance Profiling in Rust
Performance Profiling in Rust
InfluxData
 
Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016
Brendan Gregg
 
Message Passing Interface (MPI)-A means of machine communication
Message Passing Interface (MPI)-A means of machine communicationMessage Passing Interface (MPI)-A means of machine communication
Message Passing Interface (MPI)-A means of machine communication
Himanshi Kathuria
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old Secrets
Brendan Gregg
 
Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performance
PostgreSQL-Consulting
 
Container Performance Analysis
Container Performance AnalysisContainer Performance Analysis
Container Performance Analysis
Brendan Gregg
 
Apache kafka 확장과 응용
Apache kafka 확장과 응용Apache kafka 확장과 응용
Apache kafka 확장과 응용
JANGWONSEO4
 
LBNL Node Health Check Update
LBNL Node Health Check UpdateLBNL Node Health Check Update
LBNL Node Health Check Update
inside-BigData.com
 
Linux Networking Explained
Linux Networking ExplainedLinux Networking Explained
Linux Networking Explained
Thomas Graf
 
Improved alerting with Prometheus and Alertmanager
Improved alerting with Prometheus and AlertmanagerImproved alerting with Prometheus and Alertmanager
Improved alerting with Prometheus and Alertmanager
Julien Pivotto
 
PostGreSQL Performance Tuning
PostGreSQL Performance TuningPostGreSQL Performance Tuning
PostGreSQL Performance Tuning
Maven Logix
 
Aggregator Leaf Tailer: Bringing Data to Your Users with Ultra Low Latency
Aggregator Leaf Tailer: Bringing Data to Your Users with Ultra Low LatencyAggregator Leaf Tailer: Bringing Data to Your Users with Ultra Low Latency
Aggregator Leaf Tailer: Bringing Data to Your Users with Ultra Low Latency
ScyllaDB
 
Futex Scaling for Multi-core Systems
Futex Scaling for Multi-core SystemsFutex Scaling for Multi-core Systems
Futex Scaling for Multi-core Systems
Davidlohr Bueso
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
Flink Forward
 
5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance
Command Prompt., Inc
 

What's hot (20)

Spack - A Package Manager for HPC
Spack - A Package Manager for HPCSpack - A Package Manager for HPC
Spack - A Package Manager for HPC
 
P99 Pursuit: 8 Years of Battling P99 Latency
P99 Pursuit: 8 Years of Battling P99 LatencyP99 Pursuit: 8 Years of Battling P99 Latency
P99 Pursuit: 8 Years of Battling P99 Latency
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
 
Performance Profiling in Rust
Performance Profiling in RustPerformance Profiling in Rust
Performance Profiling in Rust
 
Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016
 
Message Passing Interface (MPI)-A means of machine communication
Message Passing Interface (MPI)-A means of machine communicationMessage Passing Interface (MPI)-A means of machine communication
Message Passing Interface (MPI)-A means of machine communication
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old Secrets
 
Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performance
 
Container Performance Analysis
Container Performance AnalysisContainer Performance Analysis
Container Performance Analysis
 
Apache kafka 확장과 응용
Apache kafka 확장과 응용Apache kafka 확장과 응용
Apache kafka 확장과 응용
 
LBNL Node Health Check Update
LBNL Node Health Check UpdateLBNL Node Health Check Update
LBNL Node Health Check Update
 
Linux Networking Explained
Linux Networking ExplainedLinux Networking Explained
Linux Networking Explained
 
Improved alerting with Prometheus and Alertmanager
Improved alerting with Prometheus and AlertmanagerImproved alerting with Prometheus and Alertmanager
Improved alerting with Prometheus and Alertmanager
 
PostGreSQL Performance Tuning
PostGreSQL Performance TuningPostGreSQL Performance Tuning
PostGreSQL Performance Tuning
 
Aggregator Leaf Tailer: Bringing Data to Your Users with Ultra Low Latency
Aggregator Leaf Tailer: Bringing Data to Your Users with Ultra Low LatencyAggregator Leaf Tailer: Bringing Data to Your Users with Ultra Low Latency
Aggregator Leaf Tailer: Bringing Data to Your Users with Ultra Low Latency
 
Futex Scaling for Multi-core Systems
Futex Scaling for Multi-core SystemsFutex Scaling for Multi-core Systems
Futex Scaling for Multi-core Systems
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance
 

Similar to Introduction to SLURM

Introduction to SLURM
Introduction to SLURMIntroduction to SLURM
Debugging Ruby Systems
Debugging Ruby SystemsDebugging Ruby Systems
Debugging Ruby Systems
Engine Yard
 
Lec7
Lec7Lec7
Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)
Ontico
 
Linux Du Jour
Linux Du JourLinux Du Jour
Linux Du Jour
mwedgwood
 
Solr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachSolr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approach
Alexandre Rafalovitch
 
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Lucidworks
 
Debugging Ruby
Debugging RubyDebugging Ruby
Debugging Ruby
Aman Gupta
 
Linux Performance Tools 2014
Linux Performance Tools 2014Linux Performance Tools 2014
Linux Performance Tools 2014
Brendan Gregg
 
Summit demystifying systemd1
Summit demystifying systemd1Summit demystifying systemd1
Summit demystifying systemd1
Susant Sahani
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
Mark Wong
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
Command Prompt., Inc
 
Os2
Os2Os2
Basics of unix
Basics of unixBasics of unix
Basics of unix
Deepak Singhal
 
Infrastructure review - Shining a light on the Black Box
Infrastructure review - Shining a light on the Black BoxInfrastructure review - Shining a light on the Black Box
Infrastructure review - Shining a light on the Black Box
Miklos Szel
 
Training Slides: 104 - Basics - Working With Command Line Tools
Training Slides: 104 - Basics - Working With Command Line ToolsTraining Slides: 104 - Basics - Working With Command Line Tools
Training Slides: 104 - Basics - Working With Command Line Tools
Continuent
 
Unit 10 investigating and managing
Unit 10 investigating and managingUnit 10 investigating and managing
Unit 10 investigating and managing
root_fibo
 
Designing Tracing Tools
Designing Tracing ToolsDesigning Tracing Tools
Designing Tracing Tools
Brendan Gregg
 
Designing Tracing Tools
Designing Tracing ToolsDesigning Tracing Tools
Designing Tracing Tools
Sysdig
 
When the OS gets in the way
When the OS gets in the wayWhen the OS gets in the way
When the OS gets in the way
Mark Price
 

Similar to Introduction to SLURM (20)

Introduction to SLURM
Introduction to SLURMIntroduction to SLURM
Introduction to SLURM
 
Debugging Ruby Systems
Debugging Ruby SystemsDebugging Ruby Systems
Debugging Ruby Systems
 
Lec7
Lec7Lec7
Lec7
 
Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)
 
Linux Du Jour
Linux Du JourLinux Du Jour
Linux Du Jour
 
Solr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachSolr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approach
 
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
 
Debugging Ruby
Debugging RubyDebugging Ruby
Debugging Ruby
 
Linux Performance Tools 2014
Linux Performance Tools 2014Linux Performance Tools 2014
Linux Performance Tools 2014
 
Summit demystifying systemd1
Summit demystifying systemd1Summit demystifying systemd1
Summit demystifying systemd1
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
 
Os2
Os2Os2
Os2
 
Basics of unix
Basics of unixBasics of unix
Basics of unix
 
Infrastructure review - Shining a light on the Black Box
Infrastructure review - Shining a light on the Black BoxInfrastructure review - Shining a light on the Black Box
Infrastructure review - Shining a light on the Black Box
 
Training Slides: 104 - Basics - Working With Command Line Tools
Training Slides: 104 - Basics - Working With Command Line ToolsTraining Slides: 104 - Basics - Working With Command Line Tools
Training Slides: 104 - Basics - Working With Command Line Tools
 
Unit 10 investigating and managing
Unit 10 investigating and managingUnit 10 investigating and managing
Unit 10 investigating and managing
 
Designing Tracing Tools
Designing Tracing ToolsDesigning Tracing Tools
Designing Tracing Tools
 
Designing Tracing Tools
Designing Tracing ToolsDesigning Tracing Tools
Designing Tracing Tools
 
When the OS gets in the way
When the OS gets in the wayWhen the OS gets in the way
When the OS gets in the way
 

More from CSUC - Consorci de Serveis Universitaris de Catalunya

Novetats a l'Anella Científica, presentació a la TAC24
Novetats a l'Anella Científica, presentació a la TAC24Novetats a l'Anella Científica, presentació a la TAC24
Novetats a l'Anella Científica, presentació a la TAC24
CSUC - Consorci de Serveis Universitaris de Catalunya
 
Aprenent a automatitzar amb la Network eAcademy
Aprenent a automatitzar amb la Network eAcademyAprenent a automatitzar amb la Network eAcademy
Aprenent a automatitzar amb la Network eAcademy
CSUC - Consorci de Serveis Universitaris de Catalunya
 
Cap a l'eficiència total: DevSecOps i entorns basats en contenidors per a una...
Cap a l'eficiència total: DevSecOps i entorns basats en contenidors per a una...Cap a l'eficiència total: DevSecOps i entorns basats en contenidors per a una...
Cap a l'eficiència total: DevSecOps i entorns basats en contenidors per a una...
CSUC - Consorci de Serveis Universitaris de Catalunya
 
Accelera la innovació amb Copilot per Power Platform
Accelera la innovació amb Copilot per Power PlatformAccelera la innovació amb Copilot per Power Platform
Accelera la innovació amb Copilot per Power Platform
CSUC - Consorci de Serveis Universitaris de Catalunya
 
Tendències i futur de l'automatització, IA i IAGen
Tendències i futur de l'automatització, IA i IAGenTendències i futur de l'automatització, IA i IAGen
Tendències i futur de l'automatització, IA i IAGen
CSUC - Consorci de Serveis Universitaris de Catalunya
 
Futurs imaginats i la paradoxa de l'automatització
Futurs imaginats i la paradoxa de l'automatitzacióFuturs imaginats i la paradoxa de l'automatització
Futurs imaginats i la paradoxa de l'automatització
CSUC - Consorci de Serveis Universitaris de Catalunya
 
Tendencias en herramientas de monitorización de redes y modelo de madurez en ...
Tendencias en herramientas de monitorización de redes y modelo de madurez en ...Tendencias en herramientas de monitorización de redes y modelo de madurez en ...
Tendencias en herramientas de monitorización de redes y modelo de madurez en ...
CSUC - Consorci de Serveis Universitaris de Catalunya
 
Quantum Computing Master Class 2024 (Quantum Day)
Quantum Computing Master Class 2024 (Quantum Day)Quantum Computing Master Class 2024 (Quantum Day)
Quantum Computing Master Class 2024 (Quantum Day)
CSUC - Consorci de Serveis Universitaris de Catalunya
 
Publicar dades de recerca amb el Repositori de Dades de Recerca
Publicar dades de recerca amb el Repositori de Dades de RecercaPublicar dades de recerca amb el Repositori de Dades de Recerca
Publicar dades de recerca amb el Repositori de Dades de Recerca
CSUC - Consorci de Serveis Universitaris de Catalunya
 
In sharing we trust. Taking advantage of a diverse consortium to build a tran...
In sharing we trust. Taking advantage of a diverse consortium to build a tran...In sharing we trust. Taking advantage of a diverse consortium to build a tran...
In sharing we trust. Taking advantage of a diverse consortium to build a tran...
CSUC - Consorci de Serveis Universitaris de Catalunya
 
Formació RDM: com fer un pla de gestió de dades amb l’eiNa DMP?
Formació RDM: com fer un pla de gestió de dades amb l’eiNa DMP?Formació RDM: com fer un pla de gestió de dades amb l’eiNa DMP?
Formació RDM: com fer un pla de gestió de dades amb l’eiNa DMP?
CSUC - Consorci de Serveis Universitaris de Catalunya
 
Com pot ajudar la gestió de les dades de recerca a posar en pràctica la ciènc...
Com pot ajudar la gestió de les dades de recerca a posar en pràctica la ciènc...Com pot ajudar la gestió de les dades de recerca a posar en pràctica la ciènc...
Com pot ajudar la gestió de les dades de recerca a posar en pràctica la ciènc...
CSUC - Consorci de Serveis Universitaris de Catalunya
 
Security Human Factor Sustainable Outputs: The Network eAcademy
Security Human Factor Sustainable Outputs: The Network eAcademySecurity Human Factor Sustainable Outputs: The Network eAcademy
Security Human Factor Sustainable Outputs: The Network eAcademy
CSUC - Consorci de Serveis Universitaris de Catalunya
 
The Research Portal of Catalonia: Growing more (information) & more (services)
The Research Portal of Catalonia: Growing more (information) & more (services)The Research Portal of Catalonia: Growing more (information) & more (services)
The Research Portal of Catalonia: Growing more (information) & more (services)
CSUC - Consorci de Serveis Universitaris de Catalunya
 
Facilitar la gestión, visibilidad y reutilización de los datos de investigaci...
Facilitar la gestión, visibilidad y reutilización de los datos de investigaci...Facilitar la gestión, visibilidad y reutilización de los datos de investigaci...
Facilitar la gestión, visibilidad y reutilización de los datos de investigaci...
CSUC - Consorci de Serveis Universitaris de Catalunya
 
La gestión de datos de investigación en las bibliotecas universitarias españolas
La gestión de datos de investigación en las bibliotecas universitarias españolasLa gestión de datos de investigación en las bibliotecas universitarias españolas
La gestión de datos de investigación en las bibliotecas universitarias españolas
CSUC - Consorci de Serveis Universitaris de Catalunya
 
Disposes de recursos il·limitats? Prioritza estratègicament els teus projecte...
Disposes de recursos il·limitats? Prioritza estratègicament els teus projecte...Disposes de recursos il·limitats? Prioritza estratègicament els teus projecte...
Disposes de recursos il·limitats? Prioritza estratègicament els teus projecte...
CSUC - Consorci de Serveis Universitaris de Catalunya
 
Les persones i les seves capacitats en el nucli de la transformació digital. ...
Les persones i les seves capacitats en el nucli de la transformació digital. ...Les persones i les seves capacitats en el nucli de la transformació digital. ...
Les persones i les seves capacitats en el nucli de la transformació digital. ...
CSUC - Consorci de Serveis Universitaris de Catalunya
 
Enginyeria Informàtica: una cursa de fons
Enginyeria Informàtica: una cursa de fonsEnginyeria Informàtica: una cursa de fons
Enginyeria Informàtica: una cursa de fons
CSUC - Consorci de Serveis Universitaris de Catalunya
 
Transformació de rols i habilitats en un món ple d'IA
Transformació de rols i habilitats en un món ple d'IATransformació de rols i habilitats en un món ple d'IA
Transformació de rols i habilitats en un món ple d'IA
CSUC - Consorci de Serveis Universitaris de Catalunya
 

More from CSUC - Consorci de Serveis Universitaris de Catalunya (20)

Novetats a l'Anella Científica, presentació a la TAC24
Novetats a l'Anella Científica, presentació a la TAC24Novetats a l'Anella Científica, presentació a la TAC24
Novetats a l'Anella Científica, presentació a la TAC24
 
Aprenent a automatitzar amb la Network eAcademy
Aprenent a automatitzar amb la Network eAcademyAprenent a automatitzar amb la Network eAcademy
Aprenent a automatitzar amb la Network eAcademy
 
Cap a l'eficiència total: DevSecOps i entorns basats en contenidors per a una...
Cap a l'eficiència total: DevSecOps i entorns basats en contenidors per a una...Cap a l'eficiència total: DevSecOps i entorns basats en contenidors per a una...
Cap a l'eficiència total: DevSecOps i entorns basats en contenidors per a una...
 
Accelera la innovació amb Copilot per Power Platform
Accelera la innovació amb Copilot per Power PlatformAccelera la innovació amb Copilot per Power Platform
Accelera la innovació amb Copilot per Power Platform
 
Tendències i futur de l'automatització, IA i IAGen
Tendències i futur de l'automatització, IA i IAGenTendències i futur de l'automatització, IA i IAGen
Tendències i futur de l'automatització, IA i IAGen
 
Futurs imaginats i la paradoxa de l'automatització
Futurs imaginats i la paradoxa de l'automatitzacióFuturs imaginats i la paradoxa de l'automatització
Futurs imaginats i la paradoxa de l'automatització
 
Tendencias en herramientas de monitorización de redes y modelo de madurez en ...
Tendencias en herramientas de monitorización de redes y modelo de madurez en ...Tendencias en herramientas de monitorización de redes y modelo de madurez en ...
Tendencias en herramientas de monitorización de redes y modelo de madurez en ...
 
Quantum Computing Master Class 2024 (Quantum Day)
Quantum Computing Master Class 2024 (Quantum Day)Quantum Computing Master Class 2024 (Quantum Day)
Quantum Computing Master Class 2024 (Quantum Day)
 
Publicar dades de recerca amb el Repositori de Dades de Recerca
Publicar dades de recerca amb el Repositori de Dades de RecercaPublicar dades de recerca amb el Repositori de Dades de Recerca
Publicar dades de recerca amb el Repositori de Dades de Recerca
 
In sharing we trust. Taking advantage of a diverse consortium to build a tran...
In sharing we trust. Taking advantage of a diverse consortium to build a tran...In sharing we trust. Taking advantage of a diverse consortium to build a tran...
In sharing we trust. Taking advantage of a diverse consortium to build a tran...
 
Formació RDM: com fer un pla de gestió de dades amb l’eiNa DMP?
Formació RDM: com fer un pla de gestió de dades amb l’eiNa DMP?Formació RDM: com fer un pla de gestió de dades amb l’eiNa DMP?
Formació RDM: com fer un pla de gestió de dades amb l’eiNa DMP?
 
Com pot ajudar la gestió de les dades de recerca a posar en pràctica la ciènc...
Com pot ajudar la gestió de les dades de recerca a posar en pràctica la ciènc...Com pot ajudar la gestió de les dades de recerca a posar en pràctica la ciènc...
Com pot ajudar la gestió de les dades de recerca a posar en pràctica la ciènc...
 
Security Human Factor Sustainable Outputs: The Network eAcademy
Security Human Factor Sustainable Outputs: The Network eAcademySecurity Human Factor Sustainable Outputs: The Network eAcademy
Security Human Factor Sustainable Outputs: The Network eAcademy
 
The Research Portal of Catalonia: Growing more (information) & more (services)
The Research Portal of Catalonia: Growing more (information) & more (services)The Research Portal of Catalonia: Growing more (information) & more (services)
The Research Portal of Catalonia: Growing more (information) & more (services)
 
Facilitar la gestión, visibilidad y reutilización de los datos de investigaci...
Facilitar la gestión, visibilidad y reutilización de los datos de investigaci...Facilitar la gestión, visibilidad y reutilización de los datos de investigaci...
Facilitar la gestión, visibilidad y reutilización de los datos de investigaci...
 
La gestión de datos de investigación en las bibliotecas universitarias españolas
La gestión de datos de investigación en las bibliotecas universitarias españolasLa gestión de datos de investigación en las bibliotecas universitarias españolas
La gestión de datos de investigación en las bibliotecas universitarias españolas
 
Disposes de recursos il·limitats? Prioritza estratègicament els teus projecte...
Disposes de recursos il·limitats? Prioritza estratègicament els teus projecte...Disposes de recursos il·limitats? Prioritza estratègicament els teus projecte...
Disposes de recursos il·limitats? Prioritza estratègicament els teus projecte...
 
Les persones i les seves capacitats en el nucli de la transformació digital. ...
Les persones i les seves capacitats en el nucli de la transformació digital. ...Les persones i les seves capacitats en el nucli de la transformació digital. ...
Les persones i les seves capacitats en el nucli de la transformació digital. ...
 
Enginyeria Informàtica: una cursa de fons
Enginyeria Informàtica: una cursa de fonsEnginyeria Informàtica: una cursa de fons
Enginyeria Informàtica: una cursa de fons
 
Transformació de rols i habilitats en un món ple d'IA
Transformació de rols i habilitats en un món ple d'IATransformació de rols i habilitats en un món ple d'IA
Transformació de rols i habilitats en un món ple d'IA
 

Recently uploaded

Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
Pablo Gómez Abajo
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
christinelarrosa
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
zjhamm304
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024
Vadym Kazulkin
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
ScyllaDB
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
Enterprise Knowledge
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Pitangent Analytics & Technology Solutions Pvt. Ltd
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
Fwdays
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
Safe Software
 

Recently uploaded (20)

Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
 

Introduction to SLURM

  • 1. Introduction to SLURM Ismael Fernández Pavón Cristian Gomollón Escribano 08 / 10 / 2019
  • 3. What is SLURM? • Allocates access to resources for some duration of time. • Provides a framework for starting, executing, and monitoring work (normally a parallel job). • Arbitrates contention for resources by managing a queue of pending work. Cluster manager and job scheduler system for large and small Linux clusters.
  • 4. LoadLeveler (IBM) LSF SLURM PBS Pro Resource Managers Scheduler What is SLURM? ALPS (Cray) Torque Maui Moab
  • 5. ✓ Open source ✓ Fault-tolerant ✓ Highly scalable LoadLeveler (IBM) LSF SLURM PBS Pro Resource Managers Scheduler What is SLURM? ALPS (Cray) Torque Maui Moab
  • 7. Node CPU (Core) CPU (Thread) SLURM: Resource Management Nodes: • Baseboards, Sockets, Cores, Threads • CPUs (Core or thread) • Memory size • Generic resources • Features • State − Idle − Completing − Mix − Drain / ing − Alloc − Down
  • 8. SLURM: Resource Management Partitions: • Associatedwith specific set of nodes • Nodes can be in more than one partition • Job size and time limits • Access control list • State information − Up − Drain − Down Partitions
  • 9. Allocated cores SLURM: Resource Management Allocated memory Jobs: • ID (a number) • Name • Time limit • Size specification • Node features required • Other Jobs Dependency • Quality Of Service (QoS) • State (Pending, Running, Suspended, Canceled, Failed, etc.)
  • 10. Core used SLURM: Resource Management Memory used Jobs Step: • ID (a number) • Name • Time limit (maximum) • Size specification • Node features required in allocation
  • 11. SLURM: Resource Management FULL CLUSTER! ✓ Job scheduling
  • 12. SLURM: Job Scheduling Scheduling: The process of determining next job to run and on which resources. FIFO Scheduler Backfill Scheduler Resources Time
  • 13. SLURM: Job Scheduling Scheduling: The process of determining next job to run and on which resources. Backfill Scheduler: • Based on the job request, resources available, and policy limits imposed. • Starts with job priority. • Results in a resource allocation over a period.
  • 14. SLURM: Job Scheduling Backfill Scheduler: • Starts with job priority. Job_priority = site_factor + (PriorityWeightAge) * (age_factor) + (PriorityWeightAssoc) * (assoc_factor) + (PriorityWeightFairshare) * (fair-share_factor) + (PriorityWeightJobSize) * (job_size_factor) + (PriorityWeightPartition) * (partition_factor) + (PriorityWeightQOS) * (QOS_factor) + SUM(TRES_weight_cpu * TRES_factor_cpu, TRES_weight_<type> * TRES_factor_<type>, ...) - nice_factor
  • 15. •sbatch – Submit a batch script to Slurm. •salloc – Request resources to SLURM for an interactive job. •srun – Start a new job step. •scancel – Cancel a job. SLURM: Commands
  • 16. • sinfo – Report system status (nodes, queues, etc.). PARTITION AVAIL TIMELIMIT NODES STATE NODELIST rest up infinite 3 idle~ pirineusgpu[1-2],pirineusknl1 rest up infinite 1 idle canigo2 std* up infinite 11 idle~ pirineus[14,19-20,23,25-26,29-30,33-34,40] std* up infinite 18 mix pirineus[13,15-16,18,21-22,27-28,35,38-39,41-45,48-49] std* up infinite 7 alloc pirineus[17,24,31,36-37,46-47] gpu up infinite 2 alloc pirineusgpu[3-4] knl up infinite 3 idle~ pirineusknl[2-4] mem up infinite 1 mix canigo1 class_a up infinite 8 mix canigo1,pirineus[1-7] class_a up infinite 1 alloc pirineus8 class_b up infinite 8 mix canigo1,pirineus[1-7] class_b up infinite 1 alloc pirineus8 class_c up infinite 8 mix canigo1,pirineus[1-7] class_c up infinite 1 alloc pirineus8 std_curs up infinite 5 idle~ pirineus[9-12,50] gpu_curs up infinite 2 idle~ pirineusgpu[1-2] SLURM: Commands
  • 17. • sinfo – Report system status (nodes, queues, etc.). sinfo -Np class_a -O "Nodelist,Partition,StateLong,CpusState,Memory,Freemem" NODELIST PARTITION STATE CPUS(A/I/O/T) MEMORY FREE_MEM canigo1 class_a mixed 113/79/0/192 3094521 976571 pirineus1 class_a mixed 20/28/0/48 191904 120275 pirineus2 class_a mixed 24/24/0/48 191904 185499 pirineus3 class_a mixed 46/2/0/48 191904 54232 pirineus4 class_a mixed 38/10/0/48 191904 58249 pirineus5 class_a mixed 38/10/0/48 191904 58551 pirineus6 class_a mixed 36/12/0/48 191904 114986 pirineus7 class_a mixed 38/10/0/48 191904 58622 pirineus8 class_a allocated 48/0/0/48 191904 165682 SLURM: Commands
  • 18. 1193936 std g09d1 upceqt04 PD 0:00 1 16 32G (Priority) 1195916 gpu A2B2_APO_n ubator01 PD 0:00 1 24 3900M (Priority) 1195915 gpu A2B2_APO_n ubator01 PD 0:00 1 24 3900M (Priority) 1195920 gpu A2B2_APO_n ubator01 PD 0:00 1 24 3900M (Priority) 1195927 gpu uncleaved_ ubator02 PD 0:00 1 24 3900M (Priority) 1195928 gpu uncleaved_ ubator02 PD 0:00 1 24 3900M (Priority) 1195929 gpu cleaved_wt ubator02 PD 0:00 1 24 3900M (Priority) 1138005 std U98-CuONN1 imoreira PD 0:00 1 12 3998M (Priority) 1195531 std g09d1 upceqt04 PD 0:00 1 16 32G (Priority) 1195532 std g09d1 upceqt04 PD 0:00 1 16 32G (Priority) 1195533 std g09d1 upceqt04 PD 0:00 1 16 32G (Priority) 1195536 std g09d1 upceqt04 PD 0:00 1 16 32G (Priority) 1195597 std sh gomollon R 20:04:04 4 24 6000M pirineus[31,38,44,47] 1195579 class_a rice crag49366 R 6:44:45 1 8 3998M pirineus5 1195576 class_a rice crag49366 R 6:36:48 1 8 3998M pirineus2 1195578 class_a rice crag49366 R 6:37:53 1 8 3998M pirineus4 • squeue – Report job and job step status. SLURM: Commands
  • 19. • scontrol – Administrator tool to view and/or update system, job, step, partition or reservation status. scontrol hold <jobid> scontrol release <jobid> scontrol show job <jobid> SLURM: Commands
  • 20. JobId=1195597 JobName=sh UserId=gomollon(80128) GroupId=csuc(10000) MCS_label=N/A Priority=100176 Nice=0 Account=csuc QOS=test JobState=RUNNING Reason=None Dependency=(null) Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0 RunTime=20:09:58 TimeLimit=5-00:00:00 TimeMin=N/A SubmitTime=2019-10-07T12:21:29 EligibleTime=2019-10-07T12:21:29 StartTime=2019-10-07T12:21:29 EndTime=2019-10-12T12:21:30 Deadline=N/A PreemptTime=None SuspendTime=None SecsPreSuspend=0 Partition=std AllocNode:Sid=login2:20262 ReqNodeList=(null) ExcNodeList=(null) NodeList=pirineus[31,38,44,47] BatchHost=pirineus31 NumNodes=4 NumCPUs=24 NumTasks=24 CPUs/Task=1 ReqB:S:C:T=0:0:*:* TRES=cpu=24,mem=144000M,node=4 Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* MinCPUsNode=1 MinMemoryCPU=6000M MinTmpDiskNode=0 Features=(null) Gres=(null) Reservation=(null) OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) Command=(null) WorkDir=/home/gomollon Power= SLURM: Commands
  • 23. How to launch jobs?
  • 24. Login on CSUC infrastructure • Login ssh –p 2122 username@hpc.csuc.cat • Transferfiles scp -P 2122 local_file username@hpc.csuc.cat:[path to your folder] sftp -oPort=2122 username@hpc.csuc.cat • Useful paths Name Variable Availability Quote/project Time limit Backup /home/$user $HOME global 4 GB unlimited Yes /scratch/$user $SCRATCH global unlimited 30 days No /scratch/$user/tmp/jobid $TMPDIR Local to each node job file limit 1 week No /tmp/$user/jobid $TMPDIR Local to each node job file limit 1 week No • Get HC consumption consum -a ‘any’ (group consumption) consum -a ‘any’ -u ‘nom_usuari’ (user consumption)
  • 25. Batch job submission: Default settings • 4Gb/core (excepting on mem partition). • 24Gb/core on mem partition. • 1 core on std and mem partitions. • 24 cores on gpu partition • The whole node on KNL partition • Non-exclusive, multinode job. • Scratch and Output directory are the submit directory.
  • 26. Batch job submission • Basic Linux commands: Description Command Exemple List files ls ls /home/user Making folders mkdir mkdir /home/prova Changing folder cd cd /home/prova Copy files cp cp nom_arxiu1 nom_arxiu2 Move file mv mv /home/prova.txt /cescascratch/prova.txt Delete file rm rm filename Print file content cat cat filename Find string into files grep grep ‘word’ filename List last lines on file tail tail filename • Text editors : vim, nano, emacs,etc. • More detailed info and options about the commands: ‘command’ –help man ‘command’
  • 27. Scheduler directives/Options • -c, --cpus-per-task=ncpus number of cpus required per task • --gres=list required generic resources • -J, --job-name=jobname name of job • -n, --ntasks=ntasks number of tasks to run • --ntasks-per-node=n number of tasks to invoke on each node • -N, --nodes=N number of nodes on which to run (N = min[-max]) • -o, --output=out file for batch script's standard output • -p, --partition=partition partition requested • -t, --time=minutes time limit (format: dd-hh:mm)
  • 28. • -C, --constraint=list specify a list of constraints(mem, vnc , ....) • --mem=MB minimum amount of total real memory • --reservation=name allocate resources from named reservation • -w, --nodelist=hosts... request a specific list of hosts • --mem-per-cpu=MB amount of real memory per allocated core Scheduler directives/Options
  • 29. #!/bin/bash #SBATCH–jtreball_prova #SBATCH-o treball_prova.log #SBATCH-e treball_prova.err #SBATCH-p std #SBATCH-n 48 module load mpi/intel/openmpi/3.1.0 cp –r $input $SCRATCH Cd $SCRATCH srun $APPLICATION mkdir -p $OUTPUT_DIR cp -r * $output Batch job submission Schedulerdirectives Setting up the environment Move the input files to the working directory Launch the application(similar to mpirun) Create the output folderand move the outputs
  • 30. Gaussian 16 Example #!/bin/bash #SBATCH-j gau16_test #SBATCH-o gau_test_%j.log #SBATCH-e gau_test_%j.err #SBATCH-p std #SBATCH-n 1 #SBATCH-c 16 module load gaussian/g16b1 INPUT_DIR=/$HOME/gaussian_test/inputs OUTPUT_DIR=$HOME/gaussian_test/outputs cd $SCRATCH cp -r $INPUT_DIR/*. g16 < input.gau > output.out mkdir -p $OUTPUT_DIR cp -r * $output
  • 31. Vasp 5.4.4 Example #!/bin/bash #SBATCH-j vasp_test_%j #SBATCH-o vasp_test_%j.log #SBATCH–e vasp_test_%j.err #SBATCH-p std #SBATCH-n 24 module load vasp/5.4.4 INPUT_DIR=/$HOME/vasp_test/inputs OUTPUT_DIR=$HOME/vasp_test/outputs cd $SCRATCH cp -r $INPUT_DIR/*. srun `which vasp_std` mkdir -p $OUTPUT_DIR cp -r * $output
  • 32. Gromacs Example #!/bin/bash #SBATCH--job-name=gromacs #SBATCH--output=gromacs_%j.out #SBATCH--error=gromacs_%j.err #SBATCH-n 24 #SBATCH--gres=gpu:2 #SBATCH-N 1 #SBATCH-p gpu #SBATCH-c 2 #SBATCH--time=00:30:00 module load gromacs/2018.4_mpi cd $SHAREDSCRATCH cp -r $HOME/SLMs/gromacs/CASE/*. srun `which gmx_mpi`mdrun -v -deffnm input_system -ntomp $SLURM_CPUS_PER_TASK -nb gpu -npme 12 -dlb yes -pin on –gpu_id 01 cp –r * /scratch/$USER/gromacs/CASE/output/
  • 33. ANSYS Fluent Example #!/bin/bash #SBATCH-j truck.cas #SBATCH-o truck.log #SBATCH-e truck.err #SBATCH-p std #SBATCH-n 16 module load toolchains/gcc_mkl_ompi INPUT_DIR=$HOME/FLUENT/inputs OUTPUT_DIR=$HOME/FLUENT/outputs cd $SCRATCH cp -r $INPUT_DIR/*. `/prod/ANSYS16/v162/fluent/bin/fluent3ddp –t $SLURM_NCPUS -mpi=hp -g -i input1_50.txt mkdir -p $OUTPUT_DIR cp -r * $output
  • 34. Best Practices • Use $SCRATCHas workingdirectory. • Move only the necessaryfiles(notall files in the folder each time). • Try to keep importantfiles only at $HOME • Try to choose the partition and resoruces whose mostfit to your job.