SlideShare a Scribd company logo
PRESENTED BY:
Introduction to HPC
Interacting with High Performance
Computing Systems
6/7/18 1
Virginia Trueheart, MSIS
Texas Advanced Computing Center
vtrueheart@tacc.utexas.edu
An Overview of HPC: What is it?
High Performance Computing
• Parallel processing for advanced computation
• A “Supercomputer”, “Large Scale System”, or “Cluster”
The same parts as your laptop
• Processors, coprocessors, memory, operating system, etc.
• Specialized for scale & efficiency
Scale and Speed
• Thousands of nodes
• High bandwidth, low latency network for large scale I/O
6/7/18 2
An Overview of HPC: Stampede2
• Peak performance: 18 PF, rank 12 in Top 500 (2017)
• 4,200 68-core Knights Landing (KNL) nodes
• 1,736 48-core Skylake (SKX) nodes
• 368,928 cores and 736,512GB memory in total
• Interconnect: Intel’s Omni-Path Fabric Network
• Three Lustre Filesystems
• Funded by NSF through grant #ACI-1134872
6/7/18 3
An Overview of HPC: Architecture
6/7/18 4
idev
Internet
ssh
login
node
knl-
nodes
skx-
nodes
sbatch
omnipath
STAMPEDE 2
$HOME $WORK $SCRATCH
Ex: SKX Compute Node
6/7/18 5
Model Intel Xeon Platinum 8160 ("Skylake")
Cores Per Node 48 cores on two sockets (24 cores/socket)
Hardware Threads per
Core
2
Hardware Threads per
Node
96
Clock Rate 2.1Ghz
RAM 192GB
Cache 57MB per socket
Local Storage 144GB /tmp partition
Ex: Physical Layout
KNL Node (68 cores per node) SKX Node (24 cores/socket * 2)
6/7/18 6
6/7/18 7
c455-012[knl](1001)$less /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 87
model name : Intel(R) Xeon Phi(TM) CPU 7250 @ 1.40GHz
stepping : 1
microcode : 0x1ac
cpu MHz : 1496.140
cache size : 1024 KB
physical id : 0
siblings : 272
core id : 2
cpu cores : 68
apicid : 8
initial apicid : 8
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
6/7/18 8
c455-012[knl](1001)$ numactl -H
available: 1 nodes (0)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79
80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104
105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124
125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144
145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164
165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184
185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204
205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224
225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244
245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264
265 266 267 268 269 270 271
node 0 size: 98207 MB
node 0 free: 91727 MB
node distances:
node 0
0: 10
An Overview of HPC: Using a System
Why would I use HPC resources?
• Large scale problems
• Parallelization and Efficiency
• Collaboration
How do I find an HPC resource?
• Check with your institution
• Check with national scientific groups (NSF in the US)
6/7/18 9
Overview: What Can I Find on a
System
Modules and Software
• Basic compilers and libraries
• Popular packages
• Licensed software
Build Your Own!
• github or other direct sources
• pip, wget, curl, etc
• You won’t have sudo access
6/7/18 10
What Do I Need to Get Started?
• User Account
• Allocation & Project
• Two Factor Authentication
6/7/18 11
Please See Your Handouts!
• Username
• Password
• Temporary TFA Key
6/7/18 12
SSH Protocols
Secure Shell
• Encrypted network protocol to access a secure system over an
unsecured network
• automatically generated public-private key pairs
• Your Wi-Fi Stampede2 or other secure machine
• File transfers (scp & rsync)
Options
• .ssh/config
• Host & Username
• Make connecting easier
• Passwordless Login
6/7/18 13
Logging In (Mac Terminal)
$ ssh <username>@stampede2.tacc.utexas.edu
To access the system:
1) If not using ssh-keys, please enter your TACC password at the password prompt
2) At the TACC Token prompt, enter your 6-digit code followed by <return>.
Password:
TACC Token Code:
6/7/18 14
Logging in (PuTTY pt. 1)
6/7/18 15
Logging in (PuTTY pt. 2)
6/7/18 16
Welcome to Stampede2, *please* read these important system notes:
--> Stampede2, Phase 2 Skylake nodes are now available for jobs
--> Stampede2 user documentation is available at:
https://portal.tacc.utexas.edu/user-guides/stampede2
----------------------- Project balances for user vtrue -----------------------
| Name Avail SUs Expires | |
| A-ccsc 189624 2018-12-31 | |
------------------------- Disk quotas for user vtrue --------------------------
| Disk Usage (GB) Limit %Used File Usage Limit %Used |
| /home1 1.9 10.0 19.43 39181 200000 19.59 |
| /work 311.8 1024.0 30.45 225008 3000000 7.50 |
| /scratch 0.0 0.0 0.00 4 0 0.00 |
-------------------------------------------------------------------------------
6/7/18 17
Where am I?
Login Nodes
• Manage files
• Build software
• Submit, monitor and manage jobs
Compute Nodes
• Running jobs
• Testing applications
6/7/18 18
Welcome to Stampede2, *please* read these important system notes:
--> Stampede2, Phase 2 Skylake nodes are now available for jobs
--> Stampede2 user documentation is available at:
https://portal.tacc.utexas.edu/user-guides/stampede2
----------------------- Project balances for user vtrue -----------------------
| Name Avail SUs Expires | |
| A-ccsc 189624 2018-12-31 | |
------------------------- Disk quotas for user vtrue --------------------------
| Disk Usage (GB) Limit %Used File Usage Limit %Used |
| /home1 1.9 10.0 19.43 39181 200000 19.59 |
| /work 311.8 1024.0 30.45 225008 3000000 7.50 |
| /scratch 0.0 0.0 0.00 4 0 0.00 |
-------------------------------------------------------------------------------
6/7/18 19
Allocations
Active Project with a Project Instructor attached
Service Units (SUs)
• SUs billed (node-hrs) = ( # nodes ) x (wall clock hours ) x ( charge
rate per node-hour )
Shared Systems
• Be a good citizen
6/7/18 20
Welcome to Stampede2, *please* read these important system notes:
--> Stampede2, Phase 2 Skylake nodes are now available for jobs
--> Stampede2 user documentation is available at:
https://portal.tacc.utexas.edu/user-guides/stampede2
----------------------- Project balances for user vtrue -----------------------
| Name Avail SUs Expires | |
| A-ccsc 189624 2018-12-31 | |
------------------------- Disk quotas for user vtrue --------------------------
| Disk Usage (GB) Limit %Used File Usage Limit %Used |
| /home1 1.9 10.0 19.43 39181 200000 19.59 |
| /work 311.8 1024.0 30.45 225008 3000000 7.50 |
| /scratch 0.0 0.0 0.00 4 0 0.00 |
-------------------------------------------------------------------------------
6/7/18 21
Filesystems
Division of Labor
• Linux Cluster (Lustre) system that look like a single hard disk space
• Small I/O is hard on the system
• Striping large data (OST, MDS)
Partitions
• $HOME: 10GB, $WORK: 1TB, $SCRATCH: Unlimited
• Shared system
6/7/18 22
6/7/18 23
Filesystems: Cont.
Where am I?
• pwd – print working directory
• cd – change directory
• cd .. – move up one directory
New Files
• mkdir – make directory
• Editors – vi(m), nano, emacs
• mv – move a file to another location
6/7/18 24
Create a File
login1.stampede2$ cd $WORK
login1.stampede2$ pwd
/work/03658/vtrue/stampede2
login1.stampede2$ nano helloWorld.py
6/7/18 25
6/7/18 26
#!/usr/bin/env python
"""
Hello World
"""
import datetime as DT
today = DT.datetime.today()
print "Hello World! Today is:"
print today.strftime("%d %b %Y")
A Very Small File
Run an Interactive Job
idev
• Interactive development queue access command
• Watch your code run live
• Test things in real time
• idev –help for options
idev will drop you directly into the knl development queue so be
aware of your location on the system.
6/7/18 27
helloWorld.py
staff.stampede2(1005)$ idev
-> Checking on the status of development queue. OK
-> Defaults file : ~/.idevrc
-> System : stampede2
-> Queue : development (idev default )
[...]
c455-012[knl](1019)$
6/7/18 28
helloWorld.py
staff.stampede2(1005)$ idev
-> Checking on the status of development queue. OK
-> Defaults file : ~/.idevrc
-> System : stampede2
-> Queue : development (idev default )
[...]
c455-012[knl](1019)$ python helloWorld.py
Hello World! Today is:
17 Jun 2018
c455-012[knl](1020)$
6/7/18 29
Types of Code
Serial Code
• Albeit a very, very small one
• Single tasks, one after the other
• Single node/single core
Parallel Code
• Array or “embarrassingly parallel” jobs
• Many node/many core
• Uses MPI
• Hybrid codes
6/7/18 30
Message Passing Interface
ibrun isTACC specific
• “Wrapper” for mpirun
• Execute serial and parallel jobs across the entire node
MPI Functions
• Allows communication between all cores and all nodes
• Move data between parts of the job that need it
• Point-to-Point or Collective Communication
6/7/18 31
Ex: MPI Communication
Point-to-Point Collective
proc 0
proc 1
proc 2
proc 3
proc 4
6/7/18 32
proc 0
proc 3
proc 7
proc 4
proc 9
#!/usr/bin/env python
"""
Parallel Hello World
"""
from mpi4py import MPI
import sys
size = MPI.COMM_WORLD.Get_size()
rank = MPI.COMM_WORLD.Get_rank()
name = MPI.Get_processor_name()
sys.stdout.write(
"Hello, World! I am process %d of %d on %s.n"
% (rank, size, name))
6/7/18 33
Parallel helloWorld.py
c455-012[knl](1019)$ ibrun python helloParallel.py
TACC: Starting up job 1595632
TACC: Starting parallel tasks...
Hello, World! I am process 1 of 68 on c456-042.stampede2.tacc.utexas.edu.
Hello, World! I am process 49 of 68 on c456-042.stampede2.tacc.utexas.edu.
Hello, World! I am process 66 of 68 on c456-042.stampede2.tacc.utexas.edu.
Hello, World! I am process 67 of 68 on c456-042.stampede2.tacc.utexas.edu.
Hello, World! I am process 64 of 68 on c456-042.stampede2.tacc.utexas.edu.
...
TACC: Shutdown complete. Exiting.
6/7/18 34
Submitting a Job
Why submit?
• Larger jobs, more nodes
• You don’t have to watch it in real time
• Run multiple jobs simultaneously
Queues
• Pick the queues that suit your needs
• Don’t request more resources than you need
• Remember this is a shared resource
Never run on a login node!
6/7/18 35
6/7/18 36
Queue Name Node Type
Max Nodes per
Job
Max Duration
Max Jobs in
Queue
Charge Rate
development KNL cache-quad 16 nodes 2hrs 1 1SU
normal KNL cache-quad 256 nodes 48hrs 50 1SU
large** KNL cache-quad 2048 nodes 48hrs 5 1SU
long KNL cache-quad 32 nodes 96hrs 2 1SU
flat-
quadrant
KNL flat-quad 24 nodes 48hrs 2 1SU
skx-dev SKX 4 nodes 2hrs 1 1SU
skx-normal SKX 128 nodes 48hrs 25 1SU
skx-large** SKX 868 nodes 48hrs 3 1SU
Submitting a Job cont.
sbatch
• Simple Linux Utility for Resource Management (SLURM)
• Linux/Unix workload manager
• Allocates resources
• Executes and monitors jobs
• Evaluates and manages pending jobs
Using a Scheduler
• Gets you off of the login nodes (shared resource)
• Means you can walk away and do other things
6/7/18 37
Submission Options
6/7/18 38
Option Argument Comments
-p queue_name Submits to queue (partition) designated by queue_name
-J job_name Job Name
-N total_nodes Required. Define the resources you need by specifying either:
(1) "-N" and "-n"; or (2) "-N" and "--ntasks-per-node".
-n total_tasks This is total MPI tasks in this job. When using this option in a non-MPI job, it is
usually best to set it to the same value as "-N".
-t hh:mm:ss Required. Wall clock time for job.
-o output_file Direct job standard output to output_file (without -e option error goes to this file)
-e error_file Direct job error output to error_file
-d= afterok:jobid Dependency: this run will start only after the specified job successfully finishes
-A projectnumber Charge job to the specified project/allocation number.
Parallel Job#!/bin/bash
#SBATCH -J myJob # Job name
#SBATCH -o myJob.o%j # Name of stdout output file
#SBATCH -e myJob.e%j # Name of stderr error file
#SBATCH -p development # Queue (partition) name
#SBATCH -N 1 # Total # of nodes
#SBATCH -n 68 # Total # of mpi tasks
#SBATCH -t 00:05:00 # Run time (hh:mm:ss)
#SBATCH -A myproject # Allocation name (req'd if you have more than 1)
#SBATCH --mail-user=hkang@austin.utexas.edu
#SBATCH --mail-type=all # Send email at begin and end of job
# Other commands must follow all #SBATCH directives...
module list
pwd
date
# Launch code...
ibrun python helloParallel.py
6/7/18 39
Managing Your Jobs
qlimits – all queues restrictions
sinfo – monitor queues in real time
squeue – monitor jobs in real time
showq – similar output to squeue
scancel – manually cancel a job
scontrol – detailed information about the configuration of a job
sacct – accounting data about your jobs
6/7/18 40
staff.stampede2(1009)$ squeue -u vtrue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
1604426 development idv20717 vtrue R 16:57 1 c455-001
staff.stampede2(1010)$ scontrol show job=1604426
JobId=1604426 JobName=idv20717
UserId=vtrue(829572) GroupId=G-815499(815499) MCS_label=N/A
Priority=400 Nice=0 Account=A-ccsc QOS=normal
JobState=RUNNING Reason=None Dependency=(null)
Requeue=0 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
RunTime=00:18:08 TimeLimit=00:30:00 TimeMin=N/A
SubmitTime=2018-06-09T21:27:33 EligibleTime=2018-06-09T21:27:33
StartTime=2018-06-09T21:27:36 EndTime=2018-06-09T21:57:36 Deadline=N/A
PreemptTime=None SuspendTime=None SecsPreSuspend=0
LastSchedEval=2018-06-09T21:27:36
...
6/7/18 41
Accessing a Compute Node
No Job Running
staff(1003)$ ssh c455-001
Access denied: user vtrue
(uid=829572) has no active jobs
on this node.
Authentication failed.
Job Running
staff(1002)$ ssh c455-032
Last login: Fri Jun 15 15:46:04
2018 from
staff.stampede2.tacc.utexas.edu
TACC Stampede2 System
Provisioned on 24-May-2017 at
11:49
c455-032[knl](1001)$
6/7/18 42
On Node Monitoring
cat /proc/cpuinfo
• Follow a read out of the cpu info on the node
top
• See all of the processes running and which are consuming the most
resources
free –g
• Basic print out of memory consumption
Remora
• This is an open source tool developed by TACC that can help you
track memory, cpu usage, I/O activity, and other options
6/7/18 43
What else is on the Login Node?
Modules
• Find software, compilers, dependent packages
Environments
• Paths, personalizations, licenses
Building Software
• Install what you need, compile, update
6/7/18 44
Modules
Modules
• TACC uses a tool called Lmod
• Add, remove, and swap software packages
• Saves you from having to build your own
Commands
• module spider <package> - search for a package
• module list – see currently loaded modules
• module avail – list all available packages
• module load <package> - load a specific package or version
6/7/18 45
6/7/18 46
staff.stampede2(1073)$ module list
Currently Loaded Modules:
1) intel/17.0.4 2) impi/17.0.3 3) git/2.9.0 4) autotools/1.1 5)
xalt/1.7.7 6) TACC 7) python2/2.7.14
staff.stampede2(1078)$ module spider python
----------------------------------------------------------------------------------------
python:
----------------------------------------------------------------------------------------
Versions:
python/2.7.13
Other possible modules matches:
python2 python3
----------------------------------------------------------------------------------------
To find other possible module matches execute:
$ module -r spider '.*python.*'
6/7/18 47
staff.stampede2(1073)$ module spider python3/3.6.4
-------------------------------------------------------------------------------------
python3: python3/3.6.4
-------------------------------------------------------------------------------------
Description:
scientific scripting package
You will need to load all module(s) on any one of the lines below before the
"python3/3.6.4" module is available to load.
intel/17.0.4
Help:
This is the Python3 package built on March 01, 2018.
You can install your own modules (choose one method):
1. python3 setup.py install --user
2. python3 setup.py install --home=<dir>
3. pip3 install --user module-name
Version 3.6.4
Environment Management
env – Read out of all the environment variables set
Look for something specific:
staff.stampede2(1017)$ env | grep GIT
TACC_GIT_BIN=/opt/apps/git/2.9.0/bin
TACC_GIT_DIR=/opt/apps/git/2.9.0
TACC_GIT_LIB=/opt/apps/git/2.9.0/lib
GIT_TEMPLATE_DIR=/opt/apps/git/2.9.0/share/git-core/templates
GIT_EXEC_PATH=/opt/apps/git/2.9.0/libexec/git-core
6/7/18 48
Environment Paths
$echo $PATH
/opt/apps/xalt/1.7.7/bin:/opt/ap
ps/intel17/python/2.7.13/bin:/op
t/apps/autotools/1.1/bin:/opt/ap
ps/git/2.9.0/bin:/tmprpm/intel17
/impi/17.0.3/bin:/opt/intel/comp
ilers_and_libraries_2017.4.196/l
inux/mpi/intel64/bin:/opt/intel/
compilers_and_libraries_2017.4.1
96/linux/bin/intel64:/opt/apps/g
cc/5.4.0/bin:/usr/lib64/qt-
3.3/bin:/usr/local/bin:/bin:/usr
/bin:/opt/dell/srvadmin/bin:.
$echo $LD_LIBRARY_PATH
/opt/apps/intel17/python/2.7.13/lib:/opt/in
tel/compilers_and_libraries_2017.4.196/linu
x/mpi/intel64/lib:/opt/intel/debugger_2017/
libipt/intel64/lib:/opt/intel/debugger_2017
/iga/lib:/opt/intel/compilers_and_libraries
_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.
4:/opt/intel/compilers_and_libraries_2017.4
.196/linux/daal/lib/intel64_lin:/opt/intel/
compilers_and_libraries_2017.4.196/linux/tb
b/lib/intel64/gcc4.7:/opt/intel/compilers_a
nd_libraries_2017.4.196/linux/mkl/lib/intel
64_lin:/opt/intel/compilers_and_libraries_2
017.4.196/linux/compiler/lib/intel64_lin:/o
pt/intel/compilers_and_libraries_2017.4.196
/linux/ipp/lib/intel64:/opt/intel/compilers
_and_libraries_2017.4.196/linux/compiler/li
b/intel64:/opt/apps/gcc/5.4.0/lib64:/opt/ap
ps/gcc/5.4.0/lib
6/7/18 49
Diagnosing Your Environment
$module load sanitytool
$sanitycheck
Sanity Tool Version: 1.3
1: Check SSH permissions:
Passed
2: Check SSH keys:
Passed
3: Check environment variables (e.g. HOME, WORK, SCRATCH) and
file system access:
Passed
...
6/7/18 50
.bashrc (or shell appropriate script)
• Set default modules
• Set custom paths permanently
• Change command line prompt
• Set aliases
• Change Umask settings for sharing files
• Enable startup script tracking
6/7/18 51
6/7/18 52
alias priority='squeue -t pending -o "%Q %.7i %.9P %.8j %.2t %.10M %.6D
%r %B %S" --sort="-p"'
alias qalloc='squeue -A UT-2015-05-18 -o "%.18i %.9P %.9G %.16a %.6D
%.20S %.8M %.10L %.10Q"'
alias qalloct='squeue -i 5 -A UT-2015-05-18 -o "%.18i %.9P %.9G %.16a
%.8u %.6D %.20S %.8M %.10L
%.10Q"'
alias tacc_jobs='cat
/scratch/projects/tacc_stats/accounting/tacc_jobs_completed | grep'
alias qstat='squeue -o "%.18i %.12P %.9u %.9G %.16a %.6D %.20S %.10M
%.10L %.10Q %.25V"'
alias nstat='echo A/I/O/T: Allocated/Idle/Other/Total; sinfo -o "%20P
%5a %.10l %16F"'
Building and Installing Software
Python
• pip install packageName --user
GitHub
• module load git
• git clone https://full/file/path
Direct Sources
• wget https://path/to/tarball.tar.gz
• tar -xvf tarball.tar.gz
6/7/18 53
Ex: vcftools
$ cdw
$ mkdir vcftools
$ cd vcftools
$ git clone https://github.com/vcftools/vcftools.git
$ ./configure --prefix=$WORK/Tools
$ make
$ make install
$ export PATH=$PATH:$WORK/Tools/bin
$ vcftools
VCFtools (0.1.15)
© Adam Auton and Anthony Marcketta 2009
6/7/18 54
What Else?
Any Software
• Build from source; get as complicated as you want
Customize login
• Modify .ssh/config on your local machine to meet your needs
Customize Editors
• Bring in outside configuration files (colors, layout, etc)
6/7/18 55
.ssh/config
Host s.s2
HostName staff.stampede2.tacc.utexas.edu
User vtrue
ServerAliveInterval 60
ForwardX11 yes
Host s.ls5
HostName staff.ls5.tacc.utexas.edu
User vtrue
ServerAliveInterval 60
ForwardX11 yes
6/7/18 56
6/7/18 57
Q&A
Questions Answered and Demonstrations Provided
6/7/18 58
Further Information
Main Website: www.tacc.utexas.edu
User Support: www.portal.tacc.utexas.edu
Email: info@tacc.utexas.edu

More Related Content

What's hot

Systems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting StartedSystems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting Started
Brendan Gregg
 
Known basic of NFV Features
Known basic of NFV FeaturesKnown basic of NFV Features
Known basic of NFV Features
Raul Leite
 
Velocity 2017 Performance analysis superpowers with Linux eBPF
Velocity 2017 Performance analysis superpowers with Linux eBPFVelocity 2017 Performance analysis superpowers with Linux eBPF
Velocity 2017 Performance analysis superpowers with Linux eBPF
Brendan Gregg
 
Linux Kernel Crashdump
Linux Kernel CrashdumpLinux Kernel Crashdump
Linux Kernel Crashdump
Marian Marinov
 
Kernel crashdump
Kernel crashdumpKernel crashdump
Kernel crashdump
Adrien Mahieux
 
1 m+ qps on mysql galera cluster
1 m+ qps on mysql galera cluster1 m+ qps on mysql galera cluster
1 m+ qps on mysql galera cluster
OlinData
 
Percona Live UK 2014 Part III
Percona Live UK 2014  Part IIIPercona Live UK 2014  Part III
Percona Live UK 2014 Part III
Alkin Tezuysal
 
Introduction to eBPF and XDP
Introduction to eBPF and XDPIntroduction to eBPF and XDP
Introduction to eBPF and XDP
lcplcp1
 
Mathematics and development of fast TLS handshakes
Mathematics and development of fast TLS handshakesMathematics and development of fast TLS handshakes
Mathematics and development of fast TLS handshakes
Alexander Krizhanovsky
 
Openstack Testbed_ovs_virtualbox_devstack_single node
Openstack Testbed_ovs_virtualbox_devstack_single nodeOpenstack Testbed_ovs_virtualbox_devstack_single node
Openstack Testbed_ovs_virtualbox_devstack_single node
Yongyoon Shin
 
Intro to linux performance analysis
Intro to linux performance analysisIntro to linux performance analysis
Intro to linux performance analysis
Chris McEniry
 
Debugging the Cloud Foundry Routing Tier
Debugging the Cloud Foundry Routing TierDebugging the Cloud Foundry Routing Tier
Debugging the Cloud Foundry Routing Tier
VMware Tanzu
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016
Brendan Gregg
 
Kernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPFKernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPF
Brendan Gregg
 
OakTable World Sep14 clonedb
OakTable World Sep14 clonedb OakTable World Sep14 clonedb
OakTable World Sep14 clonedb
Connor McDonald
 
Secure lustre on openstack
Secure lustre on openstackSecure lustre on openstack
Secure lustre on openstack
James Beal
 
Container Performance Analysis
Container Performance AnalysisContainer Performance Analysis
Container Performance Analysis
Brendan Gregg
 
Mysql56 replication
Mysql56 replicationMysql56 replication
Mysql56 replication
Chris Makayal
 
Where Did My Cpu Go?
Where Did My Cpu Go?Where Did My Cpu Go?
Where Did My Cpu Go?
Enkitec
 
Introduction to Perf
Introduction to PerfIntroduction to Perf
Introduction to Perf
Wang Hsiangkai
 

What's hot (20)

Systems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting StartedSystems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting Started
 
Known basic of NFV Features
Known basic of NFV FeaturesKnown basic of NFV Features
Known basic of NFV Features
 
Velocity 2017 Performance analysis superpowers with Linux eBPF
Velocity 2017 Performance analysis superpowers with Linux eBPFVelocity 2017 Performance analysis superpowers with Linux eBPF
Velocity 2017 Performance analysis superpowers with Linux eBPF
 
Linux Kernel Crashdump
Linux Kernel CrashdumpLinux Kernel Crashdump
Linux Kernel Crashdump
 
Kernel crashdump
Kernel crashdumpKernel crashdump
Kernel crashdump
 
1 m+ qps on mysql galera cluster
1 m+ qps on mysql galera cluster1 m+ qps on mysql galera cluster
1 m+ qps on mysql galera cluster
 
Percona Live UK 2014 Part III
Percona Live UK 2014  Part IIIPercona Live UK 2014  Part III
Percona Live UK 2014 Part III
 
Introduction to eBPF and XDP
Introduction to eBPF and XDPIntroduction to eBPF and XDP
Introduction to eBPF and XDP
 
Mathematics and development of fast TLS handshakes
Mathematics and development of fast TLS handshakesMathematics and development of fast TLS handshakes
Mathematics and development of fast TLS handshakes
 
Openstack Testbed_ovs_virtualbox_devstack_single node
Openstack Testbed_ovs_virtualbox_devstack_single nodeOpenstack Testbed_ovs_virtualbox_devstack_single node
Openstack Testbed_ovs_virtualbox_devstack_single node
 
Intro to linux performance analysis
Intro to linux performance analysisIntro to linux performance analysis
Intro to linux performance analysis
 
Debugging the Cloud Foundry Routing Tier
Debugging the Cloud Foundry Routing TierDebugging the Cloud Foundry Routing Tier
Debugging the Cloud Foundry Routing Tier
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016
 
Kernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPFKernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPF
 
OakTable World Sep14 clonedb
OakTable World Sep14 clonedb OakTable World Sep14 clonedb
OakTable World Sep14 clonedb
 
Secure lustre on openstack
Secure lustre on openstackSecure lustre on openstack
Secure lustre on openstack
 
Container Performance Analysis
Container Performance AnalysisContainer Performance Analysis
Container Performance Analysis
 
Mysql56 replication
Mysql56 replicationMysql56 replication
Mysql56 replication
 
Where Did My Cpu Go?
Where Did My Cpu Go?Where Did My Cpu Go?
Where Did My Cpu Go?
 
Introduction to Perf
Introduction to PerfIntroduction to Perf
Introduction to Perf
 

Similar to Full PPT Stack

Trends in Systems and How to Get Efficient Performance
Trends in Systems and How to Get Efficient PerformanceTrends in Systems and How to Get Efficient Performance
Trends in Systems and How to Get Efficient Performance
inside-BigData.com
 
Macy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-FlightMacy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-Flight
DataStax Academy
 
Analyzing OS X Systems Performance with the USE Method
Analyzing OS X Systems Performance with the USE MethodAnalyzing OS X Systems Performance with the USE Method
Analyzing OS X Systems Performance with the USE Method
Brendan Gregg
 
SCADA Strangelove: Hacking in the Name
SCADA Strangelove: Hacking in the NameSCADA Strangelove: Hacking in the Name
SCADA Strangelove: Hacking in the Name
Positive Hack Days
 
SCADA Strangelove: взлом во имя
SCADA Strangelove: взлом во имяSCADA Strangelove: взлом во имя
SCADA Strangelove: взлом во имя
Ekaterina Melnik
 
Fine grained monitoring
Fine grained monitoringFine grained monitoring
Fine grained monitoring
Iben Rodriguez
 
uCluster
uClusteruCluster
TRex Traffic Generator - Hanoch Haim
TRex Traffic Generator - Hanoch HaimTRex Traffic Generator - Hanoch Haim
TRex Traffic Generator - Hanoch Haim
harryvanhaaren
 
Using VPP and SRIO-V with Clear Containers
Using VPP and SRIO-V with Clear ContainersUsing VPP and SRIO-V with Clear Containers
Using VPP and SRIO-V with Clear Containers
Michelle Holley
 
IO_Analysis_with_SAR.ppt
IO_Analysis_with_SAR.pptIO_Analysis_with_SAR.ppt
IO_Analysis_with_SAR.ppt
cookie1969
 
Troubleshooting Complex Performance issues - Oracle SEG$ contention
Troubleshooting Complex Performance issues - Oracle SEG$ contentionTroubleshooting Complex Performance issues - Oracle SEG$ contention
Troubleshooting Complex Performance issues - Oracle SEG$ contention
Tanel Poder
 
Intro to HPC
Intro to HPCIntro to HPC
Intro to HPC
Wendi Sapp
 
OOW 2013: Where did my CPU go
OOW 2013: Where did my CPU goOOW 2013: Where did my CPU go
OOW 2013: Where did my CPU go
Kristofferson A
 
test
testtest
LISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceLISA2019 Linux Systems Performance
LISA2019 Linux Systems Performance
Brendan Gregg
 
OSDC 2017 - Werner Fischer - Linux performance profiling and monitoring
OSDC 2017 - Werner Fischer - Linux performance profiling and monitoringOSDC 2017 - Werner Fischer - Linux performance profiling and monitoring
OSDC 2017 - Werner Fischer - Linux performance profiling and monitoring
NETWAYS
 
My First 100 days with an Exadata (PPT)
My First 100 days with an Exadata (PPT)My First 100 days with an Exadata (PPT)
My First 100 days with an Exadata (PPT)
Gustavo Rene Antunez
 
Design installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttuDesign installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttu
Alan Sill
 
16aug06.ppt
16aug06.ppt16aug06.ppt
16aug06.ppt
zagreb2
 
RedGateWebinar - Where did my CPU go?
RedGateWebinar - Where did my CPU go?RedGateWebinar - Where did my CPU go?
RedGateWebinar - Where did my CPU go?
Kristofferson A
 

Similar to Full PPT Stack (20)

Trends in Systems and How to Get Efficient Performance
Trends in Systems and How to Get Efficient PerformanceTrends in Systems and How to Get Efficient Performance
Trends in Systems and How to Get Efficient Performance
 
Macy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-FlightMacy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-Flight
 
Analyzing OS X Systems Performance with the USE Method
Analyzing OS X Systems Performance with the USE MethodAnalyzing OS X Systems Performance with the USE Method
Analyzing OS X Systems Performance with the USE Method
 
SCADA Strangelove: Hacking in the Name
SCADA Strangelove: Hacking in the NameSCADA Strangelove: Hacking in the Name
SCADA Strangelove: Hacking in the Name
 
SCADA Strangelove: взлом во имя
SCADA Strangelove: взлом во имяSCADA Strangelove: взлом во имя
SCADA Strangelove: взлом во имя
 
Fine grained monitoring
Fine grained monitoringFine grained monitoring
Fine grained monitoring
 
uCluster
uClusteruCluster
uCluster
 
TRex Traffic Generator - Hanoch Haim
TRex Traffic Generator - Hanoch HaimTRex Traffic Generator - Hanoch Haim
TRex Traffic Generator - Hanoch Haim
 
Using VPP and SRIO-V with Clear Containers
Using VPP and SRIO-V with Clear ContainersUsing VPP and SRIO-V with Clear Containers
Using VPP and SRIO-V with Clear Containers
 
IO_Analysis_with_SAR.ppt
IO_Analysis_with_SAR.pptIO_Analysis_with_SAR.ppt
IO_Analysis_with_SAR.ppt
 
Troubleshooting Complex Performance issues - Oracle SEG$ contention
Troubleshooting Complex Performance issues - Oracle SEG$ contentionTroubleshooting Complex Performance issues - Oracle SEG$ contention
Troubleshooting Complex Performance issues - Oracle SEG$ contention
 
Intro to HPC
Intro to HPCIntro to HPC
Intro to HPC
 
OOW 2013: Where did my CPU go
OOW 2013: Where did my CPU goOOW 2013: Where did my CPU go
OOW 2013: Where did my CPU go
 
test
testtest
test
 
LISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceLISA2019 Linux Systems Performance
LISA2019 Linux Systems Performance
 
OSDC 2017 - Werner Fischer - Linux performance profiling and monitoring
OSDC 2017 - Werner Fischer - Linux performance profiling and monitoringOSDC 2017 - Werner Fischer - Linux performance profiling and monitoring
OSDC 2017 - Werner Fischer - Linux performance profiling and monitoring
 
My First 100 days with an Exadata (PPT)
My First 100 days with an Exadata (PPT)My First 100 days with an Exadata (PPT)
My First 100 days with an Exadata (PPT)
 
Design installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttuDesign installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttu
 
16aug06.ppt
16aug06.ppt16aug06.ppt
16aug06.ppt
 
RedGateWebinar - Where did my CPU go?
RedGateWebinar - Where did my CPU go?RedGateWebinar - Where did my CPU go?
RedGateWebinar - Where did my CPU go?
 

Recently uploaded

Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
Hiike
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Tatiana Kojar
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
Dinusha Kumarasiri
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
alexjohnson7307
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
LucaBarbaro3
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
GDSC PJATK
 

Recently uploaded (20)

Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
 

Full PPT Stack

  • 1. PRESENTED BY: Introduction to HPC Interacting with High Performance Computing Systems 6/7/18 1 Virginia Trueheart, MSIS Texas Advanced Computing Center vtrueheart@tacc.utexas.edu
  • 2. An Overview of HPC: What is it? High Performance Computing • Parallel processing for advanced computation • A “Supercomputer”, “Large Scale System”, or “Cluster” The same parts as your laptop • Processors, coprocessors, memory, operating system, etc. • Specialized for scale & efficiency Scale and Speed • Thousands of nodes • High bandwidth, low latency network for large scale I/O 6/7/18 2
  • 3. An Overview of HPC: Stampede2 • Peak performance: 18 PF, rank 12 in Top 500 (2017) • 4,200 68-core Knights Landing (KNL) nodes • 1,736 48-core Skylake (SKX) nodes • 368,928 cores and 736,512GB memory in total • Interconnect: Intel’s Omni-Path Fabric Network • Three Lustre Filesystems • Funded by NSF through grant #ACI-1134872 6/7/18 3
  • 4. An Overview of HPC: Architecture 6/7/18 4 idev Internet ssh login node knl- nodes skx- nodes sbatch omnipath STAMPEDE 2 $HOME $WORK $SCRATCH
  • 5. Ex: SKX Compute Node 6/7/18 5 Model Intel Xeon Platinum 8160 ("Skylake") Cores Per Node 48 cores on two sockets (24 cores/socket) Hardware Threads per Core 2 Hardware Threads per Node 96 Clock Rate 2.1Ghz RAM 192GB Cache 57MB per socket Local Storage 144GB /tmp partition
  • 6. Ex: Physical Layout KNL Node (68 cores per node) SKX Node (24 cores/socket * 2) 6/7/18 6
  • 7. 6/7/18 7 c455-012[knl](1001)$less /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 87 model name : Intel(R) Xeon Phi(TM) CPU 7250 @ 1.40GHz stepping : 1 microcode : 0x1ac cpu MHz : 1496.140 cache size : 1024 KB physical id : 0 siblings : 272 core id : 2 cpu cores : 68 apicid : 8 initial apicid : 8 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes
  • 8. 6/7/18 8 c455-012[knl](1001)$ numactl -H available: 1 nodes (0) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 node 0 size: 98207 MB node 0 free: 91727 MB node distances: node 0 0: 10
  • 9. An Overview of HPC: Using a System Why would I use HPC resources? • Large scale problems • Parallelization and Efficiency • Collaboration How do I find an HPC resource? • Check with your institution • Check with national scientific groups (NSF in the US) 6/7/18 9
  • 10. Overview: What Can I Find on a System Modules and Software • Basic compilers and libraries • Popular packages • Licensed software Build Your Own! • github or other direct sources • pip, wget, curl, etc • You won’t have sudo access 6/7/18 10
  • 11. What Do I Need to Get Started? • User Account • Allocation & Project • Two Factor Authentication 6/7/18 11
  • 12. Please See Your Handouts! • Username • Password • Temporary TFA Key 6/7/18 12
  • 13. SSH Protocols Secure Shell • Encrypted network protocol to access a secure system over an unsecured network • automatically generated public-private key pairs • Your Wi-Fi Stampede2 or other secure machine • File transfers (scp & rsync) Options • .ssh/config • Host & Username • Make connecting easier • Passwordless Login 6/7/18 13
  • 14. Logging In (Mac Terminal) $ ssh <username>@stampede2.tacc.utexas.edu To access the system: 1) If not using ssh-keys, please enter your TACC password at the password prompt 2) At the TACC Token prompt, enter your 6-digit code followed by <return>. Password: TACC Token Code: 6/7/18 14
  • 15. Logging in (PuTTY pt. 1) 6/7/18 15
  • 16. Logging in (PuTTY pt. 2) 6/7/18 16
  • 17. Welcome to Stampede2, *please* read these important system notes: --> Stampede2, Phase 2 Skylake nodes are now available for jobs --> Stampede2 user documentation is available at: https://portal.tacc.utexas.edu/user-guides/stampede2 ----------------------- Project balances for user vtrue ----------------------- | Name Avail SUs Expires | | | A-ccsc 189624 2018-12-31 | | ------------------------- Disk quotas for user vtrue -------------------------- | Disk Usage (GB) Limit %Used File Usage Limit %Used | | /home1 1.9 10.0 19.43 39181 200000 19.59 | | /work 311.8 1024.0 30.45 225008 3000000 7.50 | | /scratch 0.0 0.0 0.00 4 0 0.00 | ------------------------------------------------------------------------------- 6/7/18 17
  • 18. Where am I? Login Nodes • Manage files • Build software • Submit, monitor and manage jobs Compute Nodes • Running jobs • Testing applications 6/7/18 18
  • 19. Welcome to Stampede2, *please* read these important system notes: --> Stampede2, Phase 2 Skylake nodes are now available for jobs --> Stampede2 user documentation is available at: https://portal.tacc.utexas.edu/user-guides/stampede2 ----------------------- Project balances for user vtrue ----------------------- | Name Avail SUs Expires | | | A-ccsc 189624 2018-12-31 | | ------------------------- Disk quotas for user vtrue -------------------------- | Disk Usage (GB) Limit %Used File Usage Limit %Used | | /home1 1.9 10.0 19.43 39181 200000 19.59 | | /work 311.8 1024.0 30.45 225008 3000000 7.50 | | /scratch 0.0 0.0 0.00 4 0 0.00 | ------------------------------------------------------------------------------- 6/7/18 19
  • 20. Allocations Active Project with a Project Instructor attached Service Units (SUs) • SUs billed (node-hrs) = ( # nodes ) x (wall clock hours ) x ( charge rate per node-hour ) Shared Systems • Be a good citizen 6/7/18 20
  • 21. Welcome to Stampede2, *please* read these important system notes: --> Stampede2, Phase 2 Skylake nodes are now available for jobs --> Stampede2 user documentation is available at: https://portal.tacc.utexas.edu/user-guides/stampede2 ----------------------- Project balances for user vtrue ----------------------- | Name Avail SUs Expires | | | A-ccsc 189624 2018-12-31 | | ------------------------- Disk quotas for user vtrue -------------------------- | Disk Usage (GB) Limit %Used File Usage Limit %Used | | /home1 1.9 10.0 19.43 39181 200000 19.59 | | /work 311.8 1024.0 30.45 225008 3000000 7.50 | | /scratch 0.0 0.0 0.00 4 0 0.00 | ------------------------------------------------------------------------------- 6/7/18 21
  • 22. Filesystems Division of Labor • Linux Cluster (Lustre) system that look like a single hard disk space • Small I/O is hard on the system • Striping large data (OST, MDS) Partitions • $HOME: 10GB, $WORK: 1TB, $SCRATCH: Unlimited • Shared system 6/7/18 22
  • 24. Filesystems: Cont. Where am I? • pwd – print working directory • cd – change directory • cd .. – move up one directory New Files • mkdir – make directory • Editors – vi(m), nano, emacs • mv – move a file to another location 6/7/18 24
  • 25. Create a File login1.stampede2$ cd $WORK login1.stampede2$ pwd /work/03658/vtrue/stampede2 login1.stampede2$ nano helloWorld.py 6/7/18 25
  • 26. 6/7/18 26 #!/usr/bin/env python """ Hello World """ import datetime as DT today = DT.datetime.today() print "Hello World! Today is:" print today.strftime("%d %b %Y") A Very Small File
  • 27. Run an Interactive Job idev • Interactive development queue access command • Watch your code run live • Test things in real time • idev –help for options idev will drop you directly into the knl development queue so be aware of your location on the system. 6/7/18 27
  • 28. helloWorld.py staff.stampede2(1005)$ idev -> Checking on the status of development queue. OK -> Defaults file : ~/.idevrc -> System : stampede2 -> Queue : development (idev default ) [...] c455-012[knl](1019)$ 6/7/18 28
  • 29. helloWorld.py staff.stampede2(1005)$ idev -> Checking on the status of development queue. OK -> Defaults file : ~/.idevrc -> System : stampede2 -> Queue : development (idev default ) [...] c455-012[knl](1019)$ python helloWorld.py Hello World! Today is: 17 Jun 2018 c455-012[knl](1020)$ 6/7/18 29
  • 30. Types of Code Serial Code • Albeit a very, very small one • Single tasks, one after the other • Single node/single core Parallel Code • Array or “embarrassingly parallel” jobs • Many node/many core • Uses MPI • Hybrid codes 6/7/18 30
  • 31. Message Passing Interface ibrun isTACC specific • “Wrapper” for mpirun • Execute serial and parallel jobs across the entire node MPI Functions • Allows communication between all cores and all nodes • Move data between parts of the job that need it • Point-to-Point or Collective Communication 6/7/18 31
  • 32. Ex: MPI Communication Point-to-Point Collective proc 0 proc 1 proc 2 proc 3 proc 4 6/7/18 32 proc 0 proc 3 proc 7 proc 4 proc 9
  • 33. #!/usr/bin/env python """ Parallel Hello World """ from mpi4py import MPI import sys size = MPI.COMM_WORLD.Get_size() rank = MPI.COMM_WORLD.Get_rank() name = MPI.Get_processor_name() sys.stdout.write( "Hello, World! I am process %d of %d on %s.n" % (rank, size, name)) 6/7/18 33
  • 34. Parallel helloWorld.py c455-012[knl](1019)$ ibrun python helloParallel.py TACC: Starting up job 1595632 TACC: Starting parallel tasks... Hello, World! I am process 1 of 68 on c456-042.stampede2.tacc.utexas.edu. Hello, World! I am process 49 of 68 on c456-042.stampede2.tacc.utexas.edu. Hello, World! I am process 66 of 68 on c456-042.stampede2.tacc.utexas.edu. Hello, World! I am process 67 of 68 on c456-042.stampede2.tacc.utexas.edu. Hello, World! I am process 64 of 68 on c456-042.stampede2.tacc.utexas.edu. ... TACC: Shutdown complete. Exiting. 6/7/18 34
  • 35. Submitting a Job Why submit? • Larger jobs, more nodes • You don’t have to watch it in real time • Run multiple jobs simultaneously Queues • Pick the queues that suit your needs • Don’t request more resources than you need • Remember this is a shared resource Never run on a login node! 6/7/18 35
  • 36. 6/7/18 36 Queue Name Node Type Max Nodes per Job Max Duration Max Jobs in Queue Charge Rate development KNL cache-quad 16 nodes 2hrs 1 1SU normal KNL cache-quad 256 nodes 48hrs 50 1SU large** KNL cache-quad 2048 nodes 48hrs 5 1SU long KNL cache-quad 32 nodes 96hrs 2 1SU flat- quadrant KNL flat-quad 24 nodes 48hrs 2 1SU skx-dev SKX 4 nodes 2hrs 1 1SU skx-normal SKX 128 nodes 48hrs 25 1SU skx-large** SKX 868 nodes 48hrs 3 1SU
  • 37. Submitting a Job cont. sbatch • Simple Linux Utility for Resource Management (SLURM) • Linux/Unix workload manager • Allocates resources • Executes and monitors jobs • Evaluates and manages pending jobs Using a Scheduler • Gets you off of the login nodes (shared resource) • Means you can walk away and do other things 6/7/18 37
  • 38. Submission Options 6/7/18 38 Option Argument Comments -p queue_name Submits to queue (partition) designated by queue_name -J job_name Job Name -N total_nodes Required. Define the resources you need by specifying either: (1) "-N" and "-n"; or (2) "-N" and "--ntasks-per-node". -n total_tasks This is total MPI tasks in this job. When using this option in a non-MPI job, it is usually best to set it to the same value as "-N". -t hh:mm:ss Required. Wall clock time for job. -o output_file Direct job standard output to output_file (without -e option error goes to this file) -e error_file Direct job error output to error_file -d= afterok:jobid Dependency: this run will start only after the specified job successfully finishes -A projectnumber Charge job to the specified project/allocation number.
  • 39. Parallel Job#!/bin/bash #SBATCH -J myJob # Job name #SBATCH -o myJob.o%j # Name of stdout output file #SBATCH -e myJob.e%j # Name of stderr error file #SBATCH -p development # Queue (partition) name #SBATCH -N 1 # Total # of nodes #SBATCH -n 68 # Total # of mpi tasks #SBATCH -t 00:05:00 # Run time (hh:mm:ss) #SBATCH -A myproject # Allocation name (req'd if you have more than 1) #SBATCH --mail-user=hkang@austin.utexas.edu #SBATCH --mail-type=all # Send email at begin and end of job # Other commands must follow all #SBATCH directives... module list pwd date # Launch code... ibrun python helloParallel.py 6/7/18 39
  • 40. Managing Your Jobs qlimits – all queues restrictions sinfo – monitor queues in real time squeue – monitor jobs in real time showq – similar output to squeue scancel – manually cancel a job scontrol – detailed information about the configuration of a job sacct – accounting data about your jobs 6/7/18 40
  • 41. staff.stampede2(1009)$ squeue -u vtrue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 1604426 development idv20717 vtrue R 16:57 1 c455-001 staff.stampede2(1010)$ scontrol show job=1604426 JobId=1604426 JobName=idv20717 UserId=vtrue(829572) GroupId=G-815499(815499) MCS_label=N/A Priority=400 Nice=0 Account=A-ccsc QOS=normal JobState=RUNNING Reason=None Dependency=(null) Requeue=0 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0 RunTime=00:18:08 TimeLimit=00:30:00 TimeMin=N/A SubmitTime=2018-06-09T21:27:33 EligibleTime=2018-06-09T21:27:33 StartTime=2018-06-09T21:27:36 EndTime=2018-06-09T21:57:36 Deadline=N/A PreemptTime=None SuspendTime=None SecsPreSuspend=0 LastSchedEval=2018-06-09T21:27:36 ... 6/7/18 41
  • 42. Accessing a Compute Node No Job Running staff(1003)$ ssh c455-001 Access denied: user vtrue (uid=829572) has no active jobs on this node. Authentication failed. Job Running staff(1002)$ ssh c455-032 Last login: Fri Jun 15 15:46:04 2018 from staff.stampede2.tacc.utexas.edu TACC Stampede2 System Provisioned on 24-May-2017 at 11:49 c455-032[knl](1001)$ 6/7/18 42
  • 43. On Node Monitoring cat /proc/cpuinfo • Follow a read out of the cpu info on the node top • See all of the processes running and which are consuming the most resources free –g • Basic print out of memory consumption Remora • This is an open source tool developed by TACC that can help you track memory, cpu usage, I/O activity, and other options 6/7/18 43
  • 44. What else is on the Login Node? Modules • Find software, compilers, dependent packages Environments • Paths, personalizations, licenses Building Software • Install what you need, compile, update 6/7/18 44
  • 45. Modules Modules • TACC uses a tool called Lmod • Add, remove, and swap software packages • Saves you from having to build your own Commands • module spider <package> - search for a package • module list – see currently loaded modules • module avail – list all available packages • module load <package> - load a specific package or version 6/7/18 45
  • 46. 6/7/18 46 staff.stampede2(1073)$ module list Currently Loaded Modules: 1) intel/17.0.4 2) impi/17.0.3 3) git/2.9.0 4) autotools/1.1 5) xalt/1.7.7 6) TACC 7) python2/2.7.14 staff.stampede2(1078)$ module spider python ---------------------------------------------------------------------------------------- python: ---------------------------------------------------------------------------------------- Versions: python/2.7.13 Other possible modules matches: python2 python3 ---------------------------------------------------------------------------------------- To find other possible module matches execute: $ module -r spider '.*python.*'
  • 47. 6/7/18 47 staff.stampede2(1073)$ module spider python3/3.6.4 ------------------------------------------------------------------------------------- python3: python3/3.6.4 ------------------------------------------------------------------------------------- Description: scientific scripting package You will need to load all module(s) on any one of the lines below before the "python3/3.6.4" module is available to load. intel/17.0.4 Help: This is the Python3 package built on March 01, 2018. You can install your own modules (choose one method): 1. python3 setup.py install --user 2. python3 setup.py install --home=<dir> 3. pip3 install --user module-name Version 3.6.4
  • 48. Environment Management env – Read out of all the environment variables set Look for something specific: staff.stampede2(1017)$ env | grep GIT TACC_GIT_BIN=/opt/apps/git/2.9.0/bin TACC_GIT_DIR=/opt/apps/git/2.9.0 TACC_GIT_LIB=/opt/apps/git/2.9.0/lib GIT_TEMPLATE_DIR=/opt/apps/git/2.9.0/share/git-core/templates GIT_EXEC_PATH=/opt/apps/git/2.9.0/libexec/git-core 6/7/18 48
  • 49. Environment Paths $echo $PATH /opt/apps/xalt/1.7.7/bin:/opt/ap ps/intel17/python/2.7.13/bin:/op t/apps/autotools/1.1/bin:/opt/ap ps/git/2.9.0/bin:/tmprpm/intel17 /impi/17.0.3/bin:/opt/intel/comp ilers_and_libraries_2017.4.196/l inux/mpi/intel64/bin:/opt/intel/ compilers_and_libraries_2017.4.1 96/linux/bin/intel64:/opt/apps/g cc/5.4.0/bin:/usr/lib64/qt- 3.3/bin:/usr/local/bin:/bin:/usr /bin:/opt/dell/srvadmin/bin:. $echo $LD_LIBRARY_PATH /opt/apps/intel17/python/2.7.13/lib:/opt/in tel/compilers_and_libraries_2017.4.196/linu x/mpi/intel64/lib:/opt/intel/debugger_2017/ libipt/intel64/lib:/opt/intel/debugger_2017 /iga/lib:/opt/intel/compilers_and_libraries _2017.4.196/linux/tbb/lib/intel64_lin/gcc4. 4:/opt/intel/compilers_and_libraries_2017.4 .196/linux/daal/lib/intel64_lin:/opt/intel/ compilers_and_libraries_2017.4.196/linux/tb b/lib/intel64/gcc4.7:/opt/intel/compilers_a nd_libraries_2017.4.196/linux/mkl/lib/intel 64_lin:/opt/intel/compilers_and_libraries_2 017.4.196/linux/compiler/lib/intel64_lin:/o pt/intel/compilers_and_libraries_2017.4.196 /linux/ipp/lib/intel64:/opt/intel/compilers _and_libraries_2017.4.196/linux/compiler/li b/intel64:/opt/apps/gcc/5.4.0/lib64:/opt/ap ps/gcc/5.4.0/lib 6/7/18 49
  • 50. Diagnosing Your Environment $module load sanitytool $sanitycheck Sanity Tool Version: 1.3 1: Check SSH permissions: Passed 2: Check SSH keys: Passed 3: Check environment variables (e.g. HOME, WORK, SCRATCH) and file system access: Passed ... 6/7/18 50
  • 51. .bashrc (or shell appropriate script) • Set default modules • Set custom paths permanently • Change command line prompt • Set aliases • Change Umask settings for sharing files • Enable startup script tracking 6/7/18 51
  • 52. 6/7/18 52 alias priority='squeue -t pending -o "%Q %.7i %.9P %.8j %.2t %.10M %.6D %r %B %S" --sort="-p"' alias qalloc='squeue -A UT-2015-05-18 -o "%.18i %.9P %.9G %.16a %.6D %.20S %.8M %.10L %.10Q"' alias qalloct='squeue -i 5 -A UT-2015-05-18 -o "%.18i %.9P %.9G %.16a %.8u %.6D %.20S %.8M %.10L %.10Q"' alias tacc_jobs='cat /scratch/projects/tacc_stats/accounting/tacc_jobs_completed | grep' alias qstat='squeue -o "%.18i %.12P %.9u %.9G %.16a %.6D %.20S %.10M %.10L %.10Q %.25V"' alias nstat='echo A/I/O/T: Allocated/Idle/Other/Total; sinfo -o "%20P %5a %.10l %16F"'
  • 53. Building and Installing Software Python • pip install packageName --user GitHub • module load git • git clone https://full/file/path Direct Sources • wget https://path/to/tarball.tar.gz • tar -xvf tarball.tar.gz 6/7/18 53
  • 54. Ex: vcftools $ cdw $ mkdir vcftools $ cd vcftools $ git clone https://github.com/vcftools/vcftools.git $ ./configure --prefix=$WORK/Tools $ make $ make install $ export PATH=$PATH:$WORK/Tools/bin $ vcftools VCFtools (0.1.15) © Adam Auton and Anthony Marcketta 2009 6/7/18 54
  • 55. What Else? Any Software • Build from source; get as complicated as you want Customize login • Modify .ssh/config on your local machine to meet your needs Customize Editors • Bring in outside configuration files (colors, layout, etc) 6/7/18 55
  • 56. .ssh/config Host s.s2 HostName staff.stampede2.tacc.utexas.edu User vtrue ServerAliveInterval 60 ForwardX11 yes Host s.ls5 HostName staff.ls5.tacc.utexas.edu User vtrue ServerAliveInterval 60 ForwardX11 yes 6/7/18 56
  • 57. 6/7/18 57 Q&A Questions Answered and Demonstrations Provided
  • 58. 6/7/18 58 Further Information Main Website: www.tacc.utexas.edu User Support: www.portal.tacc.utexas.edu Email: info@tacc.utexas.edu

Editor's Notes

  1. High Performance Computing most generally refers to the practice of aggregating computing power in a way that delivers much higher performance than one could get out of a typical desktop computer or workstation in order to solve large problems in science, engineering, or business.
  2. Petaflop a unit of computing speed equal to one thousand million million (1015) floating-point operations per second. Blue Waters is about 13.34 PF
  3. What it does on the node will always be faster than on the filesystem bc of the interconnect of omnipath. High throughput low latency. Do things on the node then write out.
  4. So what does a node look like on a super computer? My Mac here is 1 node with 4 cores and a similar clock rate. The full machine is 4,200 KNL compute nodes AND 1,736 SKX compute nodes.
  5. Technically 72 but due to the way the KNL handles data only 68 are actually “functional” as far as the system is concerned 68 all together or 48 split between two. Different kinds of responses. Figure out what you’re doing and what your code can be made to accommodate
  6. What does that look like when im directly on the node and not just theorizing about it? Knl node ex.
  7. Non-uniform memory access (NUMA) is a computer memory design used in multiprocessing, where the memory access time depends on the memory location relative to the processor. This is another way to visualize it. Each of these is a “logical cpu” rather than a physical cpu. Bc we have hyper threading turned on you can run 4 threads on every core resulting in a total possible 272 tasks if you are very very careful. Generally we see better results with 64> physical cores used. Hardware threads = physical cores, software threads = logical cores. Threads, tasks, cores, etc. we have 8 different names for the same things.
  8. You’re working on climate data so your problem sets are huge. How many years? How granular? Do you want to develop visualizations? Predictive scaling? We could track Harvey in real time and predict where the deepest water was after the fact to help rescue crews manage their responses. We also kept legacy data for future study. Application processes, depending on their model access may be free but access is in high demand so you have to prove that you’ll be doing valuable science on the machine
  9. Now that you know a bit about what HPC systems involve, let’s try it out. What do I need to get started?
  10. Ok, let’s get started!
  11. We’ll come back to complex options later in the customizing your environment section. Programmers are lazy we don’t want to type more than we have to. On windows PuTTY is a ssh client
  12. Inputs at password and token will likely appear blank so type carefully
  13. If this doesn’t work in putty you may need to update settings on your machine: In Settings app, go to Apps > Apps & features > Manage optional features. Locate “OpenSSH server” feature, expand it, and select Install.
  14. Object storage target and metadata server
  15. Performance increases with filesystem size. But there are also caveats for long term storage associated with these resources
  16. This is a rehash of Saturday but it will help you move around in what we’re going to do next
  17. Pay attention to your command prompt. Ofc you can change this if you want but many systems have a default that is designed to be helpful .
  18. Single processor
  19. We’ll get to multithreading in a minute and how that actually works vs hyperthreading A lot of people will use the fact that there are more cores on a node some people will run across it repeatedly for more through put
  20. Point to point is communication between to processes so task 1 on core 1 talks to core 2 etc. Collective communication implies a synchronization point. All of your tasks much get to a certain point before moving to the next step (barrier) or broadcast, from single point send out the same data to other processors. You can get super fancy with this and build tree structures to help your code. Paired things like vectorization this becomes very important for increasing the performance of your code at a larger scale.
  21. You can do this for as many logical cores are on the node or you can do it between nodes. But specify which you’re doing in your code or your just going to trip it up. Point to point can be send/receive or a send or a receive separately and can go between any single core. This can be in order or not but its not the same thing being pushed out across multiple cores that is a “broadcast” the reverse can be “gathering” as in scatter and gather
  22. Single node/task = one output Shift + ZZ to save and exit ls to see if file was saved We’ll come back for this later when we start running some examples but for now make sure it’s saved and try to remember where you put it
  23. Single processor per task (multithreaded) but not yet hyperthreaded Great! Now you know how to run jobs interactively
  24. So many more options available. What’s available? See the queues!
  25. Varies based on the number of nodes the system has and the frequency of use. Large queues are special as they take up so much of the system and a poorly run job could a) cause system problems b) cost you a lot of SUs that you wont get refunded. Cache-quad and flat-quad has to do with the way memory is distributed. We keep most of the knl’s in cache bc the response time is faster but sometimes that means there is less room for certain operations. If your code is heavier on the memory switch to flat quad so you can get the full memory available. Instead of 16+96 you get 112GB flat. Cache Mode. In this mode, the fast MCDRAM is configured as an L3 cache. The operating system transparently uses the MCDRAM to move data from main memory. In this mode, the user has access to 96GB of RAM, all of it traditional DDR4. Most Stampede2 KNL nodes are configured in cache mode. Flat Mode. In this mode, DDR4 and MCDRAM act as two distinct Non-Uniform Memory Access (NUMA) nodes. It is therefore possible to specify the type of memory (DDR4 or MCDRAM) when allocating memory. In this mode, the user has access to 112GB of RAM: 96GB of traditional DDR and 16GB of fast MCDRAM. By default, memory allocations occur only in DDR4. To use MCDRAM in flat mode, use the numactl utility or the memkind library; see Managing Memory for more information. If you do not modify the default behavior you will have access only to the slower DDR4.
  26. This is how the system knows what’s trying to run. Helps manager resources and tries to keep things “fair” is generally automated though can be manipulated by admins.
  27. This isn’t all options but it covers the most commonly used ones and the ones that are required. You have to tell the system what you are trying to do and this is how you communicate with it via the work load manager.
  28. “slurm batch”
  29. Lua based module system. A modulefile contains the necessary information to allow a user to run a particular application or provide access to a particular library. All of this can be done dynamically without logging out and back in. Modulefiles for applications modify the user's path to make access easy. Modulefiles for Library packages provide environment variables that specify where the library and header files can be found. It is also very easy to switch between different versions of a package or remove the package. Module -help
  30. Lua based module system. A modulefile contains the necessary information to allow a user to run a particular application or provide access to a particular library. All of this can be done dynamically without logging out and back in. Modulefiles for applications modify the user's path to make access easy. Modulefiles for Library packages provide environment variables that specify where the library and header files can be found. It is also very easy to switch between different versions of a package or remove the package. Module -help
  31. Suggestions. READ!! It will often tell you what you need to know about the package
  32. Suggestions. READ!! It will often tell you what you need to know about the package We’ll address some more commands later …like squeue etc
  33. These are the two most important things to keep track of where the system looks for executables and libraries. The order of these _matters_ it will pick what it finds first and stick with it and ignore subsequent matching pieces. You can change these to accommodate installing new software but be careful.
  34. If anything is no correct it will say “FAILED” and then provide you with instructions
  35. Show live demo for this
  36. Aliases are short cuts and are your best friend
  37. Downloading from git or direct requires you then update paths to make it universally usable
  38. This is a terrible example but it’s straightforward. You have to put the path update in your .bashrc if you want it to stick once you logout