SlideShare a Scribd company logo
1 of 24
Download to read offline
1/13/15	
  
1	
  
Next-Generation Sequencing Analysis Series
January 14, 2015
Andrew Oler, PhD
High-throughput Sequencing Bioinformatics Specialist
BCBB/OCICB/NIAID/NIH
BCBB instructors for this NGS series
Andrew Oler Vijay Nagarajan Mariam Quiñones
2
Bioinformatics and Computational
Biosciences Branch
NIH/NIAID/OD/OSMO/OCICB
Contact BCBB at
ScienceApps@niaid.nih.gov
Contact HPC Cluster team at:
Cluster_support@niaid.nih.gov
1/13/15	
  
2	
  
Bioinformatics and Computational
Biosciences Branch
§  Bioinformatics Software
Developers
§  Computational Biologists
§  Project Managers &
Analysts
http://www.niaid.nih.gov/about/organization/odoffices/omo/ocicb/Pages/bcbb.aspx
3
Objectives
When you leave today, I hope you will be able to
1.  Open a terminal and know how to navigate
2.  Know how to do basic file manipulation and create files and
directories from the command line
3.  Submit a job to the HPC cluster
To accomplish these goals, we will
1.  Learn the most useful Unix terminal commands
2.  Practice a few of these commands
3.  Practice preparing and submitting some scripts to the NIAID
HPC Cluster
Caveat:
1.  You may not be a Unix expert when you leave today (and
that’s okay).
4
1/13/15	
  
3	
  
Anatomy of the Terminal, “Command Line”,
or “Shell”
Prompt (computer_name:current_directory username)
Cursor
Command Argument
Window
Output
Mac: Applications -> Utilities -> Terminal
Windows: Download open source software
PuTTY http://www.chiark.greenend.org.uk/~sgtatham/putty/
Other SSH Clients (http://en.wikipedia.org/wiki/Comparison_of_SSH_clients)
6
1/13/15	
  
4	
  
File Manager/Browser by Operating System
7
OS: Windows Mac OSX Unix
FM: Explorer Finder Shell
Typical UNIX directory structure
/
“root”
/bin
essential binaries
/etc
system config
/home
user directories
/home/USER1
USER1 home
/home/USER2
USER2 home
/mnt
network drives
/sbin
system binaries
/usr
shared, read-only
/usr/bin
other binaries
/usr/local
installed packages
/usr/local/bin
installed binaries
/var
variable data
/var/tmp
program caches
pwd “print working directory”; tells where you are
8
1/13/15	
  
5	
  
How to execute a command
command argument
output
output
9
Some basic Unix commands
§  pwd
§  ls
§  mkdir
§  cd
§  wget
§  curl
§  cp
§  wc
§  head
§  tail
§  less
§  cat
§  **See Pre-lecture worksheet.**
10
1/13/15	
  
6	
  
Tips to make life easier!
Tab completion: hit Tab to make computer guess your filename.
type: ls unix[Tab]
result: ls unix_hpc
If nothing happens on the first tab, press tab again…
Up Arrow: recall the previous command(s)
Ctrl+a go to beginning of line
Ctrl+e go to end of line
Ctrl+c kill current running process in terminal
Aliases (put in ~/.bashrc file … see handout)
alias ls='ls -AFG'
alias ll='ls -lrhT'
history show every command issued during the session
!ls repeat the previous “ls” command
!! repeat the previous command
man [command] read the manual for the command
man ls read the manual for the ls command
11
Accessing the NIAID HPC
§  Login to HPC “submit node,” which is the computer from which you submit jobs.
ssh secure shell, remote login
ssh ngscXXX@hpc.niaid.nih.gov fill in XXX with number
§  Copy	
  files	
  to/from	
  HPC	
  
scp secure copy to remote location
scp -r ~/data/dir username@hpc.niaid.nih.gov:~/data/
§  ssh	
  and	
  scp	
  will	
  prompt	
  you	
  to	
  enter	
  your	
  password	
  
12
1/13/15	
  
7	
  
mv (“move file”)
mv file1 temp/ move “file1” to the “temp” directory
mv file1 file2 rename “file1” to “file2”
mv -i file2 temp/file3 move “file2” to the “temp” directory and
rename it “file3”; ask to make sure
*without -i, it will overwrite an existing file!*
mv *.fastq ~ move all “.fastq” files to the home directory
Exercise 1:
mv *.fastq temp/ (moveall “.fastq” files to the “temp” directory)
ls temp (check that the files are there)
Note: syntax for mv and cp are similar
13
rm (“remove file”)
rm file1 delete “file1”
rm -i file2 delete “file2”, but ask first
rm *.pdb delete all “.pdb” files
rm -r temp delete the “temp” directory
rm -rf temp delete the “temp” directory, no questions asked!
Be careful!
rm -r *
14
1/13/15	
  
8	
  
File and system information
wc file1 “word count”; output is “lines”, “words”, characters”
wc *.fastq “word count” of all fastq files, including summary
du -h temp “disk usage” (size) of each file in the “temp”
directory (outputs a list)
top report for local machine on the processes using the
most system resources (memory, CPU, etc.); “q” to
exit
15
File compression
gzip temp/* compress every file in “temp”;
adds .gz extension
gunzip temp/*.gz expand every “gzipped” file in
“temp”
tar -zcvf myfiles.tar.gz temp/* create a single archive of
every file in “temp”
tar –xvf test_data.tar.gz copy every file out of the archive
“tarball” ≠
16
1/13/15	
  
9	
  
File manipulation
cat file1 file2 > file3 write “file3”, containing first “file1”,
then “file2”
cat file1 >> file2 append “file1” onto “file2”
sort file1 alphabetize “file1.txt”
sort -n file1 sort “file1” by number
sort -n -r -k 2 file1 sort “file1” by the second word or
column in reverse numerical order
Careful
!
17
grep (search within files)
grep key file* report the file name and line where “key” appears in file*
grep -v key file* report the file names of files that do not match “key”
man grep see other functions of grep. (lots! regular expressions!)
18
1/13/15	
  
10	
  
Linking files (making “shortcuts”)
ln -s ~/myapp/binary ~/bin make a shortcut (“symbolic link”)
in “~/bin” that points to “~/myapp/
binary”
ln -s /usr/local data make a shortcut in current
directory pointing to /usr/local
cd data takes you to /usr/local
19
Downloading Files
wget download multiple files from ftp or http address
curl download single files from ftp, http, sftp, etc.
http://curl.haxx.se/docs/comparison-table.html
20
1/13/15	
  
11	
  
Pipelining
ls | wc count the number of files in a directory
grep | sort > file1 pull out searched-for lines, sort them, and
write a new file
Exercise 2:
head –n 2000 lymph1k.fastq | gzip > head2K.txt.gz
21
Loops
for assign a variable for each of a space-separated list of values
; Use to separate commands
do done Marks start and end of loop to repeat
Exercise 3:
for i in 1 13 200; do echo $i; done
1
13
200
ls
for i in file*; do echo $i; mv “$i” “${i}.txt”; done
file1
file2
ls
22
1/13/15	
  
12	
  
Recommended Reading
Linux in a Nutshell, Sixth
Edition
Ellen Siever, Stephen Figgins,
Robert Love, Arnold Robbins
Running Linux, 5th
Edition
Matthias Kalle
Dalheimer, Matt Welsh
UNIX® Shells by
Example,
Fourth Edition
Ellie Quigley
23
Take Away
Use	
  mnemonics	
  
	
  
Read	
  “man”	
  pages	
  
	
  
Work	
  on	
  copies,	
  make	
  backups,	
  and	
  use	
  “rm”,	
  “mv”,	
  and	
  “>”	
  
carefully	
  
	
  
Pick	
  a	
  text	
  editor	
  and	
  master	
  it	
  (pico/nano,	
  emacs,	
  vi/vim,	
  etc.)	
  
	
  
Be	
  clever!	
  
Questions?
24
1/13/15	
  
13	
  
Using NIAID Grid Engine Cluster
25
High Performance Computing
§  “A computer cluster consists of a set of loosely connected
computers that work together so that in many respects they can
be viewed as a single system.”
http://en.wikipedia.org/wiki/Cluster_%28computing%29
26
1/13/15	
  
14	
  
HPC Glossary
§  node individual workstation within a network or cluster; a collection of processors all
accessing the same memory (RAM).
§  CPU abbreviation for central processing unit. The processor of a node. Also
referred to as a socket.
•  try cat /proc/cpuinfo
•  Note that “processors” in the output are actually “cores” by the definition below
§  core separate execution core for calculations. e.g., “dual-core” means the
processor has two cores. Sometimes each core is referred to as a separate
processor.
§  slot a single core available for use within a node. e.g., if a node has 16 cores, it
will have 16 slots.
§  hyper-threaded technology (HTT) Where a single execution core is treated as
being two virtual cores (or two logical processors) by the system. Some of the nodes
in the cluster have HTT. E.g., if there are 16 physical cores, there would be 32 logical
processors.
§  thread a single process of a multi-process job. Each thread runs on a separate logical
processor. E.g., if you run tophat with -p 10, 10 threads will be created and run in
parallel.
These definitions are somewhat flexible…
27
Accessing the NIAID HPC
§  Request an HPC account (for NIAID members and collaborators only)
•  https://hpcweb.niaid.nih.gov/#home
•  “Request Account”
§  Login to HPC “submit node,” which is the computer from which you submit jobs.
ssh secure shell, remote login
ssh username@hpc.niaid.nih.gov
§  Copy	
  files	
  to/from	
  HPC	
  
scp secure copy to remote location
scp -r ~/data/dir username@hpc.niaid.nih.gov:~/data/
§  ssh	
  and	
  scp	
  will	
  prompt	
  you	
  to	
  enter	
  your	
  password	
  
28
1/13/15	
  
15	
  
Mounting HPC Drives
29
Mac Windows
1.  Click "Start" > "Computer”
2.  Click on "Map Network
Drive”.
3.  Choose an available drive
letter.
4.  Enter ai-hpcfileserver.niaid.nih.govbcbb
in the “Folder” field,
replacing “bcbb” with your
group name or your user
name.
(For more details, see link to
FAQ below.)
https://hpcweb.niaid.nih.gov/#support?type=Links&requestType=HPC%20FAQs&name=41
Cluster Architecture and Access
image modified from http://ainkaboot.co.uk/
regular.q interactive.q memLong.q
qrsh -q interactive.q
qsub -q memLong.q
ssh username@hpc.niaid.nih.gov
Submit Node
30
1/13/15	
  
16	
  
Cluster Queue System: Sun Grid Engine
§  Computers have Linux Red Hat Operating System
§  Grid Engine is a batch queuing system
§  Other queuing systems (http://en.wikipedia.org/wiki/Job_scheduler):
•  Portable Batch System (PBS) (e.g., Biowulf)
•  TORQUE Resource Manager
•  Maui
•  Moab
•  others…
•  Each will require a slightly different syntax for scripts
§  Comes with a set of commands to communicate with the cluster
§  Monitors available resources and users’ workloads to start jobs at the appropriate time
31
Grid Engine jobs
§  Three types of jobs
•  Batch/Serial (one node,
one processor)
•  Parallel (multiple
processors or nodes)
•  Interactive
32
Input Process Output
Input Process Output
Process
Process
1/13/15	
  
17	
  
Grid Engine Jobs: Interactive
§  Login to a node like ssh
qrsh -l h_vmem=20G
§  Need to specify parameters
-l requested resources in space-delimited list
•  For interactive job:
h_vmem=
§  For Biowulf (PBS) (http://biowulf.nih.gov/user_guide.html#interactive):
qsub -I -V -l nodes=1
33
Cluster Architecture and Access
image from http://ainkaboot.co.uk/
regular.q interactive.q memLong.q
qrsh -q interactive.q
ssh username@hpc.niaid.nih.gov
Submit Node
34
1/13/15	
  
18	
  
Test TopHat Job in Interactive Session
§  TopHat is a short read aligner for RNA-seq data
§  Manual:
§  http://ccb.jhu.edu/software/tophat/manual.shtml
1.  Check dependencies (e.g., PATH)
2.  Check command syntax and options
3.  Run command with test dataset
35
Grid Engine Jobs: Batch / Serial
§  Single processor, one job
§  Submit a script to the cluster from the submit node,
“submit-1”
36
1/13/15	
  
19	
  
Cluster Architecture and Access
image from http://ainkaboot.co.uk/
regular.q interactive.q memLong.q
qsub -q memLong.q script.sh
OR
qsub script.sh
*No queue necessary*
ssh username@hpc.niaid.nih.gov
Submit Node
37
Text Editors for Composing Scripts (batch jobs)
§  Not the same as a word processor! e.g., Microsoft Word
§  Try some, choose a favorite
§  Popular for Windows:
•  Notepad++ (nice color-coding)
•  EditPad Lite (can open large files > 4Gb)
§  Popular for Mac:
•  TextWrangler
§  Popular for Terminal:
•  nano
•  vi
•  emacs
§  http://en.wikipedia.org/wiki/Comparison_of_text_editors
38
1/13/15	
  
20	
  
Quick Look at a Shell Script
Exercise 4:
cd ~/unix_hpc/test_data
cat test_serial.sh
§  A few things to notice:
•  #!/bin/bash
–  “shebang” or “hashbang,” used to specify the program to run for the script
•  qsub options (next slide)
•  export (used to set environmental variables)
•  PATH=/path/to/folder:/path/to/another/folder:$PATH
–  used to allow you to simply type the name of the executable instead of the
full path to the executable, e.g., type “tophat” instead of “/usr/local/
bio_apps/tophat/bin/tophat”
•  Comments about when you ran the job
•  Command for job
*PBS Script for Biowulf as well.
39
SGE qsub options
qsub [options] script.sh command to submit a job to the cluster
-S /bin/bash shell to use (default is csh)
-N job_name name for your job
-q queue.q queue(s) to submit to, e.g.,
memLong.q,memRegular.q
-M user@niaid.nih.gov email address to send alert to
-m abe when to send email (e.g., beginning, end, aborted)
-l resources resources to request, e.g.,
h_vmem=20G,h_cpu=1:00:00,mem_free=10G
-cwd run from current working directory. Output to here.
-j y join stderr and stdout into one
-pe threaded 10 parallel environment: “round” means processors
could be on separate machines, “threaded” all
processors on same machine. number of processors/
threads.
§  You can put these options on the command-line or in your shell
script
§  Lines with these options should begin with #$
40
1/13/15	
  
21	
  
Submitting jobs with PBS (Biowulf)
§  PBS options and examples for Biowulf:
•  http://biowulf.nih.gov/user_guide.html#batchsamp
§  Examples
•  qsub -I -V -l nodes=1
•  qsub -l nodes=1 myjob.bat
•  qsub -l nodes=8:o2800 myparalleljob
•  qsub -v np=3 -l nodes=2:g24:c24,mem=0 novompi.sh
§  Option lines start with #PBS instead of #$
§  Application-specific usage for Biowulf as well, e.g.,
41
Grid Engine Jobs: Batch / Serial
§  Submit a script to the cluster from the
submit node
Exercise 5:
cd ~/unix_hpc/test_data (remember to try tab
completion J)
qsub test_serial.sh
It should say “Your job XXXXXX ("tophat_test") has been
submitted” where XXXXXX is the job number.
ls –al
Do you see a file called tophat_test.oXXXXXX where
XXXXXX is your job number?
cat tophat_test.oXXXXXX (substitute job number for
XXXXXX)
42
1/13/15	
  
22	
  
Grid Engine Jobs: Parallel
§  pe commands (threaded, single, etc.)
§  Basic use in script:
#$ -pe threaded 8
§  Can also use advanced options, e.g.,
•  "-pe 12threaded 48" means use 12 cores per node, for a total
of 48 cores needed. This will allocate the job to run on 4 nodes
with 12 cores each. Your program must be able to support this
•  "-pe threaded 5-10" means run the job with 10 if available, but
down to 5 cores is fine too.
§  Do the math for memory!
•  h_vmem is not total, it’s per thread. E.g., if you have a job that
needs 10G total, running on 5 processors, you’ll assign
h_vmem=2G, not h_vmem=10G.
•  Let’s edit our script to make it run parallel…
43
Edit Shell Script in the Terminal with nano
Navigation in nano:
§  use arrow keys for up, down, left, right
§  Ctrl+a for beginning of line; Ctrl+e for end of line
§  Other commands at bottom of screen e.g., Ctrl+o, Ctrl+x
Exercise 6:
cd ~/unix_hpc/test_data
Make new script for parallel, open in nano
cp test_serial.sh test_parallel.sh
nano test_parallel.sh
Add line to script with SGE options
#$ -pe threaded 4
Modify tophat command
tophat -p 4 …
Save and close
Ctrl+o, [ENTER]
Ctrl+x
Now submit the jobs
qsub test_serial.sh
qsub test_parallel.sh
44
1/13/15	
  
23	
  
Monitoring Jobs
Exercise 7:
qsub test_tenminutes.sh
qstat check on submitted jobs
echo $LOGNAME check your username
qstat -u $LOGNAME check status or your jobs
qstat -u $LOGNAME -ext check resource usage, including memory
qstat -u $LOGNAME -ext -g t get extended details, including MASTER, SLAVE
nodes for parallel jobs
qstat -j job-ID get detailed information about your job status
qacct –j 999072 see info about a job after it was run
qalter [new qsub options] [job id] In case you want to change parameters while in
“qw” status
qdel –u username delete all of your submitted jobs
qdel jobnumber delete a single job
§  Websites
•  Cluster status:
http://hpcweb.niaid.nih.gov/#about?type=About%20Links&requestType=Cluster
%20Status
•  Current State: http://hpcwiki.niaid.nih.gov/index.php/Current_State
•  Ganglia toolkit: http://cluster.niaid.nih.gov/ganglia/
45
Contact Us
andrew.oler@nih.gov	
  
	
  
ScienceApps@niaid.nih.gov	
  
	
  
h5p://bioinforma;cs.niaid.nih.gov	
  
46
1/13/15	
  
24	
  
Example Script For SGE
#!/bin/bash
## SGE options (see man qsub for more options)
#$ -S /bin/bash #type of shell. default is csh
#$ -N tophat_test #name of job
#$ -q regular.q,memRegular.q #which queue to submit job to.
#$ -M andrewsgarbage@gmail.com #email address to send email to
#$ -m abe #when to send email: aborted, beginning, end
#$ -l h_vmem=5G,h_cpu=1:00:00 #resources (virtual memory, cpu time)
#$ -cwd #run the script from current working directory
#$ -j y #join stderr and stdout into one job_id.o file
## Script dependencies
#export the path for bowtie (tophat needs this)
export PATH=$PATH:/usr/local/bio_apps/bowtie
export PATH=$PATH:/usr/local/bio_apps/tophat/bin
export PATH=$PATH:/usr/local/bio_apps/samtools/
## Write comments (to make the future you happy)
# Ran tophat on the test dataset - andrew (111013)
#full path to tophat: /usr/local/bio_apps/tophat/bin/tophat
time tophat -r 20 test_ref reads_1.fq reads_2.fq
47
“hashbang,” to specify program used to run script
qsuboptions
export command for
setting environment
variables
command for job

More Related Content

What's hot

Quick Guide with Linux Command Line
Quick Guide with Linux Command LineQuick Guide with Linux Command Line
Quick Guide with Linux Command LineAnuchit Chalothorn
 
Course 102: Lecture 27: FileSystems in Linux (Part 2)
Course 102: Lecture 27: FileSystems in Linux (Part 2)Course 102: Lecture 27: FileSystems in Linux (Part 2)
Course 102: Lecture 27: FileSystems in Linux (Part 2)Ahmed El-Arabawy
 
101 3.2 process text streams using filters
101 3.2 process text streams using filters101 3.2 process text streams using filters
101 3.2 process text streams using filtersAcácio Oliveira
 
101 3.2 process text streams using filters
101 3.2 process text streams using filters101 3.2 process text streams using filters
101 3.2 process text streams using filtersAcácio Oliveira
 
Course 102: Lecture 10: Learning About the Shell
Course 102: Lecture 10: Learning About the Shell Course 102: Lecture 10: Learning About the Shell
Course 102: Lecture 10: Learning About the Shell Ahmed El-Arabawy
 
Introduction to linux
Introduction to linuxIntroduction to linux
Introduction to linuxQIANG XU
 
Course 102: Lecture 16: Process Management (Part 2)
Course 102: Lecture 16: Process Management (Part 2) Course 102: Lecture 16: Process Management (Part 2)
Course 102: Lecture 16: Process Management (Part 2) Ahmed El-Arabawy
 
Embedded Systems: Lecture 11: Introduction to Git & GitHub (Part 2)
Embedded Systems: Lecture 11: Introduction to Git & GitHub (Part 2)Embedded Systems: Lecture 11: Introduction to Git & GitHub (Part 2)
Embedded Systems: Lecture 11: Introduction to Git & GitHub (Part 2)Ahmed El-Arabawy
 
Next Generation Memory Forensics
Next Generation Memory ForensicsNext Generation Memory Forensics
Next Generation Memory ForensicsAndrew Case
 
Introduction to-linux
Introduction to-linuxIntroduction to-linux
Introduction to-linuxrowiebornia
 
Memory forensics
Memory forensicsMemory forensics
Memory forensicsSunil Kumar
 
Linux command for beginners
Linux command for beginnersLinux command for beginners
Linux command for beginnersSuKyeong Jang
 
Course 102: Lecture 8: Composite Commands
Course 102: Lecture 8: Composite Commands Course 102: Lecture 8: Composite Commands
Course 102: Lecture 8: Composite Commands Ahmed El-Arabawy
 
Course 102: Lecture 12: Basic Text Handling
Course 102: Lecture 12: Basic Text Handling Course 102: Lecture 12: Basic Text Handling
Course 102: Lecture 12: Basic Text Handling Ahmed El-Arabawy
 
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s going
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s goingKernel Recipes 2016 - Kernel documentation: what we have and where it’s going
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s goingAnne Nicolas
 

What's hot (20)

Quick Guide with Linux Command Line
Quick Guide with Linux Command LineQuick Guide with Linux Command Line
Quick Guide with Linux Command Line
 
Course 102: Lecture 27: FileSystems in Linux (Part 2)
Course 102: Lecture 27: FileSystems in Linux (Part 2)Course 102: Lecture 27: FileSystems in Linux (Part 2)
Course 102: Lecture 27: FileSystems in Linux (Part 2)
 
101 3.2 process text streams using filters
101 3.2 process text streams using filters101 3.2 process text streams using filters
101 3.2 process text streams using filters
 
101 3.2 process text streams using filters
101 3.2 process text streams using filters101 3.2 process text streams using filters
101 3.2 process text streams using filters
 
Linux And perl
Linux And perlLinux And perl
Linux And perl
 
Course 102: Lecture 10: Learning About the Shell
Course 102: Lecture 10: Learning About the Shell Course 102: Lecture 10: Learning About the Shell
Course 102: Lecture 10: Learning About the Shell
 
Introduction to linux
Introduction to linuxIntroduction to linux
Introduction to linux
 
Course 102: Lecture 16: Process Management (Part 2)
Course 102: Lecture 16: Process Management (Part 2) Course 102: Lecture 16: Process Management (Part 2)
Course 102: Lecture 16: Process Management (Part 2)
 
Linux Fundamentals
Linux FundamentalsLinux Fundamentals
Linux Fundamentals
 
Embedded Systems: Lecture 11: Introduction to Git & GitHub (Part 2)
Embedded Systems: Lecture 11: Introduction to Git & GitHub (Part 2)Embedded Systems: Lecture 11: Introduction to Git & GitHub (Part 2)
Embedded Systems: Lecture 11: Introduction to Git & GitHub (Part 2)
 
Linux commands
Linux commandsLinux commands
Linux commands
 
Unix
UnixUnix
Unix
 
Next Generation Memory Forensics
Next Generation Memory ForensicsNext Generation Memory Forensics
Next Generation Memory Forensics
 
Introduction to-linux
Introduction to-linuxIntroduction to-linux
Introduction to-linux
 
Memory forensics
Memory forensicsMemory forensics
Memory forensics
 
2015 bioinformatics bio_python
2015 bioinformatics bio_python2015 bioinformatics bio_python
2015 bioinformatics bio_python
 
Linux command for beginners
Linux command for beginnersLinux command for beginners
Linux command for beginners
 
Course 102: Lecture 8: Composite Commands
Course 102: Lecture 8: Composite Commands Course 102: Lecture 8: Composite Commands
Course 102: Lecture 8: Composite Commands
 
Course 102: Lecture 12: Basic Text Handling
Course 102: Lecture 12: Basic Text Handling Course 102: Lecture 12: Basic Text Handling
Course 102: Lecture 12: Basic Text Handling
 
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s going
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s goingKernel Recipes 2016 - Kernel documentation: what we have and where it’s going
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s going
 

Similar to UNIX Basics and Cluster Computing

Similar to UNIX Basics and Cluster Computing (20)

Introduction-to-Linux.pptx
Introduction-to-Linux.pptxIntroduction-to-Linux.pptx
Introduction-to-Linux.pptx
 
Introduction-to-Linux.pptx
Introduction-to-Linux.pptxIntroduction-to-Linux.pptx
Introduction-to-Linux.pptx
 
Introduction khgjkhygkjiyhgikjyhgikygkii
Introduction khgjkhygkjiyhgikjyhgikygkiiIntroduction khgjkhygkjiyhgikjyhgikygkii
Introduction khgjkhygkjiyhgikjyhgikygkii
 
Introduction to-linux
Introduction to-linuxIntroduction to-linux
Introduction to-linux
 
Linux admin interview questions
Linux admin interview questionsLinux admin interview questions
Linux admin interview questions
 
Linux or unix interview questions
Linux or unix interview questionsLinux or unix interview questions
Linux or unix interview questions
 
17 Linux Basics #burningkeyboards
17 Linux Basics #burningkeyboards17 Linux Basics #burningkeyboards
17 Linux Basics #burningkeyboards
 
Linux basic
Linux basicLinux basic
Linux basic
 
Linuxppt
LinuxpptLinuxppt
Linuxppt
 
Linux
LinuxLinux
Linux
 
Linuxppt
LinuxpptLinuxppt
Linuxppt
 
Linuxppt
LinuxpptLinuxppt
Linuxppt
 
Introduction to Unix
Introduction to UnixIntroduction to Unix
Introduction to Unix
 
Introduction to UNIX
Introduction to UNIXIntroduction to UNIX
Introduction to UNIX
 
Linux
LinuxLinux
Linux
 
Talk 160920 @ Cat System Workshop
Talk 160920 @ Cat System WorkshopTalk 160920 @ Cat System Workshop
Talk 160920 @ Cat System Workshop
 
unixtoolbox
unixtoolboxunixtoolbox
unixtoolbox
 
KCC_Final.pdf
KCC_Final.pdfKCC_Final.pdf
KCC_Final.pdf
 
Jana treek 4
Jana treek 4Jana treek 4
Jana treek 4
 
Introduction to the linux command line.pdf
Introduction to the linux command line.pdfIntroduction to the linux command line.pdf
Introduction to the linux command line.pdf
 

More from Bioinformatics and Computational Biosciences Branch

More from Bioinformatics and Computational Biosciences Branch (20)

Hong_Celine_ES_workshop.pptx
Hong_Celine_ES_workshop.pptxHong_Celine_ES_workshop.pptx
Hong_Celine_ES_workshop.pptx
 
Virus Sequence Alignment and Phylogenetic Analysis 2019
Virus Sequence Alignment and Phylogenetic Analysis 2019Virus Sequence Alignment and Phylogenetic Analysis 2019
Virus Sequence Alignment and Phylogenetic Analysis 2019
 
Nephele 2.0: How to get the most out of your Nephele results
Nephele 2.0: How to get the most out of your Nephele resultsNephele 2.0: How to get the most out of your Nephele results
Nephele 2.0: How to get the most out of your Nephele results
 
Introduction to METAGENOTE
Introduction to METAGENOTE Introduction to METAGENOTE
Introduction to METAGENOTE
 
Intro to homology modeling
Intro to homology modelingIntro to homology modeling
Intro to homology modeling
 
Protein fold recognition and ab_initio modeling
Protein fold recognition and ab_initio modelingProtein fold recognition and ab_initio modeling
Protein fold recognition and ab_initio modeling
 
Homology modeling: Modeller
Homology modeling: ModellerHomology modeling: Modeller
Homology modeling: Modeller
 
Protein docking
Protein dockingProtein docking
Protein docking
 
Protein function prediction
Protein function predictionProtein function prediction
Protein function prediction
 
Protein structure prediction with a focus on Rosetta
Protein structure prediction with a focus on RosettaProtein structure prediction with a focus on Rosetta
Protein structure prediction with a focus on Rosetta
 
Biological networks
Biological networksBiological networks
Biological networks
 
Statistical applications in GraphPad Prism
Statistical applications in GraphPad PrismStatistical applications in GraphPad Prism
Statistical applications in GraphPad Prism
 
Intro to JMP for statistics
Intro to JMP for statisticsIntro to JMP for statistics
Intro to JMP for statistics
 
Categorical models
Categorical modelsCategorical models
Categorical models
 
Better graphics in R
Better graphics in RBetter graphics in R
Better graphics in R
 
Automating biostatistics workflows using R-based webtools
Automating biostatistics workflows using R-based webtoolsAutomating biostatistics workflows using R-based webtools
Automating biostatistics workflows using R-based webtools
 
Overview of statistical tests: Data handling and data quality (Part II)
Overview of statistical tests: Data handling and data quality (Part II)Overview of statistical tests: Data handling and data quality (Part II)
Overview of statistical tests: Data handling and data quality (Part II)
 
Overview of statistics: Statistical testing (Part I)
Overview of statistics: Statistical testing (Part I)Overview of statistics: Statistical testing (Part I)
Overview of statistics: Statistical testing (Part I)
 
GraphPad Prism: Curve fitting
GraphPad Prism: Curve fittingGraphPad Prism: Curve fitting
GraphPad Prism: Curve fitting
 
Appendix: Crash course in R and BioConductor
Appendix: Crash course in R and BioConductorAppendix: Crash course in R and BioConductor
Appendix: Crash course in R and BioConductor
 

Recently uploaded

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 

Recently uploaded (20)

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 

UNIX Basics and Cluster Computing

  • 1. 1/13/15   1   Next-Generation Sequencing Analysis Series January 14, 2015 Andrew Oler, PhD High-throughput Sequencing Bioinformatics Specialist BCBB/OCICB/NIAID/NIH BCBB instructors for this NGS series Andrew Oler Vijay Nagarajan Mariam Quiñones 2 Bioinformatics and Computational Biosciences Branch NIH/NIAID/OD/OSMO/OCICB Contact BCBB at ScienceApps@niaid.nih.gov Contact HPC Cluster team at: Cluster_support@niaid.nih.gov
  • 2. 1/13/15   2   Bioinformatics and Computational Biosciences Branch §  Bioinformatics Software Developers §  Computational Biologists §  Project Managers & Analysts http://www.niaid.nih.gov/about/organization/odoffices/omo/ocicb/Pages/bcbb.aspx 3 Objectives When you leave today, I hope you will be able to 1.  Open a terminal and know how to navigate 2.  Know how to do basic file manipulation and create files and directories from the command line 3.  Submit a job to the HPC cluster To accomplish these goals, we will 1.  Learn the most useful Unix terminal commands 2.  Practice a few of these commands 3.  Practice preparing and submitting some scripts to the NIAID HPC Cluster Caveat: 1.  You may not be a Unix expert when you leave today (and that’s okay). 4
  • 3. 1/13/15   3   Anatomy of the Terminal, “Command Line”, or “Shell” Prompt (computer_name:current_directory username) Cursor Command Argument Window Output Mac: Applications -> Utilities -> Terminal Windows: Download open source software PuTTY http://www.chiark.greenend.org.uk/~sgtatham/putty/ Other SSH Clients (http://en.wikipedia.org/wiki/Comparison_of_SSH_clients) 6
  • 4. 1/13/15   4   File Manager/Browser by Operating System 7 OS: Windows Mac OSX Unix FM: Explorer Finder Shell Typical UNIX directory structure / “root” /bin essential binaries /etc system config /home user directories /home/USER1 USER1 home /home/USER2 USER2 home /mnt network drives /sbin system binaries /usr shared, read-only /usr/bin other binaries /usr/local installed packages /usr/local/bin installed binaries /var variable data /var/tmp program caches pwd “print working directory”; tells where you are 8
  • 5. 1/13/15   5   How to execute a command command argument output output 9 Some basic Unix commands §  pwd §  ls §  mkdir §  cd §  wget §  curl §  cp §  wc §  head §  tail §  less §  cat §  **See Pre-lecture worksheet.** 10
  • 6. 1/13/15   6   Tips to make life easier! Tab completion: hit Tab to make computer guess your filename. type: ls unix[Tab] result: ls unix_hpc If nothing happens on the first tab, press tab again… Up Arrow: recall the previous command(s) Ctrl+a go to beginning of line Ctrl+e go to end of line Ctrl+c kill current running process in terminal Aliases (put in ~/.bashrc file … see handout) alias ls='ls -AFG' alias ll='ls -lrhT' history show every command issued during the session !ls repeat the previous “ls” command !! repeat the previous command man [command] read the manual for the command man ls read the manual for the ls command 11 Accessing the NIAID HPC §  Login to HPC “submit node,” which is the computer from which you submit jobs. ssh secure shell, remote login ssh ngscXXX@hpc.niaid.nih.gov fill in XXX with number §  Copy  files  to/from  HPC   scp secure copy to remote location scp -r ~/data/dir username@hpc.niaid.nih.gov:~/data/ §  ssh  and  scp  will  prompt  you  to  enter  your  password   12
  • 7. 1/13/15   7   mv (“move file”) mv file1 temp/ move “file1” to the “temp” directory mv file1 file2 rename “file1” to “file2” mv -i file2 temp/file3 move “file2” to the “temp” directory and rename it “file3”; ask to make sure *without -i, it will overwrite an existing file!* mv *.fastq ~ move all “.fastq” files to the home directory Exercise 1: mv *.fastq temp/ (moveall “.fastq” files to the “temp” directory) ls temp (check that the files are there) Note: syntax for mv and cp are similar 13 rm (“remove file”) rm file1 delete “file1” rm -i file2 delete “file2”, but ask first rm *.pdb delete all “.pdb” files rm -r temp delete the “temp” directory rm -rf temp delete the “temp” directory, no questions asked! Be careful! rm -r * 14
  • 8. 1/13/15   8   File and system information wc file1 “word count”; output is “lines”, “words”, characters” wc *.fastq “word count” of all fastq files, including summary du -h temp “disk usage” (size) of each file in the “temp” directory (outputs a list) top report for local machine on the processes using the most system resources (memory, CPU, etc.); “q” to exit 15 File compression gzip temp/* compress every file in “temp”; adds .gz extension gunzip temp/*.gz expand every “gzipped” file in “temp” tar -zcvf myfiles.tar.gz temp/* create a single archive of every file in “temp” tar –xvf test_data.tar.gz copy every file out of the archive “tarball” ≠ 16
  • 9. 1/13/15   9   File manipulation cat file1 file2 > file3 write “file3”, containing first “file1”, then “file2” cat file1 >> file2 append “file1” onto “file2” sort file1 alphabetize “file1.txt” sort -n file1 sort “file1” by number sort -n -r -k 2 file1 sort “file1” by the second word or column in reverse numerical order Careful ! 17 grep (search within files) grep key file* report the file name and line where “key” appears in file* grep -v key file* report the file names of files that do not match “key” man grep see other functions of grep. (lots! regular expressions!) 18
  • 10. 1/13/15   10   Linking files (making “shortcuts”) ln -s ~/myapp/binary ~/bin make a shortcut (“symbolic link”) in “~/bin” that points to “~/myapp/ binary” ln -s /usr/local data make a shortcut in current directory pointing to /usr/local cd data takes you to /usr/local 19 Downloading Files wget download multiple files from ftp or http address curl download single files from ftp, http, sftp, etc. http://curl.haxx.se/docs/comparison-table.html 20
  • 11. 1/13/15   11   Pipelining ls | wc count the number of files in a directory grep | sort > file1 pull out searched-for lines, sort them, and write a new file Exercise 2: head –n 2000 lymph1k.fastq | gzip > head2K.txt.gz 21 Loops for assign a variable for each of a space-separated list of values ; Use to separate commands do done Marks start and end of loop to repeat Exercise 3: for i in 1 13 200; do echo $i; done 1 13 200 ls for i in file*; do echo $i; mv “$i” “${i}.txt”; done file1 file2 ls 22
  • 12. 1/13/15   12   Recommended Reading Linux in a Nutshell, Sixth Edition Ellen Siever, Stephen Figgins, Robert Love, Arnold Robbins Running Linux, 5th Edition Matthias Kalle Dalheimer, Matt Welsh UNIX® Shells by Example, Fourth Edition Ellie Quigley 23 Take Away Use  mnemonics     Read  “man”  pages     Work  on  copies,  make  backups,  and  use  “rm”,  “mv”,  and  “>”   carefully     Pick  a  text  editor  and  master  it  (pico/nano,  emacs,  vi/vim,  etc.)     Be  clever!   Questions? 24
  • 13. 1/13/15   13   Using NIAID Grid Engine Cluster 25 High Performance Computing §  “A computer cluster consists of a set of loosely connected computers that work together so that in many respects they can be viewed as a single system.” http://en.wikipedia.org/wiki/Cluster_%28computing%29 26
  • 14. 1/13/15   14   HPC Glossary §  node individual workstation within a network or cluster; a collection of processors all accessing the same memory (RAM). §  CPU abbreviation for central processing unit. The processor of a node. Also referred to as a socket. •  try cat /proc/cpuinfo •  Note that “processors” in the output are actually “cores” by the definition below §  core separate execution core for calculations. e.g., “dual-core” means the processor has two cores. Sometimes each core is referred to as a separate processor. §  slot a single core available for use within a node. e.g., if a node has 16 cores, it will have 16 slots. §  hyper-threaded technology (HTT) Where a single execution core is treated as being two virtual cores (or two logical processors) by the system. Some of the nodes in the cluster have HTT. E.g., if there are 16 physical cores, there would be 32 logical processors. §  thread a single process of a multi-process job. Each thread runs on a separate logical processor. E.g., if you run tophat with -p 10, 10 threads will be created and run in parallel. These definitions are somewhat flexible… 27 Accessing the NIAID HPC §  Request an HPC account (for NIAID members and collaborators only) •  https://hpcweb.niaid.nih.gov/#home •  “Request Account” §  Login to HPC “submit node,” which is the computer from which you submit jobs. ssh secure shell, remote login ssh username@hpc.niaid.nih.gov §  Copy  files  to/from  HPC   scp secure copy to remote location scp -r ~/data/dir username@hpc.niaid.nih.gov:~/data/ §  ssh  and  scp  will  prompt  you  to  enter  your  password   28
  • 15. 1/13/15   15   Mounting HPC Drives 29 Mac Windows 1.  Click "Start" > "Computer” 2.  Click on "Map Network Drive”. 3.  Choose an available drive letter. 4.  Enter ai-hpcfileserver.niaid.nih.govbcbb in the “Folder” field, replacing “bcbb” with your group name or your user name. (For more details, see link to FAQ below.) https://hpcweb.niaid.nih.gov/#support?type=Links&requestType=HPC%20FAQs&name=41 Cluster Architecture and Access image modified from http://ainkaboot.co.uk/ regular.q interactive.q memLong.q qrsh -q interactive.q qsub -q memLong.q ssh username@hpc.niaid.nih.gov Submit Node 30
  • 16. 1/13/15   16   Cluster Queue System: Sun Grid Engine §  Computers have Linux Red Hat Operating System §  Grid Engine is a batch queuing system §  Other queuing systems (http://en.wikipedia.org/wiki/Job_scheduler): •  Portable Batch System (PBS) (e.g., Biowulf) •  TORQUE Resource Manager •  Maui •  Moab •  others… •  Each will require a slightly different syntax for scripts §  Comes with a set of commands to communicate with the cluster §  Monitors available resources and users’ workloads to start jobs at the appropriate time 31 Grid Engine jobs §  Three types of jobs •  Batch/Serial (one node, one processor) •  Parallel (multiple processors or nodes) •  Interactive 32 Input Process Output Input Process Output Process Process
  • 17. 1/13/15   17   Grid Engine Jobs: Interactive §  Login to a node like ssh qrsh -l h_vmem=20G §  Need to specify parameters -l requested resources in space-delimited list •  For interactive job: h_vmem= §  For Biowulf (PBS) (http://biowulf.nih.gov/user_guide.html#interactive): qsub -I -V -l nodes=1 33 Cluster Architecture and Access image from http://ainkaboot.co.uk/ regular.q interactive.q memLong.q qrsh -q interactive.q ssh username@hpc.niaid.nih.gov Submit Node 34
  • 18. 1/13/15   18   Test TopHat Job in Interactive Session §  TopHat is a short read aligner for RNA-seq data §  Manual: §  http://ccb.jhu.edu/software/tophat/manual.shtml 1.  Check dependencies (e.g., PATH) 2.  Check command syntax and options 3.  Run command with test dataset 35 Grid Engine Jobs: Batch / Serial §  Single processor, one job §  Submit a script to the cluster from the submit node, “submit-1” 36
  • 19. 1/13/15   19   Cluster Architecture and Access image from http://ainkaboot.co.uk/ regular.q interactive.q memLong.q qsub -q memLong.q script.sh OR qsub script.sh *No queue necessary* ssh username@hpc.niaid.nih.gov Submit Node 37 Text Editors for Composing Scripts (batch jobs) §  Not the same as a word processor! e.g., Microsoft Word §  Try some, choose a favorite §  Popular for Windows: •  Notepad++ (nice color-coding) •  EditPad Lite (can open large files > 4Gb) §  Popular for Mac: •  TextWrangler §  Popular for Terminal: •  nano •  vi •  emacs §  http://en.wikipedia.org/wiki/Comparison_of_text_editors 38
  • 20. 1/13/15   20   Quick Look at a Shell Script Exercise 4: cd ~/unix_hpc/test_data cat test_serial.sh §  A few things to notice: •  #!/bin/bash –  “shebang” or “hashbang,” used to specify the program to run for the script •  qsub options (next slide) •  export (used to set environmental variables) •  PATH=/path/to/folder:/path/to/another/folder:$PATH –  used to allow you to simply type the name of the executable instead of the full path to the executable, e.g., type “tophat” instead of “/usr/local/ bio_apps/tophat/bin/tophat” •  Comments about when you ran the job •  Command for job *PBS Script for Biowulf as well. 39 SGE qsub options qsub [options] script.sh command to submit a job to the cluster -S /bin/bash shell to use (default is csh) -N job_name name for your job -q queue.q queue(s) to submit to, e.g., memLong.q,memRegular.q -M user@niaid.nih.gov email address to send alert to -m abe when to send email (e.g., beginning, end, aborted) -l resources resources to request, e.g., h_vmem=20G,h_cpu=1:00:00,mem_free=10G -cwd run from current working directory. Output to here. -j y join stderr and stdout into one -pe threaded 10 parallel environment: “round” means processors could be on separate machines, “threaded” all processors on same machine. number of processors/ threads. §  You can put these options on the command-line or in your shell script §  Lines with these options should begin with #$ 40
  • 21. 1/13/15   21   Submitting jobs with PBS (Biowulf) §  PBS options and examples for Biowulf: •  http://biowulf.nih.gov/user_guide.html#batchsamp §  Examples •  qsub -I -V -l nodes=1 •  qsub -l nodes=1 myjob.bat •  qsub -l nodes=8:o2800 myparalleljob •  qsub -v np=3 -l nodes=2:g24:c24,mem=0 novompi.sh §  Option lines start with #PBS instead of #$ §  Application-specific usage for Biowulf as well, e.g., 41 Grid Engine Jobs: Batch / Serial §  Submit a script to the cluster from the submit node Exercise 5: cd ~/unix_hpc/test_data (remember to try tab completion J) qsub test_serial.sh It should say “Your job XXXXXX ("tophat_test") has been submitted” where XXXXXX is the job number. ls –al Do you see a file called tophat_test.oXXXXXX where XXXXXX is your job number? cat tophat_test.oXXXXXX (substitute job number for XXXXXX) 42
  • 22. 1/13/15   22   Grid Engine Jobs: Parallel §  pe commands (threaded, single, etc.) §  Basic use in script: #$ -pe threaded 8 §  Can also use advanced options, e.g., •  "-pe 12threaded 48" means use 12 cores per node, for a total of 48 cores needed. This will allocate the job to run on 4 nodes with 12 cores each. Your program must be able to support this •  "-pe threaded 5-10" means run the job with 10 if available, but down to 5 cores is fine too. §  Do the math for memory! •  h_vmem is not total, it’s per thread. E.g., if you have a job that needs 10G total, running on 5 processors, you’ll assign h_vmem=2G, not h_vmem=10G. •  Let’s edit our script to make it run parallel… 43 Edit Shell Script in the Terminal with nano Navigation in nano: §  use arrow keys for up, down, left, right §  Ctrl+a for beginning of line; Ctrl+e for end of line §  Other commands at bottom of screen e.g., Ctrl+o, Ctrl+x Exercise 6: cd ~/unix_hpc/test_data Make new script for parallel, open in nano cp test_serial.sh test_parallel.sh nano test_parallel.sh Add line to script with SGE options #$ -pe threaded 4 Modify tophat command tophat -p 4 … Save and close Ctrl+o, [ENTER] Ctrl+x Now submit the jobs qsub test_serial.sh qsub test_parallel.sh 44
  • 23. 1/13/15   23   Monitoring Jobs Exercise 7: qsub test_tenminutes.sh qstat check on submitted jobs echo $LOGNAME check your username qstat -u $LOGNAME check status or your jobs qstat -u $LOGNAME -ext check resource usage, including memory qstat -u $LOGNAME -ext -g t get extended details, including MASTER, SLAVE nodes for parallel jobs qstat -j job-ID get detailed information about your job status qacct –j 999072 see info about a job after it was run qalter [new qsub options] [job id] In case you want to change parameters while in “qw” status qdel –u username delete all of your submitted jobs qdel jobnumber delete a single job §  Websites •  Cluster status: http://hpcweb.niaid.nih.gov/#about?type=About%20Links&requestType=Cluster %20Status •  Current State: http://hpcwiki.niaid.nih.gov/index.php/Current_State •  Ganglia toolkit: http://cluster.niaid.nih.gov/ganglia/ 45 Contact Us andrew.oler@nih.gov     ScienceApps@niaid.nih.gov     h5p://bioinforma;cs.niaid.nih.gov   46
  • 24. 1/13/15   24   Example Script For SGE #!/bin/bash ## SGE options (see man qsub for more options) #$ -S /bin/bash #type of shell. default is csh #$ -N tophat_test #name of job #$ -q regular.q,memRegular.q #which queue to submit job to. #$ -M andrewsgarbage@gmail.com #email address to send email to #$ -m abe #when to send email: aborted, beginning, end #$ -l h_vmem=5G,h_cpu=1:00:00 #resources (virtual memory, cpu time) #$ -cwd #run the script from current working directory #$ -j y #join stderr and stdout into one job_id.o file ## Script dependencies #export the path for bowtie (tophat needs this) export PATH=$PATH:/usr/local/bio_apps/bowtie export PATH=$PATH:/usr/local/bio_apps/tophat/bin export PATH=$PATH:/usr/local/bio_apps/samtools/ ## Write comments (to make the future you happy) # Ran tophat on the test dataset - andrew (111013) #full path to tophat: /usr/local/bio_apps/tophat/bin/tophat time tophat -r 20 test_ref reads_1.fq reads_2.fq 47 “hashbang,” to specify program used to run script qsuboptions export command for setting environment variables command for job