Files, directories, editing and pipes
NGS Analysis on Beocat and an introduction
to Perl programming for Bioinformatics 20...
Before class
Please read through the following pages and install the software
listed on these pages onto your laptop befor...
Logging in
• Use the program “ssh” an OpenSSH SSH client (remote login
program) to log into Beocat!
• You will not see tex...
Terminal
Software carpentry v.5 http://software-carpentry.org/v5/gloss.html
Terminal
• We are now connected to Beocat using a command-line
interface (CLI). A CLI is an interface based on typing
comm...
Terminal
• We are now connected to Beocat using a command-line
interface (CLI). A CLI is an interface based on typing
comm...
Terminal
• We are now connected to Beocat using a command-line
interface (CLI). A CLI is an interface based on typing
comm...
Terminal
• We are now connected to Beocat using a command-line
interface (CLI). A CLI is an interface based on typing
comm...
Terminal
• We are now connected to Beocat using a command-line
interface (CLI). A CLI is an interface based on typing
comm...
Shell
• shell: A command-line
interface such as Bash (the
Bourne-Again Shell) or the
Microsoft Windows DOS
shell that allo...
Shell
shell
User
$ ps -p $$
PID TTY TIME CMD
63825 ttys002 0:00.04 -bash
Shell
shell
User
$ ps -p $$
PID TTY TIME CMD
63825 ttys002 0:00.04 -bash
“process
status”
program
Shell
shell
User
$ ps -p $$
PID TTY TIME CMD
63825 ttys002 0:00.04 -bash
“process
status”
program
PID
parameter
Shell
shell
User
$ ps -p $$
PID TTY TIME CMD
63825 ttys002 0:00.04 -bash
Current
process
“process
status”
program
PID
para...
Shell
shell
User
$ ps -p $$
PID TTY TIME CMD
63825 ttys002 0:00.04 -bash
Current
process
“process
status”
program
PID
para...
Shell
shell
User
$ whoami
bioinfo
Shell
shell
User
$ whoami
bioinfo
“whoami”
program
Shell
shell
User
$ whoami
bioinfo
“whoami”
program
User ID
Files and directories
$ pwd
/homes/bioinfo
Files and directories
$ pwd
/homes/bioinfo
“pwd” or print
working
directory
program
Files and directories
$ pwd
/homes/bioinfo
“pwd” or print
working
directory
program
Current
working
directory
Files and directories
$ pwd
/homes/bioinfo
“pwd” or print
working
directory
program
root
/
Current
working
directory
Files and directories
$ pwd
/homes/bioinfo
“pwd” or print
working
directory
program
root
/
tmp homes bin
Current
working
d...
Files and directories
$ pwd
/homes/bioinfo
“pwd” or print
working
directory
program
root
/
tmp homes bin
user1 bioinfo use...
Files and directories
$ ln -s /homes/bioinfo/pipeline_datasets/ ./
$ ls
pipeline_datasets@
$ ls pipeline_datasets/RNA-SeqA...
Files and directories
$ ln -s /homes/bioinfo/pipeline_datasets/ ./
$ ls
pipeline_datasets@
$ ls pipeline_datasets/RNA-SeqA...
Files and directories
$ ln -s /homes/bioinfo/pipeline_datasets/ ./
$ ls
pipeline_datasets@
$ ls pipeline_datasets/RNA-SeqA...
Files and directories
$ ln -s /homes/bioinfo/pipeline_datasets/ ./
$ ls
pipeline_datasets@
$ ls pipeline_datasets/RNA-SeqA...
Relative paths
$ ls
/homes/bioinfo
$ ls ../../bin
ls
ln
rm
mkdir…
$ ls ../bioinfo/bioinfo_software
cufflinks@
tophat@
samt...
Relative paths
$ ls
/homes/bioinfo
$ ls ../../bin
ls
ln
rm
mkdir…
$ ls ../bioinfo/bioinfo_software
cufflinks@
tophat@
samt...
Relative paths
$ ls
/homes/bioinfo
$ ls ../../bin
ls
ln
rm
mkdir…
$ ls ../bioinfo/bioinfo_software
cufflinks@
tophat@
samt...
Relative paths
$ ls
/homes/bioinfo
$ ls ../../bin
ls
ln
rm
mkdir…
$ ls ../bioinfo/bioinfo_software
cufflinks@
tophat@
samt...
Relative paths
$ ls
/homes/bioinfo
$ ls ../../bin
ls
ln
rm
mkdir…
$ ls ../bioinfo/bioinfo_software
cufflinks@
tophat@
samt...
Navigate and create directories
$ cd ~/pipeline_datasets/RNA-SeqAlign2Ref
$ ls
sample_read_list.txt*
Galaxy5-brain_2.fastq...
Navigate and create directories
“touch” creates files!
“rm” deletes files!
or use cyberduck
Navigate and create directories
“touch” creates files!
“rm” deletes files!
“nano” is a commandline file editor!
or use cyberd...
Navigate and create directories
“touch” creates files!
“rm” deletes files!
“nano” is a commandline file editor!
or use cyberd...
Move files or directories
$ mv ~/pipeline_datasets/test.txt ~/test.txt
$ ls ~
test.txt…
“mv” move files or directories to a ...
Unix wildcards and head/tail
$ ls ~/pipeline_datasets/RNA-SeqAlign2Ref/*.fastq
pipeline_datasets/RNA-SeqAlign2Ref/Galaxy5-...
Common bioinformatics file formats
@ERR030881.107 HWI-BRUNOP16X_0001:2:1:13663:1096#0/1
ATCTTTTGTGGCTACAGTAAGTTCAATCTGAAGTC...
Common bioinformatics file formats
!HWUSI-EAS1794_0001_FC61KOJ:5:110:7624:5467#0 99 Locus_126_Transcript_1 6319 1 50M = 647...
Pipes
Standard!
input Stdin
!
Software carpentry v.4 http://software-carpentry.org/v4/shell
Pipes
Standard!
input Stdin
Standard!
input Stdin
“|” passes output from some kinds of programs as input to other
programs...
Pipes
!
$ cd ~/pipeline_datasets/RNA-SeqAlign2Ref
$ wc -l *.fastq > lines
wc
lines
!
Software carpentry v.4 http://softwar...
Pipes
!
$ wc -l *.fastq | sort > lines
wc sort
lines
!
Software carpentry v.4 http://software-carpentry.org/v4/shell
Pipes
!
$ wc -l *.fastq | sort | head -1 > lines
lines
wc sort head -1
!
Software carpentry v.4 http://software-carpentry....
Pipes and grep
!
$ wc -l *.fastq | sort | head -1 > lines
Pipes and grep
This programming model called pipes and filters.
!
$ wc -l *.fastq | sort | head -1 > lines
Pipes and grep
This programming model called pipes and filters.
!
$ wc -l *.fastq | sort | head -1 > lines
Pipes and grep
This programming model called pipes and filters.
A filter transforms a stream of input into a stream of outpu...
Pipes and grep
This programming model called pipes and filters.
A filter transforms a stream of input into a stream of outpu...
Pipes and grep
This programming model called pipes and filters.
A filter transforms a stream of input into a stream of outpu...
Pipes and grep
This programming model called pipes and filters.
A filter transforms a stream of input into a stream of outpu...
Pipes and grep
This programming model called pipes and filters.
A filter transforms a stream of input into a stream of outpu...
Pipes and grep
This programming model called pipes and filters.
A filter transforms a stream of input into a stream of outpu...
Pipes and grep
$ cd ~/pipeline_datasets/sam_bam
!
$ /homes/bioinfo/bioinfo_software/samtools/samtools cat
brain_rep_1_toph...
Pipes and grep
“|” passes output from some kinds of programs as input to other
programs to chain together steps
$ cd ~/pip...
Pipes and grep
“|” passes output from some kinds of programs as input to other
programs to chain together steps
“-” tells ...
Pipes and grep
“|” passes output from some kinds of programs as input to other
programs to chain together steps
“-” tells ...
Pipes and grep
“|” passes output from some kinds of programs as input to other
programs to chain together steps
“-” tells ...
Pipes with samtools
!
$ /homes/bioinfo/bioinfo_software/samtools/samtools
https://www.biostars.org/p/43677/!
!
http://samt...
Review Unix
ps -p $$ process status for the process id of the current shell
pwd print working directory
ln -s create link ...
Review NGS
samtools cat concatenate BAMs
samtools flagstat simple stats
samtools view SAM<->BAM conversion
samtools sort So...
Upcoming SlideShare
Loading in …5
×

Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for Bioinformatics 2014

3,111 views
3,021 views

Published on

Files, directories, editing and pipes.

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
3,111
On SlideShare
0
From Embeds
0
Number of Embeds
2,587
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for Bioinformatics 2014

  1. 1. Files, directories, editing and pipes NGS Analysis on Beocat and an introduction to Perl programming for Bioinformatics 2014! ! Jennifer Shelton
  2. 2. Before class Please read through the following pages and install the software listed on these pages onto your laptop before coming to class:! ! https://github.com/i5K-KINBRE-script-share/FAQ/blob/master/ UsingBeocat.md! ! https://github.com/i5K-KINBRE-script-share/FAQ/blob/master/ BeocatEditingTransferingFiles.md
  3. 3. Logging in • Use the program “ssh” an OpenSSH SSH client (remote login program) to log into Beocat! • You will not see text as you type your password $ ssh EID@beocat.cis.ksu.edu password:
  4. 4. Terminal Software carpentry v.5 http://software-carpentry.org/v5/gloss.html
  5. 5. Terminal • We are now connected to Beocat using a command-line interface (CLI). A CLI is an interface based on typing commands, usually at a read-eval-print loop (REPL). Software carpentry v.5 http://software-carpentry.org/v5/gloss.html
  6. 6. Terminal • We are now connected to Beocat using a command-line interface (CLI). A CLI is an interface based on typing commands, usually at a read-eval-print loop (REPL). Software carpentry v.5 http://software-carpentry.org/v5/gloss.html
  7. 7. Terminal • We are now connected to Beocat using a command-line interface (CLI). A CLI is an interface based on typing commands, usually at a read-eval-print loop (REPL). • A read-eval-print loop (REPL) is a command-line interface that reads a command from the user, executes it, prints the result, and waits for another command. Software carpentry v.5 http://software-carpentry.org/v5/gloss.html
  8. 8. Terminal • We are now connected to Beocat using a command-line interface (CLI). A CLI is an interface based on typing commands, usually at a read-eval-print loop (REPL). • A read-eval-print loop (REPL) is a command-line interface that reads a command from the user, executes it, prints the result, and waits for another command. Software carpentry v.5 http://software-carpentry.org/v5/gloss.html
  9. 9. Terminal • We are now connected to Beocat using a command-line interface (CLI). A CLI is an interface based on typing commands, usually at a read-eval-print loop (REPL). • A read-eval-print loop (REPL) is a command-line interface that reads a command from the user, executes it, prints the result, and waits for another command. • A graphical user interface (GUI) is a graphical user interface, usually controlled by using a mouse. Software carpentry v.5 http://software-carpentry.org/v5/gloss.html
  10. 10. Shell • shell: A command-line interface such as Bash (the Bourne-Again Shell) or the Microsoft Windows DOS shell that allows a user to interact with the operating system. shell User Software carpentry v.5 http://software-carpentry.org/v5/gloss.html! Software carpentry v.4 http://software-carpentry.org/v4/shell
  11. 11. Shell shell User $ ps -p $$ PID TTY TIME CMD 63825 ttys002 0:00.04 -bash
  12. 12. Shell shell User $ ps -p $$ PID TTY TIME CMD 63825 ttys002 0:00.04 -bash “process status” program
  13. 13. Shell shell User $ ps -p $$ PID TTY TIME CMD 63825 ttys002 0:00.04 -bash “process status” program PID parameter
  14. 14. Shell shell User $ ps -p $$ PID TTY TIME CMD 63825 ttys002 0:00.04 -bash Current process “process status” program PID parameter
  15. 15. Shell shell User $ ps -p $$ PID TTY TIME CMD 63825 ttys002 0:00.04 -bash Current process “process status” program PID parameter Name of the current shell
  16. 16. Shell shell User $ whoami bioinfo
  17. 17. Shell shell User $ whoami bioinfo “whoami” program
  18. 18. Shell shell User $ whoami bioinfo “whoami” program User ID
  19. 19. Files and directories $ pwd /homes/bioinfo
  20. 20. Files and directories $ pwd /homes/bioinfo “pwd” or print working directory program
  21. 21. Files and directories $ pwd /homes/bioinfo “pwd” or print working directory program Current working directory
  22. 22. Files and directories $ pwd /homes/bioinfo “pwd” or print working directory program root / Current working directory
  23. 23. Files and directories $ pwd /homes/bioinfo “pwd” or print working directory program root / tmp homes bin Current working directory
  24. 24. Files and directories $ pwd /homes/bioinfo “pwd” or print working directory program root / tmp homes bin user1 bioinfo user2 Current working directory
  25. 25. Files and directories $ ln -s /homes/bioinfo/pipeline_datasets/ ./ $ ls pipeline_datasets@ $ ls pipeline_datasets/RNA-SeqAlign2Ref/ sample_read_list.txt* Galaxy5-brain_2.fastq* Galaxy4-brain_1.fastq* Galaxy3-adrenal_2.fastq* Galaxy2-adrenal_1.fastq* Galaxy1- iGenomes_UCSC_hg19_chr19_gene_annotation.gtf* hg19.fa* “ln” or link program with the -s parameter for symbolic! “ls” list directory contents RNA-SeqAlign2Ref AssembleT pipeline_datasets sample_read_list.txt*! Galaxy5-brain_2.fastq*! Galaxy4-brain_1.fastq*! Galaxy3-adrenal_2.fastq*! Galaxy2-adrenal_1.fastq*! Galaxy1- iGenomes_UCSC_hg19_c hr19_gene_annotation.gtf*! hg19.fa* notes.txt notes.txt notes.txt notes.txt notes.txt notes.txt notes.txt notes.txt
  26. 26. Files and directories $ ln -s /homes/bioinfo/pipeline_datasets/ ./ $ ls pipeline_datasets@ $ ls pipeline_datasets/RNA-SeqAlign2Ref/ sample_read_list.txt* Galaxy5-brain_2.fastq* Galaxy4-brain_1.fastq* Galaxy3-adrenal_2.fastq* Galaxy2-adrenal_1.fastq* Galaxy1- iGenomes_UCSC_hg19_chr19_gene_annotation.gtf* hg19.fa* “ln” or link program with the -s parameter for symbolic! “ls” list directory contents RNA-SeqAlign2Ref AssembleT pipeline_datasets sample_read_list.txt*! Galaxy5-brain_2.fastq*! Galaxy4-brain_1.fastq*! Galaxy3-adrenal_2.fastq*! Galaxy2-adrenal_1.fastq*! Galaxy1- iGenomes_UCSC_hg19_c hr19_gene_annotation.gtf*! hg19.fa* notes.txt notes.txt notes.txt notes.txt notes.txt notes.txt notes.txt notes.txt
  27. 27. Files and directories $ ln -s /homes/bioinfo/pipeline_datasets/ ./ $ ls pipeline_datasets@ $ ls pipeline_datasets/RNA-SeqAlign2Ref/ sample_read_list.txt* Galaxy5-brain_2.fastq* Galaxy4-brain_1.fastq* Galaxy3-adrenal_2.fastq* Galaxy2-adrenal_1.fastq* Galaxy1- iGenomes_UCSC_hg19_chr19_gene_annotation.gtf* hg19.fa* “ln” or link program with the -s parameter for symbolic! “ls” list directory contents RNA-SeqAlign2Ref AssembleT pipeline_datasets sample_read_list.txt*! Galaxy5-brain_2.fastq*! Galaxy4-brain_1.fastq*! Galaxy3-adrenal_2.fastq*! Galaxy2-adrenal_1.fastq*! Galaxy1- iGenomes_UCSC_hg19_c hr19_gene_annotation.gtf*! hg19.fa* notes.txt notes.txt notes.txt notes.txt notes.txt notes.txt notes.txt notes.txt
  28. 28. Files and directories $ ln -s /homes/bioinfo/pipeline_datasets/ ./ $ ls pipeline_datasets@ $ ls pipeline_datasets/RNA-SeqAlign2Ref/ sample_read_list.txt* Galaxy5-brain_2.fastq* Galaxy4-brain_1.fastq* Galaxy3-adrenal_2.fastq* Galaxy2-adrenal_1.fastq* Galaxy1- iGenomes_UCSC_hg19_chr19_gene_annotation.gtf* hg19.fa* “ln” or link program with the -s parameter for symbolic! “ls” list directory contents RNA-SeqAlign2Ref AssembleT pipeline_datasets sample_read_list.txt*! Galaxy5-brain_2.fastq*! Galaxy4-brain_1.fastq*! Galaxy3-adrenal_2.fastq*! Galaxy2-adrenal_1.fastq*! Galaxy1- iGenomes_UCSC_hg19_c hr19_gene_annotation.gtf*! hg19.fa* notes.txt notes.txt notes.txt notes.txt notes.txt notes.txt notes.txt notes.txt
  29. 29. Relative paths $ ls /homes/bioinfo $ ls ../../bin ls ln rm mkdir… $ ls ../bioinfo/bioinfo_software cufflinks@ tophat@ samtools@… $ ls ~/pipeline_datasets Galaxy5-brain_2.fastq* Galaxy4-brain_1.fastq*… root / tmp homes bin user1 bioinfo user2 “ls” list directory contents! .. one directory up from the current working directory! . current working directory! ~ home directory
  30. 30. Relative paths $ ls /homes/bioinfo $ ls ../../bin ls ln rm mkdir… $ ls ../bioinfo/bioinfo_software cufflinks@ tophat@ samtools@… $ ls ~/pipeline_datasets Galaxy5-brain_2.fastq* Galaxy4-brain_1.fastq*… root / tmp homes bin user1 bioinfo user2 “ls” list directory contents! .. one directory up from the current working directory! . current working directory! ~ home directory
  31. 31. Relative paths $ ls /homes/bioinfo $ ls ../../bin ls ln rm mkdir… $ ls ../bioinfo/bioinfo_software cufflinks@ tophat@ samtools@… $ ls ~/pipeline_datasets Galaxy5-brain_2.fastq* Galaxy4-brain_1.fastq*… root / tmp homes bin user1 bioinfo user2 “ls” list directory contents! .. one directory up from the current working directory! . current working directory! ~ home directory
  32. 32. Relative paths $ ls /homes/bioinfo $ ls ../../bin ls ln rm mkdir… $ ls ../bioinfo/bioinfo_software cufflinks@ tophat@ samtools@… $ ls ~/pipeline_datasets Galaxy5-brain_2.fastq* Galaxy4-brain_1.fastq*… root / tmp homes bin user1 bioinfo user2 “ls” list directory contents! .. one directory up from the current working directory! . current working directory! ~ home directory
  33. 33. Relative paths $ ls /homes/bioinfo $ ls ../../bin ls ln rm mkdir… $ ls ../bioinfo/bioinfo_software cufflinks@ tophat@ samtools@… $ ls ~/pipeline_datasets Galaxy5-brain_2.fastq* Galaxy4-brain_1.fastq*… root / tmp homes bin user1 bioinfo user2 “ls” list directory contents! .. one directory up from the current working directory! . current working directory! ~ home directory
  34. 34. Navigate and create directories $ cd ~/pipeline_datasets/RNA-SeqAlign2Ref $ ls sample_read_list.txt* Galaxy5-brain_2.fastq* Galaxy4-brain_1.fastq* Galaxy3-adrenal_2.fastq* Galaxy2-adrenal_1.fastq* Galaxy1-iGenomes_UCSC_hg19_chr19_gene_annotation.gtf* hg19.fa* $ pwd /homes/bioinfo/pipeline_datasets/RNA-SeqAlign2Ref $ mkdir test $ ls test… “cd” change directories! “mkdir” make directories
  35. 35. Navigate and create directories “touch” creates files! “rm” deletes files! or use cyberduck
  36. 36. Navigate and create directories “touch” creates files! “rm” deletes files! “nano” is a commandline file editor! or use cyberduck! ! Software carpentry v.5 http://software-carpentry.org/v5/gloss.html! Software carpentry v.4 http://software-carpentry.org/v4/shell
  37. 37. Navigate and create directories “touch” creates files! “rm” deletes files! “nano” is a commandline file editor! or use cyberduck! ! Software carpentry v.5 http://software-carpentry.org/v5/gloss.html! Software carpentry v.4 http://software-carpentry.org/v4/shell
  38. 38. Move files or directories $ mv ~/pipeline_datasets/test.txt ~/test.txt $ ls ~ test.txt… “mv” move files or directories to a new location
  39. 39. Unix wildcards and head/tail $ ls ~/pipeline_datasets/RNA-SeqAlign2Ref/*.fastq pipeline_datasets/RNA-SeqAlign2Ref/Galaxy5-brain_2.fastq* pipeline_datasets/RNA-SeqAlign2Ref/Galaxy4-brain_1.fastq* pipeline_datasets/RNA-SeqAlign2Ref/Galaxy3-adrenal_2.fastq* pipeline_datasets/RNA-SeqAlign2Ref/Galaxy2-adrenal_1.fastq* $ head ~/pipeline_datasets/RNA-SeqAlign2Ref/*.fastq ==> pipeline_datasets/RNA-SeqAlign2Ref/Galaxy2-adrenal_1.fastq <== @ERR030881.107 HWI-BRUNOP16X_0001:2:1:13663:1096#0/1 ATCTTTTGTGGCTACAGTAAGTTCAATCTGAAGTCAAAACCAACCAATTT + 5.544,444344555CC?CAEF@EEFFFFFFFFFFFFFFFFFEFFFEFFF… “*” any character 0 or 1 times (can be used with most basic Unix commands)! “head” prints first 4 lines of a file “tail” prints the last
  40. 40. Common bioinformatics file formats @ERR030881.107 HWI-BRUNOP16X_0001:2:1:13663:1096#0/1 ATCTTTTGTGGCTACAGTAAGTTCAATCTGAAGTCAAAACCAACCAATTT + 5.544,444344555CC?CAEF@EEFFFFFFFFFFFFFFFFFEFFFEFFF Fastq: sequence data with quality scores. Four lines per entry header line, sequence, second header or +, base quality scores. http://en.wikipedia.org/wiki/FASTQ_format >Locus_1_Transcript_2/3_Confidence_0.333_Length_600 CCCCCCTTCAGTTCCCTTAAAGCACAGCCCAGGGAAACCTCCTCACAGTTTTCATCCAGC CACGGGCCAGCATGTCTGGGGGCAAATACGTAGACTCGGAGGGACATCTCTACACCGTTC CCATCCGGGAACAGGGCAACATCTACAAGCCCAACAACAAGGCCATGGCAGACGAGC Fasta: sequence data. Header line that begins with “>”, sequence (generally wrapped). http://www.ncbi.nlm.nih.gov/ BLAST/blastcgihelp.shtml
  41. 41. Common bioinformatics file formats !HWUSI-EAS1794_0001_FC61KOJ:5:110:7624:5467#0 99 Locus_126_Transcript_1 6319 1 50M = 6478 209 GCTTGTGGCAT IIIIIIIIIIII HWUSI-EAS1794_0001_FC61KOJ:5:110:7624:5467#0 147 Locus_126_Transcript_1 6478 1 50M = 6319 -209 GACGTTCGTGAT IHIIHHIIIIII Sam: sequence alignment. Tab delimited file with eleven required feilds. http://samtools.github.io/hts-specs/SAMv1.pdf Bam: binary version of a sam file. Read header MAPQ Target header! Read seq Read quality
  42. 42. Pipes Standard! input Stdin ! Software carpentry v.4 http://software-carpentry.org/v4/shell
  43. 43. Pipes Standard! input Stdin Standard! input Stdin “|” passes output from some kinds of programs as input to other programs to chain together steps! “>” tells the shell to print the output to a file rather than display on the screen ! Software carpentry v.4 http://software-carpentry.org/v4/shell
  44. 44. Pipes ! $ cd ~/pipeline_datasets/RNA-SeqAlign2Ref $ wc -l *.fastq > lines wc lines ! Software carpentry v.4 http://software-carpentry.org/v4/shell
  45. 45. Pipes ! $ wc -l *.fastq | sort > lines wc sort lines ! Software carpentry v.4 http://software-carpentry.org/v4/shell
  46. 46. Pipes ! $ wc -l *.fastq | sort | head -1 > lines lines wc sort head -1 ! Software carpentry v.4 http://software-carpentry.org/v4/shell
  47. 47. Pipes and grep ! $ wc -l *.fastq | sort | head -1 > lines
  48. 48. Pipes and grep This programming model called pipes and filters. ! $ wc -l *.fastq | sort | head -1 > lines
  49. 49. Pipes and grep This programming model called pipes and filters. ! $ wc -l *.fastq | sort | head -1 > lines
  50. 50. Pipes and grep This programming model called pipes and filters. A filter transforms a stream of input into a stream of output ! $ wc -l *.fastq | sort | head -1 > lines
  51. 51. Pipes and grep This programming model called pipes and filters. A filter transforms a stream of input into a stream of output ! $ wc -l *.fastq | sort | head -1 > lines
  52. 52. Pipes and grep This programming model called pipes and filters. A filter transforms a stream of input into a stream of output A pipe connects two filters ! $ wc -l *.fastq | sort | head -1 > lines
  53. 53. Pipes and grep This programming model called pipes and filters. A filter transforms a stream of input into a stream of output A pipe connects two filters ! $ wc -l *.fastq | sort | head -1 > lines
  54. 54. Pipes and grep This programming model called pipes and filters. A filter transforms a stream of input into a stream of output A pipe connects two filters Any program that reads lines of text from standard input, and writes lines of text to standard output, can work with every other ! $ wc -l *.fastq | sort | head -1 > lines
  55. 55. Pipes and grep This programming model called pipes and filters. A filter transforms a stream of input into a stream of output A pipe connects two filters Any program that reads lines of text from standard input, and writes lines of text to standard output, can work with every other ! $ wc -l *.fastq | sort | head -1 > lines
  56. 56. Pipes and grep $ cd ~/pipeline_datasets/sam_bam ! $ /homes/bioinfo/bioinfo_software/samtools/samtools cat brain_rep_1_tophat2_out/accepted_hits.bam adrenal_rep_1_tophat2_out_1/accepted_hits.bam | /homes/bioinfo/ bioinfo_software/samtools/samtools flagstat - > alignment_stats.txt ! $ grep -c ">" ../RNA-SeqAlign2Ref/hg19.fa
  57. 57. Pipes and grep “|” passes output from some kinds of programs as input to other programs to chain together steps $ cd ~/pipeline_datasets/sam_bam ! $ /homes/bioinfo/bioinfo_software/samtools/samtools cat brain_rep_1_tophat2_out/accepted_hits.bam adrenal_rep_1_tophat2_out_1/accepted_hits.bam | /homes/bioinfo/ bioinfo_software/samtools/samtools flagstat - > alignment_stats.txt ! $ grep -c ">" ../RNA-SeqAlign2Ref/hg19.fa
  58. 58. Pipes and grep “|” passes output from some kinds of programs as input to other programs to chain together steps “-” tells samtools program to use the output from the previous step as input $ cd ~/pipeline_datasets/sam_bam ! $ /homes/bioinfo/bioinfo_software/samtools/samtools cat brain_rep_1_tophat2_out/accepted_hits.bam adrenal_rep_1_tophat2_out_1/accepted_hits.bam | /homes/bioinfo/ bioinfo_software/samtools/samtools flagstat - > alignment_stats.txt ! $ grep -c ">" ../RNA-SeqAlign2Ref/hg19.fa
  59. 59. Pipes and grep “|” passes output from some kinds of programs as input to other programs to chain together steps “-” tells samtools program to use the output from the previous step as input “>” tells the shell to print the output to a file rather than display on the screen $ cd ~/pipeline_datasets/sam_bam ! $ /homes/bioinfo/bioinfo_software/samtools/samtools cat brain_rep_1_tophat2_out/accepted_hits.bam adrenal_rep_1_tophat2_out_1/accepted_hits.bam | /homes/bioinfo/ bioinfo_software/samtools/samtools flagstat - > alignment_stats.txt ! $ grep -c ">" ../RNA-SeqAlign2Ref/hg19.fa
  60. 60. Pipes and grep “|” passes output from some kinds of programs as input to other programs to chain together steps “-” tells samtools program to use the output from the previous step as input “>” tells the shell to print the output to a file rather than display on the screen “grep” searches for patterns in a file. The “-c” parameter tells greps to count lines with the pattern (in this case we can count contigs in a fasta). $ cd ~/pipeline_datasets/sam_bam ! $ /homes/bioinfo/bioinfo_software/samtools/samtools cat brain_rep_1_tophat2_out/accepted_hits.bam adrenal_rep_1_tophat2_out_1/accepted_hits.bam | /homes/bioinfo/ bioinfo_software/samtools/samtools flagstat - > alignment_stats.txt ! $ grep -c ">" ../RNA-SeqAlign2Ref/hg19.fa
  61. 61. Pipes with samtools ! $ /homes/bioinfo/bioinfo_software/samtools/samtools https://www.biostars.org/p/43677/! ! http://samtools.sourceforge.net/pipe.shtml
  62. 62. Review Unix ps -p $$ process status for the process id of the current shell pwd print working directory ln -s create link with the -s parameter for symbolic ls list directory contents .. one directory up from the current working directory . current working directory ~ home directory * wildcard cd change directories mkdir make directories mv moves files or directories head prints first four lines of a file tail prints last four lines of a file | chains programs together grep searches for patterns wget non-interactive network downloader
  63. 63. Review NGS samtools cat concatenate BAMs samtools flagstat simple stats samtools view SAM<->BAM conversion samtools sort Sort alignments by leftmost coordinates samtools rmdup Remove potential PCR duplicates

×