Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for Bioinformatics 2014
Upcoming SlideShare
Loading in...5
×
 

Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for Bioinformatics 2014

on

  • 2,113 views

Files, directories, editing and pipes.

Files, directories, editing and pipes.

Statistics

Views

Total Views
2,113
Views on SlideShare
148
Embed Views
1,965

Actions

Likes
0
Downloads
1
Comments
0

7 Embeds 1,965

http://bioinformaticskstateperl.blogspot.com 1955
http://bioinformaticskstateperl.blogspot.in 4
http://bioinformaticskstateperl.blogspot.kr 2
http://bioinformaticskstateperl.blogspot.de 1
http://bioinformaticskstateperl.blogspot.ca 1
http://www.slideee.com 1
http://bioinformaticskstateperl.blogspot.ch 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for Bioinformatics 2014 Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for Bioinformatics 2014 Presentation Transcript

  • Files, directories, editing and pipes NGS Analysis on Beocat and an introduction to Perl programming for Bioinformatics 2014! ! Jennifer Shelton
  • Before class Please read through the following pages and install the software listed on these pages onto your laptop before coming to class:! ! https://github.com/i5K-KINBRE-script-share/FAQ/blob/master/ UsingBeocat.md! ! https://github.com/i5K-KINBRE-script-share/FAQ/blob/master/ BeocatEditingTransferingFiles.md
  • Logging in • Use the program “ssh” an OpenSSH SSH client (remote login program) to log into Beocat! • You will not see text as you type your password $ ssh EID@beocat.cis.ksu.edu password:
  • Terminal Software carpentry v.5 http://software-carpentry.org/v5/gloss.html
  • Terminal • We are now connected to Beocat using a command-line interface (CLI). A CLI is an interface based on typing commands, usually at a read-eval-print loop (REPL). Software carpentry v.5 http://software-carpentry.org/v5/gloss.html
  • Terminal • We are now connected to Beocat using a command-line interface (CLI). A CLI is an interface based on typing commands, usually at a read-eval-print loop (REPL). Software carpentry v.5 http://software-carpentry.org/v5/gloss.html
  • Terminal • We are now connected to Beocat using a command-line interface (CLI). A CLI is an interface based on typing commands, usually at a read-eval-print loop (REPL). • A read-eval-print loop (REPL) is a command-line interface that reads a command from the user, executes it, prints the result, and waits for another command. Software carpentry v.5 http://software-carpentry.org/v5/gloss.html
  • Terminal • We are now connected to Beocat using a command-line interface (CLI). A CLI is an interface based on typing commands, usually at a read-eval-print loop (REPL). • A read-eval-print loop (REPL) is a command-line interface that reads a command from the user, executes it, prints the result, and waits for another command. Software carpentry v.5 http://software-carpentry.org/v5/gloss.html
  • Terminal • We are now connected to Beocat using a command-line interface (CLI). A CLI is an interface based on typing commands, usually at a read-eval-print loop (REPL). • A read-eval-print loop (REPL) is a command-line interface that reads a command from the user, executes it, prints the result, and waits for another command. • A graphical user interface (GUI) is a graphical user interface, usually controlled by using a mouse. Software carpentry v.5 http://software-carpentry.org/v5/gloss.html
  • Shell • shell: A command-line interface such as Bash (the Bourne-Again Shell) or the Microsoft Windows DOS shell that allows a user to interact with the operating system. shell User Software carpentry v.5 http://software-carpentry.org/v5/gloss.html! Software carpentry v.4 http://software-carpentry.org/v4/shell
  • Shell shell User $ ps -p $$ PID TTY TIME CMD 63825 ttys002 0:00.04 -bash
  • Shell shell User $ ps -p $$ PID TTY TIME CMD 63825 ttys002 0:00.04 -bash “process status” program
  • Shell shell User $ ps -p $$ PID TTY TIME CMD 63825 ttys002 0:00.04 -bash “process status” program PID parameter
  • Shell shell User $ ps -p $$ PID TTY TIME CMD 63825 ttys002 0:00.04 -bash Current process “process status” program PID parameter
  • Shell shell User $ ps -p $$ PID TTY TIME CMD 63825 ttys002 0:00.04 -bash Current process “process status” program PID parameter Name of the current shell
  • Shell shell User $ whoami bioinfo
  • Shell shell User $ whoami bioinfo “whoami” program
  • Shell shell User $ whoami bioinfo “whoami” program User ID
  • Files and directories $ pwd /homes/bioinfo
  • Files and directories $ pwd /homes/bioinfo “pwd” or print working directory program
  • Files and directories $ pwd /homes/bioinfo “pwd” or print working directory program Current working directory
  • Files and directories $ pwd /homes/bioinfo “pwd” or print working directory program root / Current working directory
  • Files and directories $ pwd /homes/bioinfo “pwd” or print working directory program root / tmp homes bin Current working directory
  • Files and directories $ pwd /homes/bioinfo “pwd” or print working directory program root / tmp homes bin user1 bioinfo user2 Current working directory
  • Files and directories $ ln -s /homes/bioinfo/pipeline_datasets/ ./ $ ls pipeline_datasets@ $ ls pipeline_datasets/RNA-SeqAlign2Ref/ sample_read_list.txt* Galaxy5-brain_2.fastq* Galaxy4-brain_1.fastq* Galaxy3-adrenal_2.fastq* Galaxy2-adrenal_1.fastq* Galaxy1- iGenomes_UCSC_hg19_chr19_gene_annotation.gtf* hg19.fa* “ln” or link program with the -s parameter for symbolic! “ls” list directory contents RNA-SeqAlign2Ref AssembleT pipeline_datasets sample_read_list.txt*! Galaxy5-brain_2.fastq*! Galaxy4-brain_1.fastq*! Galaxy3-adrenal_2.fastq*! Galaxy2-adrenal_1.fastq*! Galaxy1- iGenomes_UCSC_hg19_c hr19_gene_annotation.gtf*! hg19.fa* notes.txt notes.txt notes.txt notes.txt notes.txt notes.txt notes.txt notes.txt
  • Files and directories $ ln -s /homes/bioinfo/pipeline_datasets/ ./ $ ls pipeline_datasets@ $ ls pipeline_datasets/RNA-SeqAlign2Ref/ sample_read_list.txt* Galaxy5-brain_2.fastq* Galaxy4-brain_1.fastq* Galaxy3-adrenal_2.fastq* Galaxy2-adrenal_1.fastq* Galaxy1- iGenomes_UCSC_hg19_chr19_gene_annotation.gtf* hg19.fa* “ln” or link program with the -s parameter for symbolic! “ls” list directory contents RNA-SeqAlign2Ref AssembleT pipeline_datasets sample_read_list.txt*! Galaxy5-brain_2.fastq*! Galaxy4-brain_1.fastq*! Galaxy3-adrenal_2.fastq*! Galaxy2-adrenal_1.fastq*! Galaxy1- iGenomes_UCSC_hg19_c hr19_gene_annotation.gtf*! hg19.fa* notes.txt notes.txt notes.txt notes.txt notes.txt notes.txt notes.txt notes.txt
  • Files and directories $ ln -s /homes/bioinfo/pipeline_datasets/ ./ $ ls pipeline_datasets@ $ ls pipeline_datasets/RNA-SeqAlign2Ref/ sample_read_list.txt* Galaxy5-brain_2.fastq* Galaxy4-brain_1.fastq* Galaxy3-adrenal_2.fastq* Galaxy2-adrenal_1.fastq* Galaxy1- iGenomes_UCSC_hg19_chr19_gene_annotation.gtf* hg19.fa* “ln” or link program with the -s parameter for symbolic! “ls” list directory contents RNA-SeqAlign2Ref AssembleT pipeline_datasets sample_read_list.txt*! Galaxy5-brain_2.fastq*! Galaxy4-brain_1.fastq*! Galaxy3-adrenal_2.fastq*! Galaxy2-adrenal_1.fastq*! Galaxy1- iGenomes_UCSC_hg19_c hr19_gene_annotation.gtf*! hg19.fa* notes.txt notes.txt notes.txt notes.txt notes.txt notes.txt notes.txt notes.txt
  • Files and directories $ ln -s /homes/bioinfo/pipeline_datasets/ ./ $ ls pipeline_datasets@ $ ls pipeline_datasets/RNA-SeqAlign2Ref/ sample_read_list.txt* Galaxy5-brain_2.fastq* Galaxy4-brain_1.fastq* Galaxy3-adrenal_2.fastq* Galaxy2-adrenal_1.fastq* Galaxy1- iGenomes_UCSC_hg19_chr19_gene_annotation.gtf* hg19.fa* “ln” or link program with the -s parameter for symbolic! “ls” list directory contents RNA-SeqAlign2Ref AssembleT pipeline_datasets sample_read_list.txt*! Galaxy5-brain_2.fastq*! Galaxy4-brain_1.fastq*! Galaxy3-adrenal_2.fastq*! Galaxy2-adrenal_1.fastq*! Galaxy1- iGenomes_UCSC_hg19_c hr19_gene_annotation.gtf*! hg19.fa* notes.txt notes.txt notes.txt notes.txt notes.txt notes.txt notes.txt notes.txt
  • Relative paths $ ls /homes/bioinfo $ ls ../../bin ls ln rm mkdir… $ ls ../bioinfo/bioinfo_software cufflinks@ tophat@ samtools@… $ ls ~/pipeline_datasets Galaxy5-brain_2.fastq* Galaxy4-brain_1.fastq*… root / tmp homes bin user1 bioinfo user2 “ls” list directory contents! .. one directory up from the current working directory! . current working directory! ~ home directory
  • Relative paths $ ls /homes/bioinfo $ ls ../../bin ls ln rm mkdir… $ ls ../bioinfo/bioinfo_software cufflinks@ tophat@ samtools@… $ ls ~/pipeline_datasets Galaxy5-brain_2.fastq* Galaxy4-brain_1.fastq*… root / tmp homes bin user1 bioinfo user2 “ls” list directory contents! .. one directory up from the current working directory! . current working directory! ~ home directory
  • Relative paths $ ls /homes/bioinfo $ ls ../../bin ls ln rm mkdir… $ ls ../bioinfo/bioinfo_software cufflinks@ tophat@ samtools@… $ ls ~/pipeline_datasets Galaxy5-brain_2.fastq* Galaxy4-brain_1.fastq*… root / tmp homes bin user1 bioinfo user2 “ls” list directory contents! .. one directory up from the current working directory! . current working directory! ~ home directory
  • Relative paths $ ls /homes/bioinfo $ ls ../../bin ls ln rm mkdir… $ ls ../bioinfo/bioinfo_software cufflinks@ tophat@ samtools@… $ ls ~/pipeline_datasets Galaxy5-brain_2.fastq* Galaxy4-brain_1.fastq*… root / tmp homes bin user1 bioinfo user2 “ls” list directory contents! .. one directory up from the current working directory! . current working directory! ~ home directory
  • Relative paths $ ls /homes/bioinfo $ ls ../../bin ls ln rm mkdir… $ ls ../bioinfo/bioinfo_software cufflinks@ tophat@ samtools@… $ ls ~/pipeline_datasets Galaxy5-brain_2.fastq* Galaxy4-brain_1.fastq*… root / tmp homes bin user1 bioinfo user2 “ls” list directory contents! .. one directory up from the current working directory! . current working directory! ~ home directory
  • Navigate and create directories $ cd ~/pipeline_datasets/RNA-SeqAlign2Ref $ ls sample_read_list.txt* Galaxy5-brain_2.fastq* Galaxy4-brain_1.fastq* Galaxy3-adrenal_2.fastq* Galaxy2-adrenal_1.fastq* Galaxy1-iGenomes_UCSC_hg19_chr19_gene_annotation.gtf* hg19.fa* $ pwd /homes/bioinfo/pipeline_datasets/RNA-SeqAlign2Ref $ mkdir test $ ls test… “cd” change directories! “mkdir” make directories
  • Navigate and create directories “touch” creates files! “rm” deletes files! or use cyberduck
  • Navigate and create directories “touch” creates files! “rm” deletes files! “nano” is a commandline file editor! or use cyberduck! ! Software carpentry v.5 http://software-carpentry.org/v5/gloss.html! Software carpentry v.4 http://software-carpentry.org/v4/shell
  • Navigate and create directories “touch” creates files! “rm” deletes files! “nano” is a commandline file editor! or use cyberduck! ! Software carpentry v.5 http://software-carpentry.org/v5/gloss.html! Software carpentry v.4 http://software-carpentry.org/v4/shell
  • Move files or directories $ mv ~/pipeline_datasets/test.txt ~/test.txt $ ls ~ test.txt… “mv” move files or directories to a new location
  • Unix wildcards and head/tail $ ls ~/pipeline_datasets/RNA-SeqAlign2Ref/*.fastq pipeline_datasets/RNA-SeqAlign2Ref/Galaxy5-brain_2.fastq* pipeline_datasets/RNA-SeqAlign2Ref/Galaxy4-brain_1.fastq* pipeline_datasets/RNA-SeqAlign2Ref/Galaxy3-adrenal_2.fastq* pipeline_datasets/RNA-SeqAlign2Ref/Galaxy2-adrenal_1.fastq* $ head ~/pipeline_datasets/RNA-SeqAlign2Ref/*.fastq ==> pipeline_datasets/RNA-SeqAlign2Ref/Galaxy2-adrenal_1.fastq <== @ERR030881.107 HWI-BRUNOP16X_0001:2:1:13663:1096#0/1 ATCTTTTGTGGCTACAGTAAGTTCAATCTGAAGTCAAAACCAACCAATTT + 5.544,444344555CC?CAEF@EEFFFFFFFFFFFFFFFFFEFFFEFFF… “*” any character 0 or 1 times (can be used with most basic Unix commands)! “head” prints first 4 lines of a file “tail” prints the last
  • Common bioinformatics file formats @ERR030881.107 HWI-BRUNOP16X_0001:2:1:13663:1096#0/1 ATCTTTTGTGGCTACAGTAAGTTCAATCTGAAGTCAAAACCAACCAATTT + 5.544,444344555CC?CAEF@EEFFFFFFFFFFFFFFFFFEFFFEFFF Fastq: sequence data with quality scores. Four lines per entry header line, sequence, second header or +, base quality scores. http://en.wikipedia.org/wiki/FASTQ_format >Locus_1_Transcript_2/3_Confidence_0.333_Length_600 CCCCCCTTCAGTTCCCTTAAAGCACAGCCCAGGGAAACCTCCTCACAGTTTTCATCCAGC CACGGGCCAGCATGTCTGGGGGCAAATACGTAGACTCGGAGGGACATCTCTACACCGTTC CCATCCGGGAACAGGGCAACATCTACAAGCCCAACAACAAGGCCATGGCAGACGAGC Fasta: sequence data. Header line that begins with “>”, sequence (generally wrapped). http://www.ncbi.nlm.nih.gov/ BLAST/blastcgihelp.shtml
  • Common bioinformatics file formats !HWUSI-EAS1794_0001_FC61KOJ:5:110:7624:5467#0 99 Locus_126_Transcript_1 6319 1 50M = 6478 209 GCTTGTGGCAT IIIIIIIIIIII HWUSI-EAS1794_0001_FC61KOJ:5:110:7624:5467#0 147 Locus_126_Transcript_1 6478 1 50M = 6319 -209 GACGTTCGTGAT IHIIHHIIIIII Sam: sequence alignment. Tab delimited file with eleven required feilds. http://samtools.github.io/hts-specs/SAMv1.pdf Bam: binary version of a sam file. Read header MAPQ Target header! Read seq Read quality
  • Pipes Standard! input Stdin ! Software carpentry v.4 http://software-carpentry.org/v4/shell
  • Pipes Standard! input Stdin Standard! input Stdin “|” passes output from some kinds of programs as input to other programs to chain together steps! “>” tells the shell to print the output to a file rather than display on the screen ! Software carpentry v.4 http://software-carpentry.org/v4/shell
  • Pipes ! $ cd ~/pipeline_datasets/RNA-SeqAlign2Ref $ wc -l *.fastq > lines wc lines ! Software carpentry v.4 http://software-carpentry.org/v4/shell
  • Pipes ! $ wc -l *.fastq | sort > lines wc sort lines ! Software carpentry v.4 http://software-carpentry.org/v4/shell
  • Pipes ! $ wc -l *.fastq | sort | head -1 > lines lines wc sort head -1 ! Software carpentry v.4 http://software-carpentry.org/v4/shell
  • Pipes and grep ! $ wc -l *.fastq | sort | head -1 > lines
  • Pipes and grep This programming model called pipes and filters. ! $ wc -l *.fastq | sort | head -1 > lines
  • Pipes and grep This programming model called pipes and filters. ! $ wc -l *.fastq | sort | head -1 > lines
  • Pipes and grep This programming model called pipes and filters. A filter transforms a stream of input into a stream of output ! $ wc -l *.fastq | sort | head -1 > lines
  • Pipes and grep This programming model called pipes and filters. A filter transforms a stream of input into a stream of output ! $ wc -l *.fastq | sort | head -1 > lines
  • Pipes and grep This programming model called pipes and filters. A filter transforms a stream of input into a stream of output A pipe connects two filters ! $ wc -l *.fastq | sort | head -1 > lines
  • Pipes and grep This programming model called pipes and filters. A filter transforms a stream of input into a stream of output A pipe connects two filters ! $ wc -l *.fastq | sort | head -1 > lines
  • Pipes and grep This programming model called pipes and filters. A filter transforms a stream of input into a stream of output A pipe connects two filters Any program that reads lines of text from standard input, and writes lines of text to standard output, can work with every other ! $ wc -l *.fastq | sort | head -1 > lines
  • Pipes and grep This programming model called pipes and filters. A filter transforms a stream of input into a stream of output A pipe connects two filters Any program that reads lines of text from standard input, and writes lines of text to standard output, can work with every other ! $ wc -l *.fastq | sort | head -1 > lines
  • Pipes and grep $ cd ~/pipeline_datasets/sam_bam ! $ /homes/bioinfo/bioinfo_software/samtools/samtools cat brain_rep_1_tophat2_out/accepted_hits.bam adrenal_rep_1_tophat2_out_1/accepted_hits.bam | /homes/bioinfo/ bioinfo_software/samtools/samtools flagstat - > alignment_stats.txt ! $ grep -c ">" ../RNA-SeqAlign2Ref/hg19.fa
  • Pipes and grep “|” passes output from some kinds of programs as input to other programs to chain together steps $ cd ~/pipeline_datasets/sam_bam ! $ /homes/bioinfo/bioinfo_software/samtools/samtools cat brain_rep_1_tophat2_out/accepted_hits.bam adrenal_rep_1_tophat2_out_1/accepted_hits.bam | /homes/bioinfo/ bioinfo_software/samtools/samtools flagstat - > alignment_stats.txt ! $ grep -c ">" ../RNA-SeqAlign2Ref/hg19.fa
  • Pipes and grep “|” passes output from some kinds of programs as input to other programs to chain together steps “-” tells samtools program to use the output from the previous step as input $ cd ~/pipeline_datasets/sam_bam ! $ /homes/bioinfo/bioinfo_software/samtools/samtools cat brain_rep_1_tophat2_out/accepted_hits.bam adrenal_rep_1_tophat2_out_1/accepted_hits.bam | /homes/bioinfo/ bioinfo_software/samtools/samtools flagstat - > alignment_stats.txt ! $ grep -c ">" ../RNA-SeqAlign2Ref/hg19.fa
  • Pipes and grep “|” passes output from some kinds of programs as input to other programs to chain together steps “-” tells samtools program to use the output from the previous step as input “>” tells the shell to print the output to a file rather than display on the screen $ cd ~/pipeline_datasets/sam_bam ! $ /homes/bioinfo/bioinfo_software/samtools/samtools cat brain_rep_1_tophat2_out/accepted_hits.bam adrenal_rep_1_tophat2_out_1/accepted_hits.bam | /homes/bioinfo/ bioinfo_software/samtools/samtools flagstat - > alignment_stats.txt ! $ grep -c ">" ../RNA-SeqAlign2Ref/hg19.fa
  • Pipes and grep “|” passes output from some kinds of programs as input to other programs to chain together steps “-” tells samtools program to use the output from the previous step as input “>” tells the shell to print the output to a file rather than display on the screen “grep” searches for patterns in a file. The “-c” parameter tells greps to count lines with the pattern (in this case we can count contigs in a fasta). $ cd ~/pipeline_datasets/sam_bam ! $ /homes/bioinfo/bioinfo_software/samtools/samtools cat brain_rep_1_tophat2_out/accepted_hits.bam adrenal_rep_1_tophat2_out_1/accepted_hits.bam | /homes/bioinfo/ bioinfo_software/samtools/samtools flagstat - > alignment_stats.txt ! $ grep -c ">" ../RNA-SeqAlign2Ref/hg19.fa
  • Pipes with samtools ! $ /homes/bioinfo/bioinfo_software/samtools/samtools https://www.biostars.org/p/43677/! ! http://samtools.sourceforge.net/pipe.shtml
  • Review Unix ps -p $$ process status for the process id of the current shell pwd print working directory ln -s create link with the -s parameter for symbolic ls list directory contents .. one directory up from the current working directory . current working directory ~ home directory * wildcard cd change directories mkdir make directories mv moves files or directories head prints first four lines of a file tail prints last four lines of a file | chains programs together grep searches for patterns wget non-interactive network downloader
  • Review NGS samtools cat concatenate BAMs samtools flagstat simple stats samtools view SAM<->BAM conversion samtools sort Sort alignments by leftmost coordinates samtools rmdup Remove potential PCR duplicates