Next Generation Sequencing
Upcoming SlideShare
Loading in...5
×
 

Next Generation Sequencing

on

  • 1,137 views

This was presented on Mar 11, 2014 at Boyce Thompson Institute, Ithaca, NY at the 3rd BTI Bioinformatics Course http://btiplantbioinfocourse.wordpress.com/

This was presented on Mar 11, 2014 at Boyce Thompson Institute, Ithaca, NY at the 3rd BTI Bioinformatics Course http://btiplantbioinfocourse.wordpress.com/

Statistics

Views

Total Views
1,137
Views on SlideShare
1,080
Embed Views
57

Actions

Likes
2
Downloads
52
Comments
4

2 Embeds 57

https://twitter.com 56
http://www.slideee.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Next Generation Sequencing Next Generation Sequencing Presentation Transcript

  • Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, Ithaca, NY ss2489@cornell.edu // Twitter:@SahaSurya BTI Plant Bioinformatics Course 2014 http://www.acgt.me/blog/2014/3/7/next-generation-sequencing-must-die
  • 1953 DNA Structure discovery 1977 2012 Sanger DNA sequencing by chain-terminating inhibitors 1984 Epstein-Barr virus (170 Kb) 1987Abi370 Sequencer 1995 2001 Homo sapiens (3.0 Gb) 2005 454 Solexa Solid 2007 2011 Ion Torrent PacBio Haemophilus influenzae (1.83 Mb) 2013 Slide credit: Aureliano Bombarely Sequencing over the Ages Illumina Illumina Hiseq X 454 3/12/2014 BTI Plant Bioinformatics Course 2014 2 Pinus taeda (24 Gb)
  • First generation sequencing 3/12/2014 BTI Plant Bioinformatics Course 2014 3
  • Sanger method 3/12/2014 BTI Plant Bioinformatics Course 2014 4 Frederick Sanger 13 Aug 1918 – 19 Nov 2013 Won the Nobel Prize for Chemistry in 1958 and 1980. Published the dideoxy chain termination method or “Sanger method” in 1977 http://dailym.ai/1f1XeTB
  • Sanger method 3/12/2014 BTI Plant Bioinformatics Course 2014 5 http://bit.ly/1g6Cudq http://bit.ly/1lcQO4J
  • Maxam-Gilbert method 3/12/2014 BTI Plant Bioinformatics Course 2014 6
  • Maxam-Gilbert method 3/12/2014 BTI Plant Bioinformatics Course 2014 7 http://bit.ly/1noY0fu http://bit.ly/1lGvJCA
  • First generation sequencing • Very high quality sequences (99.999%) • Very low throughput 3/12/2014 BTI Plant Bioinformatics Course 2014 8 Run Time Read Length Reads / Run Total nucleotides sequenced Cost / MB Capillary Sequencing (ABI3730xl) 20m-3h 400-900 bp 96 or 386 1.9-84 Kb $2400 http://bit.ly/1clLps3 http://1.usa.gov/1cLqIRd
  • Next generation sequencing 3/12/2014 BTI Plant Bioinformatics Course 2014 9
  • 3/12/2014 BTI Plant Bioinformatics Course 2014 10 http://bit.ly/1keDtZQ • Second generation • Third generation • Fourth generation • Next-next-generation • Next-next-next generation http://www.acgt.me/blog/2014/3/10/next-generation- sequencing-must-diepart-2
  • Use the specific technology used to generate the data – Illumina Hiseq/Miseq/NextSeq – Pacific Biosciences RS1/RSII – Ion Torrent Proton/PGM – SOLiD 3/12/2014 BTI Plant Bioinformatics Course 2014 11 http://www.acgt.me/blog/2014/3/10/next-generation- sequencing-must-diepart-2
  • 454 Pyrosequencing One purified DNA fragment, to one bead, to one read. 3/12/2014 BTI Plant Bioinformatics Course 2014 12 http://bit.ly/1ehwxWN GS FLX Titanium http://bit.ly/1ehAcEh
  • Illumina 3/12/2014 BTI Plant Bioinformatics Course 2014 13 Output 15 Gb 120 GB 1000 GB 1800 GB Number of Reads 25 Million 400 Million 4 Billion 6 Billion Read Length 2x300 bp 2x150 bp 2x125 bp (2x250 update mid-2014) 2x150 bp Cost $99K $250K $740K $10M Source: Illumina
  • Illumina 3/12/2014 BTI Plant Bioinformatics Course 2014 14 Output 15 Gb 120 GB 1000 GB 1800 GB Number of Reads 25 Million 400 Million 4 Billion 6 Billion Read Length 2x300 bp 2x150 bp 2x125 bp (2x250 update mid-2014) 2x150 bp Cost $99K $250K $740K $10M Source: Illumina $1000 human genome??
  • Illumina 3/12/2014 BTI Plant Bioinformatics Course 2014 15 http://1.usa.gov/1fP9ybl
  • Illumina:Moleculo 3/12/2014 BTI Plant Bioinformatics Course 2014 16 http://bit.ly/1aEPOBn
  • Pacific Biosciences SMRT sequencing Single Molecule Real Time sequencing 3/12/2014 BTI Plant Bioinformatics Course 2014 17 http://bit.ly/1naxgTe
  • Pacific Biosciences SMRT sequencing Error correction methods 3/12/2014 BTI Plant Bioinformatics Course 2014 18 Hierarchical genome-assembly process (HGAP) PBJelly Enlish et al., PLOS One. 2012 PBJelly
  • 3/12/2014 BTI Plant Bioinformatics Course 2014 19 Pacific Biosciences SMRT sequencing Read Lengths http://www.igs.umaryland.edu/labs/grc/ Mean Read Length: 8391 bp Maximum Subread Length: 24585 bp
  • Others • Ion Torrent Proton/PGM • Oxford Nanopore • Nabsys • SOLiD 3/12/2014 BTI Plant Bioinformatics Course 2014 20
  • Comparison 3/12/2014 BTI Plant Bioinformatics Course 2014 21
  • Next generation sequencing 3/12/2014 BTI Plant Bioinformatics Course 2014 22 Run Time Read Length Quality Total nucleotides sequenced Cost /MB 454 Pyrosequencing 24h 700 bp Q20-Q30 0.7 GB $10 Illumina Miseq 27h 2x250bp > Q30 15 GB $0.15 Illumina Hiseq 2500 11days 2x125bp >Q30 1000 GB $0.05 Ion torrent 2h 400bp >Q20 50MB-1GB $1 Pacific Biosciences 2h 5.5-8.5kb >Q30 consensus >Q10 single 400-800MB /SMRT cell $0.33-$1 http://bit.ly/1clLps3 http://1.usa.gov/1cLqIRd
  • Summary • Microbial genomes • Eukaryotic genomes • Resequencing genomes • RNAseq and other XXXseq methods 3/12/2014 BTI Plant Bioinformatics Course 2014 23 http://bit.ly/1ko9Kgh
  • http://omicsmaps.com/ Next Generation Genomics: World Map of High-throughput Sequencers BTI Plant Bioinformatics Course 20143/12/2014 24
  • 3/12/2014 BTI Plant Bioinformatics Course 2014 25 http://bit.ly/18pfUId
  • 3/12/2014 BTI Plant Bioinformatics Course 2014 26 http://bit.ly/18pfUId
  • Real cost of Sequencing!! Sboner, Genome Biology, 2011 3/12/2014 27
  • Library Types Single end Pair end (PE, 150-800 bp, Fwd:/1, Rev:/2) Mate pair (MP, 2Kb to 20 Kb) 3/12/2014 BTI Plant Bioinformatics Course 2014 28 F F R F R 454/Roche FR Illumina Illumina Slide credit: Aureliano Bombarely
  • Implications of Choice of Library 3/12/2014 BTI Plant Bioinformatics Course 2014 29 Slide credit: Aureliano Bombarely Consensus sequence (Contig) Reads Scaffold (or Supercontig) Pair Read information NNNNN Pseudomolecule (or ultracontig) F Genetic information (markers) NNNNN NN
  • Multiplexing Libraries Use of different tags (4-6 nucleotides) to identify different samples in the same lane/sector. 3/12/2014 BTI Plant Bioinformatics Course 2014 30 Slide credit: Aureliano Bombarely AGTCGT TGAGCA AGTCGT AGTCGT AGTCGT AGTCGT TGAGCA TGAGCA TGAGCA TGAGCA AGTCGT AGTCGT AGTCGT AGTCGT TGAGCA TGAGCA TGAGCA TGAGCA Sequencing
  • Fasta files: It is a text-based format for representing either nucleotide sequences or peptide sequences, in which nucleotides or amino acids are represented using single-letter codes. -Wikipedia File Formats 3/12/2014 BTI Plant Bioinformatics Course 2014 31 Slide credit: Aureliano Bombarely
  • Fastq files: FASTQ format is a text-based format for storing both a biological sequence (usually nucleotide sequence) and its corresponding quality scores. -Wikipedia • Single line ID with at symbol (“@”) in the first column. • Sequences can be in multiple lines after the ID line • Single line with plus symbol (“+”) in the first column to represent the quality line. • Quality ID line may contain ID • Quality values are in multiple lines after the + line but length should be identical to sequence 3/12/2014 BTI Plant Bioinformatics Course 2014 32 Slide credit: Aureliano Bombarely File Formats
  • 3/12/2014 BTI Plant Bioinformatics Course 2014 33 Quality control: Encoding Fastq files: !"#$%&'()*+,-./0123456789 Offset by 33 (Phred+33) KLMNOPQRSTUVWXYZ[]^_`abcdefgh Offset by 64 (Phred+64)
  • Quality control: Encoding 3/12/2014 BTI Plant Bioinformatics Course 2014 34 !"#$%&'()*+,-./0123456789 Offset by 33 (Phred+33) KLMNOPQRSTUVWXYZ[]^_`abcdefgh Offset by 64 (Phred+64)
  • 3/12/2014 BTI Plant Bioinformatics Course 2014 35 Quality control: Encoding http://bit.ly/N28yUd Phred score of a base is: Qphred = -10 log10 (e) where e is the estimated probability of a base being wrong
  • Quality control: Error correction 3/12/2014 BTI Plant Bioinformatics Course 2014 36
  • Thank you!! 3/12/2014 BTI Plant Bioinformatics Course 2014 37