Surya Saha
Sol Genomics Network (SGN)
Boyce Thompson Institute, Ithaca, NY
ss2489@cornell.edu // Twitter:@SahaSurya
BTI Pl...
1953
DNA Structure
discovery
1977
2012
Sanger DNA sequencing by
chain-terminating inhibitors
1984
Epstein-Barr
virus
(170 ...
First generation sequencing
3/12/2014 BTI Plant Bioinformatics Course 2014 3
Sanger method
3/12/2014 BTI Plant Bioinformatics Course 2014 4
Frederick Sanger
13 Aug 1918 – 19 Nov 2013
Won the Nobel Pr...
Sanger method
3/12/2014 BTI Plant Bioinformatics Course 2014 5
http://bit.ly/1g6Cudq
http://bit.ly/1lcQO4J
Maxam-Gilbert method
3/12/2014 BTI Plant Bioinformatics Course 2014 6
Maxam-Gilbert method
3/12/2014 BTI Plant Bioinformatics Course 2014 7
http://bit.ly/1noY0fu
http://bit.ly/1lGvJCA
First generation sequencing
• Very high quality sequences (99.999%)
• Very low throughput
3/12/2014 BTI Plant Bioinformati...
Next generation sequencing
3/12/2014 BTI Plant Bioinformatics Course 2014 9
3/12/2014 BTI Plant Bioinformatics Course 2014 10
http://bit.ly/1keDtZQ
• Second generation
• Third generation
• Fourth ge...
Use the specific technology used
to generate the data
– Illumina Hiseq/Miseq/NextSeq
– Pacific Biosciences RS1/RSII
– Ion ...
454 Pyrosequencing
One purified DNA
fragment, to one bead, to
one read.
3/12/2014 BTI Plant Bioinformatics Course 2014 12
...
Illumina
3/12/2014 BTI Plant Bioinformatics Course 2014 13
Output 15 Gb 120 GB 1000 GB 1800 GB
Number
of Reads
25 Million ...
Illumina
3/12/2014 BTI Plant Bioinformatics Course 2014 14
Output 15 Gb 120 GB 1000 GB 1800 GB
Number
of Reads
25 Million ...
Illumina
3/12/2014 BTI Plant Bioinformatics Course 2014 15
http://1.usa.gov/1fP9ybl
Illumina:Moleculo
3/12/2014 BTI Plant Bioinformatics Course 2014 16
http://bit.ly/1aEPOBn
Pacific Biosciences SMRT sequencing
Single Molecule Real
Time sequencing
3/12/2014 BTI Plant Bioinformatics Course 2014 17...
Pacific Biosciences SMRT sequencing
Error correction methods
3/12/2014 BTI Plant Bioinformatics Course 2014 18
Hierarchica...
3/12/2014 BTI Plant Bioinformatics Course 2014 19
Pacific Biosciences SMRT sequencing
Read Lengths
http://www.igs.umarylan...
Others
• Ion Torrent Proton/PGM
• Oxford Nanopore
• Nabsys
• SOLiD
3/12/2014 BTI Plant Bioinformatics Course 2014 20
Comparison
3/12/2014 BTI Plant Bioinformatics Course 2014 21
Next generation sequencing
3/12/2014 BTI Plant Bioinformatics Course 2014 22
Run Time Read Length Quality
Total
nucleotide...
Summary
• Microbial genomes
• Eukaryotic genomes
• Resequencing genomes
• RNAseq and other XXXseq methods
3/12/2014 BTI Pl...
http://omicsmaps.com/
Next Generation Genomics:
World Map of High-throughput Sequencers
BTI Plant Bioinformatics Course 20...
3/12/2014 BTI Plant Bioinformatics Course 2014 25
http://bit.ly/18pfUId
3/12/2014 BTI Plant Bioinformatics Course 2014 26
http://bit.ly/18pfUId
Real cost of Sequencing!!
Sboner, Genome Biology, 2011
3/12/2014 27
Library Types
Single end
Pair end (PE, 150-800 bp, Fwd:/1, Rev:/2)
Mate pair (MP, 2Kb to 20 Kb)
3/12/2014 BTI Plant Bioinf...
Implications of Choice of Library
3/12/2014 BTI Plant Bioinformatics Course 2014 29
Slide credit: Aureliano Bombarely
Cons...
Multiplexing Libraries
Use of different tags (4-6 nucleotides) to identify
different samples in the same lane/sector.
3/12...
Fasta files:
It is a text-based format for representing either nucleotide sequences or peptide
sequences, in which nucleot...
Fastq files:
FASTQ format is a text-based format for storing both a biological sequence (usually
nucleotide sequence) and ...
3/12/2014 BTI Plant Bioinformatics Course 2014 33
Quality control: Encoding
Fastq files:
!"#$%&'()*+,-./0123456789 Offset ...
Quality control: Encoding
3/12/2014 BTI Plant Bioinformatics Course 2014 34
!"#$%&'()*+,-./0123456789 Offset by 33 (Phred+...
3/12/2014 BTI Plant Bioinformatics Course 2014 35
Quality control: Encoding
http://bit.ly/N28yUd
Phred score of a base is:...
Quality control: Error correction
3/12/2014 BTI Plant Bioinformatics Course 2014 36
Thank you!!
3/12/2014 BTI Plant Bioinformatics Course 2014 37
Upcoming SlideShare
Loading in …5
×

Next Generation Sequencing

2,827 views

Published on

This was presented on Mar 11, 2014 at Boyce Thompson Institute, Ithaca, NY at the 3rd BTI Bioinformatics Course http://btiplantbioinfocourse.wordpress.com/

Published in: Education
4 Comments
3 Likes
Statistics
Notes
No Downloads
Views
Total views
2,827
On SlideShare
0
From Embeds
0
Number of Embeds
79
Actions
Shares
0
Downloads
85
Comments
4
Likes
3
Embeds 0
No embeds

No notes for slide

Next Generation Sequencing

  1. 1. Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, Ithaca, NY ss2489@cornell.edu // Twitter:@SahaSurya BTI Plant Bioinformatics Course 2014 http://www.acgt.me/blog/2014/3/7/next-generation-sequencing-must-die
  2. 2. 1953 DNA Structure discovery 1977 2012 Sanger DNA sequencing by chain-terminating inhibitors 1984 Epstein-Barr virus (170 Kb) 1987Abi370 Sequencer 1995 2001 Homo sapiens (3.0 Gb) 2005 454 Solexa Solid 2007 2011 Ion Torrent PacBio Haemophilus influenzae (1.83 Mb) 2013 Slide credit: Aureliano Bombarely Sequencing over the Ages Illumina Illumina Hiseq X 454 3/12/2014 BTI Plant Bioinformatics Course 2014 2 Pinus taeda (24 Gb)
  3. 3. First generation sequencing 3/12/2014 BTI Plant Bioinformatics Course 2014 3
  4. 4. Sanger method 3/12/2014 BTI Plant Bioinformatics Course 2014 4 Frederick Sanger 13 Aug 1918 – 19 Nov 2013 Won the Nobel Prize for Chemistry in 1958 and 1980. Published the dideoxy chain termination method or “Sanger method” in 1977 http://dailym.ai/1f1XeTB
  5. 5. Sanger method 3/12/2014 BTI Plant Bioinformatics Course 2014 5 http://bit.ly/1g6Cudq http://bit.ly/1lcQO4J
  6. 6. Maxam-Gilbert method 3/12/2014 BTI Plant Bioinformatics Course 2014 6
  7. 7. Maxam-Gilbert method 3/12/2014 BTI Plant Bioinformatics Course 2014 7 http://bit.ly/1noY0fu http://bit.ly/1lGvJCA
  8. 8. First generation sequencing • Very high quality sequences (99.999%) • Very low throughput 3/12/2014 BTI Plant Bioinformatics Course 2014 8 Run Time Read Length Reads / Run Total nucleotides sequenced Cost / MB Capillary Sequencing (ABI3730xl) 20m-3h 400-900 bp 96 or 386 1.9-84 Kb $2400 http://bit.ly/1clLps3 http://1.usa.gov/1cLqIRd
  9. 9. Next generation sequencing 3/12/2014 BTI Plant Bioinformatics Course 2014 9
  10. 10. 3/12/2014 BTI Plant Bioinformatics Course 2014 10 http://bit.ly/1keDtZQ • Second generation • Third generation • Fourth generation • Next-next-generation • Next-next-next generation http://www.acgt.me/blog/2014/3/10/next-generation- sequencing-must-diepart-2
  11. 11. Use the specific technology used to generate the data – Illumina Hiseq/Miseq/NextSeq – Pacific Biosciences RS1/RSII – Ion Torrent Proton/PGM – SOLiD 3/12/2014 BTI Plant Bioinformatics Course 2014 11 http://www.acgt.me/blog/2014/3/10/next-generation- sequencing-must-diepart-2
  12. 12. 454 Pyrosequencing One purified DNA fragment, to one bead, to one read. 3/12/2014 BTI Plant Bioinformatics Course 2014 12 http://bit.ly/1ehwxWN GS FLX Titanium http://bit.ly/1ehAcEh
  13. 13. Illumina 3/12/2014 BTI Plant Bioinformatics Course 2014 13 Output 15 Gb 120 GB 1000 GB 1800 GB Number of Reads 25 Million 400 Million 4 Billion 6 Billion Read Length 2x300 bp 2x150 bp 2x125 bp (2x250 update mid-2014) 2x150 bp Cost $99K $250K $740K $10M Source: Illumina
  14. 14. Illumina 3/12/2014 BTI Plant Bioinformatics Course 2014 14 Output 15 Gb 120 GB 1000 GB 1800 GB Number of Reads 25 Million 400 Million 4 Billion 6 Billion Read Length 2x300 bp 2x150 bp 2x125 bp (2x250 update mid-2014) 2x150 bp Cost $99K $250K $740K $10M Source: Illumina $1000 human genome??
  15. 15. Illumina 3/12/2014 BTI Plant Bioinformatics Course 2014 15 http://1.usa.gov/1fP9ybl
  16. 16. Illumina:Moleculo 3/12/2014 BTI Plant Bioinformatics Course 2014 16 http://bit.ly/1aEPOBn
  17. 17. Pacific Biosciences SMRT sequencing Single Molecule Real Time sequencing 3/12/2014 BTI Plant Bioinformatics Course 2014 17 http://bit.ly/1naxgTe
  18. 18. Pacific Biosciences SMRT sequencing Error correction methods 3/12/2014 BTI Plant Bioinformatics Course 2014 18 Hierarchical genome-assembly process (HGAP) PBJelly Enlish et al., PLOS One. 2012 PBJelly
  19. 19. 3/12/2014 BTI Plant Bioinformatics Course 2014 19 Pacific Biosciences SMRT sequencing Read Lengths http://www.igs.umaryland.edu/labs/grc/ Mean Read Length: 8391 bp Maximum Subread Length: 24585 bp
  20. 20. Others • Ion Torrent Proton/PGM • Oxford Nanopore • Nabsys • SOLiD 3/12/2014 BTI Plant Bioinformatics Course 2014 20
  21. 21. Comparison 3/12/2014 BTI Plant Bioinformatics Course 2014 21
  22. 22. Next generation sequencing 3/12/2014 BTI Plant Bioinformatics Course 2014 22 Run Time Read Length Quality Total nucleotides sequenced Cost /MB 454 Pyrosequencing 24h 700 bp Q20-Q30 0.7 GB $10 Illumina Miseq 27h 2x250bp > Q30 15 GB $0.15 Illumina Hiseq 2500 11days 2x125bp >Q30 1000 GB $0.05 Ion torrent 2h 400bp >Q20 50MB-1GB $1 Pacific Biosciences 2h 5.5-8.5kb >Q30 consensus >Q10 single 400-800MB /SMRT cell $0.33-$1 http://bit.ly/1clLps3 http://1.usa.gov/1cLqIRd
  23. 23. Summary • Microbial genomes • Eukaryotic genomes • Resequencing genomes • RNAseq and other XXXseq methods 3/12/2014 BTI Plant Bioinformatics Course 2014 23 http://bit.ly/1ko9Kgh
  24. 24. http://omicsmaps.com/ Next Generation Genomics: World Map of High-throughput Sequencers BTI Plant Bioinformatics Course 20143/12/2014 24
  25. 25. 3/12/2014 BTI Plant Bioinformatics Course 2014 25 http://bit.ly/18pfUId
  26. 26. 3/12/2014 BTI Plant Bioinformatics Course 2014 26 http://bit.ly/18pfUId
  27. 27. Real cost of Sequencing!! Sboner, Genome Biology, 2011 3/12/2014 27
  28. 28. Library Types Single end Pair end (PE, 150-800 bp, Fwd:/1, Rev:/2) Mate pair (MP, 2Kb to 20 Kb) 3/12/2014 BTI Plant Bioinformatics Course 2014 28 F F R F R 454/Roche FR Illumina Illumina Slide credit: Aureliano Bombarely
  29. 29. Implications of Choice of Library 3/12/2014 BTI Plant Bioinformatics Course 2014 29 Slide credit: Aureliano Bombarely Consensus sequence (Contig) Reads Scaffold (or Supercontig) Pair Read information NNNNN Pseudomolecule (or ultracontig) F Genetic information (markers) NNNNN NN
  30. 30. Multiplexing Libraries Use of different tags (4-6 nucleotides) to identify different samples in the same lane/sector. 3/12/2014 BTI Plant Bioinformatics Course 2014 30 Slide credit: Aureliano Bombarely AGTCGT TGAGCA AGTCGT AGTCGT AGTCGT AGTCGT TGAGCA TGAGCA TGAGCA TGAGCA AGTCGT AGTCGT AGTCGT AGTCGT TGAGCA TGAGCA TGAGCA TGAGCA Sequencing
  31. 31. Fasta files: It is a text-based format for representing either nucleotide sequences or peptide sequences, in which nucleotides or amino acids are represented using single-letter codes. -Wikipedia File Formats 3/12/2014 BTI Plant Bioinformatics Course 2014 31 Slide credit: Aureliano Bombarely
  32. 32. Fastq files: FASTQ format is a text-based format for storing both a biological sequence (usually nucleotide sequence) and its corresponding quality scores. -Wikipedia • Single line ID with at symbol (“@”) in the first column. • Sequences can be in multiple lines after the ID line • Single line with plus symbol (“+”) in the first column to represent the quality line. • Quality ID line may contain ID • Quality values are in multiple lines after the + line but length should be identical to sequence 3/12/2014 BTI Plant Bioinformatics Course 2014 32 Slide credit: Aureliano Bombarely File Formats
  33. 33. 3/12/2014 BTI Plant Bioinformatics Course 2014 33 Quality control: Encoding Fastq files: !"#$%&'()*+,-./0123456789 Offset by 33 (Phred+33) KLMNOPQRSTUVWXYZ[]^_`abcdefgh Offset by 64 (Phred+64)
  34. 34. Quality control: Encoding 3/12/2014 BTI Plant Bioinformatics Course 2014 34 !"#$%&'()*+,-./0123456789 Offset by 33 (Phred+33) KLMNOPQRSTUVWXYZ[]^_`abcdefgh Offset by 64 (Phred+64)
  35. 35. 3/12/2014 BTI Plant Bioinformatics Course 2014 35 Quality control: Encoding http://bit.ly/N28yUd Phred score of a base is: Qphred = -10 log10 (e) where e is the estimated probability of a base being wrong
  36. 36. Quality control: Error correction 3/12/2014 BTI Plant Bioinformatics Course 2014 36
  37. 37. Thank you!! 3/12/2014 BTI Plant Bioinformatics Course 2014 37

×