SlideShare a Scribd company logo
1 of 22
Download to read offline
Introduction to
Next-Generation Sequencing (NGS)
CGEpi Winter school 2017
13.02.2017
Aarif Mohamed Nazeer Batcha & Guokun Zhang
http://www.westminster.ac.uk/__data/assets/image/0003/157377/varieties/block10.jpg
Biology: Cells, Genes and Molecules
modified from:
http://quantumanthropology1.files.wordpress
.com/2012/04/picture-cell-
nucleus.png?w=538
from: http://www.biologyjunction.com/images/04-05A-AnimalCell-L.jpg
Meselson/Stahl: DNA replication
is semi-conservative
Kornberg: Polymerase I
Crick: “Central Dogma”
of molecular biology
Hershey/Chase: Support for Avery
Astbury: Repetetivity
Koltsov: DNA proposal
Ion Torrent PGM
Benchtop Sequencers
availabe (MiSeq, GS
junior, Ion Proton)
Genome Analyzer, SOLiD system
454 FLX
Foundation of
454 Lifesciences
HG Draft
First NGS sequencer: MPSS
Foundation of NCBI
First multicellular and
animal genome: C. elegans
First eukaryotic
genome: S. cerevisae
Foundation of EBI
FASTA algorithm
Avery/McLeod/McCarty: DNA = chemical carrier of
heritable information; foundation of „molecular biology“
Watson/Crick/Franklin:
DNA double helix
…
From Mendel to nowadays: in short*
the history of molecular genetics
1869
1919 1937 1943
Levene: chem. building blocks, „sequence“
*hopefully…
1952
1953
1956
1958
1959
1845
Miescher: „Nuklein“
Mendel: Inheritance rules
1961
1962 1967
1968
1969
1974
Kornberg: Nucleosomes
1983
1980
1977
1986
1985 1987
1988
1990
1993
1992
1996
1995
2001
2000 2002
2005
2004 2006
1998
2010
Sanger/Gilbert/Maxam: Sequencing
First genome: Phi X174
Mullis: PCR
CalTech and ABI: semi-
automated sequencers
BLAST algorithm
First bacterial genome:
H. influenzae
Foundation of the Human
Genome (HG) Project
HG „complete“
Foundation of Celera
Genomics, Illumina & Solexa
2012
2009
algorithms
Bowtie & BWA
1927
Nirenberg/Khorana/Holley/Brenner/
Crick/…: Cracking the genetic code
Wanted: DNA variations
http://en.wikipedia.org/wiki/File:Chromosomes_mutations-en.svg
http://commons.wikimedia.org/wiki/File:Dna-SNP.svg
A difference compared to a defined ‚wildtype‘ reference
Nomenclature(s) - by effect on:
• structure
– small scale: single nucleotide polymorphisms (SNPs) up to
some hundred basepairs; substitutions, insertions, deletions, …)
– large scale: chromosome level; amplifications, deletions,
translocations, gene fusions, inversions, loss of heterozygosity, …
• protein sequence: frameshift, nonsense, missense, neutral, silent, …
• function: loss/gain, dominant negative, lethal
• fitness: harmful/beneficial, (nearly) neutral
• … (http://en.wikipedia.org/wiki/Mutation)
 We need a reference sequence
(and samples…)
„Next-Generation“ Sequencing?
• Sanger technique published in 1977, based on the polymerase chain
reaction (PCR)
 high quality, long reads
 slow, expensive
 „First Generation“
• Human Genome Project: 1990 – 2004
• Investments >3,000,000,000 $ (~1$/bp)
• Development of cheaper „NGS“ technologies
• Compete with microarray technology*
http://www.lifesciencesfoundation.org/content/m
edia/2011/08/17/2001_genome_covers.png
http://singularityhub.com/wp-
content/uploads/2010/05/Venter_Collins_Genome.jpg
* http://en.wikipedia.org/wiki/DNA_microarray
http://upload.wikimedia.org/wikipedia/commons/thumb/f
/f8/Frederick_Sanger2.jpg/195px-Frederick_Sanger2.jpg
NGS depends on massive, molecular parallelization
• Four major platform types (and further ones):
– Solexa/Illumina (Genome Analyzer, MiSeq, HiSeq)
– Applied Biosystems (ABi; SOLiD)
– Roche/454 (GS FLEX, GS Junior)
– Ion (Torrent PGM, Proton)
• Advantages: fast (massively parallel) & cheap (low cost per bp)
• Disadvantages: error-prone, extensive needs in
computation time and storage
„Next-Generation“ Sequencing!
Principles of Sequencing (I)
• Central „Dogma“ of molecular biology (Crick 1958/70)
• DNA polymerases (Kornberg 1956 and others)
– replicate DNA (template + nucleotides + energy)
– act semi-conservative (Meselson & Stahl 1958)
– work also in vitro
Translation
Replication
modified from: http://commons.wikimedia.org/
wiki/File:Crick's_1958_central_dogma.svg
from: http://www.ncbi.nlm.nih.gov/books/NBK28394/bin/ch6f48.jpg
record what a polymerase is
doing in order to get the
template sequence
• How to „observe“? E.g. measure light emission during a biochemical
reaction, tracing the underlying base sequence
• Requires a pre-step, due to limited signal strength:
1) DNA amplification
2) Signal measurement
32 further cycles
(2^35 overall) copies
Principles of Sequencing (II)
Parental strands
Forward
Primer Reverse
Primer
Melt Anneal Extend
First Cycle
Second Cycle
Third Cycle
made from: http://www.summerschool.at/static/andreas.zoller/images/cycle1.jpg
34.359.738.368
together emitting light at
laser excitation, recorded
by a camera, … [skip]
 very much data
a bit of… Theory
• per-base quality (sequencing errors)
• short length of reads:
– more gaps
– more comparisons
– more uncertain positions
– may not bridge repetetive regions
– …
• coverage problem in genomics (Lander-
Waterman 1988):
How much read data must be generated in
order to ensure a certain statistical
correctness of the consensus sequence?
– Poisson distribution of residing gaps
– „Coverage“ (e.g. 10x) applies to read length
target sequence
(short one!)
fragments
short reads
alignment
consensus
assembly
digestion/fragmentation
denaturation & sequencing
reconstruction of
sequence order
determining consensus
 NGS technology requires a multiple of data to be
generated compared to the original sequence
 but: massive cost reduction (per base)
modified from: S. Kurtz, ZBH Hamburg (lecture winter semester 2008)
efforts shift from lab to computing
Bridging the gap
• Computationally intense: often quadratic or even exponential growth of
requirements in time and space
 e.g. double input data, quadruple load
Aim: data reduction and gain of knowledge
Biology Computer Science
laboratory „low-level“ „high-level“
processed data
Modell
biol.
sample
Content of secondary knowledge
Relevance to scientific questions
Need for manual interactions
Level of data abstraction
Amount of data
Level of automation
fraction of irrelevant & erroneous data points
S
E
Q !
raw data
? experi-
ments
data analysis
Busting storage media
from: Stratton et al. 2009 – The cancer genome
• Current technologies (computing power, storage capacities, …) get overwhelmed
by NGS data
• Sequence data generation beats even Moore‘s Law (which postulates exponential
growth for hardware)
But: why? http://genomics.xprize.org/sites/default/files/styles/panopoly_image_original
/public/nih-cost-genome.jpg?itok=QPzSRoVY
Practice: Using sequence data (I)
• NGS is just a technology generating data
• Scientists need assays in order to get from questions to answers
• Great variety of problems, scientific fields, target molecules, biological mechanism
etc. determine the assay and the data analysis
• General scheme:
Mapping
reference
vs. de novo
sequence
alignment
QA/QC
high error
rates
individual
across
platforms
Annotation
external
knowledge
(DBs)
„gene“,
funktional
units
Modelling
formali-
zation
interaction
networks
Variant
Detection
„calling“
artifacts,
hetero-
geneity
• NGS is just a technology generating data
• Scientists need assays in order to get from questions to answers
• Great variety of problems, scientific fields, target molecules, biological mechanism
etc. to work on
• General scheme
• find differences („variants“) via comparison („alignments“)
• Examples for sequencing assays:
– whole genome
– exome („poor man‘s genome“), transcriptome
– RNA expression quantification (analogous to microarrays)
– epigenome („regulatome“)
– amplicons (deep sequencing on few defined targets)
– metagenome (genomic content of a biosphere; water, biofilm, gut flora, …)
– …
• for what?
– basic research
– diagnostics (paternity tests, hereditary defects, tumor characterization, …)
– gain in resolution in existing efforts
Practice: Using sequence data (II)
Outlook
• faster technologies are under development („third generation sequencing“)
– Nanopore technology
– Electron-microscope based
– Via mass spectroscopy
– …
 single molecule approaches: skip the amplification step
• fraction of costs for analysis will rise further (Mardis 2010)
• competition: Archon X Prize (Kedes & Campany 2011; canceled in 2013)
• Ethics!
modified
from:
http://www.nature.com/nbt/journal/v28/n
5/images/nbt0510-426-F1.gif
from: http://www.wired.com/wiredenterprise/
wp-content/uploads//2012/03/Oxford-Nanopore-
MinION.jpeg
“The $10 million grand prize will be awarded
to the team(s) able to sequence 100 human
genomes within 30 days to an accuracy of 1
error per 1,000,000 bases, with 98%
completeness, identification of insertions,
deletions and rearrangements, and a complete
haplotype, at an audited total cost of $1,000
per genome” ( “outpaced by innovation“)
„Sequencing on a stick“:
References
• Metzker 2010 - Sequencing technologies - the next generation; Nat Rev Genet. 2010
Jan;11(1):31-46
• Stratton et al. 2009 - The Cancer Genome; Nature. 2009 April 9; 458(7239): 719–724
• Mardis 2010 - The $1,000 genome, the $100,000 analysis?; Genome Medicine 2010, 2:84
• Liu et al. 2012 - Comparison of Next-Generation Sequencing Systems; J Biomed
Biotechnol. 2012;2012:251364
• Kedes & Campany 2011 - The new date, new format, new goals and new sponsor
of the Archon Genomics X PRIZE competition; Nat Genet. 2011 Oct 27;43(11):1055-8
• Science Poster „The Evolution of Sequencing Technology“
For those who are curious…
Thanks
Questions?
Comparison of next-generation
sequencing methods
Method Single-molecule
real-time
sequencing
(Pacific Bio)
Ion semiconductor
(Ion Torrent
sequencing)
Pyrosequencing
(454)
Sequencing by
synthesis
(Illumina)
Sequencing by
ligation
(SOLiD sequencing)
Chain termination
(Sanger
sequencing)
Read length 2900 bp average[38] 200 bp 700 bp 50 to 250 bp 50+35 or 50+50 bp 400 to 900 bp
Accuracy 87% (read length
mode), 99% (accuracy
mode)
98% 99.90% 98% 99.90% 99.90%
Reads per run 35–75 thousand [39] up to 5 million 1 million up to 3 billion 1.2 to 1.4 billion N/A
Time per run 30 minutes to 2 hours
[40]
2 hours 24 hours 1 to 10 days,
depending upon
sequencer and
specified read
length[41]
1 to 2 weeks 20 minutes to 3 hours
Cost per 1 million
bases (in US$)
$2 $1 $10 $0.05 to $0.15 $0.13 $2400
Advantages Longest read length.
Fast. Detects 4mC,
5mC, 6mA.[42]
Less expensive
equipment. Fast.
Long read size. Fast. Potential for high
sequence yield,
depending upon
sequencer model and
desired application.
Low cost per base. Long individual reads.
Useful for many
applications.
Disadvantages Low yield at high
accuracy. Equipment
can be very expensive.
Homopolymer errors. Runs are expensive.
Homopolymer errors.
Equipment can be very
expensive.
Slower than other
methods.
More expensive and
impractical for larger
sequencing projects.
Quail, Michael; Smith, Miriam E; Coupland, Paul; Otto, Thomas D; Harris, Simon R; Connor, Thomas R;
Bertoni, Anna; Swerdlow, Harold P; Gu, Yong (1 January 2012). "A tale of three next generation
sequencing platforms: comparison of Ion torrent, pacific biosciences and illumina MiSeq sequencers".
BMC Genomics 13 (1): 341
Liu, Lin; Li, Yinhu; Li, Siliang; Hu, Ni; He, Yimin; Pong, Ray; Lin, Danni; Lu, Lihua; Law, Maggie (1 January
2012). "Comparison of Next-Generation Sequencing Systems". Journal of Biomedicine and Biotechnology
2012: 1–11
Sanger Method (I)
• Sequence of interest is targeted via designed primers
• Massive amplification of target (e.g. by ca. 35 rounds of PCR amplification)
Parental strands
Forward
Primer Reverse
Primer
Melt Anneal Extend
First Cycle
Second Cycle
Third Cycle
modified from: http://www.eisenlab.org/FunFly/wp-
content/uploads/2012/06/science-creative-quarterly-seq.gif
made from: http://www.summerschool.at/static/andreas.zoller/images/cycle1.jpg
Sanger Method (II)
 Extremely accurate: „gold
standard“ (error rate ~1:100,000)
 Slow: poorly parallelizable (60x
max.)
• Elongation of DNA sequences by polymerase
• Enzyme stops at a random position per copy (by ddNTP)
• Terminated copies are separated within a gel (smaller ones run further)
• Sequence can be read directly
modified
from:
http://themedicalbiochemistrypage.org/images/sangersequencing.jpg
Illumina Technology (I)
• Ca. 80 % of the NGS market
• Preparation: solid-phase bridge amplification (one cluster = one target sequence)
modified from: Metzker 2010
• Ca. 80 % of the NGS market
• Preparation: solid-phase bridge amplification (one cluster = one target sequence)
• Sequencing: Reversible terminators, fluorescent dyes
Illumina Technology (II)
 Extremely fast: speedup by
massive parallelization on the
moleculare level (~100 Mio. x)
 Inaccurate: poor per-base quality
(error rate ~1:500)
modified from: Metzker 2010

More Related Content

Similar to Introduction to Next Generation Sequencing

Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and KnowledgeIan Foster
 
Collaborations Between Calit2, SIO, and the Venter Institute-a Beginning
Collaborations Between Calit2, SIO, and the Venter Institute-a BeginningCollaborations Between Calit2, SIO, and the Venter Institute-a Beginning
Collaborations Between Calit2, SIO, and the Venter Institute-a BeginningLarry Smarr
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for ScienceIan Foster
 
Building an Information Infrastructure to Support Genetic Sciences
Building an Information Infrastructure to Support Genetic SciencesBuilding an Information Infrastructure to Support Genetic Sciences
Building an Information Infrastructure to Support Genetic SciencesLarry Smarr
 
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...Larry Smarr
 
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekGenomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekData Driven Innovation
 
Bioinformatics t1-introduction wim-vancriekinge_v2013
Bioinformatics t1-introduction wim-vancriekinge_v2013Bioinformatics t1-introduction wim-vancriekinge_v2013
Bioinformatics t1-introduction wim-vancriekinge_v2013Prof. Wim Van Criekinge
 
Physics inspired artificial intelligence/machine learning
Physics inspired artificial intelligence/machine learningPhysics inspired artificial intelligence/machine learning
Physics inspired artificial intelligence/machine learningKAMAL CHOUDHARY
 
Challenges and opportunities for machine learning in biomedical research
Challenges and opportunities for machine learning in biomedical researchChallenges and opportunities for machine learning in biomedical research
Challenges and opportunities for machine learning in biomedical researchFranciscoJAzuajeG
 
Biological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdfBiological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdfBioinformaticsCentre
 
Emerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsEmerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsmikaelhuss
 
Using Supercomputers and Supernetworks to Explore the Ocean of Life
Using Supercomputers and Supernetworks to Explore the Ocean of LifeUsing Supercomputers and Supernetworks to Explore the Ocean of Life
Using Supercomputers and Supernetworks to Explore the Ocean of LifeLarry Smarr
 
XLDB South America Keynote: eScience Institute and Myria
XLDB South America Keynote: eScience Institute and MyriaXLDB South America Keynote: eScience Institute and Myria
XLDB South America Keynote: eScience Institute and MyriaUniversity of Washington
 
The eCrystals Federation
The eCrystals FederationThe eCrystals Federation
The eCrystals FederationManjulaPatel
 
Aaas Data Intensive Science And Grid
Aaas Data Intensive Science And GridAaas Data Intensive Science And Grid
Aaas Data Intensive Science And GridIan Foster
 
ReComp and P4@NU: Reproducible Data Science for Health
ReComp and P4@NU: Reproducible Data Science for HealthReComp and P4@NU: Reproducible Data Science for Health
ReComp and P4@NU: Reproducible Data Science for HealthPaolo Missier
 
The Impact of Information Technology on Chemistry and Related Sciences
The Impact of Information Technology on Chemistry and Related SciencesThe Impact of Information Technology on Chemistry and Related Sciences
The Impact of Information Technology on Chemistry and Related SciencesAshutosh Jogalekar
 

Similar to Introduction to Next Generation Sequencing (20)

Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and Knowledge
 
Collaborations Between Calit2, SIO, and the Venter Institute-a Beginning
Collaborations Between Calit2, SIO, and the Venter Institute-a BeginningCollaborations Between Calit2, SIO, and the Venter Institute-a Beginning
Collaborations Between Calit2, SIO, and the Venter Institute-a Beginning
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 
Building an Information Infrastructure to Support Genetic Sciences
Building an Information Infrastructure to Support Genetic SciencesBuilding an Information Infrastructure to Support Genetic Sciences
Building an Information Infrastructure to Support Genetic Sciences
 
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
 
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekGenomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
 
Bioinformatics t1-introduction wim-vancriekinge_v2013
Bioinformatics t1-introduction wim-vancriekinge_v2013Bioinformatics t1-introduction wim-vancriekinge_v2013
Bioinformatics t1-introduction wim-vancriekinge_v2013
 
Physics inspired artificial intelligence/machine learning
Physics inspired artificial intelligence/machine learningPhysics inspired artificial intelligence/machine learning
Physics inspired artificial intelligence/machine learning
 
Challenges and opportunities for machine learning in biomedical research
Challenges and opportunities for machine learning in biomedical researchChallenges and opportunities for machine learning in biomedical research
Challenges and opportunities for machine learning in biomedical research
 
Biological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdfBiological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdf
 
Emerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsEmerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomics
 
Using Supercomputers and Supernetworks to Explore the Ocean of Life
Using Supercomputers and Supernetworks to Explore the Ocean of LifeUsing Supercomputers and Supernetworks to Explore the Ocean of Life
Using Supercomputers and Supernetworks to Explore the Ocean of Life
 
XLDB South America Keynote: eScience Institute and Myria
XLDB South America Keynote: eScience Institute and MyriaXLDB South America Keynote: eScience Institute and Myria
XLDB South America Keynote: eScience Institute and Myria
 
The eCrystals Federation
The eCrystals FederationThe eCrystals Federation
The eCrystals Federation
 
Aaas Data Intensive Science And Grid
Aaas Data Intensive Science And GridAaas Data Intensive Science And Grid
Aaas Data Intensive Science And Grid
 
UNMSymposium2014
UNMSymposium2014UNMSymposium2014
UNMSymposium2014
 
A biologist in e-Science
A biologist in e-ScienceA biologist in e-Science
A biologist in e-Science
 
Mauricio carrillo tripp
Mauricio carrillo trippMauricio carrillo tripp
Mauricio carrillo tripp
 
ReComp and P4@NU: Reproducible Data Science for Health
ReComp and P4@NU: Reproducible Data Science for HealthReComp and P4@NU: Reproducible Data Science for Health
ReComp and P4@NU: Reproducible Data Science for Health
 
The Impact of Information Technology on Chemistry and Related Sciences
The Impact of Information Technology on Chemistry and Related SciencesThe Impact of Information Technology on Chemistry and Related Sciences
The Impact of Information Technology on Chemistry and Related Sciences
 

More from EdizonJambormias2

[Annika_Kangas,_Matti_Maltamo]_Forest_Inventory_M(BookZZ.org).pdf
[Annika_Kangas,_Matti_Maltamo]_Forest_Inventory_M(BookZZ.org).pdf[Annika_Kangas,_Matti_Maltamo]_Forest_Inventory_M(BookZZ.org).pdf
[Annika_Kangas,_Matti_Maltamo]_Forest_Inventory_M(BookZZ.org).pdfEdizonJambormias2
 
(Forestry Sciences 76) M. Köhl (auth.), Piermaria Corona, Michael Köhl, Marco...
(Forestry Sciences 76) M. Köhl (auth.), Piermaria Corona, Michael Köhl, Marco...(Forestry Sciences 76) M. Köhl (auth.), Piermaria Corona, Michael Köhl, Marco...
(Forestry Sciences 76) M. Köhl (auth.), Piermaria Corona, Michael Köhl, Marco...EdizonJambormias2
 
Operation Research_Model Pengendalian Persediaan new.pptx
Operation Research_Model Pengendalian Persediaan new.pptxOperation Research_Model Pengendalian Persediaan new.pptx
Operation Research_Model Pengendalian Persediaan new.pptxEdizonJambormias2
 
@@risetoperasi-9-model-persediaan di Bidang Kehutanan
@@risetoperasi-9-model-persediaan di Bidang Kehutanan@@risetoperasi-9-model-persediaan di Bidang Kehutanan
@@risetoperasi-9-model-persediaan di Bidang KehutananEdizonJambormias2
 
Prosiding Penguatan Sains dan Teknologi Atmosfir
Prosiding Penguatan Sains dan Teknologi AtmosfirProsiding Penguatan Sains dan Teknologi Atmosfir
Prosiding Penguatan Sains dan Teknologi AtmosfirEdizonJambormias2
 
Model Pengaruh Ketersediaan Air Terhadap Pertumbuhan dan Hasil Kelapa Sawit
Model Pengaruh Ketersediaan Air Terhadap Pertumbuhan dan Hasil Kelapa SawitModel Pengaruh Ketersediaan Air Terhadap Pertumbuhan dan Hasil Kelapa Sawit
Model Pengaruh Ketersediaan Air Terhadap Pertumbuhan dan Hasil Kelapa SawitEdizonJambormias2
 
Model Produksi Tanaman Padi. Untuk Pertanian pdf
Model Produksi Tanaman Padi. Untuk Pertanian pdfModel Produksi Tanaman Padi. Untuk Pertanian pdf
Model Produksi Tanaman Padi. Untuk Pertanian pdfEdizonJambormias2
 
(The Ima Volumes in Mathematics and Its Applications) Terry Speed (editor), M...
(The Ima Volumes in Mathematics and Its Applications) Terry Speed (editor), M...(The Ima Volumes in Mathematics and Its Applications) Terry Speed (editor), M...
(The Ima Volumes in Mathematics and Its Applications) Terry Speed (editor), M...EdizonJambormias2
 
Dr Luke Alphey - DNA Sequencing (Introduction to Biotechniques)-Garland S
Dr Luke Alphey - DNA Sequencing (Introduction to Biotechniques)-Garland SDr Luke Alphey - DNA Sequencing (Introduction to Biotechniques)-Garland S
Dr Luke Alphey - DNA Sequencing (Introduction to Biotechniques)-Garland SEdizonJambormias2
 
Utility of transcriptome sequencing for phylogenetic
Utility of transcriptome sequencing for phylogeneticUtility of transcriptome sequencing for phylogenetic
Utility of transcriptome sequencing for phylogeneticEdizonJambormias2
 
Spatial transcriptomics in Plant (for Agriculture)
Spatial transcriptomics in Plant (for Agriculture)Spatial transcriptomics in Plant (for Agriculture)
Spatial transcriptomics in Plant (for Agriculture)EdizonJambormias2
 
Next Generation Sequencing - An Overview
Next Generation Sequencing - An OverviewNext Generation Sequencing - An Overview
Next Generation Sequencing - An OverviewEdizonJambormias2
 
Next Generation Sequencing (NGS) in the Clinic
Next Generation Sequencing (NGS) in the ClinicNext Generation Sequencing (NGS) in the Clinic
Next Generation Sequencing (NGS) in the ClinicEdizonJambormias2
 
EMT Next Generation Sequencing in Helath and Science
EMT Next Generation Sequencing in Helath and ScienceEMT Next Generation Sequencing in Helath and Science
EMT Next Generation Sequencing in Helath and ScienceEdizonJambormias2
 
AdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
AdamAmeur_SciLife_Bioinfo_course_Nov2015.pptAdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
AdamAmeur_SciLife_Bioinfo_course_Nov2015.pptEdizonJambormias2
 

More from EdizonJambormias2 (15)

[Annika_Kangas,_Matti_Maltamo]_Forest_Inventory_M(BookZZ.org).pdf
[Annika_Kangas,_Matti_Maltamo]_Forest_Inventory_M(BookZZ.org).pdf[Annika_Kangas,_Matti_Maltamo]_Forest_Inventory_M(BookZZ.org).pdf
[Annika_Kangas,_Matti_Maltamo]_Forest_Inventory_M(BookZZ.org).pdf
 
(Forestry Sciences 76) M. Köhl (auth.), Piermaria Corona, Michael Köhl, Marco...
(Forestry Sciences 76) M. Köhl (auth.), Piermaria Corona, Michael Köhl, Marco...(Forestry Sciences 76) M. Köhl (auth.), Piermaria Corona, Michael Köhl, Marco...
(Forestry Sciences 76) M. Köhl (auth.), Piermaria Corona, Michael Köhl, Marco...
 
Operation Research_Model Pengendalian Persediaan new.pptx
Operation Research_Model Pengendalian Persediaan new.pptxOperation Research_Model Pengendalian Persediaan new.pptx
Operation Research_Model Pengendalian Persediaan new.pptx
 
@@risetoperasi-9-model-persediaan di Bidang Kehutanan
@@risetoperasi-9-model-persediaan di Bidang Kehutanan@@risetoperasi-9-model-persediaan di Bidang Kehutanan
@@risetoperasi-9-model-persediaan di Bidang Kehutanan
 
Prosiding Penguatan Sains dan Teknologi Atmosfir
Prosiding Penguatan Sains dan Teknologi AtmosfirProsiding Penguatan Sains dan Teknologi Atmosfir
Prosiding Penguatan Sains dan Teknologi Atmosfir
 
Model Pengaruh Ketersediaan Air Terhadap Pertumbuhan dan Hasil Kelapa Sawit
Model Pengaruh Ketersediaan Air Terhadap Pertumbuhan dan Hasil Kelapa SawitModel Pengaruh Ketersediaan Air Terhadap Pertumbuhan dan Hasil Kelapa Sawit
Model Pengaruh Ketersediaan Air Terhadap Pertumbuhan dan Hasil Kelapa Sawit
 
Model Produksi Tanaman Padi. Untuk Pertanian pdf
Model Produksi Tanaman Padi. Untuk Pertanian pdfModel Produksi Tanaman Padi. Untuk Pertanian pdf
Model Produksi Tanaman Padi. Untuk Pertanian pdf
 
(The Ima Volumes in Mathematics and Its Applications) Terry Speed (editor), M...
(The Ima Volumes in Mathematics and Its Applications) Terry Speed (editor), M...(The Ima Volumes in Mathematics and Its Applications) Terry Speed (editor), M...
(The Ima Volumes in Mathematics and Its Applications) Terry Speed (editor), M...
 
Dr Luke Alphey - DNA Sequencing (Introduction to Biotechniques)-Garland S
Dr Luke Alphey - DNA Sequencing (Introduction to Biotechniques)-Garland SDr Luke Alphey - DNA Sequencing (Introduction to Biotechniques)-Garland S
Dr Luke Alphey - DNA Sequencing (Introduction to Biotechniques)-Garland S
 
Utility of transcriptome sequencing for phylogenetic
Utility of transcriptome sequencing for phylogeneticUtility of transcriptome sequencing for phylogenetic
Utility of transcriptome sequencing for phylogenetic
 
Spatial transcriptomics in Plant (for Agriculture)
Spatial transcriptomics in Plant (for Agriculture)Spatial transcriptomics in Plant (for Agriculture)
Spatial transcriptomics in Plant (for Agriculture)
 
Next Generation Sequencing - An Overview
Next Generation Sequencing - An OverviewNext Generation Sequencing - An Overview
Next Generation Sequencing - An Overview
 
Next Generation Sequencing (NGS) in the Clinic
Next Generation Sequencing (NGS) in the ClinicNext Generation Sequencing (NGS) in the Clinic
Next Generation Sequencing (NGS) in the Clinic
 
EMT Next Generation Sequencing in Helath and Science
EMT Next Generation Sequencing in Helath and ScienceEMT Next Generation Sequencing in Helath and Science
EMT Next Generation Sequencing in Helath and Science
 
AdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
AdamAmeur_SciLife_Bioinfo_course_Nov2015.pptAdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
AdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
 

Recently uploaded

Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersChitralekhaTherkar
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptxPoojaSen20
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 

Recently uploaded (20)

Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of Powders
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptx
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 

Introduction to Next Generation Sequencing

  • 1. Introduction to Next-Generation Sequencing (NGS) CGEpi Winter school 2017 13.02.2017 Aarif Mohamed Nazeer Batcha & Guokun Zhang http://www.westminster.ac.uk/__data/assets/image/0003/157377/varieties/block10.jpg
  • 2. Biology: Cells, Genes and Molecules modified from: http://quantumanthropology1.files.wordpress .com/2012/04/picture-cell- nucleus.png?w=538 from: http://www.biologyjunction.com/images/04-05A-AnimalCell-L.jpg
  • 3. Meselson/Stahl: DNA replication is semi-conservative Kornberg: Polymerase I Crick: “Central Dogma” of molecular biology Hershey/Chase: Support for Avery Astbury: Repetetivity Koltsov: DNA proposal Ion Torrent PGM Benchtop Sequencers availabe (MiSeq, GS junior, Ion Proton) Genome Analyzer, SOLiD system 454 FLX Foundation of 454 Lifesciences HG Draft First NGS sequencer: MPSS Foundation of NCBI First multicellular and animal genome: C. elegans First eukaryotic genome: S. cerevisae Foundation of EBI FASTA algorithm Avery/McLeod/McCarty: DNA = chemical carrier of heritable information; foundation of „molecular biology“ Watson/Crick/Franklin: DNA double helix … From Mendel to nowadays: in short* the history of molecular genetics 1869 1919 1937 1943 Levene: chem. building blocks, „sequence“ *hopefully… 1952 1953 1956 1958 1959 1845 Miescher: „Nuklein“ Mendel: Inheritance rules 1961 1962 1967 1968 1969 1974 Kornberg: Nucleosomes 1983 1980 1977 1986 1985 1987 1988 1990 1993 1992 1996 1995 2001 2000 2002 2005 2004 2006 1998 2010 Sanger/Gilbert/Maxam: Sequencing First genome: Phi X174 Mullis: PCR CalTech and ABI: semi- automated sequencers BLAST algorithm First bacterial genome: H. influenzae Foundation of the Human Genome (HG) Project HG „complete“ Foundation of Celera Genomics, Illumina & Solexa 2012 2009 algorithms Bowtie & BWA 1927 Nirenberg/Khorana/Holley/Brenner/ Crick/…: Cracking the genetic code
  • 4. Wanted: DNA variations http://en.wikipedia.org/wiki/File:Chromosomes_mutations-en.svg http://commons.wikimedia.org/wiki/File:Dna-SNP.svg A difference compared to a defined ‚wildtype‘ reference Nomenclature(s) - by effect on: • structure – small scale: single nucleotide polymorphisms (SNPs) up to some hundred basepairs; substitutions, insertions, deletions, …) – large scale: chromosome level; amplifications, deletions, translocations, gene fusions, inversions, loss of heterozygosity, … • protein sequence: frameshift, nonsense, missense, neutral, silent, … • function: loss/gain, dominant negative, lethal • fitness: harmful/beneficial, (nearly) neutral • … (http://en.wikipedia.org/wiki/Mutation)  We need a reference sequence (and samples…)
  • 5. „Next-Generation“ Sequencing? • Sanger technique published in 1977, based on the polymerase chain reaction (PCR)  high quality, long reads  slow, expensive  „First Generation“ • Human Genome Project: 1990 – 2004 • Investments >3,000,000,000 $ (~1$/bp) • Development of cheaper „NGS“ technologies • Compete with microarray technology* http://www.lifesciencesfoundation.org/content/m edia/2011/08/17/2001_genome_covers.png http://singularityhub.com/wp- content/uploads/2010/05/Venter_Collins_Genome.jpg * http://en.wikipedia.org/wiki/DNA_microarray http://upload.wikimedia.org/wikipedia/commons/thumb/f /f8/Frederick_Sanger2.jpg/195px-Frederick_Sanger2.jpg
  • 6. NGS depends on massive, molecular parallelization • Four major platform types (and further ones): – Solexa/Illumina (Genome Analyzer, MiSeq, HiSeq) – Applied Biosystems (ABi; SOLiD) – Roche/454 (GS FLEX, GS Junior) – Ion (Torrent PGM, Proton) • Advantages: fast (massively parallel) & cheap (low cost per bp) • Disadvantages: error-prone, extensive needs in computation time and storage „Next-Generation“ Sequencing!
  • 7. Principles of Sequencing (I) • Central „Dogma“ of molecular biology (Crick 1958/70) • DNA polymerases (Kornberg 1956 and others) – replicate DNA (template + nucleotides + energy) – act semi-conservative (Meselson & Stahl 1958) – work also in vitro Translation Replication modified from: http://commons.wikimedia.org/ wiki/File:Crick's_1958_central_dogma.svg from: http://www.ncbi.nlm.nih.gov/books/NBK28394/bin/ch6f48.jpg record what a polymerase is doing in order to get the template sequence
  • 8. • How to „observe“? E.g. measure light emission during a biochemical reaction, tracing the underlying base sequence • Requires a pre-step, due to limited signal strength: 1) DNA amplification 2) Signal measurement 32 further cycles (2^35 overall) copies Principles of Sequencing (II) Parental strands Forward Primer Reverse Primer Melt Anneal Extend First Cycle Second Cycle Third Cycle made from: http://www.summerschool.at/static/andreas.zoller/images/cycle1.jpg 34.359.738.368 together emitting light at laser excitation, recorded by a camera, … [skip]  very much data
  • 9. a bit of… Theory • per-base quality (sequencing errors) • short length of reads: – more gaps – more comparisons – more uncertain positions – may not bridge repetetive regions – … • coverage problem in genomics (Lander- Waterman 1988): How much read data must be generated in order to ensure a certain statistical correctness of the consensus sequence? – Poisson distribution of residing gaps – „Coverage“ (e.g. 10x) applies to read length target sequence (short one!) fragments short reads alignment consensus assembly digestion/fragmentation denaturation & sequencing reconstruction of sequence order determining consensus  NGS technology requires a multiple of data to be generated compared to the original sequence  but: massive cost reduction (per base) modified from: S. Kurtz, ZBH Hamburg (lecture winter semester 2008) efforts shift from lab to computing
  • 10. Bridging the gap • Computationally intense: often quadratic or even exponential growth of requirements in time and space  e.g. double input data, quadruple load Aim: data reduction and gain of knowledge Biology Computer Science laboratory „low-level“ „high-level“ processed data Modell biol. sample Content of secondary knowledge Relevance to scientific questions Need for manual interactions Level of data abstraction Amount of data Level of automation fraction of irrelevant & erroneous data points S E Q ! raw data ? experi- ments data analysis
  • 11. Busting storage media from: Stratton et al. 2009 – The cancer genome • Current technologies (computing power, storage capacities, …) get overwhelmed by NGS data • Sequence data generation beats even Moore‘s Law (which postulates exponential growth for hardware) But: why? http://genomics.xprize.org/sites/default/files/styles/panopoly_image_original /public/nih-cost-genome.jpg?itok=QPzSRoVY
  • 12. Practice: Using sequence data (I) • NGS is just a technology generating data • Scientists need assays in order to get from questions to answers • Great variety of problems, scientific fields, target molecules, biological mechanism etc. determine the assay and the data analysis • General scheme: Mapping reference vs. de novo sequence alignment QA/QC high error rates individual across platforms Annotation external knowledge (DBs) „gene“, funktional units Modelling formali- zation interaction networks Variant Detection „calling“ artifacts, hetero- geneity
  • 13. • NGS is just a technology generating data • Scientists need assays in order to get from questions to answers • Great variety of problems, scientific fields, target molecules, biological mechanism etc. to work on • General scheme • find differences („variants“) via comparison („alignments“) • Examples for sequencing assays: – whole genome – exome („poor man‘s genome“), transcriptome – RNA expression quantification (analogous to microarrays) – epigenome („regulatome“) – amplicons (deep sequencing on few defined targets) – metagenome (genomic content of a biosphere; water, biofilm, gut flora, …) – … • for what? – basic research – diagnostics (paternity tests, hereditary defects, tumor characterization, …) – gain in resolution in existing efforts Practice: Using sequence data (II)
  • 14. Outlook • faster technologies are under development („third generation sequencing“) – Nanopore technology – Electron-microscope based – Via mass spectroscopy – …  single molecule approaches: skip the amplification step • fraction of costs for analysis will rise further (Mardis 2010) • competition: Archon X Prize (Kedes & Campany 2011; canceled in 2013) • Ethics! modified from: http://www.nature.com/nbt/journal/v28/n 5/images/nbt0510-426-F1.gif from: http://www.wired.com/wiredenterprise/ wp-content/uploads//2012/03/Oxford-Nanopore- MinION.jpeg “The $10 million grand prize will be awarded to the team(s) able to sequence 100 human genomes within 30 days to an accuracy of 1 error per 1,000,000 bases, with 98% completeness, identification of insertions, deletions and rearrangements, and a complete haplotype, at an audited total cost of $1,000 per genome” ( “outpaced by innovation“) „Sequencing on a stick“:
  • 15. References • Metzker 2010 - Sequencing technologies - the next generation; Nat Rev Genet. 2010 Jan;11(1):31-46 • Stratton et al. 2009 - The Cancer Genome; Nature. 2009 April 9; 458(7239): 719–724 • Mardis 2010 - The $1,000 genome, the $100,000 analysis?; Genome Medicine 2010, 2:84 • Liu et al. 2012 - Comparison of Next-Generation Sequencing Systems; J Biomed Biotechnol. 2012;2012:251364 • Kedes & Campany 2011 - The new date, new format, new goals and new sponsor of the Archon Genomics X PRIZE competition; Nat Genet. 2011 Oct 27;43(11):1055-8 • Science Poster „The Evolution of Sequencing Technology“
  • 16. For those who are curious…
  • 18. Comparison of next-generation sequencing methods Method Single-molecule real-time sequencing (Pacific Bio) Ion semiconductor (Ion Torrent sequencing) Pyrosequencing (454) Sequencing by synthesis (Illumina) Sequencing by ligation (SOLiD sequencing) Chain termination (Sanger sequencing) Read length 2900 bp average[38] 200 bp 700 bp 50 to 250 bp 50+35 or 50+50 bp 400 to 900 bp Accuracy 87% (read length mode), 99% (accuracy mode) 98% 99.90% 98% 99.90% 99.90% Reads per run 35–75 thousand [39] up to 5 million 1 million up to 3 billion 1.2 to 1.4 billion N/A Time per run 30 minutes to 2 hours [40] 2 hours 24 hours 1 to 10 days, depending upon sequencer and specified read length[41] 1 to 2 weeks 20 minutes to 3 hours Cost per 1 million bases (in US$) $2 $1 $10 $0.05 to $0.15 $0.13 $2400 Advantages Longest read length. Fast. Detects 4mC, 5mC, 6mA.[42] Less expensive equipment. Fast. Long read size. Fast. Potential for high sequence yield, depending upon sequencer model and desired application. Low cost per base. Long individual reads. Useful for many applications. Disadvantages Low yield at high accuracy. Equipment can be very expensive. Homopolymer errors. Runs are expensive. Homopolymer errors. Equipment can be very expensive. Slower than other methods. More expensive and impractical for larger sequencing projects. Quail, Michael; Smith, Miriam E; Coupland, Paul; Otto, Thomas D; Harris, Simon R; Connor, Thomas R; Bertoni, Anna; Swerdlow, Harold P; Gu, Yong (1 January 2012). "A tale of three next generation sequencing platforms: comparison of Ion torrent, pacific biosciences and illumina MiSeq sequencers". BMC Genomics 13 (1): 341 Liu, Lin; Li, Yinhu; Li, Siliang; Hu, Ni; He, Yimin; Pong, Ray; Lin, Danni; Lu, Lihua; Law, Maggie (1 January 2012). "Comparison of Next-Generation Sequencing Systems". Journal of Biomedicine and Biotechnology 2012: 1–11
  • 19. Sanger Method (I) • Sequence of interest is targeted via designed primers • Massive amplification of target (e.g. by ca. 35 rounds of PCR amplification) Parental strands Forward Primer Reverse Primer Melt Anneal Extend First Cycle Second Cycle Third Cycle modified from: http://www.eisenlab.org/FunFly/wp- content/uploads/2012/06/science-creative-quarterly-seq.gif made from: http://www.summerschool.at/static/andreas.zoller/images/cycle1.jpg
  • 20. Sanger Method (II)  Extremely accurate: „gold standard“ (error rate ~1:100,000)  Slow: poorly parallelizable (60x max.) • Elongation of DNA sequences by polymerase • Enzyme stops at a random position per copy (by ddNTP) • Terminated copies are separated within a gel (smaller ones run further) • Sequence can be read directly modified from: http://themedicalbiochemistrypage.org/images/sangersequencing.jpg
  • 21. Illumina Technology (I) • Ca. 80 % of the NGS market • Preparation: solid-phase bridge amplification (one cluster = one target sequence) modified from: Metzker 2010
  • 22. • Ca. 80 % of the NGS market • Preparation: solid-phase bridge amplification (one cluster = one target sequence) • Sequencing: Reversible terminators, fluorescent dyes Illumina Technology (II)  Extremely fast: speedup by massive parallelization on the moleculare level (~100 Mio. x)  Inaccurate: poor per-base quality (error rate ~1:500) modified from: Metzker 2010