Introduction to Next Generation Sequencing
Alex Sánchez-Pla
Genetics, Microbiology and Statistics Department (UB)
Statistics and Bioinformatics Unit (VHIR)
2011/04/13 (updated: 2021-05-15)
Outline
2 / 38
1)Introduction: DNA sequencing.
2) Overview of Sequencing Methods and Technologies.
3) NGS Bioinformatics.
4) Applications of NGS.
Introduction and Motivation
3 / 38
From the Central Dogma
4 / 38
Into the Omics Cascade
5 / 38
Assuming the Central Dogma and the
Omics Cascade means
assuming that knowing and
understanding the genome sequence
is key to understand life and disease.
Said otherwise. "If you want to study
anything: sequence it!"
From genome to infinity, and beyond
6 / 38
Genome sequencing
Genome sequencing is }guring out the order of DNA nucleotides, or bases, in a genome,
the order of As, Cs, Gs, and Ts that make up an organism's DNA.
It has been developing for almost half century and has been thoroughly described.
Some hints:
7 / 38
Year
discov.
Description
Sequencing efficiency
(bp/person/year
1965 Yeast tRNA sequenced 1
1977-78
Sanger Dideoxy termination and Gilbert Chemical
Degradation Methods
1000
1986 Leroy Hood's Partial Automation 25.000
1992-95 Craig Venter's first sequencing 'factory' at TIGR 1000.000
Increased automation, without great changes in the fundamental ideas, led to the
sequencing of the first genomes.
The Human Genome Project
Follow this link to learn about the story of the Human Genome Project.
8 / 38
The rise of Next Generation Sequencing
The first Human Genome was sequenced using Sanger Sequencing- and the _shotgun
method.
By mid of the first decade of XXI a new technology emerged.
It was called Next Generation Sequencing because it served the same purpose with a
series characteristics that have made it revolutionary (again):
Parallelized,
High Throughput,
Cost-effective,
Many competing technologies.
By 2020 it has become a standard but it has also kept evolving:
"Next-Next" Generation sequencing
Single molecule (no amplification)
Long Reads
9 / 38
NGS: much more for much less
10 / 38
Overview of Sequencing Methods and
Technologies
11 / 38
Resources
Sequencing is a dynamic process which is better visualized through animations.
In order to help understand it and to promote its use many educational videos have
been prepared by both companies and academics.
Some interesting resources:
Sequencing Fundamentals (15-min, Illumina)
Introduction to NGS, YouSeq
Links-2-links, some may not work
12 / 38
The method is based on the selective
incorporation of chain-terminating
dideoxynucleotides by DNA
polymerase during in vitro DNA
replication.
This results in multiple chains of
different sizes ending with the same
nucleotides, where the size indicates
the nucleortide's position.
Separating the chains using gel
electrophoresis allows establishing the
order.
Follow this link to see an animation.
Sanger Sequencing
13 / 38
Next Generation Sequencing goes
through the same steps as Sanger
Sequencing, only it is done in more
efficient ways.
We consider two such steps:
Cloning
Sequencing
14 / 38
Sanger vs Next Generation Sequencing
1) Construction of a sequencing library
Clonal amplification to generate
sequencing features
15 / 38
Sanger vs Next Generation Sequencing
1) Construction of a sequencing library
NGS: Clonal amplification to generate
sequencing features
Sanger: In vivo cloning, transformation,
colony picking...
16 / 38
Sanger vs Next Generation Sequencing
1)Construction of a sequencing library
NGS: Clonal amplification to generate
sequencing features
Sanger: In vivo cloning, transformation,
colony picking...
2) Sequencing
NGS: Array-based sequencing
High degree of parallelism
17 / 38
Sanger vs Next Generation Sequencing
1)Construction of a sequencing library
NGS: Clonal amplification to generate
sequencing features
Sanger: In vivo cloning, transformation,
colony picking...
2) Sequencing
NGS:
Array-based sequencing
Higher degree of parallelism
Sanger
Capillary-based sequencing
Low degree of parallelism
18 / 38
Sanger vs Next Generation Sequencing
Next Generation Sequencing
19 / 38
Next Generation Sequencing
20 / 38
Next Generation Sequencing
21 / 38
Next Generation Sequencing
22 / 38
Next Generation Sequencing
23 / 38
Next Generation Sequencing
24 / 38
2nd Generation Sequencing
Technologies
25 / 38
2nd Generation Sequencing
26 / 38
Small Sequencing Platforms
27 / 38
Big Sequencing Platforms
28 / 38
Next (2nd) generation sequencing
improved yield considerably but Some
(amplification) steps steps cannot
avoid a certain % of errors.
3rd («next next») generation
sequencing avoids some steps which
yields less % of errors
Very promising but not fully adopted
yet!
29 / 38
Next next generation Sequencing
Next next generation Sequencing
30 / 38
Next next generation Sequencing
31 / 38
Speed Variation by Technology
32 / 38
NGS Bioinformatics
33 / 38
NGS Bioinformatics Challenges
Need for large amount of CPU power
Informatics groups must manage computer clusters
Challenges in parallelizing existing software or redesign of algorithms to work in a
parallel environment
Another level of software complexity and challenges to interoperability
VERY large text files (~10 million lines long)
Can’t do ‘business as usual’ with familiar tools such as Perl/Python.
Impossible memory usage and execution time
Impossible to browse for problems
Need sequence Quality filtering
34 / 38
Data Management Issues
Raw data are large. How long should be kept?
Processed data are manageable for most people
20 million reads (50bp) ~1Gb
More of an issue for a facility: HiSeq recommends 32 CPU cores, each with 4GB RAM
Certain studies much more data intensive than other
Whole genome sequencing
A 30X coverage genome pair (tumor/normal)~500 GB
50 genome pairs ~ 25 TB
35 / 38
Which Software for NGS?
36 / 38
Applications of Next Generation
Sequencing
37 / 38
References and Resources (1)
Some interesting resources
Sequencing Fundamentals (15-min, Illumina)
Introduction to NGS, YouSeq
Links-2-links, some may not work
38 / 38

NGS_1.1-1.4-Introducción_a_la_ultrasecuenciación.pptx

  • 1.
    Introduction to NextGeneration Sequencing Alex Sánchez-Pla Genetics, Microbiology and Statistics Department (UB) Statistics and Bioinformatics Unit (VHIR) 2011/04/13 (updated: 2021-05-15)
  • 2.
    Outline 2 / 38 1)Introduction:DNA sequencing. 2) Overview of Sequencing Methods and Technologies. 3) NGS Bioinformatics. 4) Applications of NGS.
  • 3.
  • 4.
    From the CentralDogma 4 / 38
  • 5.
    Into the OmicsCascade 5 / 38
  • 6.
    Assuming the CentralDogma and the Omics Cascade means assuming that knowing and understanding the genome sequence is key to understand life and disease. Said otherwise. "If you want to study anything: sequence it!" From genome to infinity, and beyond 6 / 38
  • 7.
    Genome sequencing Genome sequencingis }guring out the order of DNA nucleotides, or bases, in a genome, the order of As, Cs, Gs, and Ts that make up an organism's DNA. It has been developing for almost half century and has been thoroughly described. Some hints: 7 / 38 Year discov. Description Sequencing efficiency (bp/person/year 1965 Yeast tRNA sequenced 1 1977-78 Sanger Dideoxy termination and Gilbert Chemical Degradation Methods 1000 1986 Leroy Hood's Partial Automation 25.000 1992-95 Craig Venter's first sequencing 'factory' at TIGR 1000.000 Increased automation, without great changes in the fundamental ideas, led to the sequencing of the first genomes.
  • 8.
    The Human GenomeProject Follow this link to learn about the story of the Human Genome Project. 8 / 38
  • 9.
    The rise ofNext Generation Sequencing The first Human Genome was sequenced using Sanger Sequencing- and the _shotgun method. By mid of the first decade of XXI a new technology emerged. It was called Next Generation Sequencing because it served the same purpose with a series characteristics that have made it revolutionary (again): Parallelized, High Throughput, Cost-effective, Many competing technologies. By 2020 it has become a standard but it has also kept evolving: "Next-Next" Generation sequencing Single molecule (no amplification) Long Reads 9 / 38
  • 10.
    NGS: much morefor much less 10 / 38
  • 11.
    Overview of SequencingMethods and Technologies 11 / 38
  • 12.
    Resources Sequencing is adynamic process which is better visualized through animations. In order to help understand it and to promote its use many educational videos have been prepared by both companies and academics. Some interesting resources: Sequencing Fundamentals (15-min, Illumina) Introduction to NGS, YouSeq Links-2-links, some may not work 12 / 38
  • 13.
    The method isbased on the selective incorporation of chain-terminating dideoxynucleotides by DNA polymerase during in vitro DNA replication. This results in multiple chains of different sizes ending with the same nucleotides, where the size indicates the nucleortide's position. Separating the chains using gel electrophoresis allows establishing the order. Follow this link to see an animation. Sanger Sequencing 13 / 38
  • 14.
    Next Generation Sequencinggoes through the same steps as Sanger Sequencing, only it is done in more efficient ways. We consider two such steps: Cloning Sequencing 14 / 38 Sanger vs Next Generation Sequencing
  • 15.
    1) Construction ofa sequencing library Clonal amplification to generate sequencing features 15 / 38 Sanger vs Next Generation Sequencing
  • 16.
    1) Construction ofa sequencing library NGS: Clonal amplification to generate sequencing features Sanger: In vivo cloning, transformation, colony picking... 16 / 38 Sanger vs Next Generation Sequencing
  • 17.
    1)Construction of asequencing library NGS: Clonal amplification to generate sequencing features Sanger: In vivo cloning, transformation, colony picking... 2) Sequencing NGS: Array-based sequencing High degree of parallelism 17 / 38 Sanger vs Next Generation Sequencing
  • 18.
    1)Construction of asequencing library NGS: Clonal amplification to generate sequencing features Sanger: In vivo cloning, transformation, colony picking... 2) Sequencing NGS: Array-based sequencing Higher degree of parallelism Sanger Capillary-based sequencing Low degree of parallelism 18 / 38 Sanger vs Next Generation Sequencing
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
    Next (2nd) generationsequencing improved yield considerably but Some (amplification) steps steps cannot avoid a certain % of errors. 3rd («next next») generation sequencing avoids some steps which yields less % of errors Very promising but not fully adopted yet! 29 / 38 Next next generation Sequencing
  • 30.
    Next next generationSequencing 30 / 38
  • 31.
    Next next generationSequencing 31 / 38
  • 32.
    Speed Variation byTechnology 32 / 38
  • 33.
  • 34.
    NGS Bioinformatics Challenges Needfor large amount of CPU power Informatics groups must manage computer clusters Challenges in parallelizing existing software or redesign of algorithms to work in a parallel environment Another level of software complexity and challenges to interoperability VERY large text files (~10 million lines long) Can’t do ‘business as usual’ with familiar tools such as Perl/Python. Impossible memory usage and execution time Impossible to browse for problems Need sequence Quality filtering 34 / 38
  • 35.
    Data Management Issues Rawdata are large. How long should be kept? Processed data are manageable for most people 20 million reads (50bp) ~1Gb More of an issue for a facility: HiSeq recommends 32 CPU cores, each with 4GB RAM Certain studies much more data intensive than other Whole genome sequencing A 30X coverage genome pair (tumor/normal)~500 GB 50 genome pairs ~ 25 TB 35 / 38
  • 36.
    Which Software forNGS? 36 / 38
  • 37.
    Applications of NextGeneration Sequencing 37 / 38
  • 38.
    References and Resources(1) Some interesting resources Sequencing Fundamentals (15-min, Illumina) Introduction to NGS, YouSeq Links-2-links, some may not work 38 / 38