HMD_Sequencing_KIBGE_KCHI.pptx

By
Dr.Abdul Hameed
Chief Scientific Officer
IBGE, Islamabad, Pakistan

DNA SEQUENCING
• DNA sequencing is the process of
determining the precise order of
nucleotides within a DNA molecule.
• It includes any method or technology that
is used to determine the order of the four
bases—adenine, guanine, cytosine, and
thymine—in a strand of DNA.

DNA SEQUENCING METHODS
Historically there are two main
methods of DNA sequencing
1. Maxam and Gilbert method
2. Sanger method

•A. M. Maxam and W.Gilbert-1977
•Chemical Sequencing
•Treatment of DNA with certain
Chemicals  DNA cuts into
Fragments  Monitoring of
sequences
•MAXAM & GILBERT METHOD

•Principle
A graphical demonstration

• Most common approach used
for DNA sequencing .
• Invented by Frederick Sanger -
1977
• Nobel prize - 1980
• Also termed as Chain
Termination or Dideoxy method
SANGER METHOD

•SANGER METHOD
• The chain termination reaction
• Dideoxynucleotide triphosphates (ddNTPs) chain
terminators
•havig an H on the 3’C of the ribose sugar
(normally OH found in dNTPs)
• ssDNA  addition of dNTPs  elongation
• ssDNA  addition of ddNTPs  elongation stops

Fluorescent Dyes
• Fluorescent dyes are multicyclic
molecules that absorb and emit
fluorescent light at specific wavelengths.
• Examples are fluorescein and rhodamine
derivatives.
• For sequencing applications, these
molecules can be covalently linked to
nucleotides.

AC
GT
The fragments are
distinguished by size and
“color.”
•Dye Terminator Sequencing
• A distinct dye or “color” is used for each of the
four ddNTP.
• Since the terminating nucleotides can be
distinguished by color, all four reactions can be
performed in a single tube.
A
T
G
T

ABI_Sequencing Analysis 5.2 Software

•The Human Genome Project
• First draft genome of human in 2001,
final 2004
• Estimated costs $3 billion, time 13 years
• Used Sanger Sequencing
• Today:
Illumina: 1 week, 9500$
Exome: 6 weeks*, $1000
Towards 1000$ genome
Setia Pramana
18

• The draft sequence of the
HGP was imperfect
because of the incomplete
coverage of many regions
– a huge number of gaps
• The IHGSC published a
‘finished’ version of the
human genome sequence
in 2004 and the HGP was
then deemed to be
‘complete’
19

• This ‘finished’ version of the
genome achieved almost
complete coverage of all the
regions and also significantly
reduced the number of gaps
to 341 from the initial
hundreds of thousands
• Initiated a new era in the
study of genetic variation and
the functional
characterization of the
human genome
20

•Next (second) Generation Sequencing
• New technologies allowing the massive
production of tens of millions of short
sequencing fragments. Thus, it is also
called: “Massively parallel sequencing”
• These techniques could be used to
• deal with similar problems than microarrays,
• but also with many other.
• They raised the promise of personalized
medicine
21

NGS
• The advent of high-throughput
sequencing technologies has initiated
the ‘personal genome sequencing’ era
for both normal and cancer genomes
• Large-scale international projects such
as the 1000 Genomes Project and the
International Cancer Genome
Consortium
22

NGS
• NGS technologies have been on the
market only since 2004
• Have now largely replaced Sanger
sequencing technologies (owing to the
ultra-high-throughput
production/hundreds gigabases)
• Ability to simultaneously sequence
millions of DNA fragments - massively
parallel sequencing technologies
23

•NGS
• Reduced sequencing costs
significantly, making large-scale or
WGS studies much more affordable
Setia Pramana
24

• https://www.abmgood.com/marketing/knowledge_
base/next_generation_sequencing_introduction.php
?__hstc=78008651.ac2f879252631e74a7d5a792c7309
b26.1575388813433.1575388813433.1575388813433.1&
__hssc=78008651.1.1575388813436&submissionGuid=e
7693a0c-1efc-4ae4-bcdc-9ef87ccb5773

•Third Generation Sequencing
26

•Bioinformatics Challenges of NGS
Setia Pramana
27

Sequencing has gotten Cheaper and Faster
Cost of one human genome
• HGP $ 3 billion (13 yrs)
•2004: $ 30,000,000
•2008: $100,000
•2010: $ 30,000
•2011: $10,000
•2012-13: $7,000
•2014: $4,000 (~1 week)
•???: $1,000
The Race for the $1,000 Genome

equencing) Cost is Getting Cheaper
• Reduced sequencing costs significantly, making
large-scale or WGS studies much more affordable
Setia Pramana
29

•NGS Challenges
Setia Pramana
30

•Huge Data Storage and HPC
Demand

•NGS Challenges
• Highest cost is (almost) not the sequencing
but storage and analysis.
• A standard human (30-40x) whole genome
sequencing would create 100 Gb of data
• Extreme data size causes problems
• Just transferring and storing the data
• Standard comparisons fail (N*N)
• Standard tools can not be used
• Think in fast and parallel programs
Setia Pramana
32

• Need for large amount of CPU power
- Informatics groups must manage
compute clusters
-Challenges in parallelizing existing
software or redesign of algorithms to work
in a parallel environment
- Another level of software complexity
and challenges to interoperability
Setia Pramana
33

• VERY large text files (~10 million lines
long)
- Can’t do ‘business as usual’with
familiar tools such as Perl/Python.
- Impossible memory usage and
execution time - Impossible to
browse for problems
• Need sequence Quality filtering
Setia Pramana
34

•Data Management Issues
• Raw data are large. How long should be kept?
• Processed data are manageable for most people
• 20 million reads (50bp) ~1Gb
• More of an issue for a facility: HiSeq recommends
32 CPU cores, each with 4GB RAM
• Certain studies much more data intensive than
other
• Whole genome sequencing
30X coverage genome pair (tumor/normal)
~500 GB
50 genome pairs ~ 25 TB
Setia Pramana
35

•Data Management
• Primary data usually discarded soon after run
• Secondary and tertiary data maintained on fast access
disk during analysis, then moved to slower access disk
afterward

•Big Collaboration
• Need Collaborative expertise (human intelligence
and intuition) are required for meaning and
interpretation (Bergeron 2002)
• Including on-demand communication & sharing of
protocols, electronic resources, data, and findings
among the stakeholders
• Collaboration with other Big DATA sources: National
Registers, BPJS, Hospitals, etc.

•Summary
• Challenges:
• Still expensive
• Lack of Infrastructure (in developing
countries)
• Lack of skilled personal on Bioinformatics
• Need (large scale) collaborations
• Integrate different technologies and system
• Making it all clinically relevant
Setia Pramana
39

HMD_Sequencing_KIBGE_KCHI.pptx

HMD_Sequencing_KIBGE_KCHI.pptx

Recommended

Recommended

More Related Content

Similar to HMD_Sequencing_KIBGE_KCHI.pptx

Similar to HMD_Sequencing_KIBGE_KCHI.pptx (20)

Recently uploaded

Recently uploaded (20)

HMD_Sequencing_KIBGE_KCHI.pptx