AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Next generation sequencing technologies for crop improvement
1. NEXT GENERATION SEQUENCING
AND ITS APPLICATIONS IN CROP
IMPROVEMENT
Presented by:
C. Anjali
RAD/18-37
Department of Molecular
Biology and Biotechnology
CREDIT SEMINAR
ON
COURSE NO: GP-692
2. CONTENTS
• INTRODUCTION
• HISTORY
• NEXT GENERATION SEQUENCING
1. Second generation sequencing
2. Third generation sequencing
• APPLICATIONS
• CONCLUSION
• FUTURE TRUST
3. SEQUENCING
• DNA sequencing is the process of determining
the nucleic acid sequence – the order
of nucleotides in DNA. It includes any method or
technology that is used to determine the order of the
four bases: adenine, guanine, cytosine, and thymine.
• The advent of rapid DNA sequencing methods has
greatly accelerated biological and medical research
and discovery
5. • DNA sequencing is the process of determining
the nucleic acid sequence – the order
of nucleotides in DNA. It includes any method or
technology that is used to determine the order of the
four bases: adenine, guanine, cytosine, and thymine.
• The advent of rapid DNA sequencing methods has
greatly accelerated biological and medical research
and discovery
6. Experimental Space: The Relevancy of “Classic” Techniques
Differential Gene Expression
• Northern blotting (1977) : 1 Probe – 20 samples
• Dot Blots (1987) : 100s of probes – 1 sample
• RT-PCR (1992) : 100s of probes – 10 -100 samples
• Microarrays (1995 ) : 100,000s of probes – 1 sample
• Next-gen sequencing (2005) : 10-100 x 106 reads –
1 sample
7. Next-generation sequencing
Next-generation sequencing refers to non-Sanger-based
high-throughput DNA sequencing technologies. Millions or
billions of DNA strands can be sequenced in parallel,
yielding substantially more throughput and minimizing the
need for the fragment-cloning methods that are often used in
Sanger sequencing of genomes.
http://www.nature.com/subjects/next-gneration-sequencing
9. • NGS is a general term referring to all post-Sanger
sequencing technologies that enable massive
sequencing at low cost.
• NGS may be further divided into
1. Polony-sequencing based technologies which
require the amplification of DNA prior to
sequencing, and
2. Single molecule sequencing which do not.
10. • Polony sequencing refers to all commercial
technologies except for Helicos and PacBios(SMRT).
• Polony sequencing takes place using array of
polonies, in which all amplicons of the same DNA
fragment are clustered together on the same region of
the array. These groups of amplicons were termed
polonies, shortcut for polymerase colonies.
• The degree of parallelism that can be achieved through
Sanger sequencing is only a fraction of what can be
achieved in polony sequencing
17. Bead
Emulsion PCR – (454, Ion Torrent PGM
and SOLiD) – Bead enrichment
Adapter
Complement
Sequences
The idea is that each bead should be amplified
all over with a SINGLE library fragment.
18. Generation of Polony array: DNA Beads (454,
Ion Torrent PGM and SOLiD)
DNA Beads are generated using Emulsion PCR
19. Generation of Polony array: Bridge-PCR-
Cluster generation (Solexa/Illumina)
DNA fragments are attached to array and
used as PCR templates
- Create DNA library
- Place on array
- Perform bridge-PCR (primers
are attached to an array)
- Results: ~1M colonies with ~1K
sequences at each
20. Sequencing methods
Sequencing by
synthesis
Sequencing by
ligation
Single
molecule
Sequencing
Roche 454
Pyrosequencing
Ion torrent
SOLiD
Sequencing
Nanopore
Sequencing
and Pacific
Bioscience
SMRT
sequencing
Second generation methods Third generation methods
reversible dye
terminators
Illumina
Solexa
Sequencing
FLOW CHART OF NGS
34. NGS-Applications
• De Novo sequencing (genomes, transcriptomes)-Assembly of genomes de
novo
• Transcriptomes-Gene expression analysis/splice variant identification
• Resequencing (genomes, exomes, custom sequence capture)-
Resequencing individuals in a population
• RNA-seq (mRNA, miRNA, degradome)- Non-coding RNA
characterization
• Chip-Seq- Identify protein binding sites
• Methyl-seq- Identify DNA methylated sites
• RIP-seq- Identify protein binding sites
• Other than Amplicons/ small clones
• SNP identification
• Assembly by alignment
• Metagenomics
• Metabolomics
36. RAPID MARKER DEVELOPMENT
• Molecular markers have a variety of basic and
applied applications in crop biology involving
linkage map development, quantitative trait loci
(QTL) mapping, marker assisted selection, genotype
fingerprinting, parentage analysis, genetic diversity
studies, gene flow and evolutionary studies.
• SSRs and SNPs are most widely used markers.
38. Diversity array technology
(DArT) is based on genomic
complexity reduction using
restriction enzymes followed by
hybridization to microarrays
and simultaneously assay
thousands of markers across the
genome
This technique is also benefitted from the advancement in
NGS techniques and now array based DArT markers are
replaced by NGS-DArT markers.
DArT Technology
39. TRANSCRIPTOME INVESTIGATION
• Transcriptome sequencing is the
sequencing of mRNA isolated
from different tissues of a plant at
different time intervals, which
focuses analysis of the
transcribed portion of the genome
• Successful transcriptome
sequencing studies can also be
performed in nonmodel plant
species, having no reference
genome by carefully designing
the experiment.
40.
41. PHYLOGENETIC AND ECOLOGICAL
STUDIES
• Targeted sequence capture coupled
with NGS techniques opens genomic
resources for non-model organisms,
enabling studies like polyploid
parentage, phylogeny analysis,
population divergence, gene flow and
diversity analysis.
• With the help of NGS techniques now
it possible to study phylogenetic
relationships among closely related or
recently divergent species/taxa
(having lesser variations in genomes).
42. • Allele mining is a method to discover superior
and/or novel alleles of important genes based on
the sequence information available about those
genes.
• The technique also validates the function of
specific genes controlling a trait.
• The advancement in NGS technologies has made
allele mining more effective through sequencing
method in comparison to searching new alleles
based on Eco-TIILING.
ALLELE MINING
43. Allele mining through NGS
Identification of a gene controlling a trait of interest
Primers designing
PCR run with different genotypes representing
maximum diversity for specific trait.
The amplicons are then sequenced and
analyzed for different variants.
• These new alleles are then associated with
genotype showing greater performance.
• Based on allele specific primers, these
superior alleles then can be used in crop
breeding programs
44. • Epigenetics is the study of phenotypic trait variations
without altering genomic DNA sequences.
• These are heritable variations and broadly grouped
into two types of epigenetic modifications;
1. DNA methylation
2. Post translational modifications of histone tails
• Coupled with various NGS methods, many
sequencing based approaches have been developed
and applied in epigenetic studies such as
EPIGENETIC STUDIES
45. 1. Whole genome bisulphite sequencing (WGBS)
2. Methylated DNA immunoprecipitation sequencing
(MeDIP-seq)
3. Chromatin immunoprecipitation sequencing
(ChIP-seq)
4. Tet assisted bisulphite sequencing (TAB-seq) and
5. Chromosome conformation captured sequencing
(3C-seq).
These new techniques are used to identify DNA
methylation patterns, chromatin conformation and a
broad range of protein/nucleic acid interactions
46. Whole genome bisulphite
sequencing
• Therefore, sequencing of
bisulphite treated DNA reveals
the position of methylated
cytosine, when compared to
non-treated DNA sequence .
• TGSTs can detect
methylation sites directly on
template DNA .
47. REGULATORY PROTEIN BINDING DOMAIN
PREDICTION
• Chromatin immunoprecipitation assisted with NGS sequencing is
an efficient method to study genome-wide DNA protein interaction
profile [Varshney et al., 2009].
• The arrival of more efficient NGS techniques have surpassed the
microarray based ChIP-chip, SAGE and STAGE (sequence tag
analysis of genomic enrichment) methods earlier used in such
studies.
• ChIP-seq generates tremendous amount of data, which reveal
insights of gene-regulation and epigenetic modification at a
genome-wide scale, after strong bioinformatic analysis.
• ChIP-seq involves precipitation of DNA with specific antibodies
against the target histone protein or a transcription factor (TF) and
then the precipitated DNA is isolated for further NGS analysis.
• Analysis of the sequence reads provide information about the
target site of specific histone protein or TF on a genome-wide
scale.
48. METAGENOMIC ANALYSIS
• Metagenomics is genomic analysis of whole microbial
communities by isolating DNA directly from the environmental
samples.
• This involves preparation of a shotgun or metagenomic library
and sequencing followed by complex data analysis.
• Metagenomics gives new insights to view microbial
populations as many unculturable microbes could not be
studied before using conventional methods [Knief, 2014].
• The various high throughput NGS techniques enable the deep
sequencing of the metagenome, making the identification of
less-abundant microorganisms possible.
49. SINGLE CELL GENOMICS
• Normally plant genome, its expression and regulation has
been derived by analyzing thousands or millions of cells in
bulk.
• These analyses are undoubtedly informative, but often unable
to detect any heterogeneity present within the population of
cells
• Cells acquire small amount of mutations with every cell
division and thus genomic heterogeneity within the organism
(somatic variations) is formed.
• These variations are involved in various developmental and
disease related phenomena (Biesecker et al., 2013).
• The genomic and transcriptomic variations available in single
cells are lost in conventional sequencing studies since a group
of cells are taken as starting material.
50. SINGLE CELL GENOMICS
• Improvements in single cell isolation, whole genome amplification
and NGS techniques make it possible to sequence a single cell
genome.
• Although problems associated with the whole genome amplification is
still a challenge, the upcoming TGSTs capable of single molecule
sequencing will make SCGS more accurate and fast [Macaulay and
Voet, 2014].
51. EXOME SEQUENCING
• Exome referred to the whole set of exons situated in all the
genes (including protein coding and protein non-coding genes)
present in the genome and represent small portion (1–2 %) of it.
• Exome sequencing can provide information of variants present
in coding region of the genome of a large number of
individuals with deep coverage and more cost-effective manner
[Warr et al., 2015].
• Exome sequencing is a two-step process.
1. Exome capturing
2. Sequencing
52. EXOME SEQUENCING
• Single molecule sequencing
capacity of third generation
sequencing techniques now
removes the PCR amplification step
of selected exonic sequences,
further reducing the cost of exome
sequencing.
• Exome sequencing relies on the
accuracy of the genome annotation
but for crops having poorly
annotated genomes, transcriptome
sequencing data can also be used
for it.
Basic steps in exome sequencing
Isolation and fragmentation of
genomic DNA
Selection of fragments having
exons through probe hybridization
method
PCR amplication
Sequencing
exomecapturing
54. MULTIPLE GENOME SEQUENCING
AND RESEQUENCING
• NGS techniques made to think and formulate multiple
genome sequencing of a single plant species.
• The structure of a genome is affected by a number of
evolutionary factors, including recombination,
mutation, gene conversion, selection and
polyploidization, along with various introgressions.
• Understanding the effect of these processes on
sequence variation allows to understand the genesis of
genetic diversity and study the allelic variations
responsible for phenotypic differences.
56. Discovery of SNP markers related to
pungency, bacterial wilt, blight, CMV
and Anthrocnose resistance
• A pungency related SNP (G/T) has
been detected from the expressed
sequence tags (EST).
• Moreover, the identifed SNP
marker significantly distinguished
29 cultivars (19 nonpungent and 10
pungent) of Capsicum annuum.
• The nonpungent cultivars consisted
of G allele whereas the pungent
cultivars possessed T alleles.
• Here a comprehensive overview on
the discovery of single nucleotide
polymorphism (SNP) markers
associated with diferent traits such
as pungency and disease resistance
of pepper have been demonstrated.
Hence , the availability of robust genomic and bioinformatics methods enhanced the
understanding of trait improvements in pepper crop.
58. • MON810 maize was first commercialized (GMO) in 1997
• Genetic characterization of the cry1Ab coding region allowed to identify and
quantify several sequence variants.
• Samples from seeds containing a stacked MON810 event had more variants than
MON810 single event varieties.
• Specifically, position 71 of the analyzed region varied (T instead of C) in 15 of
600 samples tested and thus appears to be a mutational hotspot.
• Epigenetic analysis revealed a low degree of methylation, making it difficult to
associate the coding region variants with methylation status.
• Overall, 20 samples of each variety were investigated to identify the MON810
insert zygosity. All samples analyzed produced two bands on the agarose gel, the
wild-type band, and the MON810-specific band, which means that they were
hemizygous for the transgene locus.
60. • Study of an allopolyploid plant with a large, highly repetitive
genome was made easier.
• Sequencing the large genomes of common wheat and its
progenitors.
• NGS-based genotyping reveals the genetic diversity of wheat where
in SNPs are identified.
• Functional annotation of wheat genes by Targeted Induced Local
Lesions in Genomes (TILLING): Application of NGS technology
allowed sequencing of exome-captured DNA EMS-induced mutants
• Sequencing RNA pools using NGS technologies (RNA-seq) allows
a thorough survey of the entire transcriptional landscape, revealing
genome-wide gene activity and alternative splicing in a quantitative
manner.
61.
62.
63.
64. IMPLICATIONS
• Nevertheless, there are significant challenges in
NGS technologies, including the difficulty of storing
and analyzing the data generated by these
technologies.
• This is mainly due to the production of a high
number of reads.
• In the coming years, new sequencing platforms will
appear producing a larger amount of data (in
Terabyte) which requires the development of new
approaches and applications capable of analyzing
this large amount of data.
65. CONCLUSION
• NGS techniques have paved the way to use
sequencing methods in numerous creative ways to
study the different plant biological processes.
• Currently, sequencing by synthesis technique
(Illumina platform) is the cheapest sequencing
method but despite its shorter read length it is
dominating the field due to its low cost, lower error
rate, fast and high throughput nature.
66. FUTURE TRUST
• In the long view of scientific history, DNA
sequencing remains a young technology. Here,
briefly consider its future in a few existing or
emerging areas.
1.Genome diversity.
2. Population scale resequencing.
3. Developmental biology.
4. Real time portable sensors.
67. References
• Ben Ali, S. E., Schamann, A., Dobrovolny, S., Indra, A., Agapito-Tenfen, S. Z.,
Hochegger, R and Brandes, C. (2018). Genetic and epigenetic characterization of the
cry1Ab coding region and its 3′ flanking genomic region in MON810 maize using next-
generation sequencing. European Food Research and Technology. 244(8), 1473–1485.
• Jia, M., Guan, J., Zhai, Z., Geng, S., Zhang, X., Mao, L and Li, A. (2017). Wheat
functional genomics in the era of next generation sequencing: An update. The Crop
Journal. 6(1): 7–14.
• Manivannan, A., Kim, J. H., Yang, E. Y., Ahn, Y. K., Lee, E. S., Choi, S and Kim, D. S.
(2018). Next-Generation Sequencing Approaches in Genome-Wide Discovery of Single
Nucleotide Polymorphism Markers Associated with Pungency and Disease Resistance
in Pepper. BioMed Research International. 2018, 1–7.