Artificial Intelligence In Microbiology by Dr. Prince C P
Bipasha datta allele
1. Speaker
Bipasha Datta
M. Sc. (Agri.)
Genetics and Plant Breeding
“Allele Mining: An Advanced Technique of
Crop Improvement”
WELCOME
2. Contents
What is allele mining ?
Steps in Development of Allele Mining Set
Techniques: (1) Eco-TILLING based allele mining
(2) Sequencing based allele mining
(3) Association mapping based allele mining
Bioinformatics tools required for allele mining
Considerations for allele mining
Applications of allele mining
Challenges in allele mining
Conclusion
Generation Challenge Programme
3. Crop Improvement Strategies based on the generation and
harnessing of genetic variation
Cieslak (2017)Vienna, Austria 3
4. Comparison between previous and current crop improvement
strategies
Wang et al. (2017)USA
Valuable
materials
Pre-breeding program
Elite breeding program
Previousstrategy
Limited-
characterized core
collection
Plant genetic resources
New
variations
Valuable
materials
Pre-breeding program
Elite breeding program
Genome
editing
Well-characterized
and understood
collections
Plant genetic resources
Global coordinated information
platform and network
Current strategy
4
5. Why we need germplasm conservation?
http://www.biologydiscussion.com
Germplasm provides the raw material for the breeder to
develop various crops. Thus, conservation of germplasm
assumes significance in all breeding programmes.
Alternatives to storing seeds and whole organisms.
Germplasm collections can range from collections of
wild species to elite, domesticated breeding lines that
have undergone extensive human selection.
Germplasm collection is important for the maintenance
of biological diversity and food security.
1
2
3
4
5
5
As a way of shaping and using genetic information,
plant breeding has implication for germplasm
conservation and use.
6. www.nbpgr.ernet.in
Table 1: The National Gene bank conserves germplasm as per the
gene bank standards as base collections at –18oC.
Crop group Species Accessions
Cereals 90 1,56,526
Millets and forages 178 56,472
Pseudo cereals 30 6,825
Grain legumes 69 58,160
Oilseeds 58 57,479
Fibre crops 51 11,943
Vegetables 151 25,084
Fruits 35 530
Medicinal and aromatic plants 661 6,771
Spices and condiments 17 3,721
Agroforestry species 244 2,443
Duplicate safety samples - 10,235
Total 1,584 3,96,189
6
7. Huge number of
accessions are held
collectively by gene
banks.
These harbor a wealth
of undisclosed allelic
variants.
“The challenge is
how to
unlock this
variation
????”Solution:
Allele
mining
7
8. ALLELE- An alternative form of
gene and are located on the
same locus of homologous
chromosomes
MINING- searching
ALLELE + MINING- searching for
different alleles.
An approach that is used to dissect naturally occurring allelic
variations or suitable alleles of a candidate gene controlling key
agronomic traits which has potential in crop improvement
(Sharma et al., 2014).
It is a research field aimed at identifying allelic variation of
relevant traits within genetic resource collections (Sarika and
Amruta, 2014). It is a finding of superior allele from the natural
population.
8
Allele
mining???
9. It helps in tracing the evolution of alleles.
It helps in identification of new haplotypes and
development of allele-specific markers for use in
marker-assisted selection (MAS).
Direct access to alleles conferring:
Resistance/tolerance to biotic stresses
Resistance/tolerance to abiotic stresses
Greater nutrient use efficiency
Enhanced yield
Improved quality
What is the need for allele mining ?
Kumari et al. (2018)Pusa, Bihar 9
10. There are thousands Of
accessions in the
germplasm bank.
How to find the
favorable genes
from the huge numbers of
plant
germplasm for plant
breeding?
10
11. Gokidi et al. (2017)
Diversity estimates
Allele mining set
Mini core set (about 10% of the core or 1% of the
entire collection)
Core collection (about 10% of entire collection) (Efficient
minimum collection of genotypes with least reduction in diversity
estimates)
BHU, Varanasi
Gene bank collection (Entire collection)
Morphological characterization Molecular characterization
11
Steps In Development of Allele Mining
Set
13. Normal sequenceof nucleotides
ACACACACACAC
TGTGTGTGTGTG
ACACAC
TGTGTG
Changein sequenceofnucleotides
Plant A
Plant B
Mutations occurs in the genic regions of the genome either as
single nucleotide polymorphism (SNP) or as insertion and
deletion (InDel).
Evolution of new alleles
How mutation does these variations occur?
Reddy et al. (2014)KAU, Kerala 13
14. 14
Analysis of non-coding and regulatory
regions of the candidate genes in addition to
analyzing sequence variations in the coding regions
of genes
True allele mining includes coding, non-coding and
regulatory regions of a gene
What do you mean by True allele mining?
Sharma et al. (2014)Meerut, UP
15. Allele mining True allele mining
Allele mining only focus
on the identification of
nucleotide sequence
variation in the coding or
exon region of gene.
True allele mining
include analysis of non-
coding and regulatory
regions of the candidate
genes in addition to
analysing sequence
variations in the coding
regions (Rangan et al.,
1999).
BHU, Varanasi Gokidi et al. (2017)15
Difference between allele mining and true allele
mining
16. Aim:
To identify nucleotide variation in transcription factor binding
motif (TFBM).
Frequency and location of regulatory elements binding sites
(REBS) in promoter regions.
Information generated:
Knowledge on conservation of regulatory motifs specific to
particular differentiated cell/tissue.
Insight into mechanism of gene expression.
Isolation of novel and efficient promoters for use in genetic
engineering.
Promoter mining
Gokidi et al. (2017)BHU, Varanasi 16
17. Approaches for allele mining:
II. Sequencing based
allele mining.
III. Association
mapping based allele
mining
I. Modified tilling technique also known as
Eco-tilling based allele mining.
Kumari et al. (2018)Pusa, Bihar 17
18. TILLING first began-1990 by Claire McCallum
on Arabidopsis thaliana
TILLING (Targeting Induced Local Lesions in Genomics):
It is a valuable and non-transgenic reverse genetic strategy to
study gene function that allows screening for mutations in
genes with known sequences in a plant mutant population or
allows rapid mutational screening to obtain induced lesions in
a gene of interest (McCallum et al., 2000).
A technique that can identify polymorphisms (more
specifically point mutations) resulting from induced mutations
in a target gene by heteroduplex analysis (Till et al., 2003).
Gokidi et al. (2017)BHU, Varanasi 18
TILLING
19. Gene segments are amplified using gene specific
primers and products are denatured and reannealed to
form heteroduplex between the mutated sequences and
its wild type counterpart.
These heteroduplexes are substrates for cleavage by the
endonucleases CEL-1.
Principle of TILLING
Comai et al. (2004)USA 19
20. Requirement of TILLING
1. Chemical mutagenic agents (EMS)
2. Targeted gene sequence information
3. Two differential 5’ end labelled dye (dye 700
& 800)
4. Specific nuclease (CEL-1, S1 and mungbean
nuclease)
5. Li-Cor genotyper (Li-Cor, USA)
Gokidi et al. (2017)BHU, Varanasi 20
21. Procedure of TILLING-based allele mining
1. Development of a mutagenized (mutated) population
2. DNA extraction from individual and creation of DNA
pools, and
3. Mutation discovery (Till et al., 2007) or screening the
population for induced mutations using different
procedures, For Example,
• Cleavage by specific endonuclease,
• Denaturing high-performance liquid chromatography
(DHPLC) or
• High throughput sequencing.
BHU, Varanasi 21 Gokidi et al. (2017)
22. Advantages of TILLING approaches
1. It is suitable for most plants and enables the identification of single-base-
pair (bp) allelic variation in a target gene in a high-throughput manner.
2. It has several benefits over other techniques such as single-strand
conformation polymorphism (SSCP), denaturing gradient gel
electrophoresis (DGGE) which are used to detect single-base pair
polymorphisms (De Francesco and Perkel, 2001).
3. It is also superior over the Array based hybridization techniques (ABHT)
(According to Borevitz et al., 2003, ABHT is only effective in identifying
approximately 50% of SNPs).
4. RNAi-based gene silencing (According to Sabetta et al., 2011, it needs the
generation of transgenic plants) in identifying SNPs in specific genes or
genomic regions whereas TILLING requires mutant plant population.
BHU, Varanasi 22 Gokidi et al. (2017)
24. Allows natural alleles at a locus to be characterized across
many germplasm accessions, enabling both SNP discovery
and haplotyping at these loci (Comai et al. 2004) .
I. Eco-TILLING Based Allele Mining
Eco-TILLING is essentially same as TILLING except that
the mutations are not induced artificially and are detected
from naturally occurring alleles in the primary and
secondary crop gene pools (Comai et al., 2004; Comai and
Henikoff, 2006).
Kumar et al. (2010)DRR, Hyderabad 24
27. Table 3: Eco-Tilling approach in various crops
Crops Genes Functions/Remarks
Brassica
species
FAE 1 Control of erucic acid synthesis in Brassica spp
Oryza sativa Alk Pi-ta and
Pid3
Encodes soluble starch synthase II, a Blast
resistance factor
Arachis
duranensis
Ara d 2.01 Seed storage protein
Triticum
aestivum
Pin a-D1 and
Pin b-D1
Kernal hardness
Capsicum
annum
eIF4E and
eIF(iso)4E
Resistant to RNA plant virus.
Cucumis
species
eIF4E gene A new allele identified for resistant to MNSV
Hordeum
vulgare
mlo and mla Powdery Mildew resistance genes
Gokidi et al. (2017)BHU, Varanasi 27
28. It is ideal for the identification of natural variance within populations
or even natural mutations within germplasm without using
mutagenesis (Comai et al., 2004).
It helps to identify DNA polymorphism in the form of SNPs, small
insertions and deletions (InDels), haplotypes at gene of interest and
variation in microsatellite (SSR) repeat number
(Comai et al., 2004, Till et al., 2006).
This is cost effective because only one individual for each haplotype
need to be sequenced (Simsek and Kacar, 2010).
It is applicable to any organism i.e., heterozygous or polyploidy and
has practical application in the searching for resistance to new viruses
or to create genetic diversity (Kurowska et al., 2011).
Gokidi et al. (2017)BHU, Varanasi 28
Advantages of Eco-Tilling Based Allele Mining
29. This technique involves amplification of alleles in diverse
genotypes through PCR followed by identification of
nucleotide variation by DNA sequencing.
Sequencing-based allele mining would help to analyze
individuals for haplotype structure and diversity to infer
genetic association studies in plants.
Kumar et al. (2010)DRR, Hyderabad 39
Frederick Sanger
1958: First to sequence a protein (Insulin)
1980: Sequencing of nucleic acids
(Along with Paul Berg and Walter
Gilbert )
II. Sequencing Based Allele Mining:
31. NGS techniques are used for resequencing, alignment of
the sequence data and their comparison with reference
genome.
The first of this type was commercialized by 454 Life
Sciences and this technique relied on pyrosequencing
while eliminating the need for cloning.
With this 454 sequencing platform, it is possible to
produce 100 Mb of sequence with 99.5% accuracy and
increase read length averaging over 250 bases
(Margulies et al., 2005).
Next generation sequencing (NGS) technologies
in allele mining
Reddy et al. (2014)KAU, Kerala 41
33. S
T
E
P
S
Selection of target trait
Identification of accessions with desired phenotypic trait
Target gene
Primer designing for whole length of gene
PCR amplification of the gene
Sequencing and finding variation
Sequencing based AM
PCR
Modified TILLING-based AM
PCR
Heteroduplexing
Nuclease cleavage Sequencing and identification
of variationLi-Cor gels and SNP identification
Comparisonof alleleminingtechniques
Identification of superior alleles
Kumar et al. (2010)DRR, Hyderabad 43
34. Advantages of Sequencing Based Allele Mining
1. Various alleles among the cultivars can be identified.
2. It would help to analyze individuals for haplotype structure
and diversity to infer genetic association studies in plants.
3. It would help to recognize the effect of mutations on gene
structure and the sequences are analyzed for the location of
point mutations or SNPs and insertions or deletions (InDel) to
construct haplotypes.
4. Unlike TILLING and Eco-Tilling, it does not require much
sophisticated equipment or involve tedious steps (Ramkumar
et al., 2010).
BHU, Varanasi 44 Gokidi et al. (2017)
35. Sl.
No.
Crop
species
SNPs
mined
NSG Platform
used
Reference
1 Arabdiopsis 8,23,325 Solexa Genome
Analyzer
Ossowski et al.
(2008)
2 Finger
millet
1415 Helicos heliscope Gimode et al.
(2016)
3 Peanut 53,257 Solexa Genome
Analyzer
Zhou et al.
(2014)
4 Rice 67,051 Solexa Genome
Analyzer
Yamamoto et al.
(2010)
5 Tomato 8978 Solexa Genome
Analyzer
Bhardwaz et al.
(2016)
Table 7: Mining of SNP using NSG platform
Gokidi et al. (2017)BHU, Varanasi 45
36. Table 8: Comparison Between Eco-TILLING and Sequencing Based
Allele Mining
Sr.
No.
Parameters Eco-Tilling Sequencing based Allele
mining
1. Technical
expertise
Requires high technical expertise Require less expertise
2. Complexity More complex Less complex
3. Efficiency Less efficient Highly efficient
4. Utility Proposed as effective in
detection of SNPs rather than
InDels
Effective in detection of any
type of nucleotide
5. Cost per
data point
Comparatively high Comparatively less cost is
involved
6. Time Requires more time Comparatively less time
7. Throughput Associated complexity reduces
the throughput only less sample
can be processed
Throughput and sample size
increases with and massively
parallel sequencing platforms
46 Sarika et al. (2014)BHU, Varanasi
37. • Association mapping is a high-resolution method for mapping
quantitative trait loci based on principle of linkage
disequilibrium that holds a great promise for the dissection of
complex genetic traits (Buckler and Thornsberry, 2002).
• It is a powerful tool for the dissection of complex agronomic
traits and for the identification of alleles. It is a very efficient
and effective method for confirming candidate genes or for
identifying new genes.
• It is also known as Linkage disequilibrium.
• Both linkage analysis and association studies rely on co-
inheritance of functional polymorphisms and neighbouring
DNA variants.
III. Association Mapping Based Allele Mining
Kushwaha et al. (2017)Pantnagar 53
38. Linkage disequilibrium
There are only a few opportunities for
recombination to occur within families
and pedigree with known ancestry.
Relatively low mapping resolution.
Two genes are inherited completely
independently in each generation.
Historical recombination and natural
genetic diversity are exploited.
High resolution mapping.
Certain alleles of each gene from the
two gene are inherited together more
often than would be expected by
chance.
Fig. 6: (a) Using F2 design (b) Showing only in haplotypes
Zhu et al. (2008)USA 54
Linkage equilibrium
Availablevariation
39. The strategy is to establish regions of
the genome associated with critical
phenotypes by association or linkage
disequilibrium mapping.
The approach relies on the
assumption that an allele responsible
for a phenotype, along with the
markers which flank the locus, are
inherited as a block.
DNA markers has been suggested as a
means to identify useful alleles in the
vast reservoirs of genetic diversity.
Kumari et al. (2018)Pusa, Bihar 55
Principle of Association Mapping
All the alleles at a time
Fig. 7: Pick the best alleles
along with markers for
breeding
Allele Mining
40. Zhu et al. (2008)
Candidate-gene
association
mapping
It relates polymorphisms
in selected candidate
genes that have
purported roles in
controlling phenotypic
variation for specific
traits (Tanksley and
McCouch, 1995).
Genome-wide
association
mapping
It surveys genetic
variation in the whole
genome to find signals
of association for
various complex traits
(Risch and Merikangas,
1996).
Types of Association Mapping
USA 56
41. Why Association Mapping?
Kraakman et al. (2006)Netherland
New tool
Genomic
technology
Natural
diversity
Resolve trait
variation down to
sequence level
Sequencing, gene
expression, profiling,
comparative genomics
Harnesses genetic diversity
of natural populations to
individual nucleotides
Identifying novel and
superior alleles
Sequencing
technologies markedly
reduced the cost
Annoted genome
sequence from
model species
57
43. Advantages of Association Mapping
1. Increased mapping resolution:
It can map quantitative traits with high resolution in a way that is
statistically very powerful. The resolution of the mapping depends on
the extent of LD, or non-random association of markers, that has
occurred across the genome.
2. Reduced research time and cost:
It does not required the development of expensive and tedious
biparental populations that makes approach time saving and cost-
effective.
3. Greater allele number (Yu and Buckler, 2006):
It offers the opportunity to investigate diverse genetic material and
potentially identify multiple alleles and mechanisms of underlying
traits.
Zhu et al., (2008)USA 59
44. 1
Allele mining requires various sophisticated
bioinformatics tools for:
Handling the complex nucleotide data.
Prediction of functional or structural components of
complex macromolecules.
Prediction of transcription factor binding sites.
To predict the amino acid changes which are responsible
for changes in encoded protein structure and function.
Pusa, Bihar Kumari et al., (2018)65
2
These tools are useful for sequence alignment in order to
compare new genome sequence to reference genome i.e.,
sequenced genome data (Reddy et al., 2014).
Bioinformatics Tools Required For Allele
Mining
45. Table 12: List of bioinformatics tools/databases useful for
allele/promoter mining
Name Utility Reference
Plant CARE Plant cis acting regulatory elements
database
Lescot et al. (2002)
Plantprom
DB
Plant promoter database Shahmuradov et al.
(2003)
EPD Eukaryotic Promoter Database Schmid et al. (2004)
MATinspector To predict TFBS and of promoter analysis Cartharius et al.
(2005)
MEME Motif discovery tool Bailey et al. (2006)
TRED Collection of mammalian regulatory
elements
Jiang et al. (2007)
JASPAR Transcription Factor Binding Site Database Bryne et al. (2008)
FastPCR Nucleotide sequence analysis and primer
design
Kalendar (2009)
Gokidi et al. (2017)BHU, Varanasi 66
46. G
CPThe GCP is a global crop research consortium directed
toward crop improvement through the application of
comparative biology and genetic resources
characterization to plant breeding (Bruskiewich et al.
2008).
The Generation Challenge Programme has
five sub-programmes:
Genetic Diversity of Global Genetic Resources
Comparative Genomics for Gene Discovery
Trait Capture for Crop Improvement
Genetic Resources, Genomic, and Crop Information Systems
Capacity Building and Enabling Delivery
Generation Challenge Programme (GCP) of
CGIAR:
Vroom, (2009)Reflexive Biotechnology Development 67
47. BHU, Varanasi Sarika et al. (2014)
Considerations For Allele Mining
Germplasm identification Crop species
Phenotypic characterization Candidate gene targets
Primer design Sequencing analysis for
detection of SNPs and InDels
Reference collection representing
maximum genetic diversity in a
minimum possible number.
Sequencing of single and clear
amplicons.
Availability of genetic map,
segregation populations, gene
sequences, availability of mini
core collections.
A priority gene list considering
the traits targeted.
Precise phenotyping using
efficient protocols in suitable
environmental condition.
Primer pairs with a minimum
overlap of 100-200 bp to
maximize their utility for SNP
detection and sequencing.
1
5 6
43
2
68
48. Allele Mining
6. Promoter mining
5. Evolution study
1. Functional molecular
marker
development for MAS
4. Similarity analysis-
inter and intra species
3. Discovery of superior alleles
2. Identification of new
haplotypes
6.2 Gene prediction6.1 Expression study
Applications of Allele Mining
Sharma et al. (2014)Meerut, UP 69