Association mapping has been widely used to study the genetic basis of complex traits in human and animal systems and is a very efficient and effective method for confirming candidate genes or for identifying new genes (Altshuler et al., 2008). Association mapping is now being increasingly used in a wide range of plants (Rafalski, 2010), where it appears to be more powerful than in humans or animals (Zhu et al., 2008). Unlike linkage mapping, association mapping can explore all the recombination events and mutations in a given population and with a higher resolution (Yu and Buckler, 2006). However, association mapping has a lower power to detect rare alleles in a population, even those with large effects, than linkage mapping (Hill et al., 2008). Yan et al., (2010) demonstrated that the gene encoding β-carotene hydroxylase 1 (crtRB1) underlies a principal quantitative trait locus associated with β-carotene concentration and conversion in maize kernels has been identified through candidate gene strategy of association mapping.
2. What is association mapping?
„‟Association genetics‟‟ or „‟association studies,” or
„‟linkage disequilibrium mapping”
(Oraguzie et al. 2007)
Tool to resolve complex trait variation down to the
sequence level by exploiting historical and
evolutionary recombination events at the population
level.
(Nordborg & Tavare, 2002; Risch& Merikangas, 1996).
3. LD mapping detects and locates quantitative
trait loci (QTL) by the strength of the correlation
between a trait and a marker.
Offers greater precision in QTL location than
family-based linkage analysis
More efficient marker-assisted selection,
facilitate gene discovery.
Does not require family or pedigree information ,
can be applied to a range of experimental and non-
experimental populations.
Care must be taken during analysis to control for
the increased rate of false positive results arising.
(Mackay and Powell, 2007)
4. Why association mapping..?
New Resolve trait Identifying novel
tool variation down to and superior
sequence level alleles
Sequencing
technologies
Sequencing, markedly
gene expression reduced the cost
Genomic profiling,
Technology comparative
genomics Annotated
genome
sequence from
model species
Harnesses genetic
Natural diversity of
Diversity natural populations
to individual
nucleotides
Hansen et al.,2001 ; Kraakman et al., 2006
5. Diversity Panel
Genomic technologies
for high-throughput
genome sequencing
Zhu et al., 2008
6. Comparison of Association Genetics and
Conventional QTL mapping
Attribute QTL mapping Association genetics
Detection goal Quantitative trait Quantitative trait
locus, i.e., wide nucleotide, i.e.,
region within specific physically as close as
pedigrees within which possible to causative
a QTL is located sequence(s)
Resolution of Low – moderate High – disequilibrium
causative Trait density linkage maps within small physical
polymorphism only required regions requiring
many markers
Marker Moderate Moderate for few
discovery costs traits, high for
many traits
7. Attribute QTL mapping Association genetics
Experimental Defined pedigrees, Unrelated individuals
populations e.g., backcross, (“unstructured”
for detection F2, RI, three and two populations), large
generation numbers of small
pedigrees/families, unrelated families
half-sib families, etc.
Number of 102–low 103 105 for small genomes
markers ~109 for
required for large genomes
genome
coverage
9. Advantages of Using Natural Population
Broader genetic variations with wider
background for marker-trait correlations .
Higher resolution mapping
( recombination events)
Exploiting historically measured trait data for
association.
No need for the development of expensive
and tedious bi-parental populations
(Kraakman 2006 ; Hansen, 2001)
11. Scheme of association mapping or tagging a gene
of interest using germplasm accessions.
(Nordborg et al., 2005)
12. Types of Association Mapping
Genome-wide Association Mapping (GWAS)
Comprehensive approach to systematically
search the genome for causal genetic
variation.
Large no of markers are tested for
association with complex traits.
Prior information regarding candidate
gene is not required
Works best for a research consortium with
complementary expertise & adequate
funding.
13. Candidate- gene association mapping
Candidate genes selected based on knowledge
from mutational analysis, biochemical
pathway, or linkage analysis
Independent set of random markers needs to
be scored to infer genetic relationships.
Low cost, hypothesis driven, and trait specific
approach but will miss other unknown loci.
(Zhu et al., 2010).
14. Principle Of Association Mapping is
Linkage disequilibrium (LD)
Linkage refers to coinheritance of
different loci within a genetic distance
on the chromosome.
LE is a random association of alleles at
different loci and equals the product of
allele frequencies within haplotypes.
LD is a non-random association of
alleles at different loci, describing the
condition with non-equal frequency of
haplotypes in a population.
Oraguzie et al.,2007
15. Concept of LD
Linkage disequilibrium also referred as “gametic phase
disequilibrium” (GPD) or “gametic disequilibrium” (GLD)
first described by Jennings in 1917, and its
quantification (D i.e. coefficient of LD) was developed
by Lewtonin in 1964.
D is the difference between the observed gametic
frequencies of haplotypes and the expected gametic
haplotype frequencies under linkage equilibrium.
D = P AB − PAPB = PAB Pab − PAbPaB
Besides D, a various different measures of LD are D,
r2, D2, D∗
(Oraguzie ., 2007)
16. Choosing appropriate LD measures depends on the
objective of the study.
r2, the square of the correlation coefficient
between the two loci.
r2 is affected by mutation and recombination
D is affected by more mutational histories.
The r2 value varies from 0 to 1.
The r2 value of equal to 0.1 (10%) or above
considered the significant.
(Abdurakhmonov and Abdukarimov, 2008)
17. Calculation and visualization of LD:
LD triangle and decay plots
LD can be calculated using haplotyping
algorithms.
Maximum likelihood estimate (MLE)
using an expectation maximization
algorithm.
Graphical display of pairwise LD between
two loci is useful to estimate the LD
patterns measured using a large number
of molecular markers.
(Abdurakhmonov and Abdukarimov, 2008)
18. Software used for calculation of LD
“Graphical overview of linkage disequilibrium”
(GOLD) to depict the structure and pattern of
LD.
“Trait Analysis by aSSociation, Evolution and
Linkage” (TASSEL) and PowerMarker
19. The TASSEL generated triangle plot for pairwise LD
Each cell represents the comparison of two pairs of marker sites with
the colour codes for the presence of significant LD.
(Abdurakhmonov and Abdukarimov,
20. LD decay plot
To estimate the size of LD blocks, the r 2
values (alternatively, D can also be used)
usually plotted against the genetic (cM)
or weighted (bp) distance referred to as
a “LD decay plot”.
(Abdurakhmonov and Abdukarimov, 2008)
21. Factors affecting LD & association mapping
Mutation and recombination are one of the
strong impact factors influencing LD.
Factors Increasing LD:
New mutation, mating system (self-pollination),
genetic isolation, population structure,
relatedness (kinship), genetic drift, admixture,
selection (natural, artificial).
Factors Decreasing LD:
High recombination and mutation rate, recurrent
mutations, outcrossing
(Huttley et al., 2005).
22. Need of Association Mapping in MAIZE ?
Source of cooking oil, biofuel and animal feed.
Model organism for cytogenetics, genetics,
genomics, and functional genomics studies.
(Strable and Scanlon, 2009).
Primary staple food in many African countries.
Map-based cloning of QTLs is time consuming and
expensive process in Maize .
Association mapping can explore all recombination
events and mutations in a given population and with
a higher resolution .
(Yu and Buckler, 2006)
23. Examples of the range of phenotypic variation in maize
germplasm held in the CIMMYT genebank (Dr. Suketoshi
Taba).
24. Nested Association Mapping(NAM)
Joint linkage and linkage disequilibrium mapping
have been proposed as “Fine Mapping’’ approach.
(Mott and Flint, 2002; Wu et al., 2002)
NAM is currently implemented in maize.
Powerful strategy for dissecting the genetic
basis of quantitative traits in species with low LD.
For other crop species, different genetic designs
(e.g., diallel, eight-way cross) could be used to
accommodate the level of LD.
NAM allows high power, cost effective genome
scans, and facilitates to link molecular variation
with complex trait variation.
(Yu et al., 2008)
29. crtRB1 is the target gene
Zea mays crtRB1 is the target gene in the present study.
translated exons are depicted as black boxes .
30. Methods
Germplasm evaluation
Panel 1 (P1): 281 maize inbred lines grown in Urbana,
Illinois (USA) in 2002–2005.
Panel 2 (P2): 245 diverse maize inbred lines derived
from tropical and subtropical adapted maize
germplasm.
Panel 3 (P3): 55 diverse maize inbred lines derived
from temperate-adapted maize germplasm.
31. Carotenoid Quantification
HPLC analysis:
Extraction of carotenoids for all segregating mapping
populations was carried out by HPLC analysis.
(Kurilich and Juvik, 1999).
Population structure and kinship analysis
Population structure and kinship for P1 was estimated
using 89 simple sequence repeat (SSR) markers and 553
SNP markers, respectively
(Yu et al., 2006).
STRUCTURE 2.1 was used to estimate the population
structure of P2 and P3 using 46 and 86 SSRs,
respectively.
32. Linkage mapping and QTL mapping
crtRB1 was mapped via genetic linkage mapping
in a RIL population derived from B73 and
BY80415, using the crtRB1 3′TE polymorphism.
QTL analysis in this population was done using
QTL Cartographer 2.5
(Wang et al.,2005).
33. Statistical analysis
Association analysis was carried out using a
mixed model incorporating kinship and population
structure as implemented in TASSEL2.1
(Bradbury, et al., 2007).
LD analysis was carried out using TASSEL2.1
with the entire sequence of crtRB1; a window
size of 50 bp was used to plot the average r2
against the distance.
34. 5′TE allelic series: 1, 397-bp insertion; 2, 206-bp insertion; 3, 0-bp
insertion.
InDel4 allelic series: 12-bp or 0-bp insertion.
3′TE allelic series: 1, no insertion; carried outinsertion; 3, 1,250-bp
P value from association analysis 2, 325-bp using the mixed model
insertion.
incorporating population structure and kinship, using data from 4 different
years.
R2 values from analysis of variance (ANOVA) of data showing
percentage phenotypic variation .
35. Haplotype is shown as linear combination
5′TE allele (1, 397-bp insertion; 2, 206-bp insertion; 3, 0-bp insertion),
InDel4 allele (12-bp or 0-bp insertion),
3′TE allele (1, no insertion; 2, 325-bp insertion; 3, 1,250-bp insertion).
36. Allele-specific crtRB1 effects on biochemical activity and
transcriptional expression.
CrtRB1 quantitative RT-PCR from whole kernel at 15 d after
pollination (DAP) and seedling leaf messenger RNA for the six
indicated lines of Zea mays.
37. β-carotene hydroxylase product profiles for the four CRTRB1
allozymes expressed in a recombinant E. coli assay system
producing β-carotene. Genetic variation for each allozyme is
listed according to InDel4 and C-terminal (3′TE) differences.
38.
39. Whole genome scan association mapping for
oleic acid content
To identify loci with major effect on oleic acid
content in maize kernels.
8,590 loci were tested for association with oleic
acid content in 553 maize inbreds.
A single locus with major effect on oleic acid
was mapped between 380 and 384 cM in the
IBM2 neighbors genetic map onchromosome 4
and conWrmed in a biparental population.
40. Fatty acid desaturase, fad2, idenntified >2 kb
from the associated genetic marker, is the
most likely candidate gene responsible for the
difference in the phenotype.
Non-conservative amino acid polymorphism near
the active site of fad2 contributes to the effect
on oleic acid content.
First report on use of a high resolution whole
genome scan association mapping.
41. Materials and Methods
Whole genome scan association
mapping
Single nucleotide polymorphism(SNP)
haplotypes at 8,590 genetic loci were genotyped
in 553 maize inbred lines.
Statistical test for association between
haplotypes and the and the embryo oleic acid
was performed by STRUCTURE program
(Pritchard et al. 2000).
LD was computed between the locus of interest
and all other loci using r2 (Devlin and Risch
1995).
43. Comparison of Low-oleic Acid Content (Lo) Against
High-oleic Acid Content (Ho) Alleles of fad2.
Boxes domain regions of the protein sequence.
Horizontal grey arrows in both sequences coding region.
Vertical bars nucleotide polymorphisms between both alleles
half-length vertical bars synonymous substitutions.
Triangles amino acid substitutions
Lines across both sequences deletions and insertions.
Black triangle non-conservative amino acid substitution of a leucine by
44. Association mapping of the markers MZA10924, MZA4015, and
MZA5102 (top) and linkage disequilibrium (LD) of all markers against the
MZA10924 (bottom).
vertical scale negative logarithm of the association mapping P-value
statistics
horizontal scale genetic position in cM from Pioneer‟s genetic map.
Editor's Notes
Association mapping through linkage disequilibrium (LD) analysis is a powerful tool for the dissection of complex agronomic traits and for the identification of alleles that can contribute to the enhancement of a target trait in maize. With the developments of high throughput genotyping techniques and advanced statistical approaches as well as the assembling and characterization of multiple association mapping panels, maize has become the model crop for association analysis (Yan et al., 2011). Association mapping has been widely used to study the genetic basis of complex traits in human and animal systems and is a very efficient and effective method for confirming candidate genes or for identifying new genes (Altshuleret al., 2008). Association mapping is now being increasingly used in a wide range of plants (Rafalski, 2010), where it appears to be more powerful than in humans or animals (Zhu et al., 2008). Unlike linkage mapping, association mapping can explore all the recombination events and mutations in a given population and with a higher resolution (Yu and Buckler, 2006). However, association mapping has a lower power to detect rare alleles in a population, even those with large effects, than linkage mapping (Hill et al., 2008). Yan et al., (2010) demonstrated that the gene encoding β-carotene hydroxylase 1 (crtRB1) underlies a principal quantitative trait locus associated with β-carotene concentration and conversion in maize kernels has been identified through candidate gene strategy of association mapping
Genomic technologies for high-throughput genome sequencingand genotyping made it more affordable to obtain a large amount of marker data across a large diversity panel for complex trait dissectionand superior allele mining
comparison of linkage analysis with designed mapping populations and association mapping with diverse collections.
Association mapping panel constitutes the genotypes sampled for capturing maximum amount of genetic variation.
LD blocks are very useful in association mapping when sizes are calculated, which suggest the needs for the minimum number of markers to efficiently cover the genome-wide haplotype blocks in association mapping