2. Contents
• Introduction to association
mapping
• Terminologies
• Comparison of AM v/s BM
GWAS
• Introduction
• Methodology
• Challenges-Conducting GWAS
• Opportunities
• Success Stories
• Conclusion
Phenotypic variation
Multiple
QTL’ s
Environment
QTL ×
Environment
Complex traits
3. Association mapping
• It is a high-resolution method for mapping quantitative trait loci (QTLs) based on
principle of linkage disequilibrium that holds a great promise for the dissection of
complex genetic traits (Buckler, 2002)
• The mapping population consists of a diverse set of lines
• Also called as linkage disequilibrium mapping
• A natural population survey to study the marker-trait associations
• Exploits evolutionary and historical recombination events at the population level.
Could be answer and alternative to family based mapping
To dissect complex traits
4.
5.
6.
7. Association
mapping
Diverse germplasm selection
Phenotyping Genotyping
Marker trait association
LD measurement
Marker Identification and
association with traits
Different environment
Multiple replications
Population structure
and relatedness
measurment
Steps in association mapping
9. GWAS
Candidate gene association mapping
• The markers used for genotyping are
distributed, preferably evenly and densely
over the whole genome.
• All the loci involved in the control of all
the traits showing variation in the sample
can be evaluated in one go.
• Analysis is restricted to the genomic
regions having the candidate genes/ QTLs
for the trait(s) of interest.
• This greatly reduces the target genomic
region, which can be analyzed with a high
density of molecular markers.
From Candidate genes to genome-wide studies
Zu et al, 2009
11. Feature Linkage Mapping Association Mapping
QTL effect size Effective for moderate to large effect QTL’s Effective for QTLs with much smaller effect
Number of alleles
detected per locus
Only two alleles can be detected All the alleles present in the sample can be
detected
Populations used for
mapping
Produced by crossing selected parents Natural populations, breeding materials,
germplasm lines, lines from multiple crosses
Recombination
events exploited
Those occurring after the crosses are made All the recombination events that occurred
since the LD was created
Mapping is based on Recombination frequency Linkage disequilibrium (LD) between the loci
Mapping resolution Low High
Identified markers
linked to QTL/gene
Few to several centimorgans away from
gene/QTL
Much closer than those by linkage mapping
Linkage Mapping V/S Association Mapping
Yu et al, 2006
12. Important Terminologies
• False negative: the declaration of an outcome as statistically non-
significant, when the effect is actually genuine
• False positive: the declaration of an outcome as statistically significant,
when there is no true effect
• Linkage: refers to coinheritance of different loci within a genetic distance
on the chromosome
13. GENOME WIDE ASSOCIATION STUDY
• In this study the markers used for genotyping are distributed,
preferably densely and evenly over the whole genome
• All the loci involved in the control of traits showing variation
in the sample can be evaluated in one go
• Identifies markers much closer to the trait of interest
• Discover genotype-phenotype association
17. What is linkage disequilibrium and why it matters?
• Jennings, described the LD concept in 1917 and Lewtonin developed quantification of LD
in 1964
• Non random associations of allele at different loci is known as linkage disequilibrium
• The power of an association study depends on the strength of this association
The strength of the correlation between marker and trait locus is a
function of the distance between them… the more closer, the
stronger the LD
18. LD decay
• Higher the recombination rate, LD
decay (the rate of return to random
association between two given alleles)
occurs more rapidly
Decay of linkage disequilibrium with time for four
different recombination fractions (ϴ)
Mackay and Powell, 2007
LD decay plot for hypothetical locus
The resolution with which a QTL can be
mapped is a function of how quickly LD
decay over distance.
19. Useful LD
• Level of LD that is useful for association mapping
D= PAB- PA. PB
• D’ and r2 are the most widely used estimates of LD
D’ ranges from 0-1
1. D’= 0 no LD
2. D’=1, complete LD
r2 ranges from 0-1
1. r2 = 0, complete linkage
equilibrium
2. r2 = 1, complete linkage
disequilibrium
3. r2 ≥ 0.33 considered
useful for LD mapping
Biparental population V/S Natural Population
20. Factors affecting LD and Association Mapping
Increasing LD
Mating system (self-pollination)
Population structure and relatedness
(kinship)
Small population size
Admixture
Selection
Decreasing LD
Out-crossing
High recombination rate
High mutation rate
Gene conversion
Huttley et al, 2005
21. Analysis for population structure and Kinship
Population structure signifies that individuals in a population
do not form a single homogeneous group, but they are
distributed in few to several distinct subgroups that show
different gene frequencies.
Population structure arises due to geographical isolation, and
natural and artificial selections.
Thus, population structure generates LD between unlinked
loci and tends to increase the likelihood of discovery of false
positive associations.
Population structure of the sample can be estimated by using
the STRUCTURE program.
The GLM, MLM, etc. models for AM minimize the effects of
population structure.
Population structure Kinship
Kinship refers to relatedness between different
pairs of individuals/lines of the sample.
Kinship among the individuals of the sample
using the TASSEL program.
TASSEL, estimates kinship coefficient as the
proportion of alleles that are identical between
each pair of lines/individuals in the sample.
22. Several methods have been used to control population structure and kinship in AM
1. Genomic control
2. Structured association
3. Mixed models
4. Principle component analysis
Experimental designs and Models for Association mapping
23. Experimental designs and Models for Association Mapping
Designs Features Remark
Structured association
Designed to minimize the effects of population structure;
one version is the general linear model (GLM)
GLM implemented in TASSEL
Mixed linear model
(MLM)
Designed to minimize the effects of population structure
and kinship; markers and Q treated as fixed effects,
while background QTLs are treated as random effects
Uses K or both Q and K matrices;
EMMA is an improved version of mixed
model
Multilocus mixed model
(MLMM)
Multiple loci used as cofactors in the model; uses
stepwise mixed model regression for the selection of loci
and an approximate version of mixed model of
correction for population structure
More QTL detection power and lower
FDR than single locus tests
Multitrait mixed model
(MTMM)
Simultaneous analysis of two or more correlated traits
using the mixed model; separates genetic and
environmental correlations and corrects for population
Structure
More power than single trait models
when the traits are correlated; otherwise,
lower power
Joint linkage association
mapping
Analysis of a sample drawn from a natural population
and
the open-pollinated progeny from this sample
Uses both LD and linkage analysis
Nested association
mapping (NAM)
LD and linkage mapping in NAM populations Higher power than AM alone
Source: Marker Assisted Plant Breeding: Principles and Practices B.D.Singh and A.K.Singh
29. Challenges – GWAS
• The markers with less than 5% frequency are excluded from the analysis leading to the
elimination of chances of discovering the rare alleles
• Synthetic associations are misleading associations that occur when GWAS identifies
noncausal SNPs as more significant than truly causal variants
• Ongoing investigation stems from the fact that different GWAS methods often yield
similar but nonidentical results
• Population structure needs to be very carefully addressed otherwise there would be an
increase in the false positives
30. OPPORTUNITIES-GWAS
• Population for the studies are samples from existing materials
• QTL linked markers can be directly used for MAS
• Provides high resolution
• Candidate gene prioritization methods help in moving from GWAS results
to biological understanding.
• Continued methodology development in GWAS is needed and funding
support for methodology development and software implementation
benefits a wide range of research disciplines
31. Challenges and opportunities in genome-wide
association studies occur at each step. Challenges
occur because of complex interplay of both biology
and statistics. Surrounding these challenges will
provide new opportunities for understanding and
application
32. SUCCESS STORIES
• In Humans, this approach has identified SNPs associated with several
complex conditions including diabetes, heart disease, Parkinson disease,
and Crohn disease. SNPs have also been associated with a person’s
response to certain drugs and susceptibility to certain environmental
factors such as toxins. Researchers hope that future genome-wide
association studies will identify additional SNPs associated with chronic
diseases and drug effects.
• In crop plants AM has been successfully used in Arabidopsis, Maize, Rice
and various other crops for traits like flowering time, plant height, yield,
resistance against pathogens, growth response
35. Plant species Populations Sample
size
Background
markers
Traits Reference
Maize Diverse inbred lines 92 141 Flowering time (Thornsberry et al., 2001)
Elite inbred lines 71 55 Flowering time (Andersen et al., 2005)
Diverse inbred lines and
landraces
375 + 275 55 Flowering time (Camus-Kulandaivelu et al.,
2006)
Diverse inbred lines 95 192 Flowering time (Salvi, 2007)
Diverse inbred lines 102 47 Kernel composition
Starch pasting properties
(Wilson et al., 2004)
Diverse inbred lines 86 141 Maysin synthesis (Szalma et al., 2005)
Elite inbred lines 75 151 Kernel color (Palaisa et al., 2004)
Diverse inbred lines 57 120 Sweet taste (Tracy et al., 2006)
Elite inbred lines 553 8950 Oleic acid content (Belo et al., 2008)
Diverse inbred lines 282 553 Carotenoid content (Harjes et al., 2008)
Sorghum Diverse inbred lines 377 47 Community resource
report
(Casa et al., 2018)
Wheat Diverse cultivars 95 93 Kernel size, milling
quality
(Breseghello and Sorrells,
2016)
Current status of association mapping in plants
36. Plant species Populations Sample
size
Background
markers
Traits Reference
Arabidopsis Diverse ecotypes 95 104 Flowering time (Olsen et al., 2004)
Diverse ecotypes 95 2553 Disease resistance
Flowering time
(Aranzana et al., 2005)
(Zhao et al., 2007)
Diverse accessions 96 90 Shoot branching (Ehrenreich et al., 2007)
Barley Diverse cultivars 148 139 Days to heading, leaf
rust, yellow dwarf virus,
(Kraakman et al., 2017)
Potato Diverse cultivars 123 49 Late blight resistance (Malosetti et al., 2007)
Rice Diverse land races 105 124 Glutinous phenotype (Olsen and Purugganan, 2002)
Diverse land races 577 577 Starch quality (Bao et al., 2006)
Diverse accessions 103 123 Yield and its components (Agrama et al., 2018)
Sugarcane Diverse clones 154 2209 Disease resistance (Wei et al., 2006)
Chickpea Diverse accessions 300 1872 Drought tolerance (Thudi et al., 2014)
Soybean Diverse accessions 305 37573 Salt tolerance (Tuyen et al., 2019)
37. CONCLUSION
• Association mapping platforms are being developed for multiple plant species.
• The studies from the established association mapping panels will generate valuable
information for future and a better understanding of various genetic and statistical
aspects of association mapping.
• Theoretical studies that closely track empirical results will provide valuable general
guidelines for association mapping.
• Genetic diversity and phenotyping are expected to gain further attention, as researchers
become more aware of their importance.
• Eventually, we will move toward researching traits, in addition to flowering time or
plant height, that have economic and evolutionary values.
• Superior allele mining for trait improvement will be greatly facilitated by synergy
among various research groups involved in different aspects of association mapping.