SlideShare a Scribd company logo
1 of 7
Download to read offline
Am. J. Hum. Genet. 76:887–883, 2005
887
Report
A Note on Exact Tests of Hardy-Weinberg Equilibrium
Janis E. Wigginton,1
David J. Cutler,2
and Gonçalo R. Abecasis1
1
Center for Statistical Genetics, Department of Biostatistics, University of Michigan, Ann Arbor; and 2
Institute of Genetic Medicine, Johns
Hopkins University School of Medicine, Baltimore
Deviations from Hardy-Weinberg equilibrium (HWE) can indicate inbreeding, population stratification, and even
problems in genotyping. In samples of affected individuals, these deviations can also provide evidence for association.
Tests of HWE are commonly performed using a simple x2
goodness-of-fit test. We show that this x2
test can have
inflated type I error rates, even in relatively large samples (e.g., samples of 1,000 individuals that include ∼100
copies of the minor allele). On the basis of previous work, we describe exact tests of HWE together with efficient
computational methods for their implementation. Our methods adequately control type I error in large and small
samples and are computationally efficient. They have been implemented in freely available code that will be useful
for quality assessment of genotype data and for the detection of genetic association or population stratification in
very large data sets.
In the absence of migration, mutation, natural selection,
and assortative mating, genotype frequencies at any lo-
cus are a simple function of allele frequencies. This phe-
nomenon, now termed “Hardy-Weinberg equilibrium”
(HWE), was first described in the early part of the twen-
tieth century (Hardy 1908; Weinberg 1908). The orig-
inal descriptions of HWE are an important landmark in
the history of population genetics (Crow 1988), and it
is now common practice to check whether observed
genotypes conform to Hardy-Weinberg expectations.
These expectations appear to hold for most human pop-
ulations, and deviations from HWE at particular mark-
ers may suggest problems with genotyping or population
structure or, in samples of affected individuals, an as-
sociation between the marker and disease susceptibility.
Here, we describe efficient implementations of exact
tests for HWE, which are suitable for use in large-scale
studies of SNP data, even when hundreds of thousands
of markers are examined. The availability of data on
patterns of linkage disequilibrium across the genome
(International HapMap Consortium 2003), interest in
identifying susceptibility alleles for complex diseases
Received November 18, 2004; accepted for publication February
22, 2005; electronically published March 23, 2005.
Address for correspondence and reprints: Dr. Janis E. Wigginton,
Department of Biostatistics, School of Public Health, University of
Michigan, Ann Arbor, MI 48109. E-mail: goncalo@umich.edu
䉷 2005 by The American Society of Human Genetics. All rights reserved.
0002-9297/2005/7605-0017$15.00
(Cardon and Abecasis 2003), and advances in genotyp-
ing technology (Kwok 2001; Weber and Broman 2001)
suggest that such large studies will be increasingly com-
mon. The principles and procedures used for testing
HWE are well established (Levene 1949; Haldane 1954;
Hernandez and Weir 1989; Wellek 2004), but the lack
of a publicly available, efficient, and reliable implemen-
tation for exact tests has led many scientists to rely on
asymptotic tests that can perform poorly with realistic
sample sizes.
Consider a sample of SNP genotypes for N unrelated
diploid individuals measured at an autosomal locus. The
sample includes 2N alleles, including copies of the
nA
rarer allele and copies of the common allele. Let the
nB
number of heterozygous AB genotypes be , and note
nAB
that the numbers of AA and BB homozygous genotypes
are and . Note
n p (n ⫺ n ) / 2 n p (n ⫺ n ) / 2
AA A AB BB B AB
that there are possible arrangements for
(2N)! / n !n !
A B
the alleles in the sample and that nAB
2 N!/(n !n !n !)
AA AB BB
of these arrangements correspond to exactly het-
nAB
erozygotes. Thus, under the assumption of HWE, the
probability of observing exactly heterozygotes in a
nAB
sample of N individuals with minor alleles is
nA
nAB
2 N! n !n !
A B
P(N p n FN, n ) p # . (1)
AB AB A
( )
n !n !n ! 2N !
AA AB BB
This equation holds for each possible number of het-
erozygotes, . When is odd, possible numbers of
n n
AB A
888 Am. J. Hum. Genet. 76:887–883, 2005
heterozygotes are 1, 3, 5,…, . When is even, pos-
n n
A A
sible numbers of heterozygotes are 0, 2, 4,…, . The
nA
expression for given in equation (1) leads
P(n FN,n )
AB A
to natural tests for HWE. For example, one could
define one-sided tests that focus on detection of a de-
ficit of heterozygotes, by calculating the statistic P p
low
, or detection of an excess of heter-
P(N ⭐ n FN,n )
AB AB A
ozygotes, by calculating the statistic P p P(N ⭓
high AB
. In each case, the statistic can be calculated
n FN,n )
AB A
by simply summing over equation (1), to include all pos-
sible values of that are lower (for ) or higher (for
N P
AB low
) than those observed in the actual data. A test for
Phigh
a deficit of heterozygotes in relation to Hardy-Weinberg
expectations is appropriate when deviations from HWE
due to inbreeding or population stratification are sus-
pected, since both of these increase the proportion of
homozygotes in the population. A test for an excess
of heterozygotes is appropriate when one suspects prob-
lems in genotyping due to the existence of highly ho-
mologous regions in the genome, since these low-copy
repeats often lead to an increase in the proportion of
apparent heterozygotes in the sample. In other settings,
it might be appropriate to use both tests. For example,
many technologies score genotypes by clustering signals,
and misspecified clusters can result in either vast excesses
or vast deficits of heterozygotes.
When neither an increase nor a decrease in the pro-
portion of heterozygotes is specifically expected, one
could perform two separate one-sided tests or, instead,
use a two-sided test statistic (Weir 1996). A natural
two-sided test statistic could be defined as P p
2a
. This two-sided statistic is appeal-
min (1.0, 2P , 2P )
high low
ing because it leads to rejection of HWE at significance
level 2a in instances in which the one-sided tests lead to
the rejection of HWE at significance level a. However,
because of the asymmetric nature of the distribution of
heterozygote counts in a sample, the statistic is quite
conservative in practice, and we do not recommend
its use. Instead, an appealing approach, analogous to
Fisher’s exact test for contingency tables (Fisher 1934),
is to calculate the probability of observing a sample con-
figuration that is even less likely than the one being eval-
uated, conditional on the observed allele counts. This
can be achieved using a statistic similar to the Monte
Carlo statistic proposed by Guo and Thompson (1992)
for multiallelic markers:
P p I P(N p n FN,n )
[
冘
HWE AB AB A
∗
nAB
∗
⭓ P(N p n FN,n ) ]
AB AB A
∗
#P(N p n FN,n ) .
AB AB A
In this definition, I[x] is an indicator function that is
equal to 1 when the comparison is true and equal to 0
otherwise. The sum should be performed over all het-
erozygote counts that are compatible with the ob-
∗
nAB
served number of minor alleles, .
nA
Most of the computational effort required for per-
forming exact tests of linkage disequilibrium is spent
evaluating the factorials in equation (1) for each possible
value of . By use of a naive approach, evaluating
nAB
equation (1) requires 5N–6N multiplications and one
division for each possible value of . We simplify cal-
nAB
culations by using the recurrence relationships previ-
ously recognized by Guo and Thompson (1992) in the
implementation of their Markov chain–Monte Carlo
sampler:
P(N p n ⫹ 2FN, n )
AB AB A
4n n
AA BB
p P(N p n FN, n ) , and
AB AB A
(n ⫹ 2)(n ⫹ 1)
AB AB
P(N p n ⫺ 2FN, n )
AB AB A
n (n ⫺ 1)
AB AB
p P(N p n FN, n ) . (2)
AB AB A
4(n ⫹ 1)(n ⫹ 1)
AA BB
In this way, evaluating the probability for each possible
number of heterozygotes takes only four multiplications
and one division, whatever the sample size N. To avoid
underflow, it is best to first calculate the probability of
observing the expected number of heterozygotes (in this
case, the most likely outcome) and then use the recur-
rence relationships to calculate probabilities for all other
outcomes. A further reduction of computational effort
is possible by noting that one need only calculate relative
probabilities for each outcome and then scale these to
ensure that their sum is 1.0. This means that the prob-
ability of observing the expected number of heterozy-
gotes can be replaced with an arbitrary constant when
using the recurrence relations in equation (2), provided
that the final result is scaled.
Table 1 illustrates the performance of the statistics for
a sample of 100 individuals in which 21 copies of the
minor allele are present. The observed number of het-
erozygotes will vary from 1 to 21 and must be odd. Note
that only a small number of distinct sample configura-
tions are possible, and each of these is associated with
a specific probability for the exact tests. If the desired
significance level a does not correspond exactly to one
of these discrete outcomes, then the exact test statistics
will be conservative (Hernandez and Weir 1989). For
example, at the significance level , the PHWE and
a p 0.05
Plow statistics both reject the hypothesis of HWE if ⭐13
heterozygotes are observed in this setting. Since the prob-
ability of observing ⭐13 heterozygotes is 0.010, the tests
are conservative. In contrast, the asymptotic x2
test sta-
tistic results in rejection of HWE when ⭐15 heterozy-
Figure 1 Type I error rates as a function of minor-allele counts for rare alleles, for samples of either 100 or 1,000 chromosomes and corresponding to a significance threshold of
, 0.01, or 0.001. Results are plotted as a function of the number of minor alleles in the sample for the exact statistic (red) and for the asymptotic x2
test statistic (blue). A gray
a p 0.05 PHWE
line denotes the nominal error rate. Note that the Y-axes in figures 1 and 2 differ.
890 Am. J. Hum. Genet. 76:887–883, 2005
Table 1
Possible Sample Configurations and Their Probabilities for a Sample of 100
Individuals and 21 Minor-Allele Copies Are Tabulated
NO. OF
HETEROZYGOTES
(nAB) PROBABILITY
a
x2
TEST P
EXACT TEST P VALUES
PHWE Phigh Plow
5 !.000001 !.000001b
!.000001b
1.000000 !.000001b
7 .000001 !.000001b
.000001b
1.000000 .000001b
9 .000047 !.000001b
.000048b
.999999 .000048b
11 .000870 .000039b
.000919b
.999952 .000919b
13 .009375 .002228b
.010293b
.999081 .010293b
15 .059283 .045180b
.069576 .989707 .069576
17 .214465 .342972 .284042 .930424 .284042
19 .406355 .906529 1.000000 .715958 .690396
21 .309604 .244336 .593645 .309604 1.000000
NOTE.—The probability of observing each possible outcome is given, together with
the corresponding P values for tests of HWE based on the x2
statistic and on the exact
test statistics PHWE, Plow, and Phigh (described in the main text).
a
.
P(n FN p 100,n p 21)
AB A
b
Configurations that would be rejected at the significance level a p 0.05.
gotes are observed (for ⭐15 heterozygotes, the x2
test
statistic corresponds to an asymptotic ). This
P ⭐ .045
results in an inflated type I error rate of 0.070 and there-
fore is inappropriate. In this sample, it is not possible
to reject HWE because of an excess of heterozygous
individuals—the probability of observing the maximum
of 21 heterozygotes is 0.31, and none of the test statistics
gives a P value !.05 for this extreme configuration. Ad-
ditional examples of the performance of exact test sta-
tistics for HWE can be found in the work by Vithayasai
(1973).
In general, the exact test statistics are conservative
when a small number of minor-allele copies are present
in the sample, but they approximate nominal signifi-
cance levels as the sample size (and number of minor-
allele copies) increases. In contrast, the commonly used
x2
statistic can produce excessively small or large P val-
ues for specific outcomes (Hernandez and Weir 1989).
To comprehensively evaluate the performance of the x2
and exact test statistics, we calculated their type I error
rates for specified significance levels of , 0.01,
a p 0.05
or 0.001, for sample sizes of or
N p 100 N p 1,000
individuals and varying minor-allele counts. The results
are summarized in figure 1 (for samples in which !25%
of chromosomes carry the minor allele) and figure 2 (for
samples in which 110% of chromosomes carry the mi-
nor allele), and it is clear that the statistics exhibit some
periodicity in their type I error rates. As expected, both
the exact PHWE statistic and the x2
statistic perform better
as the sample size and minor-allele counts increase. Nev-
ertheless, one important difference is that the x2
statistic
can sometimes be extremely anticonservative (e.g., in a
sample of 1,000 individuals, when nominal ,
a p 0.001
the true type I error rate can exceed 0.06 and is often
10.01 for minor-allele counts !100), whereas the exact
statistic never exceeds the nominal significance level. In
practical settings, the x2
statistic could lead to many false
rejections of HWE that depend on only the particular
count of minor alleles in the sample.
To understand the periodicity of the statistics, it is
important to consider the discrete nature of the data.
For example, for a sample of individuals in-
N p 100
cluding 2–5 copies of the minor allele, we reject HWE
at the significance level (fig. 1A) when there
a p 0.05
is at least one homozygote for the minor allele. The
probability of observing more than one homozygote for
the minor allele increases gradually from 0.0050 when
there are two copies of the allele in the sample up to
0.0499 when there are five copies of the minor allele in
the sample. When there are 6–14 copies of the minor
allele in the sample, we reject HWE at the a p 0.05
significance level (fig. 1A) when at least two homo-
zygotes for the rare allele are observed. Again, the prob-
ability of a more extreme event is quite low for small
numbers of the rare allele ( with six copies of
P p .0011
the minor allele in the sample) but gradually increases
if there are additional copies of the minor allele in the
sample ( with 13 copies of the minor allele).
P p .0482
In table 2, the overall type I error rates for each sta-
tistic are summarized for sample sizes of 100 or 1,000
individuals and various ranges of minor-allele counts. It
is clear that, on average, the x2
test approximates nom-
inal significance levels as the number of minor alleles in
the sample increases. Nevertheless, as illustrated in figure
1, this is achieved at the cost of inflated error rates for
samples with specific numbers of minor alleles. Even in
a sample of 1,000 individuals, the type I error rate at a
p 0.001 for the x2
test is inflated when there are !200
copies of the minor allele (corresponding to an allele
frequency of ∼10%). The exact tests approximate nom-
Figure 2 Type I error rates as a function of minor-allele counts for common alleles, for samples of either 100 or 1,000 chromosomes and corresponding to a significance threshold of
, 0.01, or 0.001. Results are plotted as a function of the number of minor alleles in the sample for the exact statistic (red) and for the asymptotic x2
test statistic (blue). A gray
a p 0.05 PHWE
line denotes the nominal error rate. Note that the Y-axes in figures 1 and 2 differ.
892 Am. J. Hum. Genet. 76:887–883, 2005
Table 2
Actual Error Rates for the x2
Test Statistic and the PHWE Test Statistic for Nominal
Significance Level a p 0.01 or 0.001
SAMPLE AND
MINOR-ALLELE COUNT
a p 0.01a
a p 0.001a
x2
PHWE x2
PHWE
N p 1,000
1–100 .0208b
(.0208)b
.0039 (.0039) .0088b
(.0088)b
.0004 (.0004)
101–200 .0100 (.0154)b
.0065 (.0052) .0017b
(.0053)b
.0006 (.0005)
201–400 .0097 (.0126)b
.0083 (.0067) .0010 (.0032)b
.0008 (.0006)
401–1,000 .0100 (.0110)b
.0090 (.0081) .0010 (.0018)b
.0009 (.0008)
N p 100
1–10 .0292b
(.0292)b
.0024 (.0024) .0114b
(.0114)b
.0001 (.0001)
11–20 .0191b
(.0242)b
.0035 (.0030) .0035b
(.0074)b
.0003 (.0002)
21–40 .0083 (.0162)b
.0037 (.0033) .0016b
(.0045)b
.0004 (.0003)
41–100 .0099 (.0124)b
.0072 (.0057) .0009 (.0023)b
.0006 (.0005)
NOTE.—Results are tabulated for samples of 100 and 1,000 individuals and represent simple
averages for each range of minor-allele counts.
a
The error rate for each bin is tabulated, followed by the cumulative error rate in parenthesis.
The cumulative error rate is calculated by including each bin and all previous bins. For example,
for a sample of size 1,000, when a p 0.001, the type I error rate for the standard x2
test in a
sample with 101–200 copies of the minor allele is 0.0017 and the cumulative error rate, cor-
responding to samples with 1–200 copies of the minor allele, is 0.0053.
b
Exceeds nominal significance level.
inal significance levels with increasing sample size but
remain conservative because of the discrete nature of the
data.
As a final evaluation of our approach, we applied our
method to a subset of the genotypes collected by the
International HapMap Consortium (2003). We focused
on a set of 18,460 SNP markers genotyped indepen-
dently by two different centers with no discrepancies
between the two sets of experimental results. For each
of these markers, we evaluated evidence against HWE
by using both the exact PHWE statistic and the asymptotic
x2
statistic. Results were broadly similar for 14,889
markers with minor-allele frequencies ⭓20%. However,
we observed noticeable differences for 3,571 markers
with minor-allele frequencies !20%. For example, the
x2
test rejected HWE for 71 of these markers at a p
(twice as many as the 35 markers expected to fail
0.01
this test by chance), whereas the exact test rejected HWE
for only 33 markers. At the more stringent a p 0.001
significance level, the x2
test rejected HWE for 28 mark-
ers (rejection for 3 markers is expected by chance),
whereas the exact PHWE statistic rejected HWE for only
5 markers.
Although we focus on testing the agreement of ob-
served genotypes with HWE proportions, computation-
ally efficient exact tests can be constructed for any de-
sired genotype proportions. In brief, let the expected
proportion of heterozygotes be pAB and the two ho-
mozygote proportions be pAA and pBB. For exam-
ple, in a population with inbreeding coefficient f, we
might expect the proportion of heterozygotes to be
. Define the quantity so that
2
2(1 ⫺ f)p p v p p / p p
A B AB AA BB
when HWE holds. Then, the probability of ob-
v p 4
serving nAB heterozygotes is
n /2
AB
v N! 1
P(N p n FN,n ) p # ,
AB AB A
n !n !n ! C
AA AB BB
where
∗
n /2
AB
v N!
C p 冘 ∗ ∗ ∗
∗ n !n !n !
n AA AB BB
AB
(Wellek 2004). It is simple to verify that the recurrence
relationships given in equation (2) can be extended to
this setting by replacing the number 4 with the quantity
v in each expression.
The exact test statistics for HWE described here are
accurate for a variety of allele frequencies and can be
computed in an inexpensive manner. We recommend
that they be used instead of the standard x2
test statistic
in all situations. For large data sets, rather than fixing
an arbitrary threshold for rejecting HWE, we suggest
that methods based on the false-discovery rate (Benja-
mini and Hochberg 1995) be used to identify a subset
of markers whose genotypes do not conform to the ex-
pected equilibrium distribution.
The PHWE test statistic described here is implemented
in the Pedstats software package (see Pedstats Web site),
which generates summaries and checks the integrity of
genetic data. In addition, code for calculating Plow, Phigh,
and PHWE in C/C⫹⫹, R, and Fortran is available from
the authors’ Web site. With appropriate citation, our
code is freely available for use and can be incorporated
Reports 893
into other programs. The HapMap Project genotype
data are freely available at the HapMap Web site.
Acknowledgments
We gratefully acknowledge grant support from the National
Human Genome Research Institute and the National Eye In-
stitute. The manuscript was improved by helpful comments
from reviewers.
Electronic-Database Information
The URLs for data presented herein are as follows:
Authors’ Web site, http://www.sph.umich.edu/csg/abecasis/
HapMap, http://www.hapmap.org/
Pedstats, http://www.sph.umich.edu/csg/abecasis/Pedstats/
References
Benjamini Y, Hochberg Y (1995) Controlling the false discov-
ery rate: a practical and powerful approach to multiple test-
ing. J R Stat Soc Ser B 57:289–300
Cardon LR, Abecasis GR (2003) Using haplotype blocks to
map human complex trait loci. Trends Genet 19:135–140
Crow JF (1988) Eighty years ago: the beginnings of population
genetics. Genetics 119:473–476
Fisher RA (1934) Statistical methods for research workers.
Oliver and Boyd, Edinburgh
Guo SW, Thompson EA (1992) Performing the exact test of
Hardy-Weinberg proportion for multiple alleles. Biometrics
48:361–372
Haldane JBS (1954) An exact test for randomness of mating.
J Genet 52:631–635
Hardy HG (1908) Mendelian proportions in a mixed popu-
lation. Science 28:49–50
Hernandez JL, Weir BS (1989) A disequilibrium coefficient
approach to Hardy-Weinberg equilibrium testing. Biomet-
rics 45:53–70
International HapMap Consortium (2003) The International
HapMap Project. Nature 426:789–796
Kwok PY (2001) Methods for genotyping single nucleotide
polymorphisms. Annu Rev Genomics Hum Genet 2:235–
258
Levene H (1949) On a matching problem arising in genetics.
Ann Math Stat 21:91–94
Vithayasai C (1973) Exact critical values of the Hardy-Wein-
berg test statistic for two alleles. Communic Stat 1:229–242
Weber JL, Broman KW (2001) Genotyping for human whole-
genome scans: past, present, and future. Adv Genet 42:77–
96
Weinberg W (1908) On the demonstration of heredity in man.
In: Boyer SH, trans (1963) Papers on human genetics. Pren-
tice Hall, Englewood Cliffs, NJ
Weir BS (1996) Genetic data analysis II. Sinauer Associates,
Sunderland, MA
Wellek S (2004) Tests for establishing compatibility of an
observed genotype distribution with Hardy-Weinberg equi-
librium in the case of a biallelic locus. Biometrics 60:694–
703

More Related Content

Similar to A Note On Exact Tests Of Hardy-Weinberg Equilibrium

Population fitness and genetic load of
Population fitness and genetic load ofPopulation fitness and genetic load of
Population fitness and genetic load ofThanka Elango
 
The epidemiology of schistosomiasis in the later stages of a control program ...
The epidemiology of schistosomiasis in the later stages of a control program ...The epidemiology of schistosomiasis in the later stages of a control program ...
The epidemiology of schistosomiasis in the later stages of a control program ...Alim A-H Yacoub Lovers
 
Case control study - Part 2
Case control study - Part 2Case control study - Part 2
Case control study - Part 2Rizwan S A
 
Association mapping
Association mapping Association mapping
Association mapping Preeti Kapoor
 
Common Statistical Methods Used In Transgenic Fish Research
Common Statistical Methods Used In Transgenic Fish ResearchCommon Statistical Methods Used In Transgenic Fish Research
Common Statistical Methods Used In Transgenic Fish ResearchMohamed Afifi
 
Chapter 5 Anova2009
Chapter 5 Anova2009Chapter 5 Anova2009
Chapter 5 Anova2009ghalan
 
Analysis of Variance (ANOVA), MANOVA: Expected variance components, Random an...
Analysis of Variance (ANOVA), MANOVA: Expected variance components, Random an...Analysis of Variance (ANOVA), MANOVA: Expected variance components, Random an...
Analysis of Variance (ANOVA), MANOVA: Expected variance components, Random an...Satish Khadia
 
Gene mapping
Gene  mappingGene  mapping
Gene mappingrashzz
 
From reads to pathways for efficient disease gene finding
From reads to pathways for efficient disease gene findingFrom reads to pathways for efficient disease gene finding
From reads to pathways for efficient disease gene findingJoaquin Dopazo
 
Simulating Genes in Genome-wide Association Studies
Simulating Genes in Genome-wide Association StudiesSimulating Genes in Genome-wide Association Studies
Simulating Genes in Genome-wide Association StudiesKevin Thornton
 
20100515 bioinformatics kapushesky_lecture06
20100515 bioinformatics kapushesky_lecture0620100515 bioinformatics kapushesky_lecture06
20100515 bioinformatics kapushesky_lecture06Computer Science Club
 
Day2 145pm Crawford
Day2 145pm CrawfordDay2 145pm Crawford
Day2 145pm CrawfordSean Paul
 
Population Genetic Models of Genomic Imprinting
Population Genetic Models of Genomic ImprintingPopulation Genetic Models of Genomic Imprinting
Population Genetic Models of Genomic ImprintingGavin Pearce
 
Adaptive Molecular Evolution (dN/dS). 2011
Adaptive Molecular Evolution (dN/dS). 2011Adaptive Molecular Evolution (dN/dS). 2011
Adaptive Molecular Evolution (dN/dS). 2011Hernán Dopazo
 
Sept2016 plenary mercer_sequins
Sept2016 plenary mercer_sequinsSept2016 plenary mercer_sequins
Sept2016 plenary mercer_sequinsGenomeInABottle
 

Similar to A Note On Exact Tests Of Hardy-Weinberg Equilibrium (20)

Population fitness and genetic load of
Population fitness and genetic load ofPopulation fitness and genetic load of
Population fitness and genetic load of
 
The epidemiology of schistosomiasis in the later stages of a control program ...
The epidemiology of schistosomiasis in the later stages of a control program ...The epidemiology of schistosomiasis in the later stages of a control program ...
The epidemiology of schistosomiasis in the later stages of a control program ...
 
Case control study - Part 2
Case control study - Part 2Case control study - Part 2
Case control study - Part 2
 
Association mapping
Association mapping Association mapping
Association mapping
 
Common Statistical Methods Used In Transgenic Fish Research
Common Statistical Methods Used In Transgenic Fish ResearchCommon Statistical Methods Used In Transgenic Fish Research
Common Statistical Methods Used In Transgenic Fish Research
 
Chapter 5 Anova2009
Chapter 5 Anova2009Chapter 5 Anova2009
Chapter 5 Anova2009
 
Analysis of Variance (ANOVA), MANOVA: Expected variance components, Random an...
Analysis of Variance (ANOVA), MANOVA: Expected variance components, Random an...Analysis of Variance (ANOVA), MANOVA: Expected variance components, Random an...
Analysis of Variance (ANOVA), MANOVA: Expected variance components, Random an...
 
Gene mapping
Gene  mappingGene  mapping
Gene mapping
 
From reads to pathways for efficient disease gene finding
From reads to pathways for efficient disease gene findingFrom reads to pathways for efficient disease gene finding
From reads to pathways for efficient disease gene finding
 
Chapter 5 Anova2009
Chapter 5 Anova2009Chapter 5 Anova2009
Chapter 5 Anova2009
 
Simulating Genes in Genome-wide Association Studies
Simulating Genes in Genome-wide Association StudiesSimulating Genes in Genome-wide Association Studies
Simulating Genes in Genome-wide Association Studies
 
20100515 bioinformatics kapushesky_lecture06
20100515 bioinformatics kapushesky_lecture0620100515 bioinformatics kapushesky_lecture06
20100515 bioinformatics kapushesky_lecture06
 
Day2 145pm Crawford
Day2 145pm CrawfordDay2 145pm Crawford
Day2 145pm Crawford
 
Population Genetic Models of Genomic Imprinting
Population Genetic Models of Genomic ImprintingPopulation Genetic Models of Genomic Imprinting
Population Genetic Models of Genomic Imprinting
 
Adaptive Molecular Evolution (dN/dS). 2011
Adaptive Molecular Evolution (dN/dS). 2011Adaptive Molecular Evolution (dN/dS). 2011
Adaptive Molecular Evolution (dN/dS). 2011
 
GWAS Study.pdf
GWAS Study.pdfGWAS Study.pdf
GWAS Study.pdf
 
14KoVar
14KoVar14KoVar
14KoVar
 
Hypothesis
HypothesisHypothesis
Hypothesis
 
Sept2016 plenary mercer_sequins
Sept2016 plenary mercer_sequinsSept2016 plenary mercer_sequins
Sept2016 plenary mercer_sequins
 
GWAS
GWASGWAS
GWAS
 

More from Stephen Faucher

Unseen Poetry Past Papers
Unseen Poetry Past PapersUnseen Poetry Past Papers
Unseen Poetry Past PapersStephen Faucher
 
Top 7 Rules For Writing A Good Analysis Essay
Top 7 Rules For Writing A Good Analysis EssayTop 7 Rules For Writing A Good Analysis Essay
Top 7 Rules For Writing A Good Analysis EssayStephen Faucher
 
Is It Okay To Include Quotes In College Essays - GradesHQ
Is It Okay To Include Quotes In College Essays - GradesHQIs It Okay To Include Quotes In College Essays - GradesHQ
Is It Okay To Include Quotes In College Essays - GradesHQStephen Faucher
 
A Manual For Writers Of Term Papers Theses And Dissert
A Manual For Writers Of Term Papers Theses And DissertA Manual For Writers Of Term Papers Theses And Dissert
A Manual For Writers Of Term Papers Theses And DissertStephen Faucher
 
Example Of An Abstract For A Research Report - English La
Example Of An Abstract For A Research Report - English LaExample Of An Abstract For A Research Report - English La
Example Of An Abstract For A Research Report - English LaStephen Faucher
 
Extended Essay Guide Art
Extended Essay Guide ArtExtended Essay Guide Art
Extended Essay Guide ArtStephen Faucher
 
Essay Essaywriting How To Do A Research Assignment,
Essay Essaywriting How To Do A Research Assignment,Essay Essaywriting How To Do A Research Assignment,
Essay Essaywriting How To Do A Research Assignment,Stephen Faucher
 
My New YearS Resolution For 20
My New YearS Resolution For 20My New YearS Resolution For 20
My New YearS Resolution For 20Stephen Faucher
 
Stunning 600 Word Essay Thatsnotus
Stunning 600 Word Essay ThatsnotusStunning 600 Word Essay Thatsnotus
Stunning 600 Word Essay ThatsnotusStephen Faucher
 
Transition Words And Phrases, Detailed List - Le
Transition Words And Phrases, Detailed List - LeTransition Words And Phrases, Detailed List - Le
Transition Words And Phrases, Detailed List - LeStephen Faucher
 
Essay Writing Process — W
Essay Writing Process — WEssay Writing Process — W
Essay Writing Process — WStephen Faucher
 
College Essay Sample Pdf
College Essay Sample PdfCollege Essay Sample Pdf
College Essay Sample PdfStephen Faucher
 
012 How To Write An Introduction Paragraph For Essay Example That
012 How To Write An Introduction Paragraph For Essay Example That012 How To Write An Introduction Paragraph For Essay Example That
012 How To Write An Introduction Paragraph For Essay Example ThatStephen Faucher
 
Example Of A Research Paper Rationale
Example Of A Research Paper RationaleExample Of A Research Paper Rationale
Example Of A Research Paper RationaleStephen Faucher
 
2024 New Year Resolutions G
2024 New Year Resolutions G2024 New Year Resolutions G
2024 New Year Resolutions GStephen Faucher
 
Example Of Reflection Paper About Movie Reflection P
Example Of Reflection Paper About Movie Reflection PExample Of Reflection Paper About Movie Reflection P
Example Of Reflection Paper About Movie Reflection PStephen Faucher
 
Concluding Sentence Generator
Concluding Sentence GeneratorConcluding Sentence Generator
Concluding Sentence GeneratorStephen Faucher
 
Personalized Letter Writing Sheets Floral Personalized Stationery Set
Personalized Letter Writing Sheets Floral Personalized Stationery SetPersonalized Letter Writing Sheets Floral Personalized Stationery Set
Personalized Letter Writing Sheets Floral Personalized Stationery SetStephen Faucher
 
Websites To Help Write Essays
Websites To Help Write EssaysWebsites To Help Write Essays
Websites To Help Write EssaysStephen Faucher
 

More from Stephen Faucher (20)

Unseen Poetry Past Papers
Unseen Poetry Past PapersUnseen Poetry Past Papers
Unseen Poetry Past Papers
 
Top 7 Rules For Writing A Good Analysis Essay
Top 7 Rules For Writing A Good Analysis EssayTop 7 Rules For Writing A Good Analysis Essay
Top 7 Rules For Writing A Good Analysis Essay
 
Is It Okay To Include Quotes In College Essays - GradesHQ
Is It Okay To Include Quotes In College Essays - GradesHQIs It Okay To Include Quotes In College Essays - GradesHQ
Is It Okay To Include Quotes In College Essays - GradesHQ
 
A Manual For Writers Of Term Papers Theses And Dissert
A Manual For Writers Of Term Papers Theses And DissertA Manual For Writers Of Term Papers Theses And Dissert
A Manual For Writers Of Term Papers Theses And Dissert
 
Example Of An Abstract For A Research Report - English La
Example Of An Abstract For A Research Report - English LaExample Of An Abstract For A Research Report - English La
Example Of An Abstract For A Research Report - English La
 
Extended Essay Guide Art
Extended Essay Guide ArtExtended Essay Guide Art
Extended Essay Guide Art
 
Essay Essaywriting How To Do A Research Assignment,
Essay Essaywriting How To Do A Research Assignment,Essay Essaywriting How To Do A Research Assignment,
Essay Essaywriting How To Do A Research Assignment,
 
My New YearS Resolution For 20
My New YearS Resolution For 20My New YearS Resolution For 20
My New YearS Resolution For 20
 
Stunning 600 Word Essay Thatsnotus
Stunning 600 Word Essay ThatsnotusStunning 600 Word Essay Thatsnotus
Stunning 600 Word Essay Thatsnotus
 
Transition Words And Phrases, Detailed List - Le
Transition Words And Phrases, Detailed List - LeTransition Words And Phrases, Detailed List - Le
Transition Words And Phrases, Detailed List - Le
 
Essay Writing Process — W
Essay Writing Process — WEssay Writing Process — W
Essay Writing Process — W
 
College Essay Sample Pdf
College Essay Sample PdfCollege Essay Sample Pdf
College Essay Sample Pdf
 
012 How To Write An Introduction Paragraph For Essay Example That
012 How To Write An Introduction Paragraph For Essay Example That012 How To Write An Introduction Paragraph For Essay Example That
012 How To Write An Introduction Paragraph For Essay Example That
 
Example Of A Research Paper Rationale
Example Of A Research Paper RationaleExample Of A Research Paper Rationale
Example Of A Research Paper Rationale
 
2024 New Year Resolutions G
2024 New Year Resolutions G2024 New Year Resolutions G
2024 New Year Resolutions G
 
Example Of Reflection Paper About Movie Reflection P
Example Of Reflection Paper About Movie Reflection PExample Of Reflection Paper About Movie Reflection P
Example Of Reflection Paper About Movie Reflection P
 
Horse Writing Paper
Horse Writing PaperHorse Writing Paper
Horse Writing Paper
 
Concluding Sentence Generator
Concluding Sentence GeneratorConcluding Sentence Generator
Concluding Sentence Generator
 
Personalized Letter Writing Sheets Floral Personalized Stationery Set
Personalized Letter Writing Sheets Floral Personalized Stationery SetPersonalized Letter Writing Sheets Floral Personalized Stationery Set
Personalized Letter Writing Sheets Floral Personalized Stationery Set
 
Websites To Help Write Essays
Websites To Help Write EssaysWebsites To Help Write Essays
Websites To Help Write Essays
 

Recently uploaded

Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
Planning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptxPlanning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptxLigayaBacuel1
 
Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........LeaCamillePacle
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.arsicmarija21
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxChelloAnnAsuncion2
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 

Recently uploaded (20)

Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
Planning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptxPlanning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptx
 
Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 

A Note On Exact Tests Of Hardy-Weinberg Equilibrium

  • 1. Am. J. Hum. Genet. 76:887–883, 2005 887 Report A Note on Exact Tests of Hardy-Weinberg Equilibrium Janis E. Wigginton,1 David J. Cutler,2 and Gonçalo R. Abecasis1 1 Center for Statistical Genetics, Department of Biostatistics, University of Michigan, Ann Arbor; and 2 Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore Deviations from Hardy-Weinberg equilibrium (HWE) can indicate inbreeding, population stratification, and even problems in genotyping. In samples of affected individuals, these deviations can also provide evidence for association. Tests of HWE are commonly performed using a simple x2 goodness-of-fit test. We show that this x2 test can have inflated type I error rates, even in relatively large samples (e.g., samples of 1,000 individuals that include ∼100 copies of the minor allele). On the basis of previous work, we describe exact tests of HWE together with efficient computational methods for their implementation. Our methods adequately control type I error in large and small samples and are computationally efficient. They have been implemented in freely available code that will be useful for quality assessment of genotype data and for the detection of genetic association or population stratification in very large data sets. In the absence of migration, mutation, natural selection, and assortative mating, genotype frequencies at any lo- cus are a simple function of allele frequencies. This phe- nomenon, now termed “Hardy-Weinberg equilibrium” (HWE), was first described in the early part of the twen- tieth century (Hardy 1908; Weinberg 1908). The orig- inal descriptions of HWE are an important landmark in the history of population genetics (Crow 1988), and it is now common practice to check whether observed genotypes conform to Hardy-Weinberg expectations. These expectations appear to hold for most human pop- ulations, and deviations from HWE at particular mark- ers may suggest problems with genotyping or population structure or, in samples of affected individuals, an as- sociation between the marker and disease susceptibility. Here, we describe efficient implementations of exact tests for HWE, which are suitable for use in large-scale studies of SNP data, even when hundreds of thousands of markers are examined. The availability of data on patterns of linkage disequilibrium across the genome (International HapMap Consortium 2003), interest in identifying susceptibility alleles for complex diseases Received November 18, 2004; accepted for publication February 22, 2005; electronically published March 23, 2005. Address for correspondence and reprints: Dr. Janis E. Wigginton, Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI 48109. E-mail: goncalo@umich.edu 䉷 2005 by The American Society of Human Genetics. All rights reserved. 0002-9297/2005/7605-0017$15.00 (Cardon and Abecasis 2003), and advances in genotyp- ing technology (Kwok 2001; Weber and Broman 2001) suggest that such large studies will be increasingly com- mon. The principles and procedures used for testing HWE are well established (Levene 1949; Haldane 1954; Hernandez and Weir 1989; Wellek 2004), but the lack of a publicly available, efficient, and reliable implemen- tation for exact tests has led many scientists to rely on asymptotic tests that can perform poorly with realistic sample sizes. Consider a sample of SNP genotypes for N unrelated diploid individuals measured at an autosomal locus. The sample includes 2N alleles, including copies of the nA rarer allele and copies of the common allele. Let the nB number of heterozygous AB genotypes be , and note nAB that the numbers of AA and BB homozygous genotypes are and . Note n p (n ⫺ n ) / 2 n p (n ⫺ n ) / 2 AA A AB BB B AB that there are possible arrangements for (2N)! / n !n ! A B the alleles in the sample and that nAB 2 N!/(n !n !n !) AA AB BB of these arrangements correspond to exactly het- nAB erozygotes. Thus, under the assumption of HWE, the probability of observing exactly heterozygotes in a nAB sample of N individuals with minor alleles is nA nAB 2 N! n !n ! A B P(N p n FN, n ) p # . (1) AB AB A ( ) n !n !n ! 2N ! AA AB BB This equation holds for each possible number of het- erozygotes, . When is odd, possible numbers of n n AB A
  • 2. 888 Am. J. Hum. Genet. 76:887–883, 2005 heterozygotes are 1, 3, 5,…, . When is even, pos- n n A A sible numbers of heterozygotes are 0, 2, 4,…, . The nA expression for given in equation (1) leads P(n FN,n ) AB A to natural tests for HWE. For example, one could define one-sided tests that focus on detection of a de- ficit of heterozygotes, by calculating the statistic P p low , or detection of an excess of heter- P(N ⭐ n FN,n ) AB AB A ozygotes, by calculating the statistic P p P(N ⭓ high AB . In each case, the statistic can be calculated n FN,n ) AB A by simply summing over equation (1), to include all pos- sible values of that are lower (for ) or higher (for N P AB low ) than those observed in the actual data. A test for Phigh a deficit of heterozygotes in relation to Hardy-Weinberg expectations is appropriate when deviations from HWE due to inbreeding or population stratification are sus- pected, since both of these increase the proportion of homozygotes in the population. A test for an excess of heterozygotes is appropriate when one suspects prob- lems in genotyping due to the existence of highly ho- mologous regions in the genome, since these low-copy repeats often lead to an increase in the proportion of apparent heterozygotes in the sample. In other settings, it might be appropriate to use both tests. For example, many technologies score genotypes by clustering signals, and misspecified clusters can result in either vast excesses or vast deficits of heterozygotes. When neither an increase nor a decrease in the pro- portion of heterozygotes is specifically expected, one could perform two separate one-sided tests or, instead, use a two-sided test statistic (Weir 1996). A natural two-sided test statistic could be defined as P p 2a . This two-sided statistic is appeal- min (1.0, 2P , 2P ) high low ing because it leads to rejection of HWE at significance level 2a in instances in which the one-sided tests lead to the rejection of HWE at significance level a. However, because of the asymmetric nature of the distribution of heterozygote counts in a sample, the statistic is quite conservative in practice, and we do not recommend its use. Instead, an appealing approach, analogous to Fisher’s exact test for contingency tables (Fisher 1934), is to calculate the probability of observing a sample con- figuration that is even less likely than the one being eval- uated, conditional on the observed allele counts. This can be achieved using a statistic similar to the Monte Carlo statistic proposed by Guo and Thompson (1992) for multiallelic markers: P p I P(N p n FN,n ) [ 冘 HWE AB AB A ∗ nAB ∗ ⭓ P(N p n FN,n ) ] AB AB A ∗ #P(N p n FN,n ) . AB AB A In this definition, I[x] is an indicator function that is equal to 1 when the comparison is true and equal to 0 otherwise. The sum should be performed over all het- erozygote counts that are compatible with the ob- ∗ nAB served number of minor alleles, . nA Most of the computational effort required for per- forming exact tests of linkage disequilibrium is spent evaluating the factorials in equation (1) for each possible value of . By use of a naive approach, evaluating nAB equation (1) requires 5N–6N multiplications and one division for each possible value of . We simplify cal- nAB culations by using the recurrence relationships previ- ously recognized by Guo and Thompson (1992) in the implementation of their Markov chain–Monte Carlo sampler: P(N p n ⫹ 2FN, n ) AB AB A 4n n AA BB p P(N p n FN, n ) , and AB AB A (n ⫹ 2)(n ⫹ 1) AB AB P(N p n ⫺ 2FN, n ) AB AB A n (n ⫺ 1) AB AB p P(N p n FN, n ) . (2) AB AB A 4(n ⫹ 1)(n ⫹ 1) AA BB In this way, evaluating the probability for each possible number of heterozygotes takes only four multiplications and one division, whatever the sample size N. To avoid underflow, it is best to first calculate the probability of observing the expected number of heterozygotes (in this case, the most likely outcome) and then use the recur- rence relationships to calculate probabilities for all other outcomes. A further reduction of computational effort is possible by noting that one need only calculate relative probabilities for each outcome and then scale these to ensure that their sum is 1.0. This means that the prob- ability of observing the expected number of heterozy- gotes can be replaced with an arbitrary constant when using the recurrence relations in equation (2), provided that the final result is scaled. Table 1 illustrates the performance of the statistics for a sample of 100 individuals in which 21 copies of the minor allele are present. The observed number of het- erozygotes will vary from 1 to 21 and must be odd. Note that only a small number of distinct sample configura- tions are possible, and each of these is associated with a specific probability for the exact tests. If the desired significance level a does not correspond exactly to one of these discrete outcomes, then the exact test statistics will be conservative (Hernandez and Weir 1989). For example, at the significance level , the PHWE and a p 0.05 Plow statistics both reject the hypothesis of HWE if ⭐13 heterozygotes are observed in this setting. Since the prob- ability of observing ⭐13 heterozygotes is 0.010, the tests are conservative. In contrast, the asymptotic x2 test sta- tistic results in rejection of HWE when ⭐15 heterozy-
  • 3. Figure 1 Type I error rates as a function of minor-allele counts for rare alleles, for samples of either 100 or 1,000 chromosomes and corresponding to a significance threshold of , 0.01, or 0.001. Results are plotted as a function of the number of minor alleles in the sample for the exact statistic (red) and for the asymptotic x2 test statistic (blue). A gray a p 0.05 PHWE line denotes the nominal error rate. Note that the Y-axes in figures 1 and 2 differ.
  • 4. 890 Am. J. Hum. Genet. 76:887–883, 2005 Table 1 Possible Sample Configurations and Their Probabilities for a Sample of 100 Individuals and 21 Minor-Allele Copies Are Tabulated NO. OF HETEROZYGOTES (nAB) PROBABILITY a x2 TEST P EXACT TEST P VALUES PHWE Phigh Plow 5 !.000001 !.000001b !.000001b 1.000000 !.000001b 7 .000001 !.000001b .000001b 1.000000 .000001b 9 .000047 !.000001b .000048b .999999 .000048b 11 .000870 .000039b .000919b .999952 .000919b 13 .009375 .002228b .010293b .999081 .010293b 15 .059283 .045180b .069576 .989707 .069576 17 .214465 .342972 .284042 .930424 .284042 19 .406355 .906529 1.000000 .715958 .690396 21 .309604 .244336 .593645 .309604 1.000000 NOTE.—The probability of observing each possible outcome is given, together with the corresponding P values for tests of HWE based on the x2 statistic and on the exact test statistics PHWE, Plow, and Phigh (described in the main text). a . P(n FN p 100,n p 21) AB A b Configurations that would be rejected at the significance level a p 0.05. gotes are observed (for ⭐15 heterozygotes, the x2 test statistic corresponds to an asymptotic ). This P ⭐ .045 results in an inflated type I error rate of 0.070 and there- fore is inappropriate. In this sample, it is not possible to reject HWE because of an excess of heterozygous individuals—the probability of observing the maximum of 21 heterozygotes is 0.31, and none of the test statistics gives a P value !.05 for this extreme configuration. Ad- ditional examples of the performance of exact test sta- tistics for HWE can be found in the work by Vithayasai (1973). In general, the exact test statistics are conservative when a small number of minor-allele copies are present in the sample, but they approximate nominal signifi- cance levels as the sample size (and number of minor- allele copies) increases. In contrast, the commonly used x2 statistic can produce excessively small or large P val- ues for specific outcomes (Hernandez and Weir 1989). To comprehensively evaluate the performance of the x2 and exact test statistics, we calculated their type I error rates for specified significance levels of , 0.01, a p 0.05 or 0.001, for sample sizes of or N p 100 N p 1,000 individuals and varying minor-allele counts. The results are summarized in figure 1 (for samples in which !25% of chromosomes carry the minor allele) and figure 2 (for samples in which 110% of chromosomes carry the mi- nor allele), and it is clear that the statistics exhibit some periodicity in their type I error rates. As expected, both the exact PHWE statistic and the x2 statistic perform better as the sample size and minor-allele counts increase. Nev- ertheless, one important difference is that the x2 statistic can sometimes be extremely anticonservative (e.g., in a sample of 1,000 individuals, when nominal , a p 0.001 the true type I error rate can exceed 0.06 and is often 10.01 for minor-allele counts !100), whereas the exact statistic never exceeds the nominal significance level. In practical settings, the x2 statistic could lead to many false rejections of HWE that depend on only the particular count of minor alleles in the sample. To understand the periodicity of the statistics, it is important to consider the discrete nature of the data. For example, for a sample of individuals in- N p 100 cluding 2–5 copies of the minor allele, we reject HWE at the significance level (fig. 1A) when there a p 0.05 is at least one homozygote for the minor allele. The probability of observing more than one homozygote for the minor allele increases gradually from 0.0050 when there are two copies of the allele in the sample up to 0.0499 when there are five copies of the minor allele in the sample. When there are 6–14 copies of the minor allele in the sample, we reject HWE at the a p 0.05 significance level (fig. 1A) when at least two homo- zygotes for the rare allele are observed. Again, the prob- ability of a more extreme event is quite low for small numbers of the rare allele ( with six copies of P p .0011 the minor allele in the sample) but gradually increases if there are additional copies of the minor allele in the sample ( with 13 copies of the minor allele). P p .0482 In table 2, the overall type I error rates for each sta- tistic are summarized for sample sizes of 100 or 1,000 individuals and various ranges of minor-allele counts. It is clear that, on average, the x2 test approximates nom- inal significance levels as the number of minor alleles in the sample increases. Nevertheless, as illustrated in figure 1, this is achieved at the cost of inflated error rates for samples with specific numbers of minor alleles. Even in a sample of 1,000 individuals, the type I error rate at a p 0.001 for the x2 test is inflated when there are !200 copies of the minor allele (corresponding to an allele frequency of ∼10%). The exact tests approximate nom-
  • 5. Figure 2 Type I error rates as a function of minor-allele counts for common alleles, for samples of either 100 or 1,000 chromosomes and corresponding to a significance threshold of , 0.01, or 0.001. Results are plotted as a function of the number of minor alleles in the sample for the exact statistic (red) and for the asymptotic x2 test statistic (blue). A gray a p 0.05 PHWE line denotes the nominal error rate. Note that the Y-axes in figures 1 and 2 differ.
  • 6. 892 Am. J. Hum. Genet. 76:887–883, 2005 Table 2 Actual Error Rates for the x2 Test Statistic and the PHWE Test Statistic for Nominal Significance Level a p 0.01 or 0.001 SAMPLE AND MINOR-ALLELE COUNT a p 0.01a a p 0.001a x2 PHWE x2 PHWE N p 1,000 1–100 .0208b (.0208)b .0039 (.0039) .0088b (.0088)b .0004 (.0004) 101–200 .0100 (.0154)b .0065 (.0052) .0017b (.0053)b .0006 (.0005) 201–400 .0097 (.0126)b .0083 (.0067) .0010 (.0032)b .0008 (.0006) 401–1,000 .0100 (.0110)b .0090 (.0081) .0010 (.0018)b .0009 (.0008) N p 100 1–10 .0292b (.0292)b .0024 (.0024) .0114b (.0114)b .0001 (.0001) 11–20 .0191b (.0242)b .0035 (.0030) .0035b (.0074)b .0003 (.0002) 21–40 .0083 (.0162)b .0037 (.0033) .0016b (.0045)b .0004 (.0003) 41–100 .0099 (.0124)b .0072 (.0057) .0009 (.0023)b .0006 (.0005) NOTE.—Results are tabulated for samples of 100 and 1,000 individuals and represent simple averages for each range of minor-allele counts. a The error rate for each bin is tabulated, followed by the cumulative error rate in parenthesis. The cumulative error rate is calculated by including each bin and all previous bins. For example, for a sample of size 1,000, when a p 0.001, the type I error rate for the standard x2 test in a sample with 101–200 copies of the minor allele is 0.0017 and the cumulative error rate, cor- responding to samples with 1–200 copies of the minor allele, is 0.0053. b Exceeds nominal significance level. inal significance levels with increasing sample size but remain conservative because of the discrete nature of the data. As a final evaluation of our approach, we applied our method to a subset of the genotypes collected by the International HapMap Consortium (2003). We focused on a set of 18,460 SNP markers genotyped indepen- dently by two different centers with no discrepancies between the two sets of experimental results. For each of these markers, we evaluated evidence against HWE by using both the exact PHWE statistic and the asymptotic x2 statistic. Results were broadly similar for 14,889 markers with minor-allele frequencies ⭓20%. However, we observed noticeable differences for 3,571 markers with minor-allele frequencies !20%. For example, the x2 test rejected HWE for 71 of these markers at a p (twice as many as the 35 markers expected to fail 0.01 this test by chance), whereas the exact test rejected HWE for only 33 markers. At the more stringent a p 0.001 significance level, the x2 test rejected HWE for 28 mark- ers (rejection for 3 markers is expected by chance), whereas the exact PHWE statistic rejected HWE for only 5 markers. Although we focus on testing the agreement of ob- served genotypes with HWE proportions, computation- ally efficient exact tests can be constructed for any de- sired genotype proportions. In brief, let the expected proportion of heterozygotes be pAB and the two ho- mozygote proportions be pAA and pBB. For exam- ple, in a population with inbreeding coefficient f, we might expect the proportion of heterozygotes to be . Define the quantity so that 2 2(1 ⫺ f)p p v p p / p p A B AB AA BB when HWE holds. Then, the probability of ob- v p 4 serving nAB heterozygotes is n /2 AB v N! 1 P(N p n FN,n ) p # , AB AB A n !n !n ! C AA AB BB where ∗ n /2 AB v N! C p 冘 ∗ ∗ ∗ ∗ n !n !n ! n AA AB BB AB (Wellek 2004). It is simple to verify that the recurrence relationships given in equation (2) can be extended to this setting by replacing the number 4 with the quantity v in each expression. The exact test statistics for HWE described here are accurate for a variety of allele frequencies and can be computed in an inexpensive manner. We recommend that they be used instead of the standard x2 test statistic in all situations. For large data sets, rather than fixing an arbitrary threshold for rejecting HWE, we suggest that methods based on the false-discovery rate (Benja- mini and Hochberg 1995) be used to identify a subset of markers whose genotypes do not conform to the ex- pected equilibrium distribution. The PHWE test statistic described here is implemented in the Pedstats software package (see Pedstats Web site), which generates summaries and checks the integrity of genetic data. In addition, code for calculating Plow, Phigh, and PHWE in C/C⫹⫹, R, and Fortran is available from the authors’ Web site. With appropriate citation, our code is freely available for use and can be incorporated
  • 7. Reports 893 into other programs. The HapMap Project genotype data are freely available at the HapMap Web site. Acknowledgments We gratefully acknowledge grant support from the National Human Genome Research Institute and the National Eye In- stitute. The manuscript was improved by helpful comments from reviewers. Electronic-Database Information The URLs for data presented herein are as follows: Authors’ Web site, http://www.sph.umich.edu/csg/abecasis/ HapMap, http://www.hapmap.org/ Pedstats, http://www.sph.umich.edu/csg/abecasis/Pedstats/ References Benjamini Y, Hochberg Y (1995) Controlling the false discov- ery rate: a practical and powerful approach to multiple test- ing. J R Stat Soc Ser B 57:289–300 Cardon LR, Abecasis GR (2003) Using haplotype blocks to map human complex trait loci. Trends Genet 19:135–140 Crow JF (1988) Eighty years ago: the beginnings of population genetics. Genetics 119:473–476 Fisher RA (1934) Statistical methods for research workers. Oliver and Boyd, Edinburgh Guo SW, Thompson EA (1992) Performing the exact test of Hardy-Weinberg proportion for multiple alleles. Biometrics 48:361–372 Haldane JBS (1954) An exact test for randomness of mating. J Genet 52:631–635 Hardy HG (1908) Mendelian proportions in a mixed popu- lation. Science 28:49–50 Hernandez JL, Weir BS (1989) A disequilibrium coefficient approach to Hardy-Weinberg equilibrium testing. Biomet- rics 45:53–70 International HapMap Consortium (2003) The International HapMap Project. Nature 426:789–796 Kwok PY (2001) Methods for genotyping single nucleotide polymorphisms. Annu Rev Genomics Hum Genet 2:235– 258 Levene H (1949) On a matching problem arising in genetics. Ann Math Stat 21:91–94 Vithayasai C (1973) Exact critical values of the Hardy-Wein- berg test statistic for two alleles. Communic Stat 1:229–242 Weber JL, Broman KW (2001) Genotyping for human whole- genome scans: past, present, and future. Adv Genet 42:77– 96 Weinberg W (1908) On the demonstration of heredity in man. In: Boyer SH, trans (1963) Papers on human genetics. Pren- tice Hall, Englewood Cliffs, NJ Weir BS (1996) Genetic data analysis II. Sinauer Associates, Sunderland, MA Wellek S (2004) Tests for establishing compatibility of an observed genotype distribution with Hardy-Weinberg equi- librium in the case of a biallelic locus. Biometrics 60:694– 703