1. BSHI 2002
Glasgow, Scotland
STATISTICAL ANALYSIS OFSTATISTICAL ANALYSIS OF
HLA AND DISEASEHLA AND DISEASE
ASSOCIATIONSASSOCIATIONS
M. Tevfik DORAKM. Tevfik DORAK
Department of EpidemiologyDepartment of Epidemiology
University of Alabama at BirminghamUniversity of Alabama at Birmingham
U.S.A.U.S.A.
Present address (2007):Present address (2007):
Newcastle UniversityNewcastle University
School of Clinical Medical SciencesSchool of Clinical Medical Sciences
U.K.U.K.
http://www.dorak.infohttp://www.dorak.info
2. BSHI 2002
Glasgow, Scotland
This workshop will cover categorical data analysis
for case-control design and some concepts in
population genetics
AIMSAIMS
Familiarization with common statistical tests
useful in HLA and disease association studies
Clarification of several statistical concepts
Discussion of common mistakes
Interpretation of results
3. BSHI 2002
Glasgow, Scotland
Why would you do an associationWhy would you do an association
study?study?
Disease gene mapping and positional cloning
Molecular profiling
(to predict susceptibility, outcome, response, prognosis)
Basic science
(to learn about disease development and
subsequently to design diagnostic tests or new treatment)
4. BSHI 2002
Glasgow, Scotland
Meaning of an associationMeaning of an association
Population stratification (confounding by ethnicity) or
other spurious associations
Linkage disequilibrium (confounding by locus)
Direct involvement in the disease process
5. BSHI 2002
Glasgow, Scotland
Cross-validation of resultsCross-validation of results
Replication
(population level and/or family-based)
Functional studies
Split the sample into two random groups
(if nothing else can be done!)
6. BSHI 2002
Glasgow, Scotland
Failure to replicateFailure to replicate
False positive in the original study
False negative in the second one
Population specificity
Population stratification
7. BSHI 2002
Glasgow, Scotland
Considerations at the beginningConsiderations at the beginning
Will you have enough power?
Who are the controls? Unrelated or family-based?
A subgroup vs another one (males vs females)?
Prospective sequential sampling or retrospective
convenience samples for cases?
Remember you will be testing whether the cases
and controls are from the same population. The
answer shouldn’t be obvious at the beginning.
8. BSHI 2002
Glasgow, Scotland
An example of power calculationAn example of power calculation
Proportion Difference Power / Sample Size Calculation
Significance Level (alpha): .05 (Usually 0.05)
Power (% chance of detecting): .80 (Usually 80)
First Group Population Proportion: .40 (Between 0.0 and 1.0)
Second Group Population Proportion: .60 (Between 0.0 and 1.0)
Relative Sample Sizes Required: 2.0 (For equal samples, use 1.0)
Sample Size Required: Group 1: 80 Group 2: 160
(Sample sizes become 115 : 231 for P = 0.01)
9. BSHI 2002
Glasgow, Scotland
An example of power calculationAn example of power calculation
Proportion Difference Power / Sample Size Calculation
Significance Level (alpha): .01 (Usually 0.05)
Power (% chance of detecting): .80 (Usually 80)
First Group Population Proportion: .05 (Between 0.0 and 1.0)
Second Group Population Proportion: .10 (Between 0.0 and 1.0)
Relative Sample Sizes Required: 2.0 (For equal samples, use 1.0)
Sample Size Required: Group 1: 538 Group 2: 1077
http://statpages.org/proppowr.html
10. BSHI 2002
Glasgow, Scotland
Beware of the following flaws and fallacies ofBeware of the following flaws and fallacies of
epidemiologic studiesepidemiologic studies
confounders (known or unknown)
selection bias
response bias
misclassification bias
variable observer
Hawthorne effect (changes caused by the observer in the observed values)
diagnostic accuracy bias
regression to the mean
significance Turkey
nerd of nonsignificance
cohort effect
ecologic fallacy
Berkson bias (selection bias in hospital-based studies)
SEE: http://www.dorak.info/epi/bc.html
11. BSHI 2002
Glasgow, Scotland
Categorical Data AnalysisCategorical Data Analysis
* 2x2 Table Analysis for Association
Chi-squared (Pearson, Yates)
Fisher
G-test
McNemar's test: TDT, HRR
(Logistic Regression)
* Odds Ratio - Relative Risk
Difference between OR and RR
Woolf-Haldane Modification
Comparison of two ORs
Adjusted OR
* Linkage Disequilibrium
Comparison of two LDs
* RxC (multicontingency) Table Analysis
Chi-squared
G-test
Exact Tests (needed for HWE)
Trend Test (frequently overlooked)
See http://www.dorak.info/hla/stat.html
12. BSHI 2002
Glasgow, Scotland
The SAS SystemThe SAS System
FREQ Procedure Output – IFREQ Procedure Output – I
Statistic DF Value Prob
Chi-Square 1 7.9047 0.0049
Likelihood Ratio Chi-Square 1 8.0067 0.0047
Continuity Adj. Chi-Square 1 7.3064 0.0069
Mantel-Haenszel Chi-Square 1 7.8840 0.0050
Phi Coefficient -0.1439
Contingency Coefficient 0.1424
Cramer's V -0.1439
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Fisher's Exact Test
Cell (1,1) Frequency (F) 45
Left-sided Pr <= F 0.0033
Right-sided Pr >= F 0.9983
Table Probability (P) 0.0016
Two-sided Pr <= P 0.0066
13. BSHI 2002
Glasgow, Scotland
The SAS SystemThe SAS System
FREQ Procedure Output – IIFREQ Procedure Output – II
Estimates of the Common Relative Risk (Row1/Row2)
Type of Study Method Value 95% Confidence Limits
Case-Control Mantel-Haenszel 0.5359 0.3461 0.8299
(Odds Ratio) Logit 0.5359 0.3461 0.8299
Cohort Mantel-Haenszel 0.6595 0.4892 0.8891
(Col1 Risk) Logit 0.6595 0.4892 0.8891
Cohort Mantel-Haenszel 1.2306 1.0666 1.4198
(Col2 Risk) Logit 1.2306 1.0666 1.4198
14. BSHI 2002
Glasgow, Scotland
□ ○
■BC
AC BB
□ ○
●
BB
BC AB
□ ○
●
□ ○
■
BC AB
AB CD AC BD
“transmitted allele“ “case”
“Non-transmitted allele” “control”
Parent-Case Trios in TDTParent-Case Trios in TDT/HRR/HRR
15. BSHI 2002
Glasgow, Scotland
- AN EXAMPLE OF TDT -- AN EXAMPLE OF TDT -
TRANSMISSION DISEQUILIBRIUM OF HLA-B62 TOTRANSMISSION DISEQUILIBRIUM OF HLA-B62 TO
THE PATIENTS WITH CHILDHOOD AMLTHE PATIENTS WITH CHILDHOOD AML
(Dorak et al, BSHI 2002)(Dorak et al, BSHI 2002)
Out of 13 parents heterozygote for B62,
12 transmitted B62 to the affected child and 1 did not
McNemar’s test results:
P = 0.0055 (with continuity correction)
odds ratio = 12.0, 95% CI = 1.8 to 513
Nontransmitted Allele
B62 Other
Transmitted
Allele
B62 x 12
Other 1 y
16. BSHI 2002
Glasgow, Scotland
Multiple comparisonsMultiple comparisons
Not needed if the study is not hypothesis driven
(i.e., a fishing experiment)
Not needed if the study is hypothesis driven
('Possible relevance of the HLA system' is not a valid
hypothesis in this context. Those studies belong to the
fishing experiments group)
Therefore, it is not clear when it is needed in HLA
association studies. Most frequently, it is an
excuse for a busy reviewer to avoid a
comprehensive review
Best solution is to avoid facing this problem
-ideally by replication and/or functional data to
support the statistical association before it is
dismissed as a spurious result of multiple
comparisons
17. BSHI 2002
Glasgow, Scotland
Common Mistakes in Statistical EvaluationCommon Mistakes in Statistical Evaluation
of Association Study Results - Iof Association Study Results - I
Confusion between corrections
(Yates/Williams for continuity VS Bonferroni)
Confusion between RR and OR
(they are not the same)
Confusion between expected and observed values in cells
of a contingency table
Small sample size issue
Don’t confuse a negative result with lack of power
(‘No significant difference between the two groups and they were
pooled’ VS ‘the difference did not reach significance due to small
sample size’ are different interpretations of the same
phenomenon, i.e., lack of power)
Using Chi-squared test for small sample size
(why not use Fisher all the time?)
Using Chi-squared test for HWE
(use exact test or G-test)
18. BSHI 2002
Glasgow, Scotland
Common Mistakes in Statistical EvaluationCommon Mistakes in Statistical Evaluation
of Association Study Results - IIof Association Study Results - II
One-tailed and two-tailed P values
(always use two-tailed)
Trend test for a multicontingency table?
(if appropriate, more powerful)
Multiple comparison issue
Failure to give the strength of the association (OR, RR, RH)
Use of the word ‘proof’. Does statistics prove anything?
(A ‘P value’ provides a sense of the strength of the evidence for or
against the null hypothesis of no association)
Reliance on large sample effect to achieve significance
Showing P values as 0.000 (this means P < 0.001)
Confusion between association and linkage
19. BSHI 2002
Glasgow, Scotland
Association and Causality?Association and Causality?
However strong an association does not necessarily meanHowever strong an association does not necessarily mean
causation. Several criteria have been proposed to assess thecausation. Several criteria have been proposed to assess the
role of an associated marker in causation. Some of those arerole of an associated marker in causation. Some of those are
as follows:as follows:
1. Biological plausibility1. Biological plausibility
2. Strength of association (this is2. Strength of association (this is notnot measured by themeasured by the PP
value)value)
3. Dose response (are heterozygotes intermediate between3. Dose response (are heterozygotes intermediate between
the two homozygotes, or is homozygosity showing a strongerthe two homozygotes, or is homozygosity showing a stronger
association than just having the marker?)association than just having the marker?)
4. Time sequence (this is inherent in the germ-line nature of4. Time sequence (this is inherent in the germ-line nature of
HLA genes)HLA genes)
5. Consistency (next slide lists reasons for inconsistency in5. Consistency (next slide lists reasons for inconsistency in
HLA association studies)HLA association studies)
6. Specificity of the association to the disease studied6. Specificity of the association to the disease studied
20. BSHI 2002
Glasgow, Scotland
Why Are the Inconsistencies? (I)Why Are the Inconsistencies? (I)
1. Mistakes in genotyping (lack of HWE in controls is1. Mistakes in genotyping (lack of HWE in controls is
usually an indication of problems with typing rather thanusually an indication of problems with typing rather than
selection, admixture, nonrandom mating or other reasons ofselection, admixture, nonrandom mating or other reasons of
departure from HWE)departure from HWE)
2. Poor control selection (would your controls be in the2. Poor control selection (would your controls be in the
case group if they had the disease, and would the cases be incase group if they had the disease, and would the cases be in
your control group if they were free of the disease?)your control group if they were free of the disease?)
3. Design problems including the statistical power issue3. Design problems including the statistical power issue
(negative results due to lack of statistical power should be(negative results due to lack of statistical power should be
distinguished from truly negative results observed despitedistinguished from truly negative results observed despite
having sufficient power)having sufficient power)
4. Publication bias (are there many more studies with4. Publication bias (are there many more studies with
negative results but we have never heard about them?)negative results but we have never heard about them?)
5. Disease misclassification or misclassification bias5. Disease misclassification or misclassification bias
21. BSHI 2002
Glasgow, Scotland
Why Are the Inconsistencies? (II)Why Are the Inconsistencies? (II)
6. Excessive type I errors (are the positive results due to6. Excessive type I errors (are the positive results due to
usingusing PP < 0.05 as the statistical significance?)< 0.05 as the statistical significance?)
7. Posthoc and subgroup analysis (are positive results due7. Posthoc and subgroup analysis (are positive results due
to fishing (data dredging)?)to fishing (data dredging)?)
8. Unjustified multiple comparisons and subsequent type II8. Unjustified multiple comparisons and subsequent type II
errorerror
9. Failure to consider the mode of inheritance in a genetic9. Failure to consider the mode of inheritance in a genetic
diseasedisease
10. Failure to account for the LD structure of the gene (only10. Failure to account for the LD structure of the gene (only
haplotype-tagging markers will show the association, otherhaplotype-tagging markers will show the association, other
markers within the same gene may fail to show anmarkers within the same gene may fail to show an
association and generate background noise)association and generate background noise)
11. Likelihood that the gene studied account for a small11. Likelihood that the gene studied account for a small
proportion of the variability in riskproportion of the variability in risk
22. BSHI 2002
Glasgow, Scotland
Further informationFurther information
Select ‘Biostatistics' or ‘Epidemiology’ at
http://www.dorak.info
or write to me at
dorakmt :at: lycos.com
[please do not add to your address book as it will change periodically]
23. BSHI 2002
Glasgow, Scotland
I am grateful to the BSHI Organizing Committee
for giving me the opportunity to run this
workshop at BSHI 2002 in Glasgow.
I particularly thank Nancy Henderson and Ian
Galbraith also for their hospitality.
BSHI AGMBSHI AGM
5:15 pm5:15 pm
All members should attendAll members should attend
Be the first to comment