This document discusses SNPs, GWAS, and association studies. It defines SNPs, linkage disequilibrium, haplotype blocks, and Hardy-Weinberg equilibrium. It describes two main types of association studies: family-based studies using TDT and case-control studies using allele frequencies and odds ratios. Issues like population stratification, principal components, identity-by-descent are addressed. GWAS require large sample sizes to reliably detect genetic associations and increase reproducibility.
2. 2
Polymorphism
• Polymorphism: sites/genes with “common”
variation
• Locus (location) vs alleles (variations)
• Minor allele frequency >= 1%, otherwise called
rare variant and not polymorphic
• Single Nucleotide Polymorphism
– Come from DNA-replication mistake
individual germ line cell, then transmitted
– ~90% of human genetic variation
• Copy number variations
– May or may not be genetic
STAT115
3. 3
SNP Characteristics:
Linkage Disequilibrium
• Hardy-Weinberg equilibrium
– In a population with genotypes AA, aa, and Aa, if p =
freq(A), q =freq(a), the frequency of AA, aa and Aa
will be p2, q2, and 2 pq respectively at equilibrium.
– Similarly with two loci, each two alleles Aa, Bb
STAT115
0.26 ab
4. 4
SNP Characteristics:
Linkage Disequilibrium
• LD: If Alleles occur together more often than can
be accounted for by chance, then indicate two
alleles are physically close on the DNA
• Haplotype block: a cluster of linked SNPs
• Haplotype boundary: blocks of sequence with
strong LD within blocks and no LD between
blocks, reflect recombination hotspots
STAT115
6. 6
SNP Profiling
• [C/T] [A/G] T X C [A/C] [T/A]
– 24 possible haplotype, although often a few common
ones explain 90% variations
• Tagging (non-redundant) SNPs that capture most
variations in haplotypes
– reference SNP ID number: rs12345678
• SNP arrays covering
whole genome
• Now WES or WGS
• Geno-type 2 alleles
STAT115
7. 7
Association Studies
• Association between genetic markers and
phenotype
– E.g. Cystic Fibrosis ~70% of Cystic Fibrosis
patients have a deletion of 3 base pairs resulting
in the loss of a phenylalanine amino acid at
position 508 of the CFTR gene
• Especially, find disease genes, SNP / haplotype
markers, for susceptibility prediction and
diagnosis
9. • Warfarin anticoagulant drug; CYP2C9 gene
metabolizes warfarin.
• A patient requiring low dosage warfarin
compared to normal population, has an odd
ratio of 6.21 for having 1 variant allele
• Subgroup of patients who are poor
metabolisers of warfarin are potentially at
higher risk of bleeding
Warfarin and CYP2C9:
SNPs in Pharmacogenomics
Aithal et al., 1999, Lancet.
Break
10. Genome-Wide Association Studies
• Quality Control
– Unusual similarity between individual
– Wrong sex
– Trio has non-Mendelian inheritance
– Genotyping quality
• Two strategies:
– Family-based association studies
– Population-based case-control association
studies
10
11. Family-based Association Studies
Look at allele transmission in unrelated families and
one affected child in each
11
Like coin toss,
likelihood of fair coin
A a
A
a
0
0
12. TDT: Transmission Disequilibrium Test
• Only heterozygote parents matters, calculate
observed over expected
• Could also compare allele frequency between
affected vs unaffected children in the same family
12
Z2
TDT
=
(A-a)2
A+a
=
(9-2)2
9+2
, ZTDT
2
~ c2
,1df
13. Case Control Studies
• SNP/haplotype marker frequency in sample
of affected cases compared to that in age
/sex /population-matched sample of
unaffected controls
13
17. Association of Alleles and Genotypes of
rs1333049 with Myocardial Infarction
• OR = 1, no disease association
• OR > 1, allele C increase risk of disease
• OR < 1, allele C decrease risk of disease
• Adjusting for multiple hypotheses testing?
C
N (%)
G
N (%)
2
(1df)
P-value
Cases 2,132 (55.4) 1,716 (44.6)
55.1 1.2 x 10-13
Controls 2,783 (47.4) 3,089 (52.6)
Allelic Odds Ratio = 1.38
Samani N et al, N Engl J Med 2007; 357:443-453.
Break
18. 18
Reproducibility of Association Studies
• Most reported associations have not been
consistently reproduced
• Hirschhorn et al, Genetics in Medicine, 2002,
review of association studies
– 603 associations of polymorphisms and disease
– 166 studied in at least three populations
– Only 6 seen in > 75% studies
21. Population Stratification
• Population stratification
– e.g. some SNP unique to ethnic group
– Need to make sure sample groups match
– Hidden environmental structure
● Two populations have different disease frequency,
and different allele frequency.
● Association picks up the fact they are different
populations!
21
23. IBD: Identity By Descent Test
• If two individuals share common ancestor, they
will share many SNPs / haplotype blocks on their
genome (identical by state: IBS)
• IBD are IBS by definition; IBS not necessarily
IBD
23
24. IBD: Identity By Descent Test
• Pairwise IBD probability between samples
• Probability two individuals share 0 (Z0), 1 (Z1),
and 2 (Z2) haplotypes across the genome.
• Remove IDBs
24
27. 27
Summary
• SNP, LD, haplotypes and tagging SNPs
• GWAS:
– Family based association studies: TDT transmitted
allele to affected child
– Case control studies: X-sq (allele frequency difference
in case and controls) and OR
• Increase reproducibility by size and reduce
population stratification and IBD
STAT115
28. Acknowledgement
• Francisco Ubeda
• Jun Liu
• Tim Niu
• Bo Li
• Cheng Li Jim Stankovich
• Teri Manolio
• David Evans
• Guodong Wu
• Stefano Mont
• Wei Wang
• Soumya Raychaudhuri
• Kenneth Kidd
• Judith Kidd
• Glenys Thomson
• Joel Hirschhorn
• Greg Gibson
• Spencer Muse
• Jim Stankovich
• Teri Manolio
• Benjamin Neale
• Enrico Petretto
28