Statistical Genetics

2,051
-1

Published on

0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,051
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
155
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide

Statistical Genetics

  1. 1. Statistical Genetics Matt McQueen Assistant Professor Institute for Behavioral Genetics University of Colorado at Boulder
  2. 2. Why am I here? Statistical Genetics - Biodemography
  3. 3. Perspectives…
  4. 4. Perspectives… Epidemiology
  5. 5. Perspectives… Biostatistics
  6. 6. Perspectives… Health Policy
  7. 7. Perspectives… Environmental Health
  8. 8. Perspectives… Society, Human Development and Health
  9. 9. The View from Here… GENES Outcome Environment
  10. 10. Overview Background and Introduction Linkage and Linkage Disequilibrium Population Genetics Linkage Analysis Association Analysis
  11. 11. Overview Background and Introduction Linkage and Linkage Disequilibrium Population Genetics Linkage Analysis Association Analysis
  12. 12. Statistical Genetics Otherwise known as: - Genetic Epidemiology - Genetic Statistics By definition, “integrative” - Combines epidemiological, statistical, clinical, genetic and molecular approaches
  13. 13. Genetic Discovery Evidence for genetic effects? Familial aggregation Mode of inheritance? Segregation Analysis What chromosome / region? Linkage Analysis Where in the region? Fine Mapping What gene? Association Analysis What is the effect of the gene? Characterization
  14. 14. Why Hunt for Genes?
  15. 15. Why Hunt for Genes? Disease etiology
  16. 16. Why Hunt for Genes? Disease etiology Refined diagnosis and/or prognosis
  17. 17. Why Hunt for Genes? Disease etiology Refined diagnosis and/or prognosis Drug development
  18. 18. Why Hunt for Genes? Disease etiology Refined diagnosis and/or prognosis Drug development Disease prediction
  19. 19. Challenges
  20. 20. Challenges Field is young and changes rapidly - Technology drives the science - We test because we can
  21. 21. Challenges Literature can be difficult - Statisticians writing genetic papers - Geneticists writing statistical papers
  22. 22. Challenges Software typically not well-tested or supported - The cost of being “free” - Use at your own risk!
  23. 23. Challenges Methods are often oversold - Consequence of high-pressure field - Rapid development creates sense of urgency
  24. 24. Some Terminology Locus - A location in the genome Gene - A DNA segment characterized by sequence, transcription or homology Allele - Different forms of a gene: A, a; B, b Polymorphism - Allele present in the population with > 5% freq Mutation - Allele present in the population with < 5% freq
  25. 25. Some Terminology Phenotype - Any measurable outcome Quantitative Trait Locus (QTL) - A region (gene) that contributes to a phenotype Penetrance (binary, disease phenotypes) - Prob(Phenotype | Genotype) Heritability (quantitative traits) - Variance explained by genetic factors Mendelian Disorder - Diseases influenced by a single gene Complex Trait - Disease influenced by multiple genes and environment
  26. 26. Pedigree Notation Founders Male Female
  27. 27. Pedigree Notation Male Female Affected
  28. 28. Pedigree Notation Male Female Affected Deceased
  29. 29. Mendel’s Laws
  30. 30. Mendel’s Laws Mendel’s First Law - Independent Segregation Mom G g G GG Gg Dad g Gg gg
  31. 31. Mendel’s Laws Mendel’s Second Law - Independent Assortment
  32. 32. Mendel’s Laws What do peas have to do with people? - Underlying principles of statistical genetics!
  33. 33. Overview A Brief Introduction Linkage and Linkage Disequilibrium Population Genetics Linkage Analysis Association Analysis
  34. 34. Linkage
  35. 35. Linkage General Idea: - Describes the relationship between two loci - If two loci are close in proximity - “linked” - If two loci are far apart (different chromosomes): - “not linked”
  36. 36. Recombinant Events
  37. 37. Recombination A1 A2 B1 B2
  38. 38. Recombination A1 A2 B1 B2 gametes A1 A1 A2 A2 B1 B2 B1 B2
  39. 39. Recombination A1 A2 B1 B2 gametes A1 A1 A2 A2 B1 B2 B1 B2 1− θ θ θ 1− θ probability 2 2 2 2 θ = Recombination Rate
  40. 40. No Linkage A1 A2 B1 B2 gametes A1 A1 A2 A2 B1 B2 B1 B2 1 1 1 1 probability 4 4 4 4 θ ~ 0.5
  41. 41. Where have we seen this before? Mendel’s Second Law - Independent Assortment
  42. 42. Linkage A1 A2 B1 B2 gametes A1 A2 B1 B2 1 1 2 2 θ~0
  43. 43. Recombination Events What predicts a recombination event? What drives the recombination fraction?
  44. 44. Genetic Distance Definition: - The expected number of crossover events between two loci Units: - Morgans - 1 Morgan = 1 crossover event expected Genetic Map - A linearly arranged set of loci with genetic distances between them - Human Autosomes ~ 3900 cM
  45. 45. Linkage Disequilibrium
  46. 46. Linkage Disequilibrium General Idea: - Describes the relationship between alleles at two loci - If the alleles at each loci are close in proximity: - “in linkage disequilibrium”
  47. 47. Linkage Disequilibrium Gametes A1B1 A1B2 A2B1 A2B2 Frequency x1 x2 x3 x4
  48. 48. Linkage Disequilibrium Gametes A1B1 A1B2 A2B1 A2B2 Frequency x1 x2 x3 x4 Allele A1 A2 B1 B2 Frequency pA1=x1+x2 pA2=x3+x4 pB1=x1+x3 pB2=x2+x4
  49. 49. Linkage Disequilibrium Gametes A1B1 A1B2 A2B1 A2B2 Frequency x1 x2 x3 x4 Allele A1 A2 B1 B2 Frequency pA1=x1+x2 pA2=x3+x4 pB1=x1+x3 pB2=x2+x4 D = Observed - Expected D = x1 − pA1 pB1 D = x1 − (x1 + x 2 )(x1 + x 3 ) D = x1 x 4 − x 2 x 3
  50. 50. Another Common LD Metric D2 r2 = pA1 pA 2 pB1 pB 2
  51. 51. Reasons for LD Mutation Population Subdivision Genetic Drift Lack of Recombination Selection Non-Random Mating
  52. 52. How does linkage relate to linkage disequilibrium?
  53. 53. Linkage and LD After t generations of random mating… Dt = (1− θ ) D0 t LD is a function of recombination and time (generations)
  54. 54. Linkage and LD Key Concepts… - Linkage : Location - LD : Alleles - There can be Linkage without LD - There can be LD without Linkage
  55. 55. Overview Background and Introduction Linkage and Linkage Disequilibrium Population Genetics Linkage Analysis Association Analysis
  56. 56. DNA Variation DNA - Adenine (A) - Guanine (G) - Cytosine (C) - Thymine (T) DNA double helix - A pairs with T and G pairs with C Codons - Triplets of bases - 64 possible codons - 20 amino acids
  57. 57. Mutations Point - Substitute one base for another Deletions - Base removed entirely Insertions - Base inserted Duplications - Base and/or sequence duplicated
  58. 58. Mutations Point - Substitute one base for another Deletions - Base removed entirely Insertions - Base inserted Duplications - Base and/or sequence duplicated
  59. 59. More on Point Mutations Point Mutations - Synonymous - No change in amino acid - Nonsynonymous - Amino acid change - Creates a new polymorphic site - “Single Nucleotide Polymorphism” (SNP)
  60. 60. Mutation Becomes Polymorphism Infinite Sites Model - Each mutation creates a unique polymorphic site - Mutation rate ~ 10-6
  61. 61. Life After Mutation Mutation is neutral - Random Genetic Drift - Eventually, the allele will “drift” out Mutation is harmful - Selective Pressure - Allele may quickly disappear Mutation is beneficial - Selective Pressure - Allele frequency may increase rapidly
  62. 62. Human Genetic History
  63. 63. Human Genetic History National Geographic: The Genographic Project
  64. 64. Human Genetic History National Geographic: The Genographic Project
  65. 65. Human Genetic History National Geographic: The Genographic Project
  66. 66. Human Genetic History National Geographic: The Genographic Project
  67. 67. Who Are We? Sequences Time
  68. 68. Who Are We? Sequences Time
  69. 69. Who Are We? Time MRCA
  70. 70. Who Are We? All DNA sequences are derived from others - Every sample has a genealogy Eventually, all lineages coalesce - Most Recent Common Ancestor (MRCA) The “older” the genetic history… - The less observed LD (Africans vs European) The more isolated genetic history… - The more observed LD (Mayan)
  71. 71. Who Are We? What does this have to do with gene-mapping? Balding (2006) Nat Rev Genet.
  72. 72. Overview Background and Introduction Linkage and Linkage Disequilibrium Population Genetics Linkage Analysis Association Analysis
  73. 73. Linkage Analysis Gene-Mapping - Manipulate the Properties of Linkage - Using an observed locus (marker) to draw inferences about an unobserved locus (disease gene) Family-Based Design - Extended (grandparents, parents and kids) - Nuclear (parents and kids) - Sibling Pair (no parents and kids) Goal: Find genomic region “linked” to disease
  74. 74. Linkage Analysis Disease Gene (unobserved) Genetic Markers M1 M2 M3 M4 M5 M6 M7 M8 0 10 20 30 40 50 60 70 cM Genetic Distance
  75. 75. Linkage Analysis Disease Gene M1 M2 M3 M4 M5 M6 M7 M8 0 10 20 30 40 50 60 70 cM
  76. 76. Linkage Analysis Disease Gene M1 M2 M3 M4 M5 M6 M7 M8 0 10 20 30 40 50 60 70 cM Linkage Region
  77. 77. Linkage Analysis Parametric - Affected / Unaffected - Observed recombination events Non-Parametric - Affected / Unaffected - Identity-by-Descent (IBD) “Semi-Parametric” - Quantitative - IBD MCMC - Any phenotype - IBD
  78. 78. Linkage Analysis Parametric - Affected / Unaffected - Observed recombination events Non-Parametric - Affected / Unaffected - Identity-by-Descent (IBD) “Semi-Parametric” - Quantitative - IBD MCMC - Any phenotype - IBD
  79. 79. Linkage Analysis Key Concepts - Allele Sharing (IBS and IBD) - Linkage Statistics (LOD Score, etc.)
  80. 80. Allele Sharing
  81. 81. Identity by State (IBS) ac bd How many alleles are in common? IBS = 0
  82. 82. Identity by State (IBS) ac ad How many alleles are in common? IBS = 1
  83. 83. Identity by State (IBS) ac ac How many alleles are in common? IBS = 2
  84. 84. Identity by Descent (IBD) ab cd ac bd How many alleles are common by descent? IBD = 0
  85. 85. Identity by Descent (IBD) ab cd ac ad How many alleles are common by descent? IBD = 1
  86. 86. Identity by Descent (IBD) ab cd ac ac How many alleles are common by descent? IBD = 2
  87. 87. IBS and IBD ab cd ac bd IBS = 0 IBD = 0
  88. 88. IBS and IBD ab cd ac ad IBS = 1 IBD = 1
  89. 89. IBS and IBD ab cd ac ac IBS = 2 IBD = 2
  90. 90. Ambiguous IBD ab cb bc ab IBS = 1 IBD = 0
  91. 91. IBD Probabilities Probability of Sharing IBD Alleles Relative Pair π0 π1 π2 MZ Twins 0 0 1 Full Sibs 0.25 0.50 0.25 Parent-Offspring 0 1 0 First Cousin 0.75 0.25 0 Grandparent-Grandchild 0.50 0.50 0 Half-Sibs 0.50 0.50 0 Avuncular 0.50 0.50 0
  92. 92. IBD and Sibling Pairs Probability of Sharing IBD Alleles Relative Pair π0 π1 π2 MZ Twins 0 0 1 Full Sibs 0.25 0.50 0.25 Parent-Offspring 0 1 0 First Cousin 0.75 0.25 0 Grandparent-Grandchild 0.50 0.50 0 Half-Sibs 0.50 0.50 0 Avuncular 0.50 0.50 0
  93. 93. IBD and Sibling Pairs Use of Sibling Pairs in linkage analysis - Affected Sibling Pair (ASP) Design - Binary Trait - Unascertained Sibling Pair Design - Quantitative Traits - Ascertained Sibling Pair Design - Quantitative Traits We look for regions that show deviation of IBD from what is expected under the null
  94. 94. Linkage Analysis of Sibling Pairs Basic Idea - Sibling pairs sharing more alleles IBD than expected at a trait-influencing locus should have more similar phenotypes
  95. 95. Affected Sibling Pairs ASP DSP USP If there is a shared genetic component… P(IBD=0, IBD=1, IBD=2) = 0.25, 0.50, 0.25
  96. 96. Affected Sibling Pairs Number of Alleles Shared IBD 0 1 2 Total Observed 20 45 35 100 Expected 25 50 25 100 H0: No Linkage H1: Linkage
  97. 97. Sibling Pairs (Quantitative Traits) If there is a shared genetic component… P(IBD=0, IBD=1, IBD=2) = 0.25, 0.50, 0.25
  98. 98. Quantitative Traits Haseman-Elston Algorithm - Calculate number of alleles shared IBD and the squared phenotype difference for each sibpair - Regress squared differences against IBD sharing E(∆ ) = α + βπ 2 ∆ = trait difference between sibs α = regression intercept β = slope π = IBD sharing
  99. 99. Quantitative Traits 9 8 7 6 5 4 β=0 3 2 1 0 -0.1 0.4 0.9 1.4 1.9 IBD
  100. 100. Quantitative Traits 9 8 7 6 5 4 β<0 3 2 1 0 -0.1 0.4 0.9 1.4 1.9 IBD
  101. 101. Linkage Analysis Statistics
  102. 102. The LOD Score Morton (1955) Log10 of the ODds for linkage Essentially a Likelihood Ratio - Likelihood of observed - Likelihood of expected (no linkage, theta=0.5) Developed in the context of parametric linkage
  103. 103. Common Nonparametric Statistics Maximum LOD Score - “MLS” (or MLOD) - ASP design only - GENEHUNTER, ASPEX Nonparametric Linkage Score - “NPL Score” - Any family design - GENEHUNTER Kong and Cox LOD Score - “K&C LOD Score” - Derived from the NPL - MERLIN, ALLEGRO
  104. 104. Interpreting Linkage Statistics Traditional View… - LOD > 3.0 for genome-wide significance More Contemporary View… - Simulate for empirically derived significance
  105. 105. Examples from the Literature Linkage Analysis
  106. 106. Alcoholism Reich et al (1998) Am J Med Genet (Neuropsychiatric Genetics)
  107. 107. Antisocial Drug Dependence Stallings et al (2005) Archives of Gen Psychiatry
  108. 108. Bipolar Disorder McQueen et al (2005) Am J Hum Genet
  109. 109. Overview Background and Introduction Linkage and Linkage Disequilibrium Population Genetics Linkage Analysis Association Analysis
  110. 110. Association Analysis
  111. 111. Association Analysis Gene-Mapping - Manipulate the Properties of Linkage Disequilibrium - Using an observed locus (marker) to draw inferences about an unobserved locus (disease gene) Fine-Mapping - Refine a linkage region Candidate-Gene - Evaluate the genetic variation as it relates to an outcome Goal: Find genomic region and/or genes “associated” with disease
  112. 112. Association Analysis Family-Based - Parent/Offspring Trios - Sibling Pairs - Nuclear Families - Extended Pedigrees Population-Based - Case-Control - Cohort
  113. 113. Association Analysis Key Concepts - Genotype Coding - Population Stratification - Transmission Disequilibrium Test (TDT) - Whole Genome Association
  114. 114. Coding Genotypes Assume a biallelic marker (SNP) There are three possible genotypes - AA - Aa - aa
  115. 115. Coding Genotypes Genotype aa aA AA Genotype 0,0,1 0,1,0 1,0,0 (A) Additive 0 1 2 (A) Dominant 0 1 1 (A) Recessive 0 0 1 (A)
  116. 116. Genotype Coding Marker Score = X Additive : X = (0, 1 or 2) Dominant : X = (0 or 1) Recessive : X = (0 or 1)
  117. 117. Additive Model Y 0 1 2 X
  118. 118. Dominant Model Y 0 1 2 X
  119. 119. Recessive Model Y 0 1 2 X
  120. 120. Population Stratification
  121. 121. Genetic Associations Truth - Causal locus (direct) - In LD with causal locus (indirect) Chance - If you test 100 times, you’ll see ~ 5 tests < 0.05 - No causal underpinning Bias - Association is not causal - e.g. Population stratification
  122. 122. Stratification Essentially a confounder! How does it happen?
  123. 123. Common Cause G P A Ancestry (A) predicts Genotype (G) Ancestry (A) predicts Phenotype (P) a.k.a.… Population Stratification
  124. 124. Poor Epidemiologic Design Source Population? Two Necessary Components: - Different prevalence (mean) of disease - Different allele frequency
  125. 125. Famous Example Knowler et al (1988) Am J Hum Genet.
  126. 126. DRD2
  127. 127. Stratification Happens Strategies to deal with it - Self-Reported Ancestry - Match (design) or Adjust (analysis) - Use other genetic markers (ancestry informative) - Genomic Control (Devlin – U of Pittsburgh) - STRUCTURE (Pritchard – U of Chicago) - Eigenstrat (Reich – Broad Institute/Harvard) - Use a family-based design
  128. 128. The TDT
  129. 129. Transmission Disequilibrium Test (TDT)
  130. 130. Transmission Disequilibrium Test (TDT) AB AB AB BB AA BA
  131. 131. Transmission Disequilibrium Test (TDT) AB AB AB BB Under the null: AA Equally probable! BA
  132. 132. Transmission Disequilibrium Test (TDT) AB AB AB Father - “A” was transmitted and “B” wasn’t Mother - “B” was transmitted and “A” wasn’t
  133. 133. Transmission Disequilibrium Test (TDT) AB AB Offspring Parent AA AB BB AAxAA AAxAB AAxBB ABxAB 0 1 0 AB ABxBB BBxBB
  134. 134. Transmission Disequilibrium Test (TDT) AB AB Offspring Parent AA AB BB AAxAA AAxAB AAxBB ABxAB 0 1 0 AB ABxBB BBxBB Not Transmitted Transmitted A B A nAA nAB B nBA nBB
  135. 135. Transmission Disequilibrium Test (TDT) AB AB Offspring Parent AA AB BB AAxAA AAxAB AAxBB ABxAB 0 1 0 AB ABxBB BBxBB Not Transmitted Transmitted A B A 0 1 B 1 0
  136. 136. TDT Not Transmitted A B Transmitted A nAA nAB B nBA nBB (n BA − n AB ) 2 TDT = ~ χ12 n BA + n AB McNemar Test for Matched-Pair Data
  137. 137. Generalized Extensions Multiple Offspring Missing Parents Non-Binary Phenotypes - Quantitative, time-to-onset, ordinal…
  138. 138. Generalized Extensions FBAT/PBAT (Laird/Lange - Harvard) QTDT (Abecasis/Cardon - Michigan) PDT (Monks/Kaplan - Duke)
  139. 139. Population Stratification Why are Family-Based Designs etc. robust to population stratification?
  140. 140. Family-Based Data G P GP1 GP2 A
  141. 141. Family-Based Data G P GP1 GP2 A Condition on parental genotypes
  142. 142. Family-Based Data P(G|GP1,GP2,A) = P(G| GP1,GP2) G P GP1 GP2 A Condition on parental genotypes
  143. 143. Paradigm Shift From Linkage to Association
  144. 144. Gene-Mapping Monogenic ‘Mendelian’ Diseases - Rare disease - Rare variants - Highly penetrant Complex Disease - Rare/Common disease - Rare/Common variants - Variable penetrance
  145. 145. Gene-Mapping Monogenic ‘Mendelian’ Diseases - Rare disease - Rare variants - Highly penetrant Linkage! Complex Disease - Rare/Common disease - Rare/Common variants - Variable penetrance
  146. 146. Gene-Mapping Monogenic ‘Mendelian’ Diseases - Rare disease - Rare variants - Highly penetrant Complex Disease - Rare/Common disease - Rare/Common variants - Variable penetrance Association
  147. 147. Genetic Discovery Evidence for genetic effects? Familial aggregation Mode of inheritance? Segregation Analysis What chromosome / region? Linkage Analysis Where in the region? Fine Mapping What gene? Association Analysis What is the effect of the gene? Characterization
  148. 148. Genetic Discovery Evidence for genetic effects? Familial aggregation Mode of inheritance? Segregation Analysis What chromosome / region? Linkage Analysis Where in the region? Fine Mapping What gene? Association Analysis What is the effect of the gene? Characterization
  149. 149. Gene-Mapping Where in the genome (1980s - 2005)? - Linkage Where in the genome (2006 - )? - Association
  150. 150. Foreshadowing the Paradigm Shift c. 1996
  151. 151. Linkage and Complex Disease
  152. 152. Linkage of Complex Traits Dismal and controversial picture
  153. 153. The Power of Linkage vs Association
  154. 154. Relative Power* LINKAGE ASSOCIATION MAF Prevalence (NL) (NA) 0.05 0.01 67,219 2,278 0.05 0.20 207,635 2,448 0.20 0.01 8,067 659 0.20 0.20 22,385 700 MAF = Minor allele frequency NL = Number of affected sibling pairs NA = Number of case-control pairs Odds Ratio = 1.5 *Adapted from Roeder et al, Am J Hum Genet (2006)
  155. 155. Rare Disease - Rare Variant LINKAGE ASSOCIATION MAF Prevalence (NL) (NA) 0.05 0.01 67,219 2,278 0.05 0.20 207,635 2,448 0.20 0.01 8,067 659 0.20 0.20 22,385 700 MAF = Minor allele frequency NL = Number of affected sibling pairs NA = Number of case-control pairs Odds Ratio = 1.5 *Adapted from Roeder et al, Am J Hum Genet (2006)
  156. 156. Common Disease - Rare Variant LINKAGE ASSOCIATION MAF Prevalence (NL) (NA) 0.05 0.01 67,219 2,278 0.05 0.20 207,635 2,448 0.20 0.01 8,067 659 0.20 0.20 22,385 700 MAF = Minor allele frequency NL = Number of affected sibling pairs NA = Number of case-control pairs Odds Ratio = 1.5 *Adapted from Roeder et al, Am J Hum Genet (2006)
  157. 157. Common Variant - Rare Disease LINKAGE ASSOCIATION MAF Prevalence (NL) (NA) 0.05 0.01 67,219 2,278 0.05 0.20 207,635 2,448 0.20 0.01 8,067 659 0.20 0.20 22,385 700 MAF = Minor allele frequency NL = Number of affected sibling pairs NA = Number of case-control pairs Odds Ratio = 1.5 *Adapted from Roeder et al, Am J Hum Genet (2006)
  158. 158. Common Disease - Common Variant LINKAGE ASSOCIATION MAF Prevalence (NL) (NA) 0.05 0.01 67,219 2,278 0.05 0.20 207,635 2,448 0.20 0.01 8,067 659 0.20 0.20 22,385 700 MAF = Minor allele frequency NL = Number of affected sibling pairs NA = Number of case-control pairs Odds Ratio = 1.5 *Adapted from Roeder et al, Am J Hum Genet (2006)
  159. 159. Why Now?
  160. 160. The “-omics” Age c. 1996 c. 2007 -Pre-genomic era -Post-genomic era -100’s of Markers -100,000’s of markers - STRs - SNPs
  161. 161. c. 2007
  162. 162. Available Technology Platforms available (or coming soon) - 1 SNP - Hundreds of SNPs - Thousands of SNPs - Hundreds of thousands of SNPs - Millions of SNPs Flexibility for Association - Single Marker - Candidate Gene - Whole-Genome
  163. 163. Examples from the Literature Whole Genome Association
  164. 164. What if we discover that genes have nothing to do with complex phenotypes?
  165. 165. What if we discover that genes have nothing to do with complex phenotypes? Good News: We may not have to cross that bridge
  166. 166. Replicated Associations Type II Diabetes BMI / Obesity Crohn’s Disease Age-Related Macular Degeneration (AMD) Prostate Cancer Breast Cancer Heart Disease
  167. 167. Framingham Heart Study and BMI
  168. 168. Framingham Heart Study and BMI The SNP is close (in LD) with INSIG2 - A plausible candidate for obesity - Responds to insulin - Involved in trigylceride synthesis
  169. 169. Framingham Heart Study and BMI
  170. 170. Framingham Heart Study and BMI Replicated in 4 out 5 studies - Childhood sample - African American Sample - Europe and North America
  171. 171. Hot off the press…
  172. 172. Hot off the press…
  173. 173. In Summary… WGA is starting off successful - More replicated associations in one year…
  174. 174. Statistical Genetics The Challenges We Face
  175. 175. Analytic Challenges
  176. 176. Wealth of Information Whole Genome Association using SNPs - Potentially use all of the data - Covariates, interactions, effect size, etc. - Statistical issues abound…
  177. 177. Multiple Comparisons
  178. 178. Multiple Comparisons The 500K People Chip
  179. 179. Multiple Comparisons Which SNPs are “real”? - 500K Chip - 25,000 SNPs with p < 0.05 Multiple Phenotypes - 10 Phenotypes, 500K chip - 5,000,000 comparisons!!!!
  180. 180. The P-Value Epidemic
  181. 181. “My name is Matt McQueen and I have a P-value problem” The smallest p-values - Most addictive - We’ve been trained to focus on them - What do they mean? - Truth - Chance - Bias
  182. 182. Replicated Associations… Scott et al (2007) Science
  183. 183. The Phenotype Question
  184. 184. What is a phenotype? Depends on who you ask…
  185. 185. What is a phenotype? If we asked a gene… 5% Trait 1 55% Trait 2 GENE 4% Trait 3 1% Trait 4 20% Trait 5 15% Trait 6
  186. 186. What is a phenotype? If we asked an environmental factor… Trait 1 10% Trait 2 10% Trait 3 30% ENV Trait 4 5% Trait 5 5% Trait 6 40%
  187. 187. What is a phenotype? 5% Trait 1 10% 55% Trait 2 10% GENE 4% Trait 3 30% ENV 1% Trait 4 5% 20% Trait 5 5% 15% Trait 6 40%
  188. 188. The Genotype Question
  189. 189. What is a Genotype? We test SNPs for association because we can What about epigenetic factors? - Methylation - Copy Number Variation
  190. 190. The Not-So Distant Future
  191. 191. The $1000 Genome NHGRI RFA Number - RFA-HG-06-020 Title - “The $1000 Genome” Goal - Develop technology to enable investigators to sequence an entire human genome for $1000 within 10 years
  192. 192. June 2017 Biodemography Short Course Complete Genome Sequence Analysis
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×