Epidemiologisk FredagsmøDe 15 2 2008

215
-1

Published on

Published in: Health & Medicine
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
215
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Epidemiologisk FredagsmøDe 15 2 2008

  1. 1. Association Mapping Through local genealogies Thomas Mailund Bioinformatics Research Center http://www.birc.au.dk/
  2. 2. Gunshot wounds Car accidents Smoking induced lung cancer “Genetic” Diseases Cardiovascular disease Obesity Diabetes 2 Alzheimer Schizophrenia BRCA1 breast cancer Cystic fibrosis Haemophilia
  3. 3. Disease Mapping... Locate disease-affecting polymorphism Cases (affected) --A--------C--------A----G---X----T---C---A---- --T--------G--------A----G---X----C---C---A---- --A--------G--------G----G---X----C---C---A---- --A--------C--------A----G---X----T---C---A---- --T--------C--------A----G---X----T---C---A---- --T--------C--------A----T---X----T---A---A---- --A--------C--------A----G---X----T---C---A---- --A--------C--------A----G---X----T---C---A---- --A--------C--------A----G---X----T---C---G---- --T--------C--------A----T---X----T---C---A---- --A--------C--------A----G---X----T---C---A---- --A--------C--------G----T---X----C---A---A---- --A--------C--------A----G---X----C---C---G---- Controls (unaffected)
  4. 4. Unrealistic Assumptions We only measure -A-- -C- -A-- “unphased” data --T-- --G- -G-- -C- -A-- -A-- --A-- --C- -G-- -T- -C-
  5. 5. Unrealistic Assumptions We only measure -A-- -C- -A-- “unphased” data --T-- --G- -G-- -C- -A-- -A-- --A-- --C- -G-- -T- -C- We first need to infer the phase --T--------G--------A----G--------C---C---A---- --A--------C--------A----G--------T---C---A----
  6. 6. Unrealistic Assumptions We only measure -A-- -C- -A-- “unphased” data --T-- --G- -G-- -C- -A-- -A-- --A-- --C- -G-- -T- -C- We first need to infer the phase --T--------G--------A----G--------C---C---A---- --A--------C--------A----G--------T---C---A---- --T--------G--------A----G--------T---C---A---- --A--------C--------A----G--------C---C---A----
  7. 7. Unrealistic Assumptions We only measure -A-- -C- -A-- “unphased” data --T-- --G- -G-- -C- -A-- -A-- --A-- --C- -G-- -T- -C- We first need to infer the phase --T--------G--------A----G--------C---C---A---- --A--------C--------A----G--------T---C---A---- --T--------G--------A----G--------T---C---A---- --A--------C--------A----G--------C---C---A---- --T--------C--------A----G--------T---C---A---- --A--------G--------A----G--------C---C---A----
  8. 8. Unrealistic Assumptions We only measure -A-- -C- -A-- “unphased” data --T-- --G- -G-- -C- -A-- -A-- --A-- --C- -G-- -T- -C- We first need to ? infer the phase --T--------G--------A----G--------C---C---A---- --A--------C--------A----G--------T---C---A---- --A--------G--------A----G--------C---C---A---- --T--------C--------A----G--------T---C---A---- --T--------C--------A----G--------T---C---A---- --A--------G--------A----G--------C---C---A----
  9. 9. Disease Mapping... Markers are locally correlated Cases (affected) --A--------C--------A----G---X----T---C---A---- --T--------G--------A----G---X----C---C---A---- --A--------G--------G----G---X----C---C---A---- --A--------C--------A----G---X----T---C---A---- --T--------C--------A----G---X----T---C---A---- --T--------C--------A----T---X----T---A---A---- --A--------C--------A----G---X----T---C---A---- --A--------C--------A----G---X----T---C---A---- --A--------C--------A----G---X----T---C---G---- --T--------C--------A----T---X----T---C---A---- --A--------C--------A----G---X----T---C---A---- --A--------C--------G----T---X----C---A---A---- --A--------C--------A----G---X----C---C---G---- Controls (unaffected)
  10. 10. Disease Mapping... Search for indirect signals Cases (affected) --A--------C--------A----G---X----T---C---A---- --T--------G--------A----G---X----C---C---A---- --A--------G--------G----G---X----C---C---A---- --A--------C--------A----G---X----T---C---A---- --T--------C--------A----G---X----T---C---A---- --T--------C--------A----T---X----T---A---A---- --A--------C--------A----G---X----T---C---A---- --A--------C--------A----G---X----T---C---A---- --A--------C--------A----G---X----T---C---G---- --T--------C--------A----T---X----T---C---A---- --A--------C--------A----G---X----T---C---A---- --A--------C--------G----T---X----C---A---A---- --A--------C--------A----G---X----C---C---G---- Controls (unaffected)
  11. 11. Marker Relatedness Linkage disequilibrium (LD) Empirical Results Theoretical Results LD (r2) Recombination rate Clark et al. 2003, AJHG 73:285-300. Hein et al. 2005
  12. 12. Indirect Association “Tag” markers Unobserved marker Cases (affected) --A--------C--------A----G---X----T---C---A---- --T--------G--------A----G---X----C---C---A---- --A--------G--------G----G---X----C---C---A---- --A--------C--------A----G---X----T---C---A---- --T--------C--------A----G---X----T---C---A---- --T--------C--------A----T---X----T---A---A---- --A--------C--------A----G---X----T---C---A---- --A--------C--------A----G---X----T---C---A---- --A--------C--------A----G---X----T---C---G---- --T--------C--------A----T---X----T---C---A---- --A--------C--------A----G---X----T---C---A---- --A--------C--------G----T---X----C---A---A---- --A--------C--------A----G---X----C---C---G---- Controls (unaffected)
  13. 13. Indirect Association Cases (affected) --A--------C--------A----G---X----T---C---A---- --T--------G--------A----G---X----C---C---A---- --A--------G--------G----G---X----C---C---A---- --A--------C--------A----G---X----T---C---A---- --T--------C--------A----G---X----T---C---A---- --T--------C--------A----T---X----T---A---A---- --A--------C--------A----G---X----T---C---A---- --A--------C--------A----G---X----T---C---A---- --A--------C--------A----G---X----T---C---G---- --T--------C--------A----T---X----T---C---A---- --A--------C--------A----G---X----T---C---A---- --A--------C--------G----T---X----C---A---A---- --A--------C--------A----G---X----C---C---G---- Controls (unaffected)
  14. 14. Indirect Association Cases (affected) --A--------C--------A----G---X----T---C---A---- --T--------G--------A----G---X----C---C---A---- --A--------G--------G----G---X----C---C---A---- --A--------C--------A----G---X----T---C---A---- --T--------C--------A----G---X----T---C---A---- --T--------C--------A----T---X----T---A---A---- --A--------C--------A----G---X----T---C---A---- --A--------C--------A----G---X----T---C---A---- --A--------C--------A----G---X----T---C---G---- --T--------C--------A----T---X----T---C---A---- --A--------C--------A----G---X----T---C---A---- --A--------C--------G----T---X----C---A---A---- --A--------C--------A----G---X----C---C---G---- Controls (unaffected)
  15. 15. Indirect Association Cases (affected) --A--------C--------A----G---X----T---C---A---- --T--------G--------A----G---X----C---C---A---- --A--------G--------G----G---X----C---C---A---- --A--------C--------A----G---X----T---C---A---- --T--------C--------A----G---X----T---C---A---- --T--------C--------A----T---X----T---A---A---- --A--------C--------A----G---X----T---C---A---- --A--------C--------A----G---X----T---C---A---- --A--------C--------A----G---X----T---C---G---- --T--------C--------A----T---X----T---C---A---- --A--------C--------A----G---X----T---C---A---- --A--------C--------G----T---X----C---A---A---- --A--------C--------A----G---X----C---C---G---- Controls (unaffected)
  16. 16. Indirect Association Cases (affected) --A--------C--------A----G---X----T---C---A---- --T--------G--------A----G---X----C---C---A---- --A--------G--------G----G---X----C---C---A---- --A--------C--------A----G---X----T---C---A---- --T--------C--------A----G---X----T---C---A---- --T--------C--------A----T---X----T---A---A---- --A--------C--------A----G---X----T---C---A---- --A--------C--------A----G---X----T---C---A---- --A--------C--------A----G---X----T---C---G---- --T--------C--------A----T---X----T---C---A---- --A--------C--------A----G---X----T---C---A---- --A--------C--------G----T---X----C---A---A---- --A--------C--------A----G---X----C---C---G---- Controls (unaffected)
  17. 17. Indirect Multi-Marker Association Cases (affected) --A--------C--------A----G---X----T---C---A---- --T--------G--------A----G---X----C---C---A---- --A--------G--------G----G---X----C---C---A---- --A--------C--------A----G---X----T---C---A---- --T--------C--------A----G---X----T---C---A---- --T--------C--------A----T---X----T---A---A---- --A--------C--------A----G---X----T---C---A---- --A--------C--------A----G---X----T---C---A---- --A--------C--------A----G---X----T---C---G---- --T--------C--------A----T---X----T---C---A---- --A--------C--------A----G---X----T---C---A---- --A--------C--------G----T---X----C---A---A---- --A--------C--------A----G---X----C---C---G---- Controls (unaffected)
  18. 18. The Ancestral Recombination Graph Hudson 1990, Griffith and Marjoram 1996
  19. 19. The Coalescent Process
  20. 20. The Coalescent Process
  21. 21. The Coalescent Process
  22. 22. The Coalescent Process
  23. 23. The Coalescent Process
  24. 24. The Coalescent Process
  25. 25. The Coalescent Process
  26. 26. The Coalescent Process
  27. 27. The Coalescent Process
  28. 28. The Coalescent Process
  29. 29. The Coalescent Process
  30. 30. The Coalescent Process
  31. 31. The Coalescent Process
  32. 32. The Coalescent Process
  33. 33. The Coalescent Process
  34. 34. The Coalescent Process
  35. 35. The Coalescent Process
  36. 36. The Coalescent Process
  37. 37. The Coalescent Process
  38. 38. The Coalescent Process
  39. 39. The Coalescent Process
  40. 40. The Coalescent Process
  41. 41. The Coalescent Process
  42. 42. The Coalescent Process
  43. 43. The Coalescent Process
  44. 44. The Coalescent Process
  45. 45. The Coalescent Process
  46. 46. The Coalescent Process
  47. 47. The Coalescent Process
  48. 48. The Coalescent Process
  49. 49. The Coalescent Process
  50. 50. The Coalescent Process
  51. 51. The Coalescent Process
  52. 52. The Coalescent Process
  53. 53. A Reasonable Local Model Copyright Ó 2007 by the Genetics Society of America DOI: 10.1534/genetics.107.071126 On Recombination-Induced Multiple and Simultaneous Coalescent Events Joanna L. Davies,1 Frantisek Simanc´k, Rune Lyngsø, Thomas Mailund and Jotun Hein ˇ ˇı Department of Statistics, University of Oxford, Oxford, OX1 3TG, United Kingdom Manuscript received January 18, 2007 Accepted for publication October 2, 2007 ABSTRACT Coalescent theory deals with the dynamics of how sampled genetic material has spread through a population from a single ancestor over many generations and is ubiquitous in contemporary molecular population genetics. Inherent in most applications is a continuous-time approximation that is derived under the assumption that sample size is small relative to the actual population size. In effect, this precludes multiple and simultaneous coalescent events that take place in the history of large samples. If sequences do not recombine, the number of sequences ancestral to a large sample is reduced sufficiently after relatively few generations such that use of the continuous-time approximation is justified. However, in tracing the history of large chromosomal segments, a large recombination rate per generation will consistently maintain a large number of ancestors. This can create a major disparity between discrete-time and continuous-time models and we analyze its importance, illustrated with model parameters typical of the human genome. The presence of gene conversion exacerbates the disparity and could seriously undermine applications of coalescent theory to complete genomes. However, we show that multiple and simultaneous coalescent events influence global quantities, such as total number of ancestors, but have negligible effect on local quantities, such as linkage disequilibrium. Reassuringly, most applications of the coalescent model with recombination (including association mapping) focus on local quantities. K INGMAN (1982) models the ancestry of a sample of sequences with a continuous-time Markov pro- cess referred to as the Kingman coalescent. Lineages ulation size, the probability of such events occurring becomes nonnegligible and consequently in these instances the rate of coalescence is underestimated collide or coalesce after random exponential waiting by Hudson’s continuous-time model. Hudson’s model
  54. 54. A Reasonable Local Model • The “back in time” approach (in general) means we ignore selection • Implicit assumption that the disease is selectively neutral • Which may or may not be reasonable... • Might be okay for late onset diseases...
  55. 55. The ARG as a Statistical Model P( )
  56. 56. The ARG as a Statistical Model P( | )
  57. 57. The ARG as a Statistical Model P( | )
  58. 58. The ARG as a Statistical Model P( | )
  59. 59. The ARG as a Statistical Model P( | , )P( |)
  60. 60. The ARG as a Statistical Model lhd( )= P( | )= ∫P( | , )P( | )d
  61. 61. The ARG as a Statistical Model lhd( )= ∫P( | , )P( | )d Integration by magic
  62. 62. The ARG as a Statistical Model lhd( )= ∫P( | , )P( | )d Integration by magic statistical sampling
  63. 63. ARG Methods • Sampling ARGs from the coalescence process • Sampling ARGs conditional on the data (importance sampling) • Sampling parsimonious ARGs conditional on the data
  64. 64. ARG Methods • Sampling ARGs from the coalescence process • This is a no go -- you would never sample an ARG that can explain the data • Sampling ARGs conditional on the data (importance sampling) • Sampling parsimonious ARGs conditional on the data
  65. 65. ARG Methods • Sampling ARGs from the coalescence process • Sampling ARGs conditional on the data (importance sampling) • Larribe, Lessard and Schork 2002 -- scales to tens of individuals and tens of markers • Sampling parsimonious ARGs conditional on the data
  66. 66. ARG Methods • Sampling parsimonious ARGs conditional on the data • Lyngsø, Song & Hein 2005 (calculates parsimonious ARGs -- a 2008 paper in press for sampling) • Minichiello & Durbin 2006 (samples parsimonious ARGs and scores local genealogies) • Both preferentially selects mutations and coalescence events over recombinations • Scales to thousands of individuals and hundreds of markers
  67. 67. Local Phylogenies For each “point” on the chromosome, the ARG determines a (local) tree:
  68. 68. Local Phylogenies For each “point” on the chromosome, the ARG determines a (local) tree:
  69. 69. Local Phylogenies For each “point” on the chromosome, the ARG determines a (local) tree:
  70. 70. Local Phylogenies For each “point” on the chromosome, the ARG determines a (local) tree:
  71. 71. Changing Phylogenies Type 1: No change Type 2: Change in branch lengths Type 3: Change in topology From Hein et al. 2005
  72. 72. Trees and LD Tree similarity LD r2 Recombination rate Recombination rate
  73. 73. Can we use just the trees?
  74. 74. Clustering on a Tree Disease affecting mutation
  75. 75. Clustering on a Tree Complete penetrance Incomplete penetrance Spurious disease
  76. 76. Clustering on a Tree 25% Case/control clustering is not random on the tree... 75% 40% 60%
  77. 77. Sampling Trees (with recombination) Zöllner & Pritchard 2005
  78. 78. Sampling Trees (with recombination) Zöllner & Pritchard 2005
  79. 79. Sampling Trees (with recombination) Zöllner & Pritchard 2005
  80. 80. Sampling Trees (with recombination) Zöllner & Pritchard 2005
  81. 81. Sampling Trees (with recombination) Zöllner & Pritchard 2005
  82. 82. Sampling Trees (with recombination) Zöllner & Pritchard 2005
  83. 83. Sampling Trees (with recombination) Zöllner & Pritchard 2005
  84. 84. Sampling Trees (with recombination) Zöllner & Pritchard 2005
  85. 85. Sampling Trees (with recombination) Zöllner & Pritchard 2005
  86. 86. Sampling Trees (with recombination) Zöllner & Pritchard 2005
  87. 87. Sampling Trees (with recombination) Zöllner & Pritchard 2005
  88. 88. Sampling Trees (with recombination) Zöllner & Pritchard 2005
  89. 89. Sampling Trees (with recombination) Zöllner & Pritchard 2005
  90. 90. Sampling Trees (with recombination) Zöllner & Pritchard 2005
  91. 91. Sampling Trees (with recombination) Zöllner & Pritchard 2005
  92. 92. Sampling Trees (with recombination) Zöllner & Pritchard 2005
  93. 93. Sampling Trees (with recombination) Zöllner & Pritchard 2005
  94. 94. Sampling Trees (with recombination) We only sample the process on the left -- much fewer events Zöllner & Pritchard 2005
  95. 95. Using “Perfect Phylogenies” Use the four-gamete test to find regions that can be explained by a tree with no recurrent mutations Mailund, Besenbacher & Schierup 2006
  96. 96. Using “Perfect Phylogenies” Build trees for each such region Mailund, Besenbacher & Schierup 2006
  97. 97. Using “Perfect Phylogenies” Each marker splits a sub-tree in two Mailund, Besenbacher & Schierup 2006
  98. 98. Using “Perfect Phylogenies” Each marker splits a sub-tree in two Mailund, Besenbacher & Schierup 2006
  99. 99. Using “Perfect Phylogenies” Each marker splits a sub-tree in two Mailund, Besenbacher & Schierup 2006
  100. 100. Using “Perfect Phylogenies” Much faster (and much cruder) Catches the essential tree structure Mailund, Besenbacher & Schierup 2006
  101. 101. Scoring the Clustering Red=cases Green=controls Are the case chromosomes significantly over-represented in some clusters?
  102. 102. Wild-types Mutation Mutants We can place “mutations” on the tree edges and partition chromosomes into “mutants” and “wild-types” and test for different distributions of cases and controls
  103. 103. Wild-types Mutation Mutants Use average or maximum to score the tree Average is kosher Bayesian stats; maximum needs to be corrected for over-fitting.
  104. 104. Blossoc (BLOck aSSOCiation) Homepage: www.birc.au.dk/~mailund/Blossoc Command line and graphical user interface (with limited functionality)
  105. 105. Blossoc (BLOck aSSOCiation) Homepage: www.birc.au.dk/~mailund/Blossoc Fast enough to analyse tens of thousands of individuals in hundred of thousands of markers in a day or two on a desktop computer...
  106. 106. Localisation Accuracy A single causal mutation Max BF / min p-value used as point estimate
  107. 107. Localisation Accuracy Two causal mutations Max BF / min p-value used as point estimate
  108. 108. Thank you! More information at http://www.birc.au.dk/~mailund/association-mapping/
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×