guenomu software -- model and agorithm in 2013

359 views

Published on

This is a progress report presented to the Phylogenomics Group at UVigo in May 2013, about the current status of the software guenomu and the Bayesian model implemented.

At that time I was experimenting with a mixture model, that has been since then abandoned, and the Hdist that is still experimental. The presentation also describes the exhange algorithm to solve doubly-intractable distributions, the generalized Multiple-Try Metropolis, and the parallel PRNG used to minimize communication between jobs.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
359
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

guenomu software -- model and agorithm in 2013

  1. 1. guenomu Software and Model Leonardo de O. Martins University of Vigo May, 16th 2013 Leo Martins (U Vigo) guenomu software 2013/5/16 1 / 15
  2. 2. Outline 1 The Model 2 The Sampling 3 The Code Leo Martins (U Vigo) guenomu software 2013/5/16 2 / 15
  3. 3. Hierarchical Bayesian model P(S, Θ | D) ∝ P(θ0)P(λ0)P(α0)P(S) × × N i=1 P(Di | Gi , θi )P(θi | θ0)P(Gi | λi , wi , S)P(λi | λ0)P(wi | αi )P(αi | α0) Leo Martins (U Vigo) guenomu software 2013/5/16 3 / 15
  4. 4. The mixture of distance distributions P(G | λ, w, S) = w1e−(dDUPS (G,S)/λDUPS +dLOSS (G,S)/λLOSS ) + w2e−(dILS (G,S)/λILS ) + w3e−(dRF (G,S)/λRF ) Z(λ, w, S) Leo Martins (U Vigo) guenomu software 2013/5/16 4 / 15
  5. 5. The mixture of distance distributions P(G | λ, w, S) = w1e−(dDUPS (G,S)/λDUPS +dLOSS (G,S)/λLOSS ) + w2e−(dILS (G,S)/λILS ) + w3e−(dRF (G,S)/λRF ) Z(λ, w, S) wi ∼ Gamma(αgene , 1) Leo Martins (U Vigo) guenomu software 2013/5/16 4 / 15
  6. 6. The mixture of distance distributions P(G | λ, w, S) = w1e−(dDUPS (G,S)/λDUPS +dLOSS (G,S)/λLOSS ) + w2e−(dILS (G,S)/λILS ) + w3e−(dRF (G,S)/λRF ) Z(λ, w, S) wi ∼ Gamma(αgene , 1) λx ∼ Exp(Λx ) Leo Martins (U Vigo) guenomu software 2013/5/16 4 / 15
  7. 7. The mixture of distance distributions P(G | λ, w, S) = w1e−(dDUPS (G,S)/λDUPS +dLOSS (G,S)/λLOSS ) + w2e−(dILS (G,S)/λILS ) + w3e−(dRF (G,S)/λRF ) Z(λ, w, S) wi ∼ Gamma(αgene , 1) λx ∼ Exp(Λx ) each gene has its own set of wi and λi Leo Martins (U Vigo) guenomu software 2013/5/16 4 / 15
  8. 8. The mixture of distance distributions P(G | λ, w, S) = w1e−(dDUPS (G,S)/λDUPS +dLOSS (G,S)/λLOSS ) + w2e−(dILS (G,S)/λILS ) + w3e−(dRF (G,S)/λRF ) Z(λ, w, S) wi ∼ Gamma(αgene , 1) λx ∼ Exp(Λx ) each gene has its own set of wi and λi the distances dx (G, S) are scaled to account for different gene family sizes Leo Martins (U Vigo) guenomu software 2013/5/16 4 / 15
  9. 9. Outline 1 The Model 2 The Sampling 3 The Code Leo Martins (U Vigo) guenomu software 2013/5/16 5 / 15
  10. 10. Doubly-intractable distributions π(y | θ) = qθ(y) Z(θ) = eθt s(y) Z(θ) ; Z(θ) = y eθt s(y) (1) augmented distribution: π(θ , y , θ | y) ∝ π(y | θ)π(θ)h(θ | θ)π(y | θ ) Leo Martins (U Vigo) guenomu software 2013/5/16 6 / 15
  11. 11. Doubly-intractable distributions π(y | θ) = qθ(y) Z(θ) = eθt s(y) Z(θ) ; Z(θ) = y eθt s(y) (1) augmented distribution: π(θ , y , θ | y) ∝ π(y | θ)π(θ)h(θ | θ)π(y | θ ) Gibbs update of the auxiliary variables θ ,y : Leo Martins (U Vigo) guenomu software 2013/5/16 6 / 15
  12. 12. Doubly-intractable distributions π(y | θ) = qθ(y) Z(θ) = eθt s(y) Z(θ) ; Z(θ) = y eθt s(y) (1) augmented distribution: π(θ , y , θ | y) ∝ π(y | θ)π(θ)h(θ | θ)π(y | θ ) Gibbs update of the auxiliary variables θ ,y : I. draw θ ∼ h(· | θ) Leo Martins (U Vigo) guenomu software 2013/5/16 6 / 15
  13. 13. Doubly-intractable distributions π(y | θ) = qθ(y) Z(θ) = eθt s(y) Z(θ) ; Z(θ) = y eθt s(y) (1) augmented distribution: π(θ , y , θ | y) ∝ π(y | θ)π(θ)h(θ | θ)π(y | θ ) Gibbs update of the auxiliary variables θ ,y : I. draw θ ∼ h(· | θ) II. draw y ∼ π(· | θ ) Leo Martins (U Vigo) guenomu software 2013/5/16 6 / 15
  14. 14. Doubly-intractable distributions π(y | θ) = qθ(y) Z(θ) = eθt s(y) Z(θ) ; Z(θ) = y eθt s(y) (1) augmented distribution: π(θ , y , θ | y) ∝ π(y | θ)π(θ)h(θ | θ)π(y | θ ) Gibbs update of the auxiliary variables θ ,y : I. draw θ ∼ h(· | θ) II. draw y ∼ π(· | θ ) exchange ratio from θ to θ min 1, qθ(y )π(θ )h(θ | θ )qθ (y) qθ(y)π(θ)h(θ | θ)qθ (y ) (2) Leo Martins (U Vigo) guenomu software 2013/5/16 6 / 15
  15. 15. Doubly-intractable distributions π(y | θ) = qθ(y) Z(θ) = eθt s(y) Z(θ) ; Z(θ) = y eθt s(y) (1) augmented distribution: π(θ , y , θ | y) ∝ π(y | θ)π(θ)h(θ | θ)π(y | θ ) Gibbs update of the auxiliary variables θ ,y : I. draw θ ∼ h(· | θ) II. draw y ∼ π(· | θ ) exchange ratio from θ to θ min 1, qθ(y )π(θ )h(θ | θ )qθ (y) qθ(y)π(θ)h(θ | θ)qθ (y ) (2) We draw y (the gene tree) through a secondary MCMC starting at its current value Leo Martins (U Vigo) guenomu software 2013/5/16 6 / 15
  16. 16. Species tree proposal with the exchange algorithm Leo Martins (U Vigo) guenomu software 2013/5/16 7 / 15
  17. 17. Species tree proposal with the exchange algorithm Leo Martins (U Vigo) guenomu software 2013/5/16 7 / 15
  18. 18. Species tree proposal with the exchange algorithm Leo Martins (U Vigo) guenomu software 2013/5/16 7 / 15
  19. 19. Species tree proposal with the exchange algorithm Leo Martins (U Vigo) guenomu software 2013/5/16 7 / 15
  20. 20. Generalized Multiple-Try Metropolis MH: sample y, decide if accept it with probability r r = π(y) π(x) q(y, x) q(x, y) = π(y) π(x) p(x | y) p(y | x) Leo Martins (U Vigo) guenomu software 2013/5/16 8 / 15
  21. 21. Generalized Multiple-Try Metropolis MH: sample y, decide if accept it with probability r r = π(y) π(x) q(y, x) q(x, y) = π(y) π(x) p(x | y) p(y | x) MTM: choose y among several samples, according to their relative weights r = w(y1, x) + · · · + w(yk , x) w(x∗ 1 , y) + · · · + w(x∗ k , y) where w(x, y) = π(x)q(x, y)λ(x, y) = π(x)p(y | x)λ(x, y) Leo Martins (U Vigo) guenomu software 2013/5/16 8 / 15
  22. 22. Generalized Multiple-Try Metropolis MH: sample y, decide if accept it with probability r r = π(y) π(x) q(y, x) q(x, y) = π(y) π(x) p(x | y) p(y | x) MTM: choose y among several samples, according to their relative weights r = w(y1, x) + · · · + w(yk , x) w(x∗ 1 , y) + · · · + w(x∗ k , y) where w(x, y) = π(x)q(x, y)λ(x, y) = π(x)p(y | x)λ(x, y) GMTM: weights w(.) do not need to represent probability distributions. r = π(y)pk (x | y) π(x)pk (y | x) Wx Wy where Wy = wi (yi ,x) k j=1 wj (yj ,x) for the chosen element i Leo Martins (U Vigo) guenomu software 2013/5/16 8 / 15
  23. 23. gene tree proposal with GMTM or MTM Leo Martins (U Vigo) guenomu software 2013/5/16 9 / 15
  24. 24. gene tree proposal with GMTM or MTM Leo Martins (U Vigo) guenomu software 2013/5/16 9 / 15
  25. 25. gene tree proposal with GMTM or MTM Leo Martins (U Vigo) guenomu software 2013/5/16 9 / 15
  26. 26. Outline 1 The Model 2 The Sampling 3 The Code Leo Martins (U Vigo) guenomu software 2013/5/16 10 / 15
  27. 27. RF distance, Assignment cost (Hdist) Leo Martins (U Vigo) guenomu software 2013/5/16 11 / 15
  28. 28. RF distance, Assignment cost (Hdist) Leo Martins (U Vigo) guenomu software 2013/5/16 11 / 15
  29. 29. A parallel pseudo-random number generator (PRNG) Given a seed and an algorithm, we have a stream of PRNs. PRNG1 PRNG2 PRNG2 PRNG2 PRNG2 x1 seed x2 x3 x4 x11 x12 Leo Martins (U Vigo) guenomu software 2013/5/16 12 / 15
  30. 30. A parallel pseudo-random number generator (PRNG) Given a seed and an algorithm, we have a stream of PRNs. PRNG1 PRNG2 PRNG2 PRNG2 PRNG2 x1 seed x2 x3 x4 x11 x12 Using a second algorithm, the first stream will give us a sequence of seeds. We use the 150 parameter sets for the Tausworthe (LFSR) generators (L’ecuyer, Maths Comput 1999, pp.261). Therefore, given the seed, we can predict all states of all streams. Leo Martins (U Vigo) guenomu software 2013/5/16 12 / 15
  31. 31. A parallel pseudo-random number generator (PRNG) In our gene/species model: PRNG1 PRNG2 PRNG2 PRNG2 PRNG2 x1 seed x2 x3 x4 x11 x12 we split gene families among jobs Leo Martins (U Vigo) guenomu software 2013/5/16 13 / 15
  32. 32. A parallel pseudo-random number generator (PRNG) In our gene/species model: PRNG1 PRNG2 PRNG2 PRNG2 PRNG2 x1 seed x2 x3 x4 x11 x12 we split gene families among jobs all jobs receive seed (broadcast) and therefore can reproduce the same x1. That’s cheaper than communicating the states. Leo Martins (U Vigo) guenomu software 2013/5/16 13 / 15
  33. 33. A parallel pseudo-random number generator (PRNG) In our gene/species model: PRNG1 PRNG2 PRNG2 PRNG2 PRNG2 x1 seed x2 x3 x4 x11 x12 we split gene families among jobs all jobs receive seed (broadcast) and therefore can reproduce the same x1. That’s cheaper than communicating the states. each job uses its own x(i+1) for sampling new gene trees etc. and can work in parallel. They use the common x1 for sampling e.g. new species tree, which needs synchronization. Leo Martins (U Vigo) guenomu software 2013/5/16 13 / 15
  34. 34. A parallel pseudo-random number generator (PRNG) In our gene/species model: PRNG1 PRNG2 PRNG2 PRNG2 PRNG2 x1 seed x2 x3 x4 x11 x12 we split gene families among jobs all jobs receive seed (broadcast) and therefore can reproduce the same x1. That’s cheaper than communicating the states. each job uses its own x(i+1) for sampling new gene trees etc. and can work in parallel. They use the common x1 for sampling e.g. new species tree, which needs synchronization. the only thing that must be shared is thus the proposal values (AllReduce) when updating ”global” parameters”, so that all jobs can make the same acceptance/rejection decision. Leo Martins (U Vigo) guenomu software 2013/5/16 13 / 15
  35. 35. Each job looks like an independent analysis Leo Martins (U Vigo) guenomu software 2013/5/16 14 / 15
  36. 36. https://bitbucket.org/leomrtns/guenomu Leo Martins (U Vigo) guenomu software 2013/5/16 15 / 15

×