• Like
guenomu software -- model and agorithm in 2013
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

guenomu software -- model and agorithm in 2013

  • 55 views
Published

This is a progress report presented to the Phylogenomics Group at UVigo in May 2013, about the current status of the software guenomu and the Bayesian model implemented. …

This is a progress report presented to the Phylogenomics Group at UVigo in May 2013, about the current status of the software guenomu and the Bayesian model implemented.

At that time I was experimenting with a mixture model, that has been since then abandoned, and the Hdist that is still experimental. The presentation also describes the exhange algorithm to solve doubly-intractable distributions, the generalized Multiple-Try Metropolis, and the parallel PRNG used to minimize communication between jobs.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
55
On SlideShare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. guenomu Software and Model Leonardo de O. Martins University of Vigo May, 16th 2013 Leo Martins (U Vigo) guenomu software 2013/5/16 1 / 15
  • 2. Outline 1 The Model 2 The Sampling 3 The Code Leo Martins (U Vigo) guenomu software 2013/5/16 2 / 15
  • 3. Hierarchical Bayesian model P(S, Θ | D) ∝ P(θ0)P(λ0)P(α0)P(S) × × N i=1 P(Di | Gi , θi )P(θi | θ0)P(Gi | λi , wi , S)P(λi | λ0)P(wi | αi )P(αi | α0) Leo Martins (U Vigo) guenomu software 2013/5/16 3 / 15
  • 4. The mixture of distance distributions P(G | λ, w, S) = w1e−(dDUPS (G,S)/λDUPS +dLOSS (G,S)/λLOSS ) + w2e−(dILS (G,S)/λILS ) + w3e−(dRF (G,S)/λRF ) Z(λ, w, S) Leo Martins (U Vigo) guenomu software 2013/5/16 4 / 15
  • 5. The mixture of distance distributions P(G | λ, w, S) = w1e−(dDUPS (G,S)/λDUPS +dLOSS (G,S)/λLOSS ) + w2e−(dILS (G,S)/λILS ) + w3e−(dRF (G,S)/λRF ) Z(λ, w, S) wi ∼ Gamma(αgene , 1) Leo Martins (U Vigo) guenomu software 2013/5/16 4 / 15
  • 6. The mixture of distance distributions P(G | λ, w, S) = w1e−(dDUPS (G,S)/λDUPS +dLOSS (G,S)/λLOSS ) + w2e−(dILS (G,S)/λILS ) + w3e−(dRF (G,S)/λRF ) Z(λ, w, S) wi ∼ Gamma(αgene , 1) λx ∼ Exp(Λx ) Leo Martins (U Vigo) guenomu software 2013/5/16 4 / 15
  • 7. The mixture of distance distributions P(G | λ, w, S) = w1e−(dDUPS (G,S)/λDUPS +dLOSS (G,S)/λLOSS ) + w2e−(dILS (G,S)/λILS ) + w3e−(dRF (G,S)/λRF ) Z(λ, w, S) wi ∼ Gamma(αgene , 1) λx ∼ Exp(Λx ) each gene has its own set of wi and λi Leo Martins (U Vigo) guenomu software 2013/5/16 4 / 15
  • 8. The mixture of distance distributions P(G | λ, w, S) = w1e−(dDUPS (G,S)/λDUPS +dLOSS (G,S)/λLOSS ) + w2e−(dILS (G,S)/λILS ) + w3e−(dRF (G,S)/λRF ) Z(λ, w, S) wi ∼ Gamma(αgene , 1) λx ∼ Exp(Λx ) each gene has its own set of wi and λi the distances dx (G, S) are scaled to account for different gene family sizes Leo Martins (U Vigo) guenomu software 2013/5/16 4 / 15
  • 9. Outline 1 The Model 2 The Sampling 3 The Code Leo Martins (U Vigo) guenomu software 2013/5/16 5 / 15
  • 10. Doubly-intractable distributions π(y | θ) = qθ(y) Z(θ) = eθt s(y) Z(θ) ; Z(θ) = y eθt s(y) (1) augmented distribution: π(θ , y , θ | y) ∝ π(y | θ)π(θ)h(θ | θ)π(y | θ ) Leo Martins (U Vigo) guenomu software 2013/5/16 6 / 15
  • 11. Doubly-intractable distributions π(y | θ) = qθ(y) Z(θ) = eθt s(y) Z(θ) ; Z(θ) = y eθt s(y) (1) augmented distribution: π(θ , y , θ | y) ∝ π(y | θ)π(θ)h(θ | θ)π(y | θ ) Gibbs update of the auxiliary variables θ ,y : Leo Martins (U Vigo) guenomu software 2013/5/16 6 / 15
  • 12. Doubly-intractable distributions π(y | θ) = qθ(y) Z(θ) = eθt s(y) Z(θ) ; Z(θ) = y eθt s(y) (1) augmented distribution: π(θ , y , θ | y) ∝ π(y | θ)π(θ)h(θ | θ)π(y | θ ) Gibbs update of the auxiliary variables θ ,y : I. draw θ ∼ h(· | θ) Leo Martins (U Vigo) guenomu software 2013/5/16 6 / 15
  • 13. Doubly-intractable distributions π(y | θ) = qθ(y) Z(θ) = eθt s(y) Z(θ) ; Z(θ) = y eθt s(y) (1) augmented distribution: π(θ , y , θ | y) ∝ π(y | θ)π(θ)h(θ | θ)π(y | θ ) Gibbs update of the auxiliary variables θ ,y : I. draw θ ∼ h(· | θ) II. draw y ∼ π(· | θ ) Leo Martins (U Vigo) guenomu software 2013/5/16 6 / 15
  • 14. Doubly-intractable distributions π(y | θ) = qθ(y) Z(θ) = eθt s(y) Z(θ) ; Z(θ) = y eθt s(y) (1) augmented distribution: π(θ , y , θ | y) ∝ π(y | θ)π(θ)h(θ | θ)π(y | θ ) Gibbs update of the auxiliary variables θ ,y : I. draw θ ∼ h(· | θ) II. draw y ∼ π(· | θ ) exchange ratio from θ to θ min 1, qθ(y )π(θ )h(θ | θ )qθ (y) qθ(y)π(θ)h(θ | θ)qθ (y ) (2) Leo Martins (U Vigo) guenomu software 2013/5/16 6 / 15
  • 15. Doubly-intractable distributions π(y | θ) = qθ(y) Z(θ) = eθt s(y) Z(θ) ; Z(θ) = y eθt s(y) (1) augmented distribution: π(θ , y , θ | y) ∝ π(y | θ)π(θ)h(θ | θ)π(y | θ ) Gibbs update of the auxiliary variables θ ,y : I. draw θ ∼ h(· | θ) II. draw y ∼ π(· | θ ) exchange ratio from θ to θ min 1, qθ(y )π(θ )h(θ | θ )qθ (y) qθ(y)π(θ)h(θ | θ)qθ (y ) (2) We draw y (the gene tree) through a secondary MCMC starting at its current value Leo Martins (U Vigo) guenomu software 2013/5/16 6 / 15
  • 16. Species tree proposal with the exchange algorithm Leo Martins (U Vigo) guenomu software 2013/5/16 7 / 15
  • 17. Species tree proposal with the exchange algorithm Leo Martins (U Vigo) guenomu software 2013/5/16 7 / 15
  • 18. Species tree proposal with the exchange algorithm Leo Martins (U Vigo) guenomu software 2013/5/16 7 / 15
  • 19. Species tree proposal with the exchange algorithm Leo Martins (U Vigo) guenomu software 2013/5/16 7 / 15
  • 20. Generalized Multiple-Try Metropolis MH: sample y, decide if accept it with probability r r = π(y) π(x) q(y, x) q(x, y) = π(y) π(x) p(x | y) p(y | x) Leo Martins (U Vigo) guenomu software 2013/5/16 8 / 15
  • 21. Generalized Multiple-Try Metropolis MH: sample y, decide if accept it with probability r r = π(y) π(x) q(y, x) q(x, y) = π(y) π(x) p(x | y) p(y | x) MTM: choose y among several samples, according to their relative weights r = w(y1, x) + · · · + w(yk , x) w(x∗ 1 , y) + · · · + w(x∗ k , y) where w(x, y) = π(x)q(x, y)λ(x, y) = π(x)p(y | x)λ(x, y) Leo Martins (U Vigo) guenomu software 2013/5/16 8 / 15
  • 22. Generalized Multiple-Try Metropolis MH: sample y, decide if accept it with probability r r = π(y) π(x) q(y, x) q(x, y) = π(y) π(x) p(x | y) p(y | x) MTM: choose y among several samples, according to their relative weights r = w(y1, x) + · · · + w(yk , x) w(x∗ 1 , y) + · · · + w(x∗ k , y) where w(x, y) = π(x)q(x, y)λ(x, y) = π(x)p(y | x)λ(x, y) GMTM: weights w(.) do not need to represent probability distributions. r = π(y)pk (x | y) π(x)pk (y | x) Wx Wy where Wy = wi (yi ,x) k j=1 wj (yj ,x) for the chosen element i Leo Martins (U Vigo) guenomu software 2013/5/16 8 / 15
  • 23. gene tree proposal with GMTM or MTM Leo Martins (U Vigo) guenomu software 2013/5/16 9 / 15
  • 24. gene tree proposal with GMTM or MTM Leo Martins (U Vigo) guenomu software 2013/5/16 9 / 15
  • 25. gene tree proposal with GMTM or MTM Leo Martins (U Vigo) guenomu software 2013/5/16 9 / 15
  • 26. Outline 1 The Model 2 The Sampling 3 The Code Leo Martins (U Vigo) guenomu software 2013/5/16 10 / 15
  • 27. RF distance, Assignment cost (Hdist) Leo Martins (U Vigo) guenomu software 2013/5/16 11 / 15
  • 28. RF distance, Assignment cost (Hdist) Leo Martins (U Vigo) guenomu software 2013/5/16 11 / 15
  • 29. A parallel pseudo-random number generator (PRNG) Given a seed and an algorithm, we have a stream of PRNs. PRNG1 PRNG2 PRNG2 PRNG2 PRNG2 x1 seed x2 x3 x4 x11 x12 Leo Martins (U Vigo) guenomu software 2013/5/16 12 / 15
  • 30. A parallel pseudo-random number generator (PRNG) Given a seed and an algorithm, we have a stream of PRNs. PRNG1 PRNG2 PRNG2 PRNG2 PRNG2 x1 seed x2 x3 x4 x11 x12 Using a second algorithm, the first stream will give us a sequence of seeds. We use the 150 parameter sets for the Tausworthe (LFSR) generators (L’ecuyer, Maths Comput 1999, pp.261). Therefore, given the seed, we can predict all states of all streams. Leo Martins (U Vigo) guenomu software 2013/5/16 12 / 15
  • 31. A parallel pseudo-random number generator (PRNG) In our gene/species model: PRNG1 PRNG2 PRNG2 PRNG2 PRNG2 x1 seed x2 x3 x4 x11 x12 we split gene families among jobs Leo Martins (U Vigo) guenomu software 2013/5/16 13 / 15
  • 32. A parallel pseudo-random number generator (PRNG) In our gene/species model: PRNG1 PRNG2 PRNG2 PRNG2 PRNG2 x1 seed x2 x3 x4 x11 x12 we split gene families among jobs all jobs receive seed (broadcast) and therefore can reproduce the same x1. That’s cheaper than communicating the states. Leo Martins (U Vigo) guenomu software 2013/5/16 13 / 15
  • 33. A parallel pseudo-random number generator (PRNG) In our gene/species model: PRNG1 PRNG2 PRNG2 PRNG2 PRNG2 x1 seed x2 x3 x4 x11 x12 we split gene families among jobs all jobs receive seed (broadcast) and therefore can reproduce the same x1. That’s cheaper than communicating the states. each job uses its own x(i+1) for sampling new gene trees etc. and can work in parallel. They use the common x1 for sampling e.g. new species tree, which needs synchronization. Leo Martins (U Vigo) guenomu software 2013/5/16 13 / 15
  • 34. A parallel pseudo-random number generator (PRNG) In our gene/species model: PRNG1 PRNG2 PRNG2 PRNG2 PRNG2 x1 seed x2 x3 x4 x11 x12 we split gene families among jobs all jobs receive seed (broadcast) and therefore can reproduce the same x1. That’s cheaper than communicating the states. each job uses its own x(i+1) for sampling new gene trees etc. and can work in parallel. They use the common x1 for sampling e.g. new species tree, which needs synchronization. the only thing that must be shared is thus the proposal values (AllReduce) when updating ”global” parameters”, so that all jobs can make the same acceptance/rejection decision. Leo Martins (U Vigo) guenomu software 2013/5/16 13 / 15
  • 35. Each job looks like an independent analysis Leo Martins (U Vigo) guenomu software 2013/5/16 14 / 15
  • 36. https://bitbucket.org/leomrtns/guenomu Leo Martins (U Vigo) guenomu software 2013/5/16 15 / 15