Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Journal Club @ UVigo 2011.07.22

1,556 views

Published on

Discussion of article " Bayes Estimators for Phylogenetic Reconstruction", presented by Leo Martins to the Phylogenomics Lab of the University of Vigo

Syst. Biol. 60(4), 528 ­ 540, 2011 doi 10.1093/sysbio/syr021

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

Journal Club @ UVigo 2011.07.22

  1. 1. Journal Club – Bayes Estimators for Phylogenetic Reconstruction Syst. Biol. 60(4), 528 – 540, 2011 doi 10.1093/sysbio/syr021 Leonardo de O. Martins University of Vigo July 22, 2011Leo Martins (Univ. Vigo) Journal Club 22/7 1 / 12
  2. 2. Outline1 Distance as a penalty2 Distances, everywhere3 No phylogenetics, yet...4 Trees as points in space5 To the paper, then Leo Martins (Univ. Vigo) Journal Club 22/7 2 / 12
  3. 3. Statistical Risk ˆThe risk ρ associated with a decision θ is the expected loss of this decisionˆθ (which can be, for instance, an estimate of θ). Leo Martins (Univ. Vigo) Journal Club 22/7 3 / 12
  4. 4. Statistical Risk ˆThe risk ρ associated with a decision θ is the expected loss of this decisionˆθ (which can be, for instance, an estimate of θ). ˆ ρ(θ) = ˆ L(θ, θ) P(θ | data) dθ(promptly called posterior expected loss) Leo Martins (Univ. Vigo) Journal Club 22/7 3 / 12
  5. 5. Statistical Risk ˆThe risk ρ associated with a decision θ is the expected loss of this decisionˆθ (which can be, for instance, an estimate of θ). ˆ ρ(θ) = ˆ L(θ, θ) P(θ | data) dθ(promptly called posterior expected loss) ˆThe loss function L(θ, θ) is a penalty we give for ”deciding” away from theparameter. Examples are the squared loss and the absolute loss. Leo Martins (Univ. Vigo) Journal Club 22/7 3 / 12
  6. 6. Statistical Risk ˆThe risk ρ associated with a decision θ is the expected loss of this decisionˆθ (which can be, for instance, an estimate of θ). ˆ ρ(θ) = ˆ L(θ, θ) P(θ | data) dθ(promptly called posterior expected loss) ˆThe loss function L(θ, θ) is a penalty we give for ”deciding” away from theparameter. Examples are the squared loss and the absolute loss.For some loss functions, we can calculate what is the best decision (i.e.the one that minimizes the risk, for any data). Leo Martins (Univ. Vigo) Journal Club 22/7 3 / 12
  7. 7. Outline1 Distance as a penalty2 Distances, everywhere3 No phylogenetics, yet...4 Trees as points in space5 To the paper, then Leo Martins (Univ. Vigo) Journal Club 22/7 4 / 12
  8. 8. How to summarise a collection of objects? scattered points library ( MASS ) ; x <- mvrnorm ( n =1000 , mu = c (0 ,0) , Sigma = matrix ( c (1 , 0.8 , 0.9 , 1) , 2 , 2 , byrow = T ) ) ; plot ( x [ ,1] , x [ ,2] , pch = " . " , cex = 2 , xlab = " x " , ylab = " y " ) ; Leo Martins (Univ. Vigo) Journal Club 22/7 5 / 12
  9. 9. How to summarise a collection of objects? centroid: minimizes a distance to all points library ( MASS ) ; x <- mvrnorm ( n =1000 , mu = c (0 ,0) , Sigma = matrix ( c (1 , 0.8 , 0.9 , 1) , 2 , 2 , byrow = T ) ) ; plot ( x [ ,1] , x [ ,2] , pch = " . " , cex = 2 , xlab = " x " , ylab = " y " ) ; Leo Martins (Univ. Vigo) Journal Club 22/7 5 / 12
  10. 10. How to summarise a collection of objects? regression line: minimizes a distance to all points library ( MASS ) ; x <- mvrnorm ( n =1000 , mu = c (0 ,0) , Sigma = matrix ( c (1 , 0.8 , 0.9 , 1) , 2 , 2 , byrow = T ) ) ; plot ( x [ ,1] , x [ ,2] , pch = " . " , cex = 2 , xlab = " x " , ylab = " y " ) ; Leo Martins (Univ. Vigo) Journal Club 22/7 5 / 12
  11. 11. Outline1 Distance as a penalty2 Distances, everywhere3 No phylogenetics, yet...4 Trees as points in space5 To the paper, then Leo Martins (Univ. Vigo) Journal Club 22/7 6 / 12
  12. 12. How to summarise the posterior distribution P(X)? Leo Martins (Univ. Vigo) Journal Club 22/7 7 / 12
  13. 13. How to summarise the posterior distribution P(X)?Posterior meanMinimize the expected loss under a squared loss function ˆ ˆ L(θ, θ) = (θ − θ)2(Euclidean distance) Leo Martins (Univ. Vigo) Journal Club 22/7 7 / 12
  14. 14. How to summarise the posterior distribution P(X)?Posterior medianMinimize the expected loss under a linear loss function ˆ ˆ L(θ, θ) =| θ − θ |(Manhattan distance) Leo Martins (Univ. Vigo) Journal Club 22/7 7 / 12
  15. 15. How to summarise the posterior distribution P(X)?Posterior modea.k.a. Maximum A Posteriori (MAP) estimate.Minimize the expected loss under a delta loss function 0, ˆ for θ = θ ˆ L(θ, θ) = 1, ˆ for θ = θ Leo Martins (Univ. Vigo) Journal Club 22/7 7 / 12
  16. 16. Outline1 Distance as a penalty2 Distances, everywhere3 No phylogenetics, yet...4 Trees as points in space5 To the paper, then Leo Martins (Univ. Vigo) Journal Club 22/7 8 / 12
  17. 17. Distances between trees D D C E € € € € € € E C € € f f f f f f fˆˆ fˆˆ ¢ ˆˆ ˆB ¢ ˆˆ ˆB ¢ ¢ ¢ ¢ ¢ ¢ A ATrees from the article Leo Martins (Univ. Vigo) Journal Club 22/7 9 / 12
  18. 18. Distances between trees D D C E € € € € € € E C € € f f f f f f fˆˆ fˆˆ ¢ ˆˆ ˆB ¢ ˆˆ ˆB ¢ ¢ ¢ ¢ ¢ ¢ A ARF distance DE|ABC and CD|ABE total 2 branches Leo Martins (Univ. Vigo) Journal Club 22/7 9 / 12
  19. 19. Distances between trees D D C E € € € € € € E C € € f f f f f f fˆˆ fˆˆ ¢ ˆˆ ˆB ¢ ˆˆ ˆB ¢ ¢ ¢ ¢ ¢ ¢ A AQuartet distance AC|DE and AE|CD BC|DE and BE|CD 4 quartets are different Leo Martins (Univ. Vigo) Journal Club 22/7 9 / 12
  20. 20. Distances between trees D D C E € € € € € € E C € € f f f f f f fˆˆ fˆˆ ¢ ˆˆ ˆB ¢ ˆˆ ˆB ¢ ¢ ¢ ¢ ¢ ¢ A AQuartet distance AC|DE and AE|CD BC|DE and BE|CD 4 quartets are different Leo Martins (Univ. Vigo) Journal Club 22/7 9 / 12
  21. 21. Distances between trees D D C E € € € € € € E C € € f f f f f f fˆˆ fˆˆ ¢ ˆˆ ˆB ¢ ˆˆ ˆB ¢ ¢ ¢ ¢ ¢ ¢ A APath difference (number of speciations between trees) path from A to E is one edge longer in one tree than the other (...) the overall difference is 6 Leo Martins (Univ. Vigo) Journal Club 22/7 9 / 12
  22. 22. Outline1 Distance as a penalty2 Distances, everywhere3 No phylogenetics, yet...4 Trees as points in space5 To the paper, then Leo Martins (Univ. Vigo) Journal Club 22/7 10 / 12
  23. 23. If there is a distance, there is a Bayes estimatorFor points in Rn , we know that the mean minimizes the Euclideandistance, etc.For phylogenies: there are several Euclidean distancesBut some distances between trees also lead to “analytical” solutions: Leo Martins (Univ. Vigo) Journal Club 22/7 11 / 12
  24. 24. If there is a distance, there is a Bayes estimatorFor points in Rn , we know that the mean minimizes the Euclideandistance, etc.For phylogenies: there are several Euclidean distances the mean does not work since a tree has restrictionsBut some distances between trees also lead to “analytical” solutions: Leo Martins (Univ. Vigo) Journal Club 22/7 11 / 12
  25. 25. If there is a distance, there is a Bayes estimatorFor points in Rn , we know that the mean minimizes the Euclideandistance, etc.For phylogenies: there are several Euclidean distances the mean does not work since a tree has restrictionsBut some distances between trees also lead to “analytical” solutions: the consensus tree minimizes the Robinson-Foulds distance between the samples Leo Martins (Univ. Vigo) Journal Club 22/7 11 / 12
  26. 26. If there is a distance, there is a Bayes estimatorFor points in Rn , we know that the mean minimizes the Euclideandistance, etc.For phylogenies: there are several Euclidean distances the mean does not work since a tree has restrictionsBut some distances between trees also lead to “analytical” solutions: the consensus tree minimizes the Robinson-Foulds distance between the samples the quartet puzzling minimizes the quartet distance Leo Martins (Univ. Vigo) Journal Club 22/7 11 / 12
  27. 27. If there is a distance, there is a Bayes estimatorFor points in Rn , we know that the mean minimizes the Euclideandistance, etc.For phylogenies: there are several Euclidean distances the mean does not work since a tree has restrictionsBut some distances between trees also lead to “analytical” solutions: the consensus tree minimizes the Robinson-Foulds distance between the samples the quartet puzzling minimizes the quartet distance the Buneman tree minimizes (I think) the dissimilarity map distance Leo Martins (Univ. Vigo) Journal Club 22/7 11 / 12
  28. 28. If there is a distance, there is a Bayes estimatorFor points in Rn , we know that the mean minimizes the Euclideandistance, etc.For phylogenies: there are several Euclidean distances the mean does not work since a tree has restrictionsBut some distances between trees also lead to “analytical” solutions: the consensus tree minimizes the Robinson-Foulds distance between the samples the quartet puzzling minimizes the quartet distance the Buneman tree minimizes (I think) the dissimilarity map distance some of these are hard to solve as well Leo Martins (Univ. Vigo) Journal Club 22/7 11 / 12
  29. 29. How do they find, then, the Bayes estimates? like many other softwares: hill-climbing on the space of possible topologies Leo Martins (Univ. Vigo) Journal Club 22/7 12 / 12
  30. 30. How do they find, then, the Bayes estimates? like many other softwares: hill-climbing on the space of possible topologies their input data is the posterior distribution of trees from MrBayes Leo Martins (Univ. Vigo) Journal Club 22/7 12 / 12
  31. 31. How do they find, then, the Bayes estimates? like many other softwares: hill-climbing on the space of possible topologies their input data is the posterior distribution of trees from MrBayes starting tree can be NJ, MAP tree, ML... Leo Martins (Univ. Vigo) Journal Club 22/7 12 / 12
  32. 32. How do they find, then, the Bayes estimates? like many other softwares: hill-climbing on the space of possible topologies their input data is the posterior distribution of trees from MrBayes starting tree can be NJ, MAP tree, ML... apply branch-swap (NNI) to current optimal tree, then verify distance to all samples Leo Martins (Univ. Vigo) Journal Club 22/7 12 / 12
  33. 33. How do they find, then, the Bayes estimates? like many other softwares: hill-climbing on the space of possible topologies their input data is the posterior distribution of trees from MrBayes starting tree can be NJ, MAP tree, ML... apply branch-swap (NNI) to current optimal tree, then verify distance to all samples the distance used is the path difference (matrix subtraction) Leo Martins (Univ. Vigo) Journal Club 22/7 12 / 12
  34. 34. How do they find, then, the Bayes estimates? like many other softwares: hill-climbing on the space of possible topologies their input data is the posterior distribution of trees from MrBayes starting tree can be NJ, MAP tree, ML... apply branch-swap (NNI) to current optimal tree, then verify distance to all samples the distance used is the path difference (matrix subtraction) don’t need to recalculate distance to all samples, just to matrix with average values Leo Martins (Univ. Vigo) Journal Club 22/7 12 / 12

×