Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Analysis and visualization of
large collections of trees
A case study in Chalcidoidea (Insecta:
Hymenoptera)
Ana Dal Molin...
The tree space






The number of possible
trees is given
Criteria exist to
determine which ones
are better hypotheses...
The tree space






The number of possible
trees is given
Criteria exist to
determine which ones
are better hypotheses...
Case study





525 terminals
2992 characters, rDNA (18S and 28S D2-D5)
sequences
Structural alignment + MAFFT alignmen...
Secondary structure is characterized by stems (paired bases) and loops
(unpaired bases): alignment
Case study: symptoms


Inconsistencies across repeated analyses



Spurious relationships



Why?
“low support”
“results highly
sensitive to the
method used”
recognized tribes
and subfamilies
group, but not in a
plausibl...
Problems
1. Growing data sets lead to growing number of trees,
sometimes too large to be compared by eye
2. Dozens of thou...
Methods



TNT, 5 seeds, unweighted parsimony
5 different seeds resulted in 30,000 trees, 20061
steps, CI=0.165, RI=0.62...
1. Portability and set operations
File size comparison

• a print screen of file
structure
• [hashing?]
• Reference for de...
Set Operations
• All trees were unique in
every set
but
• Union = 32,300 (unique)
trees, not 150,000
• Intersection = 28,4...
2. Comparisons: MrsRF


5x5 heatmap
Large heatmap
Distance between consensus trees =
0
Color strict consensus tree
More Information
Acknowledgements
Upcoming SlideShare
Loading in …5
×

Whs121

317 views

Published on

Published in: Business, Technology
  • Be the first to comment

  • Be the first to like this

Whs121

  1. 1. Analysis and visualization of large collections of trees A case study in Chalcidoidea (Insecta: Hymenoptera) Ana Dal Molin, Suzanne Matthews James Munro, John Heraty, Jim Woolley
  2. 2. The tree space    The number of possible trees is given Criteria exist to determine which ones are better hypotheses Heuristics
  3. 3. The tree space    The number of possible trees is given Criteria exist to determine which ones are better hypotheses Heuristics
  4. 4. Case study    525 terminals 2992 characters, rDNA (18S and 28S D2-D5) sequences Structural alignment + MAFFT alignment of the RAA's (EINSI)
  5. 5. Secondary structure is characterized by stems (paired bases) and loops (unpaired bases): alignment
  6. 6. Case study: symptoms  Inconsistencies across repeated analyses  Spurious relationships  Why?
  7. 7. “low support” “results highly sensitive to the method used” recognized tribes and subfamilies group, but not in a plausible place
  8. 8. Problems 1. Growing data sets lead to growing number of trees, sometimes too large to be compared by eye 2. Dozens of thousands of trees with hundreds of terminals = really large files  Can I even load them? 3. Inconsistencies and polytomies in consensus trees:  Do we have rogue taxa?  Has the search run enough?  Do we have enough signal?
  9. 9. Methods   TNT, 5 seeds, unweighted parsimony 5 different seeds resulted in 30,000 trees, 20061 steps, CI=0.165, RI=0.62  Portability: TreeZip  Set operations: TreeZip  Comparison via matrices of RF distances: MrsRF − Heatmaps of the distance matrices plotted using R
  10. 10. 1. Portability and set operations File size comparison • a print screen of file structure • [hashing?] • Reference for details
  11. 11. Set Operations • All trees were unique in every set but • Union = 32,300 (unique) trees, not 150,000 • Intersection = 28,422 trees • Consensus….
  12. 12. 2. Comparisons: MrsRF  5x5 heatmap
  13. 13. Large heatmap
  14. 14. Distance between consensus trees = 0
  15. 15. Color strict consensus tree
  16. 16. More Information
  17. 17. Acknowledgements

×