Community content building for evolutionary biology<br />Lessons learned from LepTree and Encyclopedia of Life<br />Cynthi...
Today’s story<br />LepTree and Encyclopedia of Life built a couple of websites<br />LepTree: slow for social content-build...
LepTreehttp://leptree.net<br />
Community features<br />Blog<br />Commenting<br />Forum<br />Working Groups<br />
Complex LepTreetaxontemplate<br />
LepTree built semantic tools, then invited data entry<br />Export<br />
http://www.eol.org<br /><ul><li>All species known to science
Freely accessible: open access, open source
Available from a single portal in a common format
Quality
Always growing as new species are discovered and new knowledge is generated</li></li></ul><li>Typical species page<br />
http://www.eol.org/content_partner<br />Objects can come from many partners<br />Objects are sorted by topic<br />Each par...
EOL aggregates, then annotates<br />Catalogue of Life<br />IUCN<br />Content providers<br />Databases<br />LifeDesks<br />...
LepTree’s data approach is more complex and customized <br />LepTree<br /><ul><li>Highly structured348 content “fields”
Big S semantics (OWL, RDF triple store). Tied to people and project ontologies
Custom data entry: required new workflow</li></ul>EOL<br /><ul><li>Very coarsely structured33 subjects (TDWG Species Profi...
XML schema
Variety of data paths: avoid changes in workflow</li></li></ul><li>Comparing stats . . .<br />LepTree<br />EOL<br /><ul><l...
1750 pages (107 rich pages + ~450 fossils) + ~1600 images)
Upcoming SlideShare
Loading in …5
×

Community content building for evolutionary biology: Lessons learned from LepTree and Encyclopedia of Life

2,086 views

Published on

Presented at iEvoBio: Informatics for Phylogeny, Evolution, and Biodiversity in Portland, OR 29 June 2010

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,086
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
12
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • I’m going to do a compare and contrast talk, so I have two projects to introduce you to. I apologize in advance if I go a bit quickly. Please feel free to catch me anytime in the next two days to get a demonstration of either of these projects
  • Conclusion is that these are complementary approaches – can pursue in parallel. Focus on community driven databases that can be customized for the needs of the users of the data – result in highly atomized specialist data. Then alllow that information to be aggregated on EOL where it might find broader reuse and reinterpretation.
  • LepTree is an Assembling the tree of life project whose major goal is to use nuclear genetic sequence to resolve deep nodes at the family and superfamily level in the Lepidoptera. This tree on the left shows our initial published findings which are not the point of this talk. I’ll just note that our analysis suggests that macrolepidoptera, shown by these orange bars, the very large moths and all butterflies, are clearly not a monophyletic group.The subject of today’s talk is the website tools we’ve created at leptree.net that include some features such as an interactive matrix visualizion of the sequencing status for the project of the where columns are each of the genes being sequenced and the rows show the hundreds of samples being used by the project, colors show our progress for each gene.We also have a fossil project and a morphology project that also have representation on our pages.
  • The leptree website is built on a core of the open source drupal platform, and includes a number of the out-of-the-box community features, blog, discussion forum, commenting, the ability to create private working areas.In addition we have added new modules to allow community members to add information about their own projects, to post protocols that they are using so that they can link to them and other people can use the same protocols. Finally, we have a references module that lists about 800 articles on lepidopteransystematics. Rather than using the relational database that is the backend of drupal, these are actually storing data semantically – as RDF triples linked to rich ontologies.
  • And finally, we also set up a custom module that presents a user with a complex temlpate for describing taxa. The checkboxes and data fields are the result of months of consultation with lepidopterists and are intended to cover the kinds of morphological and ecological variation across the group. Like the projects, protocols, and references modules, the data are stored in a sesame triple store repository. We can use this semantic representation to link our knowledge to that generated by other projects and use machine reasoning to come up with new results. This is the kind of data that would be appropriate to “decorate” a phylogenetic tree to look for patterns.The goal is to produce about 150 of these taxon pages but we designed the system to be expandable.
  • So to summarize, LepTree built some semantics-enabled tools, combine this with data and links from a couple of other projects to create the taxonomic information pages you can see on LepTree.net under “Knowledge project”In addition, the taxon information is now being exported as text objects and also appears on the Encyclopedia of Life taxon pages.
  • Objects such as these are essentially chunks of text sorted by topic.Each of these credits the source, and can receive comments or ratings, or can be trusted or untrusted by curators.
  • So, the approach of EOL is rather different. EOL is a giant mashup that creates pages, that are then available for curators to assess and rate, or for anybody to provide comments or tags.LepTree has foccuseed on data entry tools while EOL has not – though I should note that we have also developed a Drupal-based system called LifeDesks, which are one of the many ways that data flows to the central EOL.
  • On LepTree, burden on users to learn a new systemOn EOL, burden on programming staff, not on users
  • The effort we went to in Leptree to add semantics to the tools likely just slowed us down, and distracted us from the effort of developing a community effort. But once we had tools with lots of checkboxes we have been able to accumulate a lot of potentially useful atomized data.By divide and conquer I mean that it should be possible to continue to promote community databases – these can be tailored to the specific needs of a scientific community and its audiences, with data as structured as possible. And then The data from these projects can be aggregated, essentially cross-indexed, so that they are accesssible from a common portal, EOL. If EOL had tried to structure or semanticize from the beginning we never would have achieved the growth we have.
  • Build contentExpose triplesShare data
  • Community content building for evolutionary biology: Lessons learned from LepTree and Encyclopedia of Life

    1. 1. Community content building for evolutionary biology<br />Lessons learned from LepTree and Encyclopedia of Life<br />Cynthia Parr<br />Smithsonian Institution<br />University of Maryland<br />
    2. 2. Today’s story<br />LepTree and Encyclopedia of Life built a couple of websites<br />LepTree: slow for social content-building but highly useful content<br />EOL: quick for content aggregation, but now need to atomize and semanticize<br />Conclusion: Best of both worlds<br />
    3. 3. LepTreehttp://leptree.net<br />
    4. 4. Community features<br />Blog<br />Commenting<br />Forum<br />Working Groups<br />
    5. 5. Complex LepTreetaxontemplate<br />
    6. 6. LepTree built semantic tools, then invited data entry<br />Export<br />
    7. 7. http://www.eol.org<br /><ul><li>All species known to science
    8. 8. Freely accessible: open access, open source
    9. 9. Available from a single portal in a common format
    10. 10. Quality
    11. 11. Always growing as new species are discovered and new knowledge is generated</li></li></ul><li>Typical species page<br />
    12. 12. http://www.eol.org/content_partner<br />Objects can come from many partners<br />Objects are sorted by topic<br />Each partner gets credit<br />
    13. 13. EOL aggregates, then annotates<br />Catalogue of Life<br />IUCN<br />Content providers<br />Databases<br />LifeDesks<br /> Public contribution<br />Curating<br />Commenting<br />Tagging<br />GBIF<br />Biodiversity Heritage Library<br />http://www.eol.org/content_partner<br />
    14. 14. LepTree’s data approach is more complex and customized <br />LepTree<br /><ul><li>Highly structured348 content “fields”
    15. 15. Big S semantics (OWL, RDF triple store). Tied to people and project ontologies
    16. 16. Custom data entry: required new workflow</li></ul>EOL<br /><ul><li>Very coarsely structured33 subjects (TDWG Species Profile Model)
    17. 17. XML schema
    18. 18. Variety of data paths: avoid changes in workflow</li></li></ul><li>Comparing stats . . .<br />LepTree<br />EOL<br /><ul><li>2 content partners23 contributors/260 members
    19. 19. 1750 pages (107 rich pages + ~450 fossils) + ~1600 images)
    20. 20. 75thousand triples </li></ul>43per taxon<br />3300 per contributor<br /><ul><li>>45 content partners430 curators/1000s contributors/~43,000 members
    21. 21. 2.4 million pages390 thousand pages with objects
    22. 22. 1.7 million data objects and 889K taxa with BHL links BUT0 are atomized</li></li></ul><li>What LepTree has done with triples<br />Nothing.<br />So far.<br />
    23. 23. Community areas of LepTree are flat <br />
    24. 24. EOL’s content trajectory is promising<br />Species pages with a vetted object<br />Year<br />
    25. 25. Lessons learned<br /><ul><li>Semanticizing tools wasn’t productivebut structuring was…
    26. 26. Communities are hard!
    27. 27. Divide and conquer should scale</li></li></ul><li>Future plans<br />LepTree<br />Build<br />Expose <br />Share<br />EOL<br />Partner<br />Atomize<br />Semanticize<br />Delight<br /><ul><li>APIs
    28. 28. Phylogenies
    29. 29. Visualizations</li></li></ul><li>Thank you<br />http://leptree.net<br />Leadership:, Mike Cummings, Don Davis , Charlie Mitter, Jerry Regier, Susan Weller<br /> Developers: John Park, Joshua Kim, Phuong Nguyen, Matt Chan, Matthew Conte, DanhLuong, Adam Bazinet, Erica Olson<br />Biologists: John Brown, Dana Campbell, Soowon Cho, Amanda Roe, Jennifer Zaspel, Jae-CheonSohn, Akito Kawahara, Andreas Zwick, Kim Mitter, April Dinwiddie <br />Funding: National Science Foundation AToL<br />http://www.eol.org<br />Leadership: Jim Edwards, David Patterson, Nathan Wilson, Bob Corrigan, Mark Westneat, Marie Studer, Tom Garnett<br />Developers: Peter Mangiafico, Patrick Leary, Jeremy Rice, DimitriMozzherin, David Shorthouse, Lisa Whalley and others<br />Biologists: Katja Schulz, Jennifer Hammock, Tanya Dewey, Audrey Aronowsky. Leo Shapiro, R. Allen<br />Funding:John D. and Catherine T. MacArthur Foundation, Alfred P. Sloan Foundation, Cornerstone Institutions, Private Donors<br />
    30. 30. EOL Cornerstone Institutions<br />Sample Content Partners<br />AmphibiaWeb<br />Animal Diversity Web<br />AntWeb<br />Catalogue of Life<br />FishBase<br />Global Biodiversity Information Facility (GBIF)<br />International Union for the Conservation of Nature<br />Tree of Life Web Project<br />The Biodiversity Heritage Library<br />The Field Museum of Natural History<br />The Missouri Botanical Garden<br />The Marine Biological Laboratory <br />Harvard University<br />The Smithsonian Institution<br />

    ×