Encyclopedia of Life: Use cases for phenotypes


Published on

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Whirlwind tour to EOLAs you may know, Encyclopedia of Life is a web site providing global access to knowledge about life on earth.Global – the whole worldAccess – free, and freely re-usableKnowledge – synthesized, not rawLife on Earth – biological diversity
  • EOL takes information from about 200 sources so far, mostly scientific databases, but also including Flickr and Wikipedia, and automatically sorts it onto on taxon pages. Our curators can then trust or untrust it, or anybody can provide comments or ratings. About a thousand credentialed scientists have already volunteered to help with quality control. Actions and comments get fed back to the original providers, and the material on EOL is also available to other applications via an Application Programming Interface, which I’ll talk more about in a moment.We’re partnering with over two hundred scientific databases as well as public conribution sites like Flickr and Wikipedia.100+ partner databases700 curators/1000s contributors/46,000 members2.8 million pages500 thousand pages with Creative Commons contentOver 2 million data objects and >1 million pages with links to research literatureTraffic in past year: 1.7 million unique users, 6.2 million page views
  • I want to emphasize that EOL deals in summarized knowledge, not raw specimen data. For example, for the serpents head cowrie, we have images like this from the Mooreabiocode project, but instead of serving the individual specimen data, we get the overall distribution of specimen data on a map from GBIF. We also get a summary of environmental data associated with specimens in the Ocean Biogeographic Information System database. Imagine if we could do a summary like this across databases.
  • But what we don’t have is data that is analysis ready – numeric, controlled vocabularies, etc.
  • Good for general public, to the extent that the concepts have understandable labelsThese are from the Animal Diversity Web, put these in the reproduction part of the pageAlong with any other reproduction data we get from other sourcesSome problems – some of our audiences aren’t interested in the fine detail but you never know…how do you decide what to hide?
  • If querying interfaces or APIs are not your thing, we could easily make the whole web page browsable by semantic web browsers You could do whatever you want with that….
  • This is a graphical way of presenting the summarized data from OBIS, which Jen Hammock on my staff worked on with Edward Van den berghe and our team at the Marine Biological Lab. The salinity range for the species is shown here as just a smal, specific slice of the global ocean minimum and maximums.Looking just at 15 content providers we already work with, it is possible that numeric data such as lifespan or average body weight is already available for more than 800,000 species
  • Think Neuroscience Information Frameworkbut for non-model organisms and traits across ecology, evolution, morphology & behaviorThe second challenge is to propose research work using computable data and EOL in some concrete way. Perhaps as I suggested with using collections to harvest computable data or perhaps using text mining. Here the deadline is next month for the idea, and then we’re providing funds to accomplish the pilot project over the next year.
  • Encyclopedia of Life: Use cases for phenotypes

    1. 1. Cynthia ParrPhenotype RCN @cydparrNESCent 25 February 2013 @eol
    2. 2. EOL aggregates and curates across topics, across the tree of lifeScientific Databases, includingBHL, GBIF, ALA, INBio, COL,Scratchpads, LifeDesksScientific Journals Curate Aggregate Comment Rate, Collect eol.org Quality control API Third party apps
    3. 3. From Moorea Biocode EOL summarizes knowledgeErosaria caputserpentisSerpents Head Cowrie Depth range based on 51 specimens in 2 taxa. Water temperature and chemistry ranges based on 40 samples. Environmental ranges Depth range (m): -5 - 67 Temperature range (°C): 23.011 - 28.496 Nitrate (umol/L): 0.048 - 0.923 Salinity (PPS): 33.821 - 35.837 Oxygen (ml/l): 4.349 - 4.825 Phosphate (umol/l): 0.088 - 0.228 Silicate (umol/l): 0.983 - 4.026From GBIF From OBIS
    4. 4. Statistics2 years ago Today• 2.8 million pages – one (or • 3.3 million pages – one (or more) per taxon more) per taxon• 2 million data objects • 5 million CC-licensed data objects• 500 thousand pages with objects • Over 1 million pages with objects• 100+ partner databases • 200+ partner databases• 700 curators/1000s contributors/~46,000 • 1200 curators/1000s members contributors/~64,000 members
    5. 5. We have an infrastructure . . .• Aggregation mechanisms• Names resolution• Curation mechanisms• Public and machine interfaces• User-created collectionsWhat are the next use cases to tackle?How could ontologies & annotations help?
    6. 6. See structured info on EOL pagesDiscover and identify“find taxa with thesecharacteristics”
    7. 7. Browse the whole page semantically, link to related resources (LOD: linked open data)Google Summer of Code with Phenoscape (Alex Ginsca)Using DBPedia Spotlight to extract associations among taxa and add to LinkedOpen Data cloud (Devries and Thessen)Linking names, literature, phylogeny (Page)Resolving archeological data on animal domestication in the near east(Alexandria Archive Institute)
    8. 8. Promote NLP text mining and crowdsourcing– Altitude Specificity of Flower Coloration (Wright)– Species Interaction Datasets— Integration, Visualization, and Analysis (Poelen and Mungall)– Crowd-sourced data to examine morphological impacts of extinction risk in ray-finned fishes (Chang)– Macroecological patterns in butterfly-hostplant associations (Ferrer-Parris)– Discovering habitat terms in EOL contents (Pafilis)
    9. 9. Easy access to analyzable data “Are blue organisms more common in high altitudes?”“How can I predict vulnerability to climate change basedon life history characteristics?”“What organisms should I collect to fill in gaps in genomequality data?”• Look for data, download for all taxa• Create a collection of taxa, download all data• Use Reol: an R interface to EOL (Banbury, Omeara)http://barbbanbury.info/barbbanbury/Reol.html• Find more specialized data repositories
    10. 10. Dynamic online knowledge• Support summaries with networks of evidence – E.g. Bergmann’s rule: animals living in higher latitudes have larger body size• As evidence grows or changes, change the knowledge summary• Flag evidence that is in conflict with the summary
    11. 11. Summarize data across providers Flag outlier data Salinity envelope (n=40)Erosaria caputserpentisSerpents Head Cowrie From OBIS
    12. 12. The big pictureIn progress: Marine computable dataDraft phylogenetic tree from Open Tree of Life projectTraitBank: access to computabledescriptive information across the tree oflife
    13. 13. Thanks toOur funders John D. and Catherine T. MacArthur Foundation Alfred P. Sloane Foundation Smithsonian Institution Marine Biological Laboratory Harvard University David Rubenstein and other funders and donorsAll our content providers and global partnersVolunteer curators and individual contributors via Flickr, Wikimedia, and members of EOL