Challenge of Semantics for the Encyclopedia of Life

1,853 views
1,773 views

Published on

An introduction to EOL (http://www.eol.org) and some of the challenges and possible applications for structured, semantic information about biological organisms. Presented at the kick-off meeting of the NSF-Funded Phenotype Ontology Research Coordination Network.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,853
On SlideShare
0
From Embeds
0
Number of Embeds
379
Actions
Shares
0
Downloads
6
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • So, the approach of EOL is rather different than many other sites. EOL is a giant mashup that creates pages, that are then available for curators to assess and rate, or for anybody to provide comments or tags.
  • Objects such as these are essentially chunks of text sorted by topic.Each of these credits the source, and can receive comments or ratings, or can be trusted or untrusted by curators.
  • From this page from LepTree:EpipyropidaePlanthopperparasites
  • Given this scale, I think ths was the ONLY way we could start.Imagine how large an ontology we’d have to have to fully describe organisms ranging from this tiny Pelagic diatom, 50 microns longWhales, in this case a humpback, many orders of magnitude larger, also pelagic, but physiologically and morphologically quite differentPixie's Parasol, saprophytic organism with complex life cycles (note the collembola on it)An animal like a humpback is characterized in Animal Diversity Web by an ontology with about 400 concepts, just scratches the surface, similarly this Saturnid moth we characterized in the LepTree project with a few more hundred concepts, some of which overlap with the whale but most don’t. The size of ontologies spoken about here is on the order of 5 to 70K conceptsThink about what kind of characters you’d need to characterize this halobacteria – an archaean!!But a scientist studying food webs might want to know characteristics across a wide swath of life.
  • Represents about 2200 projects, and 1000 instances of data flow or hyperlinks between them. Hundreds of partners, each with their own ontology (in many cases for good reason!) and you can see that the ontology space itself, much less the way you Most of these are NOT using ontologies
  • One of the things that may be valuable about EOL is the ability to assess the amount of information available for a group of taxaFamily Corvidae, showing the hooded crow here, is where I curate. It has reasonably rich content with 74% of pages having some text though only 27% have images. There are also a large number of unreviewed images (from Wikipedia and Flickr) and text (mostly from Wikipedia) I am working through.This could be expanded to highlight gaps in what we know about organisms – what areas of biology, for example, lack information. Could be used by funding agencies to prioritize grants, by students deciding what needs to be studied.Might show how to find content summaries on current pages
  • Not biologically relevant concepts but it is a start
  • Hand wavy, we aren’t actually doing this just yet but we could….Note that by referring to the URIs for the concepts can take advantage of the relationship assertions among the terms, but we don’t need to manage them ourselves, so this might be pointers to the EQ statements described earlier, with enough information here that we can display to humans, but enough info so scientists and ontologists can have the formalisms needed for reasoning
  • Let’s say we figure out HOW to do it, should we do it?
  • Good for general public, to the extent that the concepts have understandable labelsThese are from the Animal Diversity Web, put these in the reproduction part of the pageAlong with any other reproduction data we get from other sourcesSome problems – some of our audiences aren’t interested in the fine detail but you never know…how do you decide what to hide?
  • For scientists, let them download or access the data, providing not only the source of where the info came from but machine-readable URIs that define the concepts, so that they can integrate and perform analyses on the dataDownload data like this, combine it with a phylogeny of rodents and you might be able to test evolutionary hypothesesmiddleman
  • If querying interfaces or APIs are not your thing, we could easily make the whole web page browsable by semantic web browsers You could do whatever you want with that….
  • Most ambitious, pie in the sky
  • Informtics for evolution, systematics, and biodiversity
  • Challenge of Semantics for the Encyclopedia of Life

    1. 1. Challenges for semantics in EOL<br />Phenotype Ontology RCN<br />NESCent<br />25 February 2011<br />Cynthia Parr<br />National Museum of Natural History<br />Smithsonian Institution<br />
    2. 2. http://www.eol.org<br /><ul><li>All species known to science
    3. 3. Summary descriptions across biology domains
    4. 4. Freely accessible
    5. 5. Available from a single portal in a common format
    6. 6. Quality
    7. 7. Always growing</li></li></ul><li>EOL is a Content Curation Community<br />Catalogue of Life<br />IUCN<br />Content providers<br />Databases<br /> Journals<br />LifeDesks<br /> Public contribution<br />Curating<br />Commenting<br />Tagging<br />GBIF<br />Biodiversity Heritage Library<br />
    8. 8. Typical species page<br />
    9. 9. http://www.eol.org/content_partner<br />Objects can come from many partners<br />Objects are sorted by topic and by taxon<br />Each partner gets credit<br />
    10. 10.
    11. 11. Curation, Comments, Tags<br />
    12. 12. Not<br />
    13. 13. Statistics<br />2.8 million pages – one (or more) per taxon<br />2 million data objects<br />500 thousand pages with objects<br />100+ partner databases<br />700 curators/1000s contributors/~46,000 members<br />
    14. 14.
    15. 15. http://NodeXL.codeplex.com<br />
    16. 16. Schema<br />Very coarsely structured<br />33 subjects (TDWG Species Profile Model)<br />No numeric data<br />Minimal controlled vocabularies<br />API<br />
    17. 17. Corvidae<br />
    18. 18. We have an infrastructure . . .<br />Aggregation mechanisms<br />Names resolution<br />Curation mechanisms<br />Public and machine interfaces<br />Version 2 (August) vastly improved support for community interaction<br />Version 3 (???) <br />
    19. 19. Rich page calculations<br />
    20. 20. Possible path to semantics <br />
    21. 21. What could we do?<br />
    22. 22. Organize info on EOL pages<br />Index by taxon<br />Sort into one of the 33 SPM subjects<br />Improve discoverability<br />
    23. 23. Serve data by API or query interface<br />“Give me all the information you have about the elbow joint and life histories in rodents”<br />
    24. 24. Make the whole page semantically browsable (LOD: linked open data)<br />Taxon<br />Text blobs<br />Character data<br />Metadata <br />
    25. 25. Consistency checks<br />Curators<br />Crowd-sourcing<br />Reasoning…<br />… inferring summaries<br />….mining for patterns?<br />… hypothesis testing?<br />
    26. 26. ievobio.org<br />
    27. 27. Image credits<br />Michal Koupý <br />Lorraine Phelan<br />David J Patterson <br />Dmitry Mozzherin<br />

    ×