Semantic Annotation Dc 2009


Published on

1 Like
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • With the advent of Web 2.0, social networking sites have become very common and esp students spend increasing amounts of time on networking sites such as Facebook, Orkut & MySpace. Even within the science & education community, researchers are discussing findings and networking within scientific social communities. This is the home page of Alzforum, one of the oldest of such communities. It has over 4000 registered Alzheimers researchers networking to find a cure for the neurological disorder Alzheimers. Alzforum became very popular and is known as CNN for AD researchers. In fact it became necessary to clone the site. Alzourm was developed 10 years ago and features were added over time making it difficult to replicate the platform.
  • To meet these needs we developed SCF – Science Collaboration Framework. SCF can be used to replicate Alzforum like communities. It is based on Drupal – an open source CMS. Contains Integrated Collaborative tools Web 2.0. One of our key contribution was to adapt Drupal to The Semantic Web. Thus we can leverage existing linked data and ontologies/vocabularies. SCF ssc are Interoperable with other SCF or Semantic Web communities. And finally provides powerful “semantic search” capabilities
  • Our pilot project was StemBook - an online review of Stem Cell Biology for Stem Cell researchers. Then we took advantage of features in StemBook and developed PDOnline – a site for Parkinson’s researchers. Alzforum has come a full circle and is re-developing their site on SCF. A site for neuropathic pain and other sites are in planning stages. The idea is that every site contributes features to the SCF toolkit as well as reuses existing ones. And we hope to achieve asymptotic convergence.
  • To link and integrate these communities developed with SCF, we annotate the content of the communities with ontologies, controlled vocabularies and linked data. The articles and comments on the community site are tagged with resources that have stable URIs or terms from controlled vocabularies. The tags have meaning and other details such as provenance and status are also captured. The details of the semantic annotation ontology can be found at our website
  • Suppose a document discusses the gene amyloid beta. We annotate the document with the gene resource “AB”, not just the term “AB”. The resource information is obtained from a SPARQL endpoint provided by Science Commons that contains the gene synonyms are other information. Thus, search using any of these terms returns the document
  • Another example search for BACE1 returns document annotated wih Beta secretase, beta-site AP cleaving enzyme and so on…
  • In principle, search for BACE1 could also bring up the structure for BACE1. This feature has not yet been implemented
  • Such searches are made possible by semantic annotation of site content. And semantic annotation is facilitated by semi-automatic text mining. Text-mining algorithms suggest terms for annotation and then the editor of the community sites manually review those, prior to attaching the annotation to the document. Currently we mine documents for genes names, gene ontology terms, tissue cell types etc.
  • Screen shot of SCF annotation editor. The editor facilitates the manual review process. The terms identified by the algorithm are highlighted and any term can be accepted, changed or deleted.
  • So to recap “algorithm finds core terms”
  • Relationships to other entities are established automatically. The gene points to the protein, the protein to the antibody and so on
  • Thus powerful searches across communities are established
  • Semantic Annotation Dc 2009

    1. 1. Semantic Annotation of Scientific Articles DC-2009 "Semantic Interoperability of Linked Data" Sudeshna Das 1,2 & Tim Clark 1,2, [email_address] 1 MIND, Massachusetts General Hospital 2 Harvard Medical School
    2. 2. Alzforum: The Pioneer in Biomedical Web Communities
    3. 3. Problem Statement Shared terminology Linked open data sources Reusable software Web 3.0
    4. 4. What is the SCF? <ul><li>Science Collaboration Framework </li></ul><ul><li>Replicate Alzforum like communities </li></ul><ul><li>Based on Drupal </li></ul><ul><ul><li>low barrier to entry </li></ul></ul><ul><li>Integrated Collaborative tools </li></ul><ul><ul><li>Web 2.0 </li></ul></ul><ul><li>Adapted to the Semantic Web </li></ul><ul><ul><li>Leverages existing linked data </li></ul></ul><ul><ul><li>Uses shared ontologies/vocabularies </li></ul></ul><ul><ul><li>Interoperable with other SCF or Semantic Web communities </li></ul></ul><ul><li>Provides powerful “semantic search” capabilities </li></ul>
    5. 5. SCF Overview
    6. 6. PDOnline Alzforum Pain StemBook SCF Toolkit
    7. 7. Semantic Annotation <ul><li>Tagging with a term or resource </li></ul><ul><ul><li>belongs to a defined class </li></ul></ul><ul><ul><li>that has a URI (Uniform Resource Identifier) </li></ul></ul><ul><ul><li>controlled vocabularies or ontologies </li></ul></ul><ul><li>The tag has meaning! </li></ul><ul><ul><li>Defined relationship between document and tag </li></ul></ul><ul><ul><li>Document “discusses” PhenomenonA </li></ul></ul><ul><ul><li>Document “majorTopic” Gene123 </li></ul></ul><ul><ul><li>DocumentA “cites” DocumentB </li></ul></ul><ul><li>Provenance and status </li></ul><ul><li>Tag - annotation </li></ul><ul><li> (ontology) </li></ul>
    8. 8. Search for beta-amyloid Retrieve content with “ abeta”, “ amyloid-beta”, “ A β ”, “Ab1-40”, “Ab1-42”, . . . Semantic Search
    9. 9. Search for “ BACE1” Retrieve content with “ beta secretase”, “ beta-site APP cleaving enzyme”, “ membrane-associated aspartic protease”, etc. . . . Semantic Search
    10. 10. Search for “ BACE1” Associate database content
    11. 11. Enabling semantic annotation <ul><li>Semi-automatic text-mining </li></ul><ul><ul><li>Use machines to “suggest terms” </li></ul></ul><ul><ul><li>Use editors to refine choice </li></ul></ul><ul><li>Currently mining for </li></ul><ul><ul><li>Gene names </li></ul></ul><ul><ul><li>Gene Ontology terms </li></ul></ul><ul><ul><li>Tissue, Organs, Cell types </li></ul></ul>High recall High precision
    12. 15. Powerful search across communities
    13. 16. <ul><li>SCF </li></ul><ul><li>Software Engineer </li></ul><ul><ul><li>Mark Goetz </li></ul></ul><ul><ul><li>Stéphane Corlosquet </li></ul></ul><ul><ul><li>Sam Brin </li></ul></ul><ul><li>Project Management </li></ul><ul><ul><li>Sudeshna Das </li></ul></ul><ul><ul><li>Tim Clark </li></ul></ul><ul><li>Agaric Design </li></ul><ul><li>Ben Melancon </li></ul><ul><li>HSCI </li></ul><ul><li>Lisa Girard </li></ul><ul><li>Brock Reeve </li></ul><ul><li>NeuroCommons </li></ul><ul><li>Alan Ruttenberg </li></ul><ul><li>Jonathan Rees </li></ul><ul><li>Alzforum </li></ul><ul><li>June Kinoshita </li></ul><ul><li>Elizabeth Wu </li></ul>Team
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.