SemTechBiz 2012: Domeo: a web-based tool for semantic annotation of online documents
1. SemTechBiz 2012, San Francisco, June 4th 2012
Domeo: a web-based tool for semantic
annotation of online documents
http://www.annotationframework.org/
Paolo Ciccarese, PhD
http://www.paolociccarese.info/
paolo.ciccarese@gmail.com
Mass General Hospital Harvard Medical School
2. About Me
• Assistant in Neurology at Mass General Hospital
• Research faculty at Harvard Medical School
• Author of 30+ scientific publications
• Senior software and knowledge engineer
• Member of W3C HCLS Interest Group
• Co-chair of the W3C Open Annotation
Community Group
http://www.paolociccarese.info/
Paolo Ciccarese, PhD SemTechBiz 2012, June 4th 2012
3. As (biomedical) scientists…
• We deal with an increasing amount of digital
resources: documents, images, videos,
datasets, vocabularies, databases, software…
– About 150-200 articles a week
– 10mins/article ≈ 34hours/week?
– How can we manage it?
http://www.ncbi.nlm.nih.gov/pubmed/
Paolo Ciccarese, PhD SemTechBiz 2012, June 4th 2012
4. … we commonly use annotation
• We annotate prints, HTML and PDFs
• We bookmark/tag web pages…
• … and publications (citations/references)
• We comment on web pages, blogs, forums
and emails
• We tweet…
• …
Paolo Ciccarese, PhD SemTechBiz 2012, June 4th 2012
5. Are we efficient and effective?
• Can we integrate our annotations?
• Can we leverage machine computation?
• Can we share it easily with our colleagues?
• Can we capitalize on the work of colleagues?
• Can we integrate it with other resources?
• Can we easily observe science evolution?
• Can we easily detect the up-to-date science?
• Can we discover valuable resources?
Paolo Ciccarese, PhD SemTechBiz 2012, June 4th 2012
6. A ‘semantic’ view of a publication
Semantic Web Applications in Neuromedicine
(SWAN) project [2007]
classic publication
scientific discourse ‘semantic’ representation
http://tinyurl.com/cgyna2m
Paolo Ciccarese, PhD SemTechBiz 2012, June 4th 2012
9. How do we empower ‘Joe Scientist’?
• Even simple linking tasks are not ‘standardized’, hard
to share and not easy to perform
http://antibodyregistry.org/antibody17/antibodyform.html?
gui_type=advanced&ab_id=2266850
antibodyregistry.org
Paolo Ciccarese, PhD SemTechBiz 2012, June 4th 2012
10. Enable manual annotation
of digital resources
• Visually and effectively annotate - better
semantically annotate - any digital resource
and resource fragment, while performing our
regular browsing/reading activities
http://www.ncbi.nlm.nih.gov/pubmed/19822029 http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2874257/
≈
Paolo Ciccarese, PhD SemTechBiz 2012, June 4th 2012
11. Leverage text mining and
community curation
• Run text mining and entities recognition
algorithms on scientific documents and persist
the results in a standard format
• Benefit from crowdsourcing by supporting
curation of manual and automatic annotation
Paolo Ciccarese, PhD SemTechBiz 2012, June 4th 2012
12. Enable semantic tagging (ontologies)
http://purl.obolibrary.org/obo/PR_000004168
Label ‘amyloid beta A4 protein’
Exact synonyms ‘APP’, ‘amyloidogenic glycoprotein’, …
Related Synonyms ‘A4’, ‘ABPP’,
Is a
http://purl.obolibrary.org/obo/PR_000000001
Label ‘protein’
Definition ‘An amino acid chain that…’
Source: Protein Ontology (PRO) https://pir5.georgetown.edu/wiki/PRO
Paolo Ciccarese, PhD SemTechBiz 2012, June 4th 2012
13. APPs for the Semantic Resources Project, May 2010
Paolo Ciccarese, PhD SemTechBiz 2012, June 4th 2012
14. Zooming in
APPs for the Semantic Resources Project, May 2010
Paolo Ciccarese, PhD SemTechBiz 2012, June 4th 2012
15. …and more
• Share the annotation in a common format
• Efficiently search (inference, rules) the annotation
• Reuse/integrate the annotation
• Exercise access control
• Subscribe to feeds related to topics of interest
– Proteins, Cells, Authors, Papers…
• Retrieve additional content (mashups)
• Find new resources
• Find collaborators
Paolo Ciccarese, PhD SemTechBiz 2012, June 4th 2012
16. Annotation Ontology (AO)
• OWL vocabulary for representing and sharing
annotation of digital resources and their fragments
• Not only for biomedicine!
Ciccarese et al, 2011
An open annotation ontology for science on web 3.0
http://www.jbiomedsem.com/content/2/S2/S4
http://purl.org/ao/home (Website/Wiki)
Paolo Ciccarese, PhD SemTechBiz 2012, June 4th 2012
17. AO Overview
AO allows to annotate:
Resources: Documents (HTML, PDF, Word, Excel), Images,
Databases, Web Services... (and their fragments)
Specifying (or not) an:
Annotation Type: through one of the already available
types (errata, highlight, qualifiers...) or the ones the users
will define.
With (or without) a:
Topic: free text, structured text, URIs, RDF entities,
RDF graphs, domain ontologies…
Tracing:
Provenance: who created what, when, with which
software, with what expectations…
Paolo Ciccarese, PhD SemTechBiz 2012, June 4th 2012
19. Annotating a document fragment
Protein Ontology – PRO: http://purl.org/obo/owl/PRO
Paolo Ciccarese, PhD SemTechBiz 2012, June 4th 2012
20. HyQue triples
Experiments
Workflows
Paolo Ontology 2.0: http://code.google.com/p/swan-ontology/
SWAN Ciccarese, PhD SemTechBiz 2012, June 4th 2012
21. Annotation Ontology Network
Biotea
The Living Document
Project
Paolo Ciccarese, PhD SemTechBiz 2012, June 4th 2012
22. Open Annotation Community Group
• Annotation Ontology is going to be replaced in
our applications by the Open Annotation
Model developed through the W3C Open
Annotation Community Group
– Website http://www.w3.org/community/openannotation/
– Core Model http://www.openannotation.org/spec/core/
– Extensions http://www.openannotation.org/spec/extension/
Paolo Ciccarese, PhD SemTechBiz 2012, June 4th 2012
23. • DOMEO Annotation Toolkit is a web
application for producing and sharing manual,
semi-automatic and automatic annotation
Ciccarese et al, 2012
Open semantic annotation of scientific publications using DOMEO
http://www.jbiomedsem.com/content/3/S1/S1
http://annotationframework.org
Paolo Ciccarese, PhD SemTechBiz 2012, June 4th 2012
25. Semantic Tags or Qualifiers [1]
Paolo Ciccarese, PhD SemTechBiz 2012, June 4th 2012
26. Semantic Tags or Qualifiers [2]
Paolo Ciccarese, PhD SemTechBiz 2012, June 4th 2012
27. Semantic Tags or Qualifiers [3]
Paolo Ciccarese, PhD SemTechBiz 2012, June 4th 2012
28. Domeo and the NCBO Annotator
http://www.bioontology.org/annotator-service
• Domeo allows automatic/manual annotation with
terms coming from selected ontologies managed by
the BioPortal
Paolo Ciccarese, PhD SemTechBiz 2012, June 4th 2012
29. Running NCBO Annotator
Additional text mining services
will be listed here
Paolo Ciccarese, PhD SemTechBiz 2012, June 4th 2012
30. NCBO Annotator Results in Domeo
List of recognized
entities
Paolo Ciccarese, PhD SemTechBiz 2012, June 4th 2012
31. Results Curation
Customizable
Paolo Ciccarese, PhD SemTechBiz 2012, June 4th 2012
32. Cumulative Results Curation
• One item only
• All instances with the same text match
• All instances independently from the text
match
Paolo Ciccarese, PhD SemTechBiz 2012, June 4th 2012
34. http://www.slideshare.net/paolociccarese/domeo-and-text-mining
UIMA, Clerezza and AO
Evaluating Performance
Comparing Algorithms
Learning
…
Text
Curated
Mining
Results
AO RDF Text
Mining
Results
Applications
AO RDF Publishing
Paolo Ciccarese, PhD SemTechBiz 2012, June 4th 2012
35. SemTechBiz 2012, San Francisco, June 4th 2012
Thank you!
Paolo Ciccarese, PhD
http://www.paolociccarese.info/
paolo.ciccarese@gmail.com
Mass General Hospital Harvard Medical School
Editor's Notes
The topic can be an antibody (NIF Antibody registry)