Your SlideShare is downloading. ×
Open Annotation (in Biomedicine)
Mass General Hospital Harvard Medical School
Annotation, Semantic Annotation and
Keeping ...
• How do we get the best
up to date knowledge to
the final users* preserving
the historical record?
• How do we involve
ex...
Salesman: Answer is simple
• By crowd-sourcing annotation
and semantic annotation
• Annotation
– intuitive and agile
– mic...
Scientist: Answer not that simple but
slowly things are getting better
• Growing interest in annotation
• Annotation is an...
Annotation in teaching: learning from the expertsGregNagy,professorof
ClassicsatHarvardUniversity
DirectoroftheHarvardCent...
Annotation Convergence Workshop 2013
• More than 100
participants from
Harvard (plus visitors)
• More than 25
annotation r...
Harvard Library Cloud
Harvard Libraries, how do we make them
discoverable and how do we integrate such a great
variety of ...
Filtered Push (Biodiversity)
There are 2-3 billions
specimens and it has been
estimated1 that no more than
3% have any dig...
Research Objects
StianSoiland-Reyes,Researcher,
UniversityofManchester,UK
Carole Goble full professor
School of Computer S...
Neuroscience Information Framework (NIF)
Professor in Residence,
Department of Neurosciences, UCSD
Co-Director, National C...
A (few?) years back…
Paolo Ciccarese, PhD DILS 2013
Data integration learned in College
• University of Pavia (Italy) mid/late-Nineties
• Software engineering: Databases inte...
Hypertensions databases integration
• Electronic Patient Records from several
institutions and departments
• Creating a no...
Data integration during my PhD
• University of Pavia (Italy) 2001-2004
• PhD in Bioengineering and Bioinformatics
• Eviden...
Hypothesis (EBM)
• If we deliver up to date computerized clinical
practice guidelines to the point of care
– We will provi...
CPG representation and enactment
Annotation of clinical guidelines
Paolo Ciccarese, PhD DILS 2013
After 12 years I still r...
The Guide Project* (1999-2004)
• Beyond Evidence Based clinical decision
support
– integrates a formalized model of the me...
The Guide Project (1999-2004)
• Integrated Clinical KnowledgeManagement
infrastructure through separation of concerns
(SoC...
Guide: lesson learned (1)
• Guidelines are semi-structured knowledge
that is hard to be formalized directly by
medical ope...
Guide: lesson learned (2)
• Knowledge extraction and encoding in a three
steps process
1. From paper to a list of recommen...
Guide: lesson learned (3)
• The architecture demonstrated to be robust and
scalable
– Datatypes, Terminologies, Contracts,...
Semantics at work… Protégé EON, Sage
• Frame-based logic with
Protégé for Knowledge
representation
– Clinical practice gui...
Growing Interest for Semantic
Technologies lead me to Boston
• Simile (2003-2006): Semantic Interoperability
of Metadata a...
Stefano Mazzocchi
Google Inc
David Huynh, PhD
Google Inc
Simile widgets
• Exhibit
• Timeline
• Timeplot
• Welkin and Vicin...
Piggy Bank
http://simile.mit.edu/wiki/Piggy_Bank
Paolo Ciccarese, PhD DILS 2013
Simile Potluck
http://simile.mit.edu/potluck/
Paolo Ciccarese, PhD DILS 2013
Simile Playground
• Combined most of the Simile technologies
• Data extraction, semantic integration,
annotation and publi...
Boston (Summer 2006)
Clinical Space-> Neurology Research
Paolo Ciccarese, PhD DILS 2013
SWAN (Semantic Web Applications in
Neuromedicine) (2004-2010)
• Developing cures for highly
complex diseasesrequires
exten...
A ‘structured’ view of a publication
classic publication
scientific discourse ‘semantic’ representation
http://tinyurl.com...
AlzSWAN Curation Process
Paolo Ciccarese, PhD DILS 2013
http://hypothesis.alzforum.org
AlzSwan: the SWAN-Alzheimer KB
http://hypothesis.alzforum.org/
http://hypothesis.alzforum.org
Paolo Ciccarese, PhD DILS 20...
Goldehypothesis
Paolo Ciccarese, PhD DILS 2013
A claim
Paolo Ciccarese, PhD DILS 2013
Paolo Ciccarese, PhD DILS 2013
Nature News: Literature mining: Speed reading (27 January 2010)
NaturePaolo Ciccarese, PhD DILS 2013
http://hypothesis.alzforum.org
SWAN in numbers (1.5 years)
• 2398 Research Statements
– 184 Hypothesis
• 60 deeply annotated
• 124 simply annotated
– 221...
SWAN, data integration and
interoperability
• RDF, Triple Store and SPARQL
• Integration of data from PubMed, UniProt,
PRO...
W3C HCLS Working Group Notes
Paolo Ciccarese, PhD DILS 2013
SWAN: lesson learned (1)
• Labor intensive + subjectivity + loss of context
(missed links back to the original content)
• ...
SWAN: lesson learned (2)
• Discourse elements can be further structured
(relationships provided value but text is not
acti...
Semantic Resources Project
• Antibodies
• Mouse Models
• Protein Ontology
extensions for APP
• Ontology Broker
(adding new...
… thinking of SWAN 2…
But wait a minute…
Unstructured Knowledge
Annotation
Structured Knowledge
Structured Knowledge
Annot...
Science is big
• As (biomedical) scientists we deal with an
increasing amount of digital/online resources:
publications, d...
Science is social
• We publish and participate to conferences in
order to contribute to and be part of science
• We belong...
Science is connected
CourtesyofTimClark
Paolo Ciccarese, PhD DILS 2013
… and with the new technologies
The Journal of Laryngology, Rhinology, and Otology
Volume 29 / Issue 10 / October 1914, pp...
Network of knowledge
How do we keep track of it?
Paolo Ciccarese, PhD DILS 2013
… we commonly use annotation
• We annotate prints,
HTML and PDFs
• We bookmark/tag web
pages…
• … and publications
(citati...
How is that working out for you?
• Can you integrate annotations?
• Can you leverage machine computation?
• Can you share ...
Annotation and Semantics
And Open!!!
A generic model and platform for
creating annotation and semantic
annotation on any o...
Annotation Ontology (AO) - 2009
• OWL vocabulary for representing and sharing
annotation of digital resources (text, image...
Annotation Ontology crowd
The Living Document
Project
Biotea
Paolo Ciccarese, PhD DILS 2013
Open Annotation Collaboration
• Focus on interoperability for annotations in
order to allow sharing of annotations across:...
Interoperability starts from people
• OA started with the reconciliation of
– Open Annotation Collaboration (OAC)
– Annota...
W3C Open Annotation Community Group
• 93 participants from around the world: 5th of
132 groups
Paolo Ciccarese, PhD DILS 2...
Open Annotation Model (Feb 2013)
http://www.openannotation.org/spec/core/
Paolo Ciccarese, PhD DILS 2013
Web Annotation Tool
• Domeo is a web application for producing and
sharingstand-off annotation
• Science and semantics lin...
Annotating while we are reading
Paolo Ciccarese, PhD DILS 2013
Manual and automatic annotation
URLIamannotating
Manualannotationtools
Automaticannotationtools
Exploration panels
Paolo C...
Manual annotation: notes/comments
Paolo Ciccarese, PhD DILS 2013
Semantic tagging
NCBO BioPortal
NIF Registry
Domeo can query external services and use as qualifiers anything that
has a u...
Semantic tagging
We could refer to historic figures, galaxies, places, events…
Paolo Ciccarese, PhD DILS 2013
Semantic Tag on text
Links to further readings
and additional resources
Annotation and Pop-up
Paolo Ciccarese, PhD DILS 20...
Image annotation
Paolo Ciccarese, PhD DILS 2013
Image annotation
By semantically tagging figures in a paper, I make them discoverable…
And we can integrate inference capa...
Defining permissions (annotation sets)
Paolo Ciccarese, PhD DILS 2013
Support for extensions: antibodies
Contributed to PubMedLinkOut through NIF (http://neuinfo.org)
Translates into a formal ...
Hypotheses management (v1)
Translates into a formal OWL/RDF representation (SWAN Ontology)
Possibility for integrating
Nan...
Hypotheses management (SWAN)
classic publication scientific discourse ‘semantic’ representation
Semantic Web Applications ...
Hypotheses management (SWAN)
graph representation
Paolo Ciccarese, PhD NFAIS Workshop 2013
Infinite possibilities
• Integration of Nanopubs, HyBrow, HyQue, BEL
• Capturing microdata and metadata
• Annotating video...
Text mining
Paolo Ciccarese, PhD DILS 2013
Reflect
http://reflect.ws/
Paolo Ciccarese, PhD DILS 2013
Domeo Text Mining Selection
Paolo Ciccarese, hD NFAIS Workshop 2013
Domeo can trigger external text mining services and tr...
Text Mining Results
Paolo Ciccarese, PhD DILS 2013
Text mining services comparison and improvement
Text Mining Results and social-curation
Paolo Ciccarese, PhD DILS 2013
Support for comments/discussions
Paolo Ciccarese, PhD DILS 2013
Domeo supports extraction pipelines
Paolo Ciccarese, PhD DILS 2013
Self Reference
Paolo Ciccarese, PhD DILS 2013
References
Paolo Ciccarese, PhD DILS 2013
References are annotations!
Paolo Ciccarese, PhD DILS 2013
Virtual bibliography
Paolo Ciccarese, PhD DILS 2013
Extend your reading
Paolo Ciccarese, PhD DILS 2013
Search example
Paolo Ciccarese, PhD DILS 2013
Serialization in AO/RDF working on OA
Paolo Ciccarese, PhD DILS 2013
Utopia for PDF
Paolo Ciccarese, PhD DILS 2013
http://getutopia.com
Integration through APIs (ex NIF)
PubMedLinkouts!!
Paolo Ciccarese, PhD DILS 2013
Stemcell
Paolo Ciccarese, PhD DILS 2013
http://http://www.stembook.org/
Stembook.org and Domeo
Paolo Ciccarese, PhD DILS 2013
Integration with Drupal 7 (Biblio module)
ThankstoStephaneCorlosquetDrupalCoredeveloepr
Paolo Ciccarese, PhD DILS 2013
In conclusion…
• Consider annotation as first class citizen for
your projects… annotation is a great
ubiquitous way to kee...
annotator.js (Text)
• Open Knowledge Foundation Project for text
annotation: easy to integrate and supports
extensions
Pao...
annotorious.js (Images)
• Image annotation: to add drawing and
commenting to images in web pages
Paolo Ciccarese, PhD DILS...
Shared Canvas (Manuscripts)
Paolo Ciccarese, PhD DILS 2013
www.shared-canvas.org/
MapHub (Maps)
• Maps annotation
Paolo Ciccarese, PhD DILS 2013
http://maphub.github.io/
Paolo Ciccarese, PhD DILS 2013
Keep annotating… and sharing!
Thank you
Paolo Ciccarese, PhD DILS 2013
Upcoming SlideShare
Loading in...5
×

Paolo ciccarese DILS 2013 keynote

1,161

Published on

Slides presented for the Keynote at DILS 2013 in Montreal, Canada

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,161
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • As it is done in class…. Journal clubs….
  • And talking about students the MOOCS are another amazing opportunity for annotation.
  • Hypertension study with 3-4 different databases to be ultimately cleaned up by hand. It was no fun at all.
  • Hypertension study with 3-4 different databases to be ultimately cleaned up by hand. It was no fun at all.
  • Before and during my PhD I’ve been then focusing on Evidence based decision support.We still have the problem of accessing patients data, but now we have also the problem of accessing organizational data and evidence-based guidelines/protocols.Normally every ward had a different database, most of them produced by small companies, very fragmented market. I saw XML as an easier way to convey knowledge. Problem is how the data are generated in first place.
  • clinical practice guideline, domain ontologies, a view of patient data (virtual medical record), and  other entities (e.g. those that define roles in an organization)17min
  • SIMILE sought to enhance inter-operability among digital assets, schemata/vocabularies/ontologies, metadata, and services. A key challenge it solved was to make collections interoperable which are distributed across individual, community, and institutional stores -- by drawing on the assets, schemata/vocabularies/ontologies, and metadata held in such stores.MIT Libraries and MIT CSAIL (founding partners also included HP Laboratories and the World Wide Web Consortium) with support from the Andrew W. Mellon Foundation.So, what's the difference? Wikipedia says "Interoperability: the capability of different programs to exchange data via a common set of business procedures, and to read and write the same file formats and use the same protocols" and "Integration allows data from one device or software to be read or manipulated by another, resulting in ease of use." Yuck, those aren't much help.To me, interoperability means that two (or more) systems work together unchanged even though they weren't necessarily designed to work together. Integration means that you've written some custom code to connect two (or more) systems together. So integrating two systems which are already interoperable is trivial; you just configure them to know about each other. Integrating non-interoperable systems takes more work.The beauty of interoperability is that two systems developed completely independently can still work together. Magic? No, standards (or at least specifications, open or otherwise); see Open Standards in Everyday Life. Consider a Web services consumer that wants to invoke a particular WSDL, and a provider that implements the same WSDL; they'll work together, even if they were implemented independently. Why? Because they agree on the same WSDL (which may have come from a third party) and a protocol (such as SOAP over HTTP) discovered in the binding. How does the consumer discover the provider? Some registry, perhaps one that implements UDDI (which sucks, BTW). So SOAP, HTTP, WSDL, UDDI--all that good WS-I stuff--make Web services interoperable.Another example I like is the "X/Open Distributed Transaction Processing (DTP) model" (aka the XA spec); see "Configuring and using XA distributed transactions in WebSphere Studio." With it, a transaction manager by one vendor can use resource managers by other vendors. Even though they weren't all written for each other, they still work together because they follow the same spec. They're interoperable.Now consider two systems that weren't designed to be interoperable, or perhaps interoperable but with different specs. This requires integration. The integration code--could be Java, Message Broker, etc.; I co-authored a whole book on this--takes the interface one system expects and converts it to the one the other system provides. This is why WPS has stuff like Interface Maps and Business Object Maps.So, you want interoperable systems; integrating them is simple. Otherwise, you have to integrate them yourself.
  • 26mins
  • Developing cures for highly complex diseases, such as neurodegenerative disorders, requires extensive interdisciplinary collaboration and exchange of biomedical information in context. Our ability to exchange such information across sub-specialties today is limited by the current scientific knowledge ecosystem’s inability to properly contextualize and integrate data and discourse in machine-interpretable form. This inherently limits the productivity of research and the progress toward cures for devastating diseases such as Alzheimer’s and Parkinson’s. The SWAN (Semantic Web Applications in Neuromedicine) ontology is an ontology for modeling scientific discourse and has been developed in the context of building a series of applications for biomedical researchers, as well as extensive discussions and collaborations with the larger bio-ontologies community. This document describes the SWAN ontology of scientific discourse.
  • http://www.nature.com/news/2010/100127/full/463416a.html
  • But no scientist is an island, we know we cannot scale very well so we normally organized ourselves in groups
  • Scientists are connected and science isCredits http://www.tnca.org/2012/08/30/for-immediate-release-secretary-of-state-has-authority-to-stop-certification-of-election-and-should-use-it/
  • People are connected and so is science
  • For a resource we recognize we can FIND many other connected ones. FIND because most of the times these links are not there.We SPEND TIME searching and putting the network together and how do we keep track of it?
  • 26 mins + 12 mins = 38 mins31mins
  • Transcript of "Paolo ciccarese DILS 2013 keynote"

    1. 1. Open Annotation (in Biomedicine) Mass General Hospital Harvard Medical School Annotation, Semantic Annotation and Keeping the right crowd in the loop Paolo Ciccarese, PhD @paolociccarese
    2. 2. • How do we get the best up to date knowledge to the final users* preserving the historical record? • How do we involve experts in the knowledge creation/extraction process? Research Questions Paolo Ciccarese, PhD DILS 2013 * healthcare providers, researchers, scientists, scholars, librarians, students…
    3. 3. Salesman: Answer is simple • By crowd-sourcing annotation and semantic annotation • Annotation – intuitive and agile – micro data integration – traceable – large scale – unstructured/structured – manual/automatic/semi-automatic – supports disagreement – personal/groups/public – velocity and fast turn – … Paolo Ciccarese, PhD DILS 2013
    4. 4. Scientist: Answer not that simple but slowly things are getting better • Growing interest in annotation • Annotation is an important tool to be combined with other methods • It nicely allows to keep knowledgeable human agents in the loop • Still lots of research to be done but we have a standard and tools are improving fast • Right time to annotate!!! Paolo Ciccarese, PhD DILS 2013
    5. 5. Annotation in teaching: learning from the expertsGregNagy,professorof ClassicsatHarvardUniversity DirectoroftheHarvardCenter forHellenicStudiesinWashingtonDC GaryKing,ProfessorofGovernment DirectorfortheInstitutefor QuantitativeSocialScience atHarvardUniversity http://www.annotations.harvard.edu/ Paolo Ciccarese, PhD DILS 2013 MOOCs, edX, HarvardX, MITX
    6. 6. Annotation Convergence Workshop 2013 • More than 100 participants from Harvard (plus visitors) • More than 25 annotation related presentations • Morning session videos are online http://www.annotations.harvard.edu/ Paolo Ciccarese, PhD DILS 2013 Big interest from libraries
    7. 7. Harvard Library Cloud Harvard Libraries, how do we make them discoverable and how do we integrate such a great variety of resources. Data integration gets more value out of existing records. David Weinberger, Writer, Senior researcher at the Berkman Center and co-director of the Harvard Library Innovation Lab. There is only so much you can do at the record level. When you have scholars and students… they are doing the work of discovering the relationships between the parts. Annotation is the platform http://www.librarycloud.org/ Paolo Ciccarese, PhD DILS 2013
    8. 8. Filtered Push (Biodiversity) There are 2-3 billions specimens and it has been estimated1 that no more than 3% have any digital record Emeritus Professor University of Massachusetts Boston IT Research Staff Harvard University Herbaria 1. ARTURO H.ARIÑO, APPROACHES TO ESTIMATING THE UNIVERSE OF NATURAL HISTORY COLLECTIONS DATA; Biodiversity Informatics, 7, 2010, pp. 81 – 92 ; 2. Nelson et al. Five task clusters that enable efficient and effective digitization of biological collections, ZooKeys 209: 19–45, doi: 10.3897/zookeys.209.3135 2 BobMorris http://wiki.filteredpush.org/ Paolo Ciccarese, PhD DILS 2013
    9. 9. Research Objects StianSoiland-Reyes,Researcher, UniversityofManchester,UK Carole Goble full professor School of Computer Science University of Manchester, UK How can we record research for anticipated but also unanticipated re-use? http://wiki.myexperiment.org/index.php/Research_Objects Paolo Ciccarese, PhD DILS 2013
    10. 10. Neuroscience Information Framework (NIF) Professor in Residence, Department of Neurosciences, UCSD Co-Director, National Center for Microscopy and Imaging Research (NCMIR) MaryannMartone,PhDhttp://neuinfo.org A dynamic inventory of Web-based neuroscience resources: data, materials, and tools accessible via anycomputer connected to theInternet. Annotation can be used to link scientific literature with the NIF resources such as antibodies and animal strains and mutants Paolo Ciccarese, PhD DILS 2013
    11. 11. A (few?) years back… Paolo Ciccarese, PhD DILS 2013
    12. 12. Data integration learned in College • University of Pavia (Italy) mid/late-Nineties • Software engineering: Databases integration Paolo Ciccarese, PhD DILS 2013 Knowledge
    13. 13. Hypertensions databases integration • Electronic Patient Records from several institutions and departments • Creating a normalized database for analysis of patient data • ‘Classic’ integration issues – Columns nature – Formats (names, dates and unit of measures) – Unstructured content – Social interactions (assisted annotation of records) • Tacit  Explicit knowledge/semantics Annotation of patient records Paolo Ciccarese, PhD DILS 2013 After 15 years I still get at least an email a month on this topic
    14. 14. Data integration during my PhD • University of Pavia (Italy) 2001-2004 • PhD in Bioengineering and Bioinformatics • Evidence Based Clinical Decision Support Paolo Ciccarese, PhD DILS 2013 Knowledge
    15. 15. Hypothesis (EBM) • If we deliver up to date computerized clinical practice guidelines to the point of care – We will provide decision support reducing errors, malpractice and costs – We will improve the quality of care by leveraging the best scientific evidence – We will be able to collect structured data for updating the guidelines speeding up the guidelines creation/dissemination process. Paolo Ciccarese, PhD DILS 2013
    16. 16. CPG representation and enactment Annotation of clinical guidelines Paolo Ciccarese, PhD DILS 2013 After 12 years I still review ‘innovative’ papers on the topic
    17. 17. The Guide Project* (1999-2004) • Beyond Evidence Based clinical decision support – integrates a formalized model of the medical knowledge expressed in clinical guidelines and protocols with both WorkFlow Management Systems and Electronic Patient Record technologies *Guide on OpenClinical: http://www.openclinical.org/gmm_guide.html P Ciccarese, E Caffi, S Quaglini, M Stefanelli Architectures and tools for innovative health information systems: the Guide Project International journal of medical informatics 74 (7-8), 553-562, 2005 Paolo Ciccarese, PhD DILS 2013
    18. 18. The Guide Project (1999-2004) • Integrated Clinical KnowledgeManagement infrastructure through separation of concerns (SoC) Integration: -Datatypes system - Terminologies - Contracts (XML) - Web Services (WSDL) -Social interaction Paolo Ciccarese, PhD DILS 2013
    19. 19. Guide: lesson learned (1) • Guidelines are semi-structured knowledge that is hard to be formalized directly by medical operators or knowledge engineers alone (we needed both) • Interaction between health care providers and knowledge engineers causes behavioral modifications for both • Annotation was a big part of the process and it made feel the physicians in control Paolo Ciccarese, PhD DILS 2013
    20. 20. Guide: lesson learned (2) • Knowledge extraction and encoding in a three steps process 1. From paper to a list of recommendations (possibly using markup/annotation tools?) 2. From the recommendations to a flow-chart like model where all the entities (agents, patients variables, drugs) were explicit (< semantics) 3. From the flow-chart like model to a formal model Paolo Ciccarese, PhD DILS 2013
    21. 21. Guide: lesson learned (3) • The architecture demonstrated to be robust and scalable – Datatypes, Terminologies, Contracts, Web Services and XML were good for components to communicate • But the semantics was still not completely explicit – XML not ideal to represent knowledge and graphs – Data integration was relying on tacit knowledge – Low quality of patient data in the EPRs • How about ontologies… and RDF? Paolo Ciccarese, PhD DILS 2013 Prof. Barry Smith
    22. 22. Semantics at work… Protégé EON, Sage • Frame-based logic with Protégé for Knowledge representation – Clinical practice guidelines – Domain ontologies – Virtual medical record – Organizational entities Samson Tu Stanford University Prof. Mark Musen Stanford University http://www.openclinical.org/gmm_eon.html http://www.openclinical.org/gmm_sage.html Paolo Ciccarese, PhD DILS 2013
    23. 23. Growing Interest for Semantic Technologies lead me to Boston • Simile (2003-2006): Semantic Interoperability of Metadata and Information in unLike Environments – to enhance inter-operability among digital assets, schemata/vocabularies/ontologies, metadata, and services. • PIs: Eric Miller (Zephira), David Karger (MIT) and McKenzie Smith (UC Davis) Paolo Ciccarese, PhD DILS 2013
    24. 24. Stefano Mazzocchi Google Inc David Huynh, PhD Google Inc Simile widgets • Exhibit • Timeline • Timeplot • Welkin and Vicino • Piggy Bank • Potluck • Playgroud Paolo Ciccarese, PhD DILS 2013
    25. 25. Piggy Bank http://simile.mit.edu/wiki/Piggy_Bank Paolo Ciccarese, PhD DILS 2013
    26. 26. Simile Potluck http://simile.mit.edu/potluck/ Paolo Ciccarese, PhD DILS 2013
    27. 27. Simile Playground • Combined most of the Simile technologies • Data extraction, semantic integration, annotation and publishing in the same platform… in the browser!!! http://simile.mit.edu/wiki/Playground Paolo Ciccarese, PhD DILS 2013
    28. 28. Boston (Summer 2006) Clinical Space-> Neurology Research Paolo Ciccarese, PhD DILS 2013
    29. 29. SWAN (Semantic Web Applications in Neuromedicine) (2004-2010) • Developing cures for highly complex diseasesrequires extensive interdisciplinary collaboration and exchange of biomedical information in context. • Our ability to exchange such information across sub- specialties today is limited by the current scientific knowledge ecosystem’s inability to properly contextualize and integrate data and discourse in machine-interpretable form. June Kinoshita Tim Clark Director of MIND Informatics Mass General Hospital Paolo Ciccarese, PhD DILS 2013
    30. 30. A ‘structured’ view of a publication classic publication scientific discourse ‘semantic’ representation http://tinyurl.com/cgyna2m Semantic Web Applications in Neuromedicine (SWAN) project [2007] Paolo Ciccarese, PhD DILS 2013 Annotation of scientific papers
    31. 31. AlzSWAN Curation Process Paolo Ciccarese, PhD DILS 2013 http://hypothesis.alzforum.org
    32. 32. AlzSwan: the SWAN-Alzheimer KB http://hypothesis.alzforum.org/ http://hypothesis.alzforum.org Paolo Ciccarese, PhD DILS 2013
    33. 33. Goldehypothesis Paolo Ciccarese, PhD DILS 2013
    34. 34. A claim Paolo Ciccarese, PhD DILS 2013
    35. 35. Paolo Ciccarese, PhD DILS 2013 Nature News: Literature mining: Speed reading (27 January 2010)
    36. 36. NaturePaolo Ciccarese, PhD DILS 2013 http://hypothesis.alzforum.org
    37. 37. SWAN in numbers (1.5 years) • 2398 Research Statements – 184 Hypothesis • 60 deeply annotated • 124 simply annotated – 2214 Claims • 61 Research Questions • 48 Comments • 2825 Journal Articles Paolo Ciccarese, PhD DILS 2013 Less papers than those published in a week on the topic
    38. 38. SWAN, data integration and interoperability • RDF, Triple Store and SPARQL • Integration of data from PubMed, UniProt, PRO, GO, data repositories • Ontologies (OWL DL) – SWAN (Scientific Discourse) – PAV (Provenance Authoring and Versioning) – CO (Collections) • ≈ Linked Data Paolo Ciccarese, PhD DILS 2013 PROV Nanopublications Elsevier Satellite Research Objects …
    39. 39. W3C HCLS Working Group Notes Paolo Ciccarese, PhD DILS 2013
    40. 40. SWAN: lesson learned (1) • Labor intensive + subjectivity + loss of context (missed links back to the original content) • Full article representation not attractive, scientists want to ‘formalize’ only what is interesting for them at that very moment (during their normal activities) • Form based approach not efficient (too many copy and paste involved) Paolo Ciccarese, PhD DILS 2013
    41. 41. SWAN: lesson learned (2) • Discourse elements can be further structured (relationships provided value but text is not actionable) – see nanopublications, HyBrow, HyQue, BEL • Integration with external sources not trivial (normalized models)… and we needed more! Paolo Ciccarese, PhD DILS 2013
    42. 42. Semantic Resources Project • Antibodies • Mouse Models • Protein Ontology extensions for APP • Ontology Broker (adding new temporary terms to the ontologies during the activities) AlanRuttenbergJonathanReeshttp://neurocommons.org/page/Semantic_resources_project Paolo Ciccarese, PhD DILS 2013 Timothy Danford
    43. 43. … thinking of SWAN 2… But wait a minute… Unstructured Knowledge Annotation Structured Knowledge Structured Knowledge Annotation Better Structured Knowledge Paolo Ciccarese, PhD DILS 2013 How can we build SWAN, Guide and, at the same time be helpful to a larger crowd?
    44. 44. Science is big • As (biomedical) scientists we deal with an increasing amount of digital/online resources: publications, dataset/databases, big data, reports, grants, images, videos, guidelines, protocols, vocabularies, linked data, software.. • Journal publications are still the peak of the iceberg (bottleneck?) of science: • About 150-250 articles a week • 10mins/article ≈ 34 hours/week Paolo Ciccarese, PhD DILS 2013
    45. 45. Science is social • We publish and participate to conferences in order to contribute to and be part of science • We belong to formal/informal and vertical/horizontal scientific communities • We communicate with colleagues via emails, voice, video; we broadcast to colleagues through publications, blogs, screencasts, twitter, social networks… • We build on each other’s work! Paolo Ciccarese, PhD DILS 2013
    46. 46. Science is connected CourtesyofTimClark Paolo Ciccarese, PhD DILS 2013
    47. 47. … and with the new technologies The Journal of Laryngology, Rhinology, and Otology Volume 29 / Issue 10 / October 1914, pp 500-510 Better access and links Paolo Ciccarese, PhD DILS 2013
    48. 48. Network of knowledge How do we keep track of it? Paolo Ciccarese, PhD DILS 2013
    49. 49. … we commonly use annotation • We annotate prints, HTML and PDFs • We bookmark/tag web pages… • … and publications (citations/references) • We comment on web pages, blogs, forums and emails • youtube, vimeo, flickrslideshare,twitter… Paolo Ciccarese, PhD DILS 2013
    50. 50. How is that working out for you? • Can you integrate annotations? • Can you leverage machine computation? • Can you share it easily with your colleagues? • Can you capitalize on the work of colleagues? • Can you easily discover valuable resources? • Can you integrate it with other resources? • Can you detect the up-to-date science? • … Paolo Ciccarese, PhD DILS 2013
    51. 51. Annotation and Semantics And Open!!! A generic model and platform for creating annotation and semantic annotation on any online content Paolo Ciccarese, PhD DILS 2013
    52. 52. Annotation Ontology (AO) - 2009 • OWL vocabulary for representing and sharing annotation of digital resources (text, images, audio, video, …) and their fragments in RDF format • Focus on biomedicine and sciences. But desire to make the AO framework more broadly usable. Ciccarese et al, 2011 An open annotation ontology for science on web 3.0 J Biomed Semantics 2011, 2(Suppl 2):S4 (17 May 2011) Paolo Ciccarese, PhD DILS 2013
    53. 53. Annotation Ontology crowd The Living Document Project Biotea Paolo Ciccarese, PhD DILS 2013
    54. 54. Open Annotation Collaboration • Focus on interoperability for annotations in order to allow sharing of annotations across: – Annotation clients; – Content collections; – Services that leverage annotations. • Focus on annotation for scholarly purposes. But desire to make the OAC framework more broadly usable. http://openannotation.org/ Paolo Ciccarese, PhD DILS 2013
    55. 55. Interoperability starts from people • OA started with the reconciliation of – Open Annotation Collaboration (OAC) – Annotation Ontology (AO) Paolo Ciccarese, PhD DILS 2013
    56. 56. W3C Open Annotation Community Group • 93 participants from around the world: 5th of 132 groups Paolo Ciccarese, PhD DILS 2013 http://www.w3.org/community/openannotation/
    57. 57. Open Annotation Model (Feb 2013) http://www.openannotation.org/spec/core/ Paolo Ciccarese, PhD DILS 2013
    58. 58. Web Annotation Tool • Domeo is a web application for producing and sharingstand-off annotation • Science and semantics linked in a few clicks • Domeo is open source and designed as an open system… we are working to make it easier to customize. – http://annotationframework.org – https://twitter.com/DomeoTool Paolo Ciccarese, PhD DILS 2013
    59. 59. Annotating while we are reading Paolo Ciccarese, PhD DILS 2013
    60. 60. Manual and automatic annotation URLIamannotating Manualannotationtools Automaticannotationtools Exploration panels Paolo Ciccarese, PhD DILS 2013
    61. 61. Manual annotation: notes/comments Paolo Ciccarese, PhD DILS 2013
    62. 62. Semantic tagging NCBO BioPortal NIF Registry Domeo can query external services and use as qualifiers anything that has a unique identifier. Paolo Ciccarese, PhD DILS 2013
    63. 63. Semantic tagging We could refer to historic figures, galaxies, places, events… Paolo Ciccarese, PhD DILS 2013
    64. 64. Semantic Tag on text Links to further readings and additional resources Annotation and Pop-up Paolo Ciccarese, PhD DILS 2013
    65. 65. Image annotation Paolo Ciccarese, PhD DILS 2013
    66. 66. Image annotation By semantically tagging figures in a paper, I make them discoverable… And we can integrate inference capabilities Paolo Ciccarese, PhD DILS 2013
    67. 67. Defining permissions (annotation sets) Paolo Ciccarese, PhD DILS 2013
    68. 68. Support for extensions: antibodies Contributed to PubMedLinkOut through NIF (http://neuinfo.org) Translates into a formal OWL/RDF representation Antibodyregistry.org Paolo Ciccarese, PhD DILS 2013
    69. 69. Hypotheses management (v1) Translates into a formal OWL/RDF representation (SWAN Ontology) Possibility for integrating Nanopublications and BEL Data as evidence Paolo Ciccarese, PhD DILS 2013
    70. 70. Hypotheses management (SWAN) classic publication scientific discourse ‘semantic’ representation Semantic Web Applications in Neuromedicine (SWAN) project [2007] Paolo Ciccarese, PhD DILS 2013
    71. 71. Hypotheses management (SWAN) graph representation Paolo Ciccarese, PhD NFAIS Workshop 2013
    72. 72. Infinite possibilities • Integration of Nanopubs, HyBrow, HyQue, BEL • Capturing microdata and metadata • Annotating videos, audios, 3D models, database records • Plug-ins for: Clinical guidelines, Clinical trials, Drug-drug interaction, Protocols, Databases curation • Legislation, Astronomy, Humanities • … Paolo Ciccarese, PhD DILS 2013
    73. 73. Text mining Paolo Ciccarese, PhD DILS 2013
    74. 74. Reflect http://reflect.ws/ Paolo Ciccarese, PhD DILS 2013
    75. 75. Domeo Text Mining Selection Paolo Ciccarese, hD NFAIS Workshop 2013 Domeo can trigger external text mining services and transform the results into annotation (that can be annotated) - NCBO Annotator, NIF Annotator, Textpresso, UMIA based algorithms Many other possibilities - SADI services - WhatIzIt - DBPedia Spotlight Paolo Ciccarese, PhD DILS 2013
    76. 76. Text Mining Results Paolo Ciccarese, PhD DILS 2013
    77. 77. Text mining services comparison and improvement Text Mining Results and social-curation Paolo Ciccarese, PhD DILS 2013
    78. 78. Support for comments/discussions Paolo Ciccarese, PhD DILS 2013
    79. 79. Domeo supports extraction pipelines Paolo Ciccarese, PhD DILS 2013
    80. 80. Self Reference Paolo Ciccarese, PhD DILS 2013
    81. 81. References Paolo Ciccarese, PhD DILS 2013
    82. 82. References are annotations! Paolo Ciccarese, PhD DILS 2013
    83. 83. Virtual bibliography Paolo Ciccarese, PhD DILS 2013
    84. 84. Extend your reading Paolo Ciccarese, PhD DILS 2013
    85. 85. Search example Paolo Ciccarese, PhD DILS 2013
    86. 86. Serialization in AO/RDF working on OA Paolo Ciccarese, PhD DILS 2013
    87. 87. Utopia for PDF Paolo Ciccarese, PhD DILS 2013 http://getutopia.com
    88. 88. Integration through APIs (ex NIF) PubMedLinkouts!! Paolo Ciccarese, PhD DILS 2013
    89. 89. Stemcell Paolo Ciccarese, PhD DILS 2013 http://http://www.stembook.org/
    90. 90. Stembook.org and Domeo Paolo Ciccarese, PhD DILS 2013
    91. 91. Integration with Drupal 7 (Biblio module) ThankstoStephaneCorlosquetDrupalCoredeveloepr Paolo Ciccarese, PhD DILS 2013
    92. 92. In conclusion… • Consider annotation as first class citizen for your projects… annotation is a great ubiquitous way to keep the crowd in the loop • Consider using the Open Annotation Model and joining the community… we can help! • Domeo is a complete playground/framework for creating and sharing semantic annotation • There are lots of other open source tools… Paolo Ciccarese, PhD DILS 2013
    93. 93. annotator.js (Text) • Open Knowledge Foundation Project for text annotation: easy to integrate and supports extensions Paolo Ciccarese, PhD DILS 2013 http://okfnlabs.org/annotator/
    94. 94. annotorious.js (Images) • Image annotation: to add drawing and commenting to images in web pages Paolo Ciccarese, PhD DILS 2013 http://annotorious.github.io/
    95. 95. Shared Canvas (Manuscripts) Paolo Ciccarese, PhD DILS 2013 www.shared-canvas.org/
    96. 96. MapHub (Maps) • Maps annotation Paolo Ciccarese, PhD DILS 2013 http://maphub.github.io/
    97. 97. Paolo Ciccarese, PhD DILS 2013
    98. 98. Keep annotating… and sharing! Thank you Paolo Ciccarese, PhD DILS 2013

    ×