Dh2012 enriching digital libraries contents with pundit system

  • 568 views
Uploaded on

 

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
568
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
8
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. ENRICHING DIGITAL LIBRARIES CONTENTS WITH SEMLIB SEMANTIC ANNOTATION SYSTEM (PUNDIT!) Michele Nucci, Marco Grassi, Christian Morbidoni and Francesco Piazza Semedia (Semantic Web and Multimedia) http://semedia.dii.univpm.it DII - Department of Information Engineering. Polytechnic University of Le Marche, Ancona, ItalyTuesday, July 24, 2012
  • 2. DIGITAL EVOLUTION • Most of the resources of interest for the Humanities: • in digital format (digitized or born digital) • available on the Web • Information is multiplying faster and faster: • classification and management increasingly complex task • well structured metadata a key requirement • Semantic Web technologies in Digital Libraries • Publish DL content as Linked Data • define ontologies or vocabularies for metadata encoding (Europeana Data Model, OAI-ORE…) Enriching digital libraries contents with Pundit m.grassi@univpm.itTuesday, July 24, 2012
  • 3. THE WEB SCENARIO • Web (> 2.0) has become more social and interactive • Annotation of Web content is beneficial: • More engaging and productive user experience • Exploit social engagement to improve resource ranking, classification • Annotating web content has become a common task • Comments and tags are widely supported by mainstream application • Facebook pictures tags, Flickrs pictures comments, etc ... • Many tools to bookmark, highlight, comment web page fragments • E.g. sharedcopy.com, annotateit.org, diigo.com, • Some tools support collaborative annotations Enriching digital libraries contents with Pundit m.grassi@univpm.itTuesday, July 24, 2012
  • 4. DL SCENARIO • Digital Libraries (DL) are no longer simple “expositions” of digital objects but provide users with more interaction Experts Create Contents Add Content Add Annotations Experts on Digital Library cti Consume Commenting Contents ra Tagging Linking te Create Contents Consume Expert model Contents rI n Digital Library se Experts U Consume Commenting Users Contents Crowdsourcing Tagging Linking Consume Contents Create Contents Digital Library Users Consume Contents Social Engagement Users • Crowdsourcing experiments for enriching DL, curating contents or uploading digital material of interest for the DL (BBC WW2 People’s War, …) Enriching digital libraries contents with Pundit m.grassi@univpm.itTuesday, July 24, 2012
  • 5. SEMANTICALLY STRUCTURED ANNOTATIONS • ... so what’s missing? • Most of existing annotation tools are usually limited to simple textual tags and comments. • limitation due to the ambiguity of natural language (“orange” a fruit or a color?) • their semantic is not machine interpretable • Semantically structured annotations to make smart use of such added knowledge: • Unambiguously express semantics to be processed by software agents (e.g. annotations can be harvested and used by recommender systems, search engines, etc.) • Power Digital Libraries (improving browsing, search, automatic content classification, ...) • Reuse such a collaborative knowledge in different contexts and different applications Enriching digital libraries contents with Pundit m.grassi@univpm.itTuesday, July 24, 2012
  • 6. SEMANTICALLY STRUCTURED ANNOTATIONS Users to create knowledge graphs where web content fragments, concepts and entities are meaningfully connected. Enriching digital libraries contents with Pundit m.grassi@univpm.itTuesday, July 24, 2012
  • 7. SEMANTICALLY STRUCTURED ANNOTATIONS • Rely on controlled vocabularies and ontologies • share the same terminology and “talk about the same things” • annotations can be meaningfully mashed-up • Link to the emerging Web of Data • a software can automatically get additional, useful semantic data (e.g. date and place of birth, pictures, citations, multi-language data) Augmenting the information of the original annotation content to support smarter application Ex. We have discovered that the two images contain american film actors showing anger emotion! Enriching digital libraries contents with Pundit m.grassi@univpm.itTuesday, July 24, 2012
  • 8. • Pundit is a novel semantic annotation tool: Semedia (Semantic Web and Multimedia) http://semedia.dii.univpm.it • developed by: with the collaboration of NET7 Semlib Project Eu Project • funded by: http://semedia.dii.univpm.it • supported and further developed in: DM2E EU Project AGORA EU Project http://dm2e.edu/ http://project-agora.eu/ Enriching digital libraries contents with Pundit m.grassi@univpm.itTuesday, July 24, 2012
  • 9. SEMLIB PROJECT Semlib Project Semantic Web Tools for DL http://www.semlibproject.eu/ • R&D project supported by EU FP7 Theme: Research for SMEs (no. FP7-SME -2010-01- 262301 - SEMLIB) • 24 months (commenced in January 2011, currently at month 19) www.semedia.dii.univpm.it/ www.deri.ie/ www.in-two.com www.liberologico.com/ www.knowledgehives.com/ www.netseven.it/ Enriching digital libraries contents with Pundit m.grassi@univpm.itTuesday, July 24, 2012
  • 10. ANNOTATION MODEL • Based on Open Annotation Collaboration (OAC) ontology* Contextual Information Enriching digital libraries contents with Pundit m.grassi@univpm.itTuesday, July 24, 2012
  • 11. ANNOTATION MODEL • Based on Open Annotation Collaboration (OAC) ontology* Contextual Information Annotation Content Enriching digital libraries contents with Pundit m.grassi@univpm.itTuesday, July 24, 2012
  • 12. ANNOTATION MODEL • Based on Open Annotation Collaboration (OAC) ontology* Semantically Structured Content Contextual Information Annotation Content Enriching digital libraries contents with Pundit m.grassi@univpm.itTuesday, July 24, 2012
  • 13. ANNOTATION MODEL • Based on Open Annotation Collaboration (OAC) ontology* SPARQL support to query slices of knowledge Named Graph Contextual Information Annotation Content Enriching digital libraries contents with Pundit m.grassi@univpm.itTuesday, July 24, 2012
  • 14. NOTEBOOKS • Annotations are collected in notebooks 2011-01-27 10:30:56 • Provide users with the capability to dcterms:creator organize their annotations • users has a default notebook My Example Notebook dcterms:created rdfs:label • can create more An Example Notebook used to show the model rdfs:comment • Put together annotations so that they NotebookURI can be retrieved and queried • Different UNIX style read/write privileges (from private to completely public) • Identified by a URI Enriching digital libraries contents with Pundit m.grassi@univpm.itTuesday, July 24, 2012
  • 15. NOTEBOOKS • Notebooks allow annotations sharing 2011-01-27 10:30:56 dcterms:creator E SINGLE USER R HA My Example Notebook dcterms:created S RI kU rdfs:label oo teb An Example Notebook No used to show the model WIKI SHARE rdfs:comment NotebookURI NotebookURI SH COMMUNITIES AR No E te bo ok U RI PUBLIC • Sharing a notebook is as easy as sharing its URL on the web (similarly to popular file sharing platforms) Enriching digital libraries contents with Pundit m.grassi@univpm.itTuesday, July 24, 2012
  • 16. USER AUTHENTICATION • Authentication is based on OpenID: • No need to store user’s credentials • Implemented already by mainstream company (Google, Yahoo, ...) • Possibly avoid user multiple registration (waste of time, another password) • Single identity can be used among different Pundit-enabled Digital Libraries • Adding an OpenID provider is easy and transparent to the Pundit server. Enriching digital libraries contents with Pundit m.grassi@univpm.itTuesday, July 24, 2012
  • 17. ANNOTATION SHARING SCENARIO Create structured annotations Annotation Client Annotation Client Annotation Client structured annotations structured annotations Annotation Authoring API Annotation Server Annotation Consuming API Enriching digital libraries contents with Pundit m.grassi@univpm.itTuesday, July 24, 2012
  • 18. ANNOTATION SHARING SCENARIO Create structured annotations Annotation Client Annotation Client Annotation Client structured annotations structured annotations Store them into Annotation a unique Authoring API knowledge base Annotation Server Annotation COLLECTIVE KB Consuming API Enriching digital libraries contents with Pundit m.grassi@univpm.itTuesday, July 24, 2012
  • 19. ANNOTATION SHARING SCENARIO Create structured annotations Annotation Client Annotation Client Annotation Client structured annotations structured annotations Store them into Annotation a unique Authoring API knowledge base Annotation Server Annotation COLLECTIVE KB Consuming API Annotation Annotation Annotation Client Client Client ...whose slices can be accessed not only by their creator... Enriching digital libraries contents with Pundit m.grassi@univpm.itTuesday, July 24, 2012
  • 20. ANNOTATION SHARING SCENARIO Create structured annotations Annotation Client Annotation Client Annotation Client structured annotations structured annotations Store them into Annotation a unique Authoring API knowledge base Annotation Server Annotation COLLECTIVE KB Consuming API Third Party Application Annotation Annotation Annotation Annotation Client Client Client Client ...whose slices can be accessed not ...but also by other users and only by their creator... third party applications! Enriching digital libraries contents with Pundit m.grassi@univpm.itTuesday, July 24, 2012
  • 21. ANNOTATION SHARING SCENARIO Create structured DL administrator can select annotations Annotation Client Annotation Client Annotation Client annotations and publish back as trusted annotations to structured annotations structured annotations enrich DL content Store them into Annotation trusted/ufficial a unique Authoring API annotations knowledge base Annotation Annotation Server Client Annotation selected COLLECTIVE KB Consuming API annotations Third Party Application Annotation Annotation Annotation Annotation Client Client Client Client ...whose slices can be accessed not ...but also by other users and only by their creator... third party applications! Enriching digital libraries contents with Pundit m.grassi@univpm.itTuesday, July 24, 2012
  • 22. NAMED CONTENT • DLs change over time <div class="pundit-content" about="http://example.org/contents/123"> • Presentation can restyled and content can be <!-- HTML goes here. --> re-organized <p>This is a named content and contains both text and a picture</p> <img src="http://example.org/pictires/pictire123.png" /> • Same content in different pages <p><em>Caption:</em> this is a caption.</p> </div> • Some part of the page should not be annotated (menu, ...) • Specific markup can be added in the pages to allows Pundit: • identifying atomic pieces of content (by means of URI) • attaching the annotations to such contents • avoid the annotation of page accessory component Enriching digital libraries contents with Pundit m.grassi@univpm.itTuesday, July 24, 2012
  • 23. NAMED CONTENT • DLs change over time <div class="pundit-content" about="http://example.org/contents/123"> • Presentation can restyled and content can be <!-- HTML goes here. --> re-organized <p>This is a named content and contains both text and a picture</p> <img src="http://example.org/pictires/pictire123.png" /> • Same content in different pages <p><em>Caption:</em> this is a caption.</p> </div> • Some part of the page should not be annotated (menu, ...) • Specific markup can be added in the pages to allows Pundit: • identifying atomic pieces of content (by means of URI) • attaching the annotations to such contents • avoid the annotation of page accessory component Enriching digital libraries contents with Pundit m.grassi@univpm.itTuesday, July 24, 2012
  • 24. NAMED CONTENT Text The same content in different pages shows the same annotations! Enriching digital libraries contents with Pundit m.grassi@univpm.itTuesday, July 24, 2012
  • 25. NAMED CONTENT Text The same content in different pages shows the same annotations! Enriching digital libraries contents with Pundit m.grassi@univpm.itTuesday, July 24, 2012
  • 26. PUNDIT ARCHITECTURE CLIENT • Set of Javascript modules (Dojo Framework) • Easily extendable • Highly customizable • Open Source RESTful Web Service (Java Jersey framework) • Cross origin request • CORS (Cross-Origin Resource Sharing) SERVER • JSONP • Sesame triple store • SPARQL and inference • Different sail are provided to implement different storages (BigOWLIM, MySQL, PostgreeSQL, Virtuoso ...) • MySQL for user data Enriching digital libraries contents with Pundit m.grassi@univpm.itTuesday, July 24, 2012
  • 27. DIFFERENT ANNOTABLE CONTENTS • Pundit allows the annotation of different types of contents at different level of granularity • Text fragments • Images • Image fragments (under development) • Videos and video fragments (experimented in Semtube) Enriching digital libraries contents with Pundit m.grassi@univpm.itTuesday, July 24, 2012
  • 28. • Semantic annotation of YouTube videos (alpha state) based on Pundit JavaScript libraries and annotation server http://semedia.dii.univpm.it/semtube Enriching digital libraries contents with Pundit m.grassi@univpm.itTuesday, July 24, 2012
  • 29. DIFFERENT TYPES OF ANNOTATIONS Annotation with different levels of expressivity and structure Comment/Tag Panel Enriching digital libraries contents with Pundit m.grassi@univpm.itTuesday, July 24, 2012
  • 30. DIFFERENT TYPES OF ANNOTATIONS Annotation with different levels of expressivity and structure Comment/Tag Panel Enriching digital libraries contents with Pundit m.grassi@univpm.itTuesday, July 24, 2012
  • 31. DIFFERENT TYPES OF ANNOTATIONS Annotation with different levels of expressivity and structure • Textual comments Comment/Tag Panel Enriching digital libraries contents with Pundit m.grassi@univpm.itTuesday, July 24, 2012
  • 32. DIFFERENT TYPES OF ANNOTATIONS Annotation with different levels of expressivity and structure • Textual comments Comment/Tag Panel • Semantic Tags • Automatically extracted from textual comments (Dbpedia Spotlight) Enriching digital libraries contents with Pundit m.grassi@univpm.itTuesday, July 24, 2012
  • 33. DIFFERENT TYPES OF ANNOTATIONS Annotation with different levels of expressivity and structure • Textual comments Comment/Tag Panel • Semantic Tags • Automatically extracted from textual comments (Dbpedia Spotlight) • Popular Linked Data service(Dbpedia, Freebase, Wordnet, ..) • Define your own (SPARQL endpoint) Enriching digital libraries contents with Pundit m.grassi@univpm.itTuesday, July 24, 2012
  • 34. DIFFERENT TYPES OF ANNOTATIONS Annotation with different levels of expressivity and structure Triple Composer • Textual comments • Semantic Tags • Popular Linked Data service(Dbpedia, Freebase, Wordnet, ..) • Automatically extracted from textual comments (Dbpedia Spotlight) • Define your own (SPARQL endpoint) • Semantic Relations • Subject-Property-Object Statements • Drag&Drop and suggestions • Connect different resources (user selection, linked data entities, ...) with semantically defined properties Enriching digital libraries contents with Pundit m.grassi@univpm.itTuesday, July 24, 2012
  • 35. DIFFERENT TYPES OF ANNOTATIONS Annotation with different levels of expressivity and structure Triple Composer • Textual comments • Semantic Tags • Popular Linked Data service(Dbpedia, Freebase, Wordnet, ..) • Automatically extracted from textual comments (Dbpedia Spotlight) • Define your own (SPARQL endpoint) • Semantic Relations • Subject-Property-Object Statements • Drag&Drop and suggestions • Connect different resources (user selection, linked data entities, ...) with semantically defined properties Enriching digital libraries contents with Pundit m.grassi@univpm.itTuesday, July 24, 2012
  • 36. DIFFERENT TYPES OF ANNOTATIONS Annotation with different levels of expressivity and structure Triple Composer • Textual comments • Semantic Tags • Popular Linked Data service(Dbpedia, Freebase, Wordnet, ..) • Automatically extracted from textual comments (Dbpedia Spotlight) • Define your own (SPARQL endpoint) • Semantic Relations • Subject-Property-Object Statements • Drag&Drop and suggestions • Connect different resources (user selection, linked data entities, ...) with semantically defined properties Enriching digital libraries contents with Pundit m.grassi@univpm.itTuesday, July 24, 2012
  • 37. CUSTOM VOCABULARIES • Pundit allows to use custom vocabularies/taxonomies (and relations): • Create a JSONp file (manually or automatically from an ontology ) • Put it online • Add its URL to the configuration to import and use it Enriching digital libraries contents with Pundit m.grassi@univpm.itTuesday, July 24, 2012
  • 38. CROSS PAGE / DOMAIN ANNOTATIONS • Special Bookmarklet allows to lunch Pundit on every Web page to perform annotations • Selected resources (text fragments, images, ...) on different pages and domain can be added to “My Items” to be stored on server and reused on different pages Enriching digital libraries contents with Pundit m.grassi@univpm.itTuesday, July 24, 2012
  • 39. CROSS PAGE / DOMAIN ANNOTATIONS • Special Bookmarklet allows to lunch Pundit on every Web page to perform annotations • Selected resources (text fragments, images, ...) on different pages and domain can be added to “My Items” to be stored on server and reused on different pages Use in another page Add to My Items cites Create cross page semantic relations Enriching digital libraries contents with Pundit m.grassi@univpm.itTuesday, July 24, 2012
  • 40. DEMO TIME! http://thepund.it Enriching digital libraries contents with Pundit m.grassi@univpm.itTuesday, July 24, 2012
  • 41. THANK YOU! http://thepund.it Semedia (Semantic Web and Multimedia) http://semedia.dii.univpm.it Semlib Project Eu Project DM2E EU Project AGORA EU Project http://www.semlibproject.eu/ http://dm2e.edu/ http://project-agora.eu/Tuesday, July 24, 2012