Bringing semantic publishing into TEI: ideas and pointers


Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Bringing semantic publishing into TEI: ideas and pointers

  1. 1. Bringing semantic publishing into TEI ideas and pointers Silvio Peroni Fabio Vitali Department of Computer Science and Engineering University of Bologna Italy
  2. 2. Outline •  Semantic publishing •  SPAR ontologies and semantic lenses •  TEI and EARMARK
  3. 3. Semantic Web / Open Linked Data Yet another definition of Semantic Web: The evolution of the World Wide Web encompassing the integration of the WWW with formal semantics to: Yet another definition of Open Linked Data: The incremental implementation of many layers of semantics of data released to the Commons: •  Structured and semi-structured data •  Abstraction and conceptualisation of data •  Inferences on data •  enable visualisation and elaboration of complex data •  provide languages (e.g., OWL) to formalise the meaning of data (e.g., using description logics)
  4. 4. Semantic publishing « anything that •  enhances the meaning of a published journal article, •  facilitates its automated discovery, •  enables its linking to semantically related articles, •  provides access to data within the article in actionable form, or •  facilitates integration of data between papers. Among other things, it involves enriching the article with appropriate metadata that •  are amenable to automated processing and analysis, •  allowing enhanced verifiability of published information and •  providing the capacity for automated discovery and summarization » Shotton, D. (2009). Semantic publishing: the coming revolution in scientific journal publishing. Learned Publishing, 22(2): 85–94. DOI: 10.1087/2009202
  5. 5. Why Semantic Publishing? •  Increase the intrinsic value of publications, •  Increase the richness of information, understanding and knowledge that can be extracted from publications; •  Enable the development of additional services •  Integrate information from multiple enhanced articles, •  Provide additional business opportunities for the publishers
  6. 6. Goals of semantic publishing •  Evaluating the pertinence of a document to a scientific field •  Discovering research trends and propagation of research findings •  Tracking of research activities, institutions and disciplines •  Analysing quantitative aspects of the output of researchers •  Evaluating the multi-disciplinarity of the output of scholars •  Measuring positive/negative citations to a particular work •  Designing and including algorithms to compute metrics indicators •  Helping final users to find related materials to a topic and/or article •  Evaluating the social acceptability of the scientific production •  Enabling users to annotate documents with related semantic data •  Querying (semantic) bibliographic data
  7. 7. SPAR •  One of the most complete set of ontologies to describe scholarly objects •  It uses: –  Common vocabulary of terms –  External metadata schemas (SKOS, PRISM, DC) –  FRBR concepts to distinguish between work, version, edition and copy –  Document components –  Roles of people, status of documents and publishing workflows –  Citations, citation contexts, reference lists
  8. 8. Semantic lenses •  Particular points of view on scholarly entities •  Contextual data: –  Research context –  Roles and contribution –  Publishing context •  Content data: –  Text: •  Text structure •  Rhetoric –  Message: •  Argumentation •  Citation network •  Textual semantics
  9. 9. An example The Tempest by William Shakespeare as available in the Oxford Text Archive :work a fabio:Play ; frbr:realization :expression ;! dcterms:creator [ a foaf:Person ; foaf:name “William Shakespeare” ] .! ! :expression a fabio:Book ; frbr:embodiment :manifestation .! ! :manifestation a fabio:DigitalManifestation ; frbr:exemplar :item ;! dcterms:format [ a dcterms:MediaType ; dcterms:description “application/tei+xml”] ;! dcterms:publisher [ a foaf:Organization ; foaf:name “OUCS” ] ; ! ! :item a fabio:ComputerFile ; fabio:storedOn fabio:web .! Closed view dbpedia:The_Tempest a fabio:Play ; frbr:realization <> ;! dcterms:creator dbpedia:William_Shakespeare .! ! <> a fabio:Book ; ! frbr:embodiment <> .! ! <> a fabio:DigitalManifestation ; ! frbr:exemplar <> ; dcterms:format application:tei+xml ; 
 dcterms:publisher dbpedia:Oxford_University_Computing_Services .! ! <> a fabio:ComputerFile ; fabio:storedOn fabio:web .! Open (Linked Data) View
  10. 10. Annotating the content <body> ! ...! <sp> ! <speaker rend="italic">Ari.</speaker>! <ab>! All haile, great Master, graue Sir, haile: I come<lb n="301"/>! To answer thy best pleasure; be’t to fly,<lb n="302"/>! To swim, to diue into the fire: to ride<lb n="303"/>! On the curld clowds: to thy strong bidding,taske<lb n="304"/>! <hi rend="italic">Ariel,</hi> and all his Qualitie.<lb n="305"/>! </ab>! </sp>! <sp> ! <speaker rend="italic">Pro.</speaker>! <ab>! Hast thou, Spirit,<lb n="306"/> ! Performd to point, the Tempest that I ! <seg type="homograph">bad</seg> thee.<lb n="307"/>! </ab>! </sp>! ... ! </body>! “Ari.”, “Ariel”, “Spirit” refer to the same entity “Master.”, “Pro.” refer to the same entity Both are defined in DBPedia! How can I annotate such an XML document without having permission to modify it?
  11. 11. •  The Extremely Annotational RDF Markup, a.k.a. EARMARK, is an OWL 2 DL ontology that defines document meta-markup •  It is an ontologically precise definition of markup that instantiates the markup of a text document as an independent OWL document outside of the text strings it annotates •  It can define structures such as trees or graphs (i.e. overlapping markup) and can be used to generate validity constraints (including co-constraints currently unavailable in most validation languages) •  Using the Linguistic Meta-Model, it becomes possible to express and assess facts, constraints and rules about the markup structure as well as about the semantics of the content of the document URIDocuverse to define the whole textual content of the document to annotate – in this case the Oxford Text Archive TEI version of the play The Tempest, available at a particular URL PointerRange to define textual ranges upon it LinguisticAct to represent annotations made on ranges by someone at a certain time
  12. 12. Multiple interpretations <ab>! All haile, great Master, graue Sir, haile: I come<lb n="301"/>! ...! </ab>! # The textual content of the document to annotate ! :content a earmark:URIDocuverse ;! earmark:hasContent ""^^xsd:anyURI .! # The string "Master"! :master-string a earmark:PointerRange ;! earmark:refersTo :content ;! earmark:begins "34023"^^xsd:nonNegativeInteger ; ! earmark:ends "34029"^^xsd:nonNegativeInteger .! # Silvio’s interpretation! :prospero-as-person a la:LinguisticAct ;! la:hasInformationEntity :master-string ; ! la:hasReference dbpedia:Prospero ; ! la:hasMeaning foaf:Person ; ! prov:wasAttributedTo :silvio ; ! prov:generatedAtTime! "2013-06-18T17:23:23Z"^^xsd:dateTime .! # Fabio’s interpretation! :prospero-as-character a la:LinguisticAct ;! la:hasInformationEntity :master-string ; ! la:hasReference dbpedia:Prospero ; ! la:hasMeaning yago:ShakespeareanCharacters ;! prov:wasAttributedTo :fabio; ! prov:generatedAtTime! "2013-07-23T17:45:23Z"^^xsd:dateTime .!
  13. 13. Conclusions •  Semantic Publishing is a natural and inevitable evolution of the technological advances of the publishing industry •  Shared ontologies are the only way to provide interoperability of data between publishers •  SPAR and Earmark do provide interesting contact points between metadata hidden in XML vocabularies and shared publishing ontologies •  TEI, which is orthogonal to these languages, can and should work well with them.
  14. 14. Thank you for your attention Emails: