3. www.sti-innsbruck.at
Research question
3
We want to annotate Springfield with an URI to make sure that the
computer understands we mean the Springfield in Massachusetts.
HTML:
<p>It is well known, that Springfield has mild summers and short, but hard
winters.</p>
HTML with annotation (something like that):
<p>It is well known, that
<span about="http://sws.geonames.org/4951788/">Springfield</span>
has mild summers and short, but hard winters.</p>
We don't want to add whole triples, but just annotate the HTML and say
"this element refers to the following URI".
From: Denny Vrandečić
Sent: Wednesday, April 24, 2013 1:59 PM
To: semantic-web at W3C
Subject: How to put an annotation in HTML?
4. www.sti-innsbruck.at
ITS 2.0
4
• International Tag Set (ITS) [2]
– enhances the foundation to integrate automated processing of human language into
core Web technologies;
– focuses on HTML, XML-based formats in general, and can leverage processing
based on the XML Localization Interchange File Format (XLIFF), as well as the
Natural Language Processing Interchange Format (NIF);
– is a technology to add metadata to Web content, for the benefit of localization,
language technologies, and internationalization (see more in [5] regarding localization
(l10n) and internationalization (i18n))
5. www.sti-innsbruck.at
ITS 2.0
5
• Potential Users of ITS [2]:
– Schema developers starting a schema from the ground up
(proposals for attribute and element names to be included in their new schema)
– Schema developers working with an existing schema
(should check whether their schemas support the markup proposed in this
specification, and, where appropriate, add the markup proposed here to their schema)
– Vendors of content-related tools (e.g. tools for authoring, translation, etc.)
– Content producers (may be used by them to mark up specific bits of content)
– Machine Translation Systems
– Text Analytics (automatically generated metadata for improving localization, data
integration or knowledge management workflows)
– Localization Workflow Managers
6. www.sti-innsbruck.at
ITS 2.0
6
The Text Analysis use case:
•This data category is used to annotate content with lexical or conceptual
information for the purpose of contextual disambiguation.
•3 pieces of annotation:
– Confidence: The confidence of the agent (that produced the annotation) in its own
computation – XSD double data type (e.g. 0.63)
– Entity type: The type of entity, or concept class of the text analysis target – IRI (e.g.
http://nerd.eurecom.fr/ontology#Location [8])
– Entity identifier: A unique identifier for the text analysis target – IRI or String (e.g.
http://dbpedia.org/page/Innsbruck or the identifier for “Capital” from Wordnet [9])
7. www.sti-innsbruck.at
ITS 2.0
7
Rendered HTML:
HTML with ITS metadata:
<html xmlns="http://www.w3.org/1999/xhtml">
<body>
<h2 translate="yes">Welcome to
<span its-ta-ident-ref="http://dbpedia.org/page/Innsbruck"
its-within-text="yes" translate="no">Innsbruck</span> in
<b translate="no" its-within-text="yes">Austria</b>!</h2>
</body>
</html>
8. www.sti-innsbruck.at
ITS 2.0
8
• Conversion to NIF [2]:
– Convert XML or HTML documents that contain ITS metadata to the RDF-based
format based on NIF. The conversion results in RDF.
– The conversion algorithm to generate NIF consists of seven steps. The output of the
algorithm uses the ITS RDF ontology [7].
– The conversion to NIF is a possible basis for a natural language processing (NLP)
application that creates, for example, named entity annotations.
– To integrate the RDF annotations into the original input document is given in [6]
(NIF2ITS).
9. www.sti-innsbruck.at
NLP Interchange Format (NIF)
9
• NIF is an RDF/OWL-based format that aims to achieve interoperability
between Natural Language Processing (NLP) tools, language resources
and annotations.
• NIF will soon be a normative part of the ITS 2.0
• NIF and its community project NLP2RDF serve as an umbrella project
liaising with other community of practices, especially:
– LOD2 FP7 EU project
– MultilingualWeb-LT Working Group
– Best Practices for Multilingual Linked Open Data Community Group
– Ontology-Lexica Community Group
– Named Entity Recognition and Disambiguation (NERD)
– Ontologies of Linguistic Annotation (OLiA)
• University of Leipzig
10. www.sti-innsbruck.at
How is it different to Microdata annotations?
10
What is the latitude and longitude of
the <span ?=?>Empire State Building</span>?
<span its-ta-ident-ref="http://live.dbpedia.org/page/Empire_State_Building">
Empire State Building</span>
<div itemscope itemtype="http://schema.org/Place">
What is the latitude and longitude of the
<span itemprop="name">Empire State Building</span>?
</div>
Microdata + schema.org
ITS2.0 + dbpedia resource
11. www.sti-innsbruck.at
How is it different to Microdata annotations?
11
What is the latitude and longitude of
the <span ?=?>Empire State Building</span>?
Semantics of ITS2.0 annotations:
Specify entity identifiers (IRIs) for the presented information item.
Semantics of Microdata annotations:
Specify the type of information that is presented.
Microdata
ITS2.0
12. www.sti-innsbruck.at
Hands-on / Demo
12
• HTML with ITS metadata
• Transformation of HTML with ITS metadata to NIF
Notes:
• Based on the XSLT files shared by the W3C Working Group member
Felix Sasaki (@fsasaki) [4]
• The Java internal XSLTC processor fails to compile the XSLTs. Use
Saxon 9 HE.
13. www.sti-innsbruck.at
References
[1] W3C semantic web list thread:
http://lists.w3.org/Archives/Public/semantic-web/2013Apr/0218.html
[2] ITS 2.0 W3C working draft: http://www.w3.org/TR/its20/
[3] NIF Core Ontology: http://persistence.uni-leipzig.org/nlp2rdf/
[4] Felix Sasaki ITS 2.0 extractor (github): https://github.com/fsasaki/its20-extractor
[5] W3C, Localization vs. Internationalization: http://www.w3.org/International/questions/qa-i18n
[6] W3C, Conversion NIF2ITS: http://www.w3.org/TR/its20/#nif-backconversion
[7] W3C, ITS 2.0 / RDF Ontology: http://www.w3.org/2005/11/its/rdf-content/its-rdf.html
[8] Named Entity Recognition and Disambiguation (NERD): http://nerd.eurecom.fr/ontology
[9] WordNet Search 3.1: http://wordnetweb.princeton.edu/perl/webwn
13