SlideShare a Scribd company logo
Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
Roundtripping of NIF based
Linguistic Linked Data with non
linked data sources
Felix Sasaki
DFKI / W3C Fellow
Slides:
http://de.slideshare.net/atcfsenzoku/sasaki-datathonmadrid2015
1
Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
What is NIF?
• Natural Language Processing Interchange
Format
– See http://nlp2rdf.org/
• LLD format to store annotations & to organize
NLP pipelines
• API specification to create NIF workflows
• More details: after the coffee break 
• Following slides: main roles for NIF
2
Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
Example (Partial; JSON-LD Syntax)
{ "@graph" : [ {
"@id" : "p:char=0,18",
"@type" : [ "nif:Context", "nif:Sentence", "nif:RFC5147String" ],
"anchorOf" : "Welcome to Prague.",
"beginIndex" : "0",
"endIndex" : "18",
"isString" : "Welcome to Prague.",
"referenceContext" : "p:char=0,18”
}, {
"@id" : "p:char=11,17",
"@type" : [ "nif:RFC5147String", "nif:Word" ], …
"referenceContext" : "p:char=0,18",
"taIdentRef" : "http://dbpedia.org/resource/Prague" }, …] }
3
Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
Example (Partial; JSON-LD Syntax)
{ "@graph" : [ {
"@id" : "p:char=0,18",
"@type" : [ "nif:Context", "nif:Sentence", "nif:RFC5147String" ],
"anchorOf" : "Welcome to Prague.",
"beginIndex" : "0",
"endIndex" : "18",
"isString" : "Welcome to Prague.",
"referenceContext" : "p:char=0,18”
}, {
"@id" : "p:char=11,17",
"@type" : [ "nif:RFC5147String", "nif:Word" ], …
"referenceContext" : "p:char=0,18",
"taIdentRef" : "http://dbpedia.org/resource/Prague" }, …] }
4
• Identifying and typing
annotations
• Identifying annotation
offsets
• Adding additional
knowledge, e.g. named
entity identifier
• Interrelating
annotations
Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
Example (Partial; JSON-LD Syntax)
{ "@graph" : [ {
"@id" : "p:char=0,18",
"@type" : [ "nif:Context", "nif:Sentence", "nif:RFC5147String" ],
"anchorOf" : "Welcome to Prague.",
"beginIndex" : "0",
"endIndex" : "18",
"isString" : "Welcome to Prague.",
"referenceContext" : "p:char=0,18”
}, {
"@id" : "p:char=11,17",
"@type" : [ "nif:RFC5147String", "nif:Word" ], …
"referenceContext" : "p:char=0,18",
"taIdentRef" : "http://dbpedia.org/resource/Prague" }, …] }
5
• Identifying and typing
annotations
• Identifying annotation
offsets
• Adding additional
knowledge, e.g. named
entity identifier
• Interrelating
annotations
Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
Example (Partial; JSON-LD Syntax)
{ "@graph" : [ {
"@id" : "p:char=0,18",
"@type" : [ "nif:Context", "nif:Sentence", "nif:RFC5147String" ],
"anchorOf" : "Welcome to Prague.",
"beginIndex" : "0",
"endIndex" : "18",
"isString" : "Welcome to Prague.",
"referenceContext" : "p:char=0,18”
}, {
"@id" : "p:char=11,17",
"@type" : [ "nif:RFC5147String", "nif:Word" ], …
"referenceContext" : "p:char=0,18",
"taIdentRef" : "http://dbpedia.org/resource/Prague" }, …] }
6
• Identifying and typing
annotations
• Identifying annotation
offsets
• Adding additional
knowledge, e.g.
named entity identifier
• Interrelating
annotations
Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
Example (Partial; JSON-LD Syntax)
{ "@graph" : [ {
"@id" : "p:char=0,18",
"@type" : [ "nif:Context", "nif:Sentence", "nif:RFC5147String" ],
"anchorOf" : "Welcome to Prague.",
"beginIndex" : "0",
"endIndex" : "18",
"isString" : "Welcome to Prague.",
"referenceContext" : "p:char=0,18”
}, {
"@id" : "p:char=11,17",
"@type" : [ "nif:RFC5147String", "nif:Word" ], …
"referenceContext" : "p:char=0,18",
"taIdentRef" : "http://dbpedia.org/resource/Prague" }, …] }
7
• Identifying and typing
annotations
• Identifying annotation
offsets
• Adding additional
knowledge, e.g.
named entity identifier
• Interrelating
annotations
Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
A NIF workflow
8
Existing
content
Content analytics, e.g.
named entity
recognition
Conversion to
NIF
Deploying knowledge from the LLD cloud
Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
Potential scenario: roundtripping
9
Existing
content
Content analytics, e.g.
named entity
recognition
Conversion to
NIF
Storing annotations in original content
Deploying knowledge from the LLD cloud
Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
Roundtripping
• Roundtripping: Storing the outcome of
content processing (analytics) tasks in the
original content
• Not always needed, but sometimes –
examples:
– Enriching Web content with named entity
information; generating Schema.org markup via
NIF pipelines. Format: HTML
– Enriching localisation content, to add value
beyond translation: Format: XLIFF
10
Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
Example: HTML
Example roundtripping workflow
11
… <p>Welcome to Prague!</p>…
…<p>Welcome to <span …
itemtype="http://schema.org/Place">Prague</span>!<
/p>…
1) Conversion to NIF 2) NER processing
3) Back conversion to HTML
Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
Example: XLIFF
Example roundtripping workflow
12
… <xlf:source>Welcome to Prague!</xlf:source> …
… <xlf:source>Welcome to <mrk …
its:taClassRef="http://schema.org/Place">Prague
</mrk>!</xlf:source> …
1) Conversion to NIF 2) NER processing
3) Back conversion to HTML
Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
Example usage scenario:
FREME project
• See http://www.freme-project.eu/
• Developing interfaces for multilingual and semantic
enrichment of digital content
• Relies on NIF based enrichment workflows
– See FREME API version 0.1
http://api.freme-project.eu/doc/0.1/
• Deploys aspects of the LIDER reference architecture for LLD
processing
– See D3.1.1 at http://lider-project.eu/?q=doc/deliverables
• Focuses on four business cases
– Localization BC requires XLIFF roundtripping
– Web content personalisation BC requires HTML roundtripping
13
Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
Challenges for roundtripping
• Source format
– How to store enrichment information
(annotations)
– How to handle existing information
• Annotation model
– NIF = a general graph-based annotation model
– Sources format and annotation motivation may
require restriction of the model
14
Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
How to store annotations in various
source formats
• Solvable for markup languages like HTML or
XLIFF
• Challenge to preserve existing markup
“<p>Welcome to <b>Prague</b>!</p>”
• General issue with complex and proprietary
formats:
– “My own” storage mechanism = no tool support
– Using existing storage mechanisms may mean:
overloading semantics
15
Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
Source format example: Word
… <w:t>Welcome to Prague!</w:t> …
16
… <w:commentRangeStart w:id="0"/><w:t>Prague</w:t>
<w:commentRangeEnd w:id="0"/>
<w:r w:rsidR="00987079"> …
<w:p w:rsidRPr="00987079">… Enrichment: type "http://schema.org/Place"…</w:p>
Enrichment process; storing enrichment as comments
Change of original content: creation of anchor
Comment stored separately; refers to anchor: “standoff approach”
Content storage
Comment storage
Content storage (Word file unzipped)
Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
Annotation models
• NIF: like RDF = general graph model
– Consisting of nodes and arcs
17
p:char=11,17 dbp:Prague
taIdentRef
Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
Restricting graphs: Tree structured annotations
on several layers
18
• Tree structures
for syntactic
annotations
• Several
annotation layers
for the same text
• Concurrent
hierarchies
• Representation
only of one of
these in
roundtripping
with XML
Example taken from TEI http://www.tei-c.org/release/doc/tei-p5-doc/en/html/NH.html
Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
Representing overlapping hierarchies
with markup (1/2)
Solutions advertised by the TEI
• Multiple encoding of the same information
– One XML document per annotation
• Boundary marking with empty “milestone”
elements
– Also used by XLIFF
19
Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
Representing overlapping hierarchies
with markup (2/2)
Solutions advertised by the TEI
• Fragmentation and reconstitution of virtual
elements
– One hierarchy explicit, others with interrelated
marked-up spans
• Stand-off markup
– Separation of text and annotations, interlinked via
anchor and reference
– Cf. Word example
20
Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
Representing overlapping hierarchies
in RDF
POWLA (cf. Chiarcos, 2012)
• RDF representation for corpus annotation,
based on PAULA XML Standoff format
• Allows to represent hierarchical, multi-layer
corpora in RDF and query in SPARQL
• Not relevant for roundtripping, but for
linguistic annotation representation and
processing in RDF
21
Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
Lessons learned
• Choose the overlap solution that fits your
roundtripping modelling and processing needs
• Consider off-the-shelf tooling
– For 100% hierarchical data: XPath / CSS selectors, DOM, …
• Consider libraries
– For extraction only: Tika http://tika.apache.org/
– For roundtripping: Okapi http://okapi.opentag.com/ - in
FREME currently being adapted for roundtripping in
selected formats
• Make sure the annotation survives in the original
format – cf. Word example
– Soon to be made easier by using Okapi
22
Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
Roundtripping of NIF based
Linguistic Linked Data with non
linked data sources
Felix Sasaki
DFKI / W3C Fellow
23

More Related Content

What's hot

Getting Started with Knowledge Graphs
Getting Started with Knowledge GraphsGetting Started with Knowledge Graphs
Getting Started with Knowledge Graphs
Peter Haase
 
Linked base Registries | The Scottish Government - Webinar 2017
Linked base Registries | The Scottish Government - Webinar 2017Linked base Registries | The Scottish Government - Webinar 2017
Linked base Registries | The Scottish Government - Webinar 2017
Raf Buyle
 
Linked data-tooling-xml
Linked data-tooling-xmlLinked data-tooling-xml
Linked data-tooling-xml
Felix Sasaki
 
Semantic Variation Graphs the case for RDF & SPARQL
Semantic Variation Graphs the case for RDF & SPARQLSemantic Variation Graphs the case for RDF & SPARQL
Semantic Variation Graphs the case for RDF & SPARQL
Jerven Bolleman
 
ESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge GraphsESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge Graphs
Peter Haase
 
Hybrid Enterprise Knowledge Graphs
Hybrid Enterprise Knowledge GraphsHybrid Enterprise Knowledge Graphs
Hybrid Enterprise Knowledge Graphs
Peter Haase
 
Semantic Web Technology
Semantic Web TechnologySemantic Web Technology
Semantic Web Technology
Rathachai Chawuthai
 
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
National Institute of Informatics
 
What_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfWhat_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdf
Heiko Paulheim
 
LD4KD 2015 - Demos and tools
LD4KD 2015 - Demos and toolsLD4KD 2015 - Demos and tools
LD4KD 2015 - Demos and tools
Vrije Universiteit Amsterdam
 
The Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataThe Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open Data
Ontotext
 
ROI in Linking Content to CRM by Applying the Linked Data Stack
ROI in Linking Content to CRM by Applying the Linked Data StackROI in Linking Content to CRM by Applying the Linked Data Stack
ROI in Linking Content to CRM by Applying the Linked Data Stack
Martin Voigt
 
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudFirst Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
Ontotext
 
Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020
Ontotext
 
Semantic Technologies and Triplestores for Business Intelligence
Semantic Technologies and Triplestores for Business IntelligenceSemantic Technologies and Triplestores for Business Intelligence
Semantic Technologies and Triplestores for Business Intelligence
Marin Dimitrov
 
Regal - a Repository for Electronic Documents and Bibliographic Data
Regal - a Repository for Electronic Documents and Bibliographic DataRegal - a Repository for Electronic Documents and Bibliographic Data
Regal - a Repository for Electronic Documents and Bibliographic Data
Felix Ostrowski
 
Hacktoberfest 2020 - Intro to Knowledge Graphs
Hacktoberfest 2020 - Intro to Knowledge GraphsHacktoberfest 2020 - Intro to Knowledge Graphs
Hacktoberfest 2020 - Intro to Knowledge Graphs
ArangoDB Database
 
Linked Data Experiences at Springer Nature
Linked Data Experiences at Springer NatureLinked Data Experiences at Springer Nature
Linked Data Experiences at Springer Nature
Michele Pasin
 
Linked Data at the National Széchényi Library : road to the publication
Linked Data at the National Széchényi Library : road to the publicationLinked Data at the National Széchényi Library : road to the publication
Linked Data at the National Széchényi Library : road to the publication
horvadam
 
Semantics 2017 - Trying Not to Die Benchmarking using LITMUS
Semantics 2017 - Trying Not to Die Benchmarking using LITMUSSemantics 2017 - Trying Not to Die Benchmarking using LITMUS
Semantics 2017 - Trying Not to Die Benchmarking using LITMUS
Harsh Thakkar
 

What's hot (20)

Getting Started with Knowledge Graphs
Getting Started with Knowledge GraphsGetting Started with Knowledge Graphs
Getting Started with Knowledge Graphs
 
Linked base Registries | The Scottish Government - Webinar 2017
Linked base Registries | The Scottish Government - Webinar 2017Linked base Registries | The Scottish Government - Webinar 2017
Linked base Registries | The Scottish Government - Webinar 2017
 
Linked data-tooling-xml
Linked data-tooling-xmlLinked data-tooling-xml
Linked data-tooling-xml
 
Semantic Variation Graphs the case for RDF & SPARQL
Semantic Variation Graphs the case for RDF & SPARQLSemantic Variation Graphs the case for RDF & SPARQL
Semantic Variation Graphs the case for RDF & SPARQL
 
ESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge GraphsESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge Graphs
 
Hybrid Enterprise Knowledge Graphs
Hybrid Enterprise Knowledge GraphsHybrid Enterprise Knowledge Graphs
Hybrid Enterprise Knowledge Graphs
 
Semantic Web Technology
Semantic Web TechnologySemantic Web Technology
Semantic Web Technology
 
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
 
What_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfWhat_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdf
 
LD4KD 2015 - Demos and tools
LD4KD 2015 - Demos and toolsLD4KD 2015 - Demos and tools
LD4KD 2015 - Demos and tools
 
The Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataThe Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open Data
 
ROI in Linking Content to CRM by Applying the Linked Data Stack
ROI in Linking Content to CRM by Applying the Linked Data StackROI in Linking Content to CRM by Applying the Linked Data Stack
ROI in Linking Content to CRM by Applying the Linked Data Stack
 
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudFirst Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
 
Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020
 
Semantic Technologies and Triplestores for Business Intelligence
Semantic Technologies and Triplestores for Business IntelligenceSemantic Technologies and Triplestores for Business Intelligence
Semantic Technologies and Triplestores for Business Intelligence
 
Regal - a Repository for Electronic Documents and Bibliographic Data
Regal - a Repository for Electronic Documents and Bibliographic DataRegal - a Repository for Electronic Documents and Bibliographic Data
Regal - a Repository for Electronic Documents and Bibliographic Data
 
Hacktoberfest 2020 - Intro to Knowledge Graphs
Hacktoberfest 2020 - Intro to Knowledge GraphsHacktoberfest 2020 - Intro to Knowledge Graphs
Hacktoberfest 2020 - Intro to Knowledge Graphs
 
Linked Data Experiences at Springer Nature
Linked Data Experiences at Springer NatureLinked Data Experiences at Springer Nature
Linked Data Experiences at Springer Nature
 
Linked Data at the National Széchényi Library : road to the publication
Linked Data at the National Széchényi Library : road to the publicationLinked Data at the National Széchényi Library : road to the publication
Linked Data at the National Széchényi Library : road to the publication
 
Semantics 2017 - Trying Not to Die Benchmarking using LITMUS
Semantics 2017 - Trying Not to Die Benchmarking using LITMUSSemantics 2017 - Trying Not to Die Benchmarking using LITMUS
Semantics 2017 - Trying Not to Die Benchmarking using LITMUS
 

Similar to Sasaki datathon-madrid-2015

MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open DataMuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data
21Style
 
The Nature.com ontologies portal - Linked Science 2015
The Nature.com ontologies portal - Linked Science 2015The Nature.com ontologies portal - Linked Science 2015
The Nature.com ontologies portal - Linked Science 2015
Michele Pasin
 
The nature.com ontologies portal: nature.com/ontologies
The nature.com ontologies portal: nature.com/ontologiesThe nature.com ontologies portal: nature.com/ontologies
The nature.com ontologies portal: nature.com/ontologies
Tony Hammond
 
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
Sergio Fernández
 
Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.
Enrico Daga
 
Eclipse RDF4J - Working with RDF in Java
Eclipse RDF4J - Working with RDF in JavaEclipse RDF4J - Working with RDF in Java
Eclipse RDF4J - Working with RDF in Java
Jeen Broekstra
 
Sasaki practical-linked-data
Sasaki practical-linked-dataSasaki practical-linked-data
Sasaki practical-linked-data
Felix Sasaki
 
Linked data tooling XML
Linked data tooling XMLLinked data tooling XML
Linked data tooling XML
FREMEProjectH2020
 
Oc wg-nif-20130711
Oc wg-nif-20130711Oc wg-nif-20130711
Oc wg-nif-20130711
STIinnsbruck
 
Graph databases & data integration - the case of RDF
Graph databases & data integration - the case of RDFGraph databases & data integration - the case of RDF
Graph databases & data integration - the case of RDF
Dimitris Kontokostas
 
Linked Open Data: A simple how-to
Linked Open Data: A simple how-toLinked Open Data: A simple how-to
Linked Open Data: A simple how-to
nvitucci
 
A year on the Semantic Web @ W3C
A year on the Semantic Web @ W3CA year on the Semantic Web @ W3C
A year on the Semantic Web @ W3C
Ivan Herman
 
The Rhizomer Semantic Content Management System
The Rhizomer Semantic Content Management SystemThe Rhizomer Semantic Content Management System
The Rhizomer Semantic Content Management System
Roberto García
 
Data Integration And Visualization
Data Integration And VisualizationData Integration And Visualization
Data Integration And Visualization
Ivan Ermilov
 
Knowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything ProjectKnowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything Project
Enrico Daga
 
Querying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphQuerying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge Graph
Ioan Toma
 
Querying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphQuerying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge Graph
LDBC council
 
State of the Semantic Web
State of the Semantic WebState of the Semantic Web
State of the Semantic Web
Ivan Herman
 
FIWARE Global Summit - IDS Implementation with FIWARE Software Components
FIWARE Global Summit - IDS Implementation with FIWARE Software ComponentsFIWARE Global Summit - IDS Implementation with FIWARE Software Components
FIWARE Global Summit - IDS Implementation with FIWARE Software Components
FIWARE
 
RDF Linked Data - Automatic Exchange of BIM Containers
RDF Linked Data - Automatic Exchange of BIM ContainersRDF Linked Data - Automatic Exchange of BIM Containers
RDF Linked Data - Automatic Exchange of BIM Containers
Safe Software
 

Similar to Sasaki datathon-madrid-2015 (20)

MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open DataMuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data
 
The Nature.com ontologies portal - Linked Science 2015
The Nature.com ontologies portal - Linked Science 2015The Nature.com ontologies portal - Linked Science 2015
The Nature.com ontologies portal - Linked Science 2015
 
The nature.com ontologies portal: nature.com/ontologies
The nature.com ontologies portal: nature.com/ontologiesThe nature.com ontologies portal: nature.com/ontologies
The nature.com ontologies portal: nature.com/ontologies
 
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
 
Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.
 
Eclipse RDF4J - Working with RDF in Java
Eclipse RDF4J - Working with RDF in JavaEclipse RDF4J - Working with RDF in Java
Eclipse RDF4J - Working with RDF in Java
 
Sasaki practical-linked-data
Sasaki practical-linked-dataSasaki practical-linked-data
Sasaki practical-linked-data
 
Linked data tooling XML
Linked data tooling XMLLinked data tooling XML
Linked data tooling XML
 
Oc wg-nif-20130711
Oc wg-nif-20130711Oc wg-nif-20130711
Oc wg-nif-20130711
 
Graph databases & data integration - the case of RDF
Graph databases & data integration - the case of RDFGraph databases & data integration - the case of RDF
Graph databases & data integration - the case of RDF
 
Linked Open Data: A simple how-to
Linked Open Data: A simple how-toLinked Open Data: A simple how-to
Linked Open Data: A simple how-to
 
A year on the Semantic Web @ W3C
A year on the Semantic Web @ W3CA year on the Semantic Web @ W3C
A year on the Semantic Web @ W3C
 
The Rhizomer Semantic Content Management System
The Rhizomer Semantic Content Management SystemThe Rhizomer Semantic Content Management System
The Rhizomer Semantic Content Management System
 
Data Integration And Visualization
Data Integration And VisualizationData Integration And Visualization
Data Integration And Visualization
 
Knowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything ProjectKnowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything Project
 
Querying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphQuerying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge Graph
 
Querying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphQuerying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge Graph
 
State of the Semantic Web
State of the Semantic WebState of the Semantic Web
State of the Semantic Web
 
FIWARE Global Summit - IDS Implementation with FIWARE Software Components
FIWARE Global Summit - IDS Implementation with FIWARE Software ComponentsFIWARE Global Summit - IDS Implementation with FIWARE Software Components
FIWARE Global Summit - IDS Implementation with FIWARE Software Components
 
RDF Linked Data - Automatic Exchange of BIM Containers
RDF Linked Data - Automatic Exchange of BIM ContainersRDF Linked Data - Automatic Exchange of BIM Containers
RDF Linked Data - Automatic Exchange of BIM Containers
 

More from Felix Sasaki

Thb tag-des-offenen-fensters-2021-sasaki-graphdatenbanken
Thb tag-des-offenen-fensters-2021-sasaki-graphdatenbankenThb tag-des-offenen-fensters-2021-sasaki-graphdatenbanken
Thb tag-des-offenen-fensters-2021-sasaki-graphdatenbanken
Felix Sasaki
 
XML Seminar
XML SeminarXML Seminar
XML Seminar
Felix Sasaki
 
Sasaki Presentation at EVA 2016
Sasaki Presentation at EVA 2016Sasaki Presentation at EVA 2016
Sasaki Presentation at EVA 2016
Felix Sasaki
 
Freme at feisgiltt 2015 freme & linked data & localisers
Freme at feisgiltt 2015   freme & linked data & localisersFreme at feisgiltt 2015   freme & linked data & localisers
Freme at feisgiltt 2015 freme & linked data & localisers
Felix Sasaki
 
Freme at feisgiltt 2015 freme use cases
Freme at feisgiltt 2015   freme use casesFreme at feisgiltt 2015   freme use cases
Freme at feisgiltt 2015 freme use cases
Felix Sasaki
 
1114 sasaki-metadata
1114 sasaki-metadata1114 sasaki-metadata
1114 sasaki-metadata
Felix Sasaki
 
Its2 ontology-localization
Its2 ontology-localizationIts2 ontology-localization
Its2 ontology-localization
Felix Sasaki
 
Sasaki ins-netz-gegangen-20111117
Sasaki ins-netz-gegangen-20111117Sasaki ins-netz-gegangen-20111117
Sasaki ins-netz-gegangen-20111117
Felix Sasaki
 
"Warum Metadaten? Ein Plädoyer und mehr …" - webtechcon 2011 Präsentation
"Warum Metadaten? Ein Plädoyer und mehr …" - webtechcon 2011 Präsentation"Warum Metadaten? Ein Plädoyer und mehr …" - webtechcon 2011 Präsentation
"Warum Metadaten? Ein Plädoyer und mehr …" - webtechcon 2011 Präsentation
Felix Sasaki
 
Sasaki markupforum2011
Sasaki markupforum2011Sasaki markupforum2011
Sasaki markupforum2011
Felix Sasaki
 
Sasaki webtechcon2010
Sasaki webtechcon2010Sasaki webtechcon2010
Sasaki webtechcon2010
Felix Sasaki
 
Mlw sasaki-20101027
Mlw sasaki-20101027Mlw sasaki-20101027
Mlw sasaki-20101027
Felix Sasaki
 
HTML5 - presentation at W3C-Tag 2009
HTML5 - presentation at W3C-Tag 2009HTML5 - presentation at W3C-Tag 2009
HTML5 - presentation at W3C-Tag 2009
Felix Sasaki
 

More from Felix Sasaki (13)

Thb tag-des-offenen-fensters-2021-sasaki-graphdatenbanken
Thb tag-des-offenen-fensters-2021-sasaki-graphdatenbankenThb tag-des-offenen-fensters-2021-sasaki-graphdatenbanken
Thb tag-des-offenen-fensters-2021-sasaki-graphdatenbanken
 
XML Seminar
XML SeminarXML Seminar
XML Seminar
 
Sasaki Presentation at EVA 2016
Sasaki Presentation at EVA 2016Sasaki Presentation at EVA 2016
Sasaki Presentation at EVA 2016
 
Freme at feisgiltt 2015 freme & linked data & localisers
Freme at feisgiltt 2015   freme & linked data & localisersFreme at feisgiltt 2015   freme & linked data & localisers
Freme at feisgiltt 2015 freme & linked data & localisers
 
Freme at feisgiltt 2015 freme use cases
Freme at feisgiltt 2015   freme use casesFreme at feisgiltt 2015   freme use cases
Freme at feisgiltt 2015 freme use cases
 
1114 sasaki-metadata
1114 sasaki-metadata1114 sasaki-metadata
1114 sasaki-metadata
 
Its2 ontology-localization
Its2 ontology-localizationIts2 ontology-localization
Its2 ontology-localization
 
Sasaki ins-netz-gegangen-20111117
Sasaki ins-netz-gegangen-20111117Sasaki ins-netz-gegangen-20111117
Sasaki ins-netz-gegangen-20111117
 
"Warum Metadaten? Ein Plädoyer und mehr …" - webtechcon 2011 Präsentation
"Warum Metadaten? Ein Plädoyer und mehr …" - webtechcon 2011 Präsentation"Warum Metadaten? Ein Plädoyer und mehr …" - webtechcon 2011 Präsentation
"Warum Metadaten? Ein Plädoyer und mehr …" - webtechcon 2011 Präsentation
 
Sasaki markupforum2011
Sasaki markupforum2011Sasaki markupforum2011
Sasaki markupforum2011
 
Sasaki webtechcon2010
Sasaki webtechcon2010Sasaki webtechcon2010
Sasaki webtechcon2010
 
Mlw sasaki-20101027
Mlw sasaki-20101027Mlw sasaki-20101027
Mlw sasaki-20101027
 
HTML5 - presentation at W3C-Tag 2009
HTML5 - presentation at W3C-Tag 2009HTML5 - presentation at W3C-Tag 2009
HTML5 - presentation at W3C-Tag 2009
 

Recently uploaded

Week 1 - Pendidikan Pancasila - Gr 1.docx
Week 1 - Pendidikan Pancasila - Gr 1.docxWeek 1 - Pendidikan Pancasila - Gr 1.docx
Week 1 - Pendidikan Pancasila - Gr 1.docx
JunaManroe1
 
Enhancing seamless access using TIGERfed
Enhancing seamless access using TIGERfedEnhancing seamless access using TIGERfed
Enhancing seamless access using TIGERfed
Bangladesh Network Operators Group
 
Effective Tips for Creating the Best Rich Media Ads .pptx
Effective Tips for Creating the Best Rich Media Ads .pptxEffective Tips for Creating the Best Rich Media Ads .pptx
Effective Tips for Creating the Best Rich Media Ads .pptx
AirtoryInc
 
SisAi World - Software is AI - Providing AI as Software - Protecting the Inte...
SisAi World - Software is AI - Providing AI as Software - Protecting the Inte...SisAi World - Software is AI - Providing AI as Software - Protecting the Inte...
SisAi World - Software is AI - Providing AI as Software - Protecting the Inte...
QingjieDu1
 
202254.com香蕉影视,在线观看《我才不要和你做朋友呢》在线观看最新电影,香蕉影视在线观看《我才不要和你做朋友呢》在线观看高清电影
202254.com香蕉影视,在线观看《我才不要和你做朋友呢》在线观看最新电影,香蕉影视在线观看《我才不要和你做朋友呢》在线观看高清电影202254.com香蕉影视,在线观看《我才不要和你做朋友呢》在线观看最新电影,香蕉影视在线观看《我才不要和你做朋友呢》在线观看高清电影
202254.com香蕉影视,在线观看《我才不要和你做朋友呢》在线观看最新电影,香蕉影视在线观看《我才不要和你做朋友呢》在线观看高清电影
ffg01100
 
Top 50 Telephone Conversation Sample Examples For IT Industries.pdf
Top 50 Telephone Conversation Sample Examples For IT Industries.pdfTop 50 Telephone Conversation Sample Examples For IT Industries.pdf
Top 50 Telephone Conversation Sample Examples For IT Industries.pdf
Krishna L
 
Do it again anti Republican shirt Do it again anti Republican shirt
Do it again anti Republican shirt Do it again anti Republican shirtDo it again anti Republican shirt Do it again anti Republican shirt
Do it again anti Republican shirt Do it again anti Republican shirt
exgf28
 
Saint Louis University diploma
Saint Louis University diplomaSaint Louis University diploma
Saint Louis University diploma
eufdev
 
upgrade to zabbix-7 0 como atualiza lts1
upgrade to zabbix-7 0 como atualiza lts1upgrade to zabbix-7 0 como atualiza lts1
upgrade to zabbix-7 0 como atualiza lts1
diogolsew
 
Portugal Dreamin 24 - How to easily use an API with Flows
Portugal Dreamin 24  - How to easily use an API with FlowsPortugal Dreamin 24  - How to easily use an API with Flows
Portugal Dreamin 24 - How to easily use an API with Flows
Thierry TROUIN ☁
 
Girls Call Mahipalpur 000XX00000 Provide Best And Top Girl Service And No1 in...
Girls Call Mahipalpur 000XX00000 Provide Best And Top Girl Service And No1 in...Girls Call Mahipalpur 000XX00000 Provide Best And Top Girl Service And No1 in...
Girls Call Mahipalpur 000XX00000 Provide Best And Top Girl Service And No1 in...
mahigarg2024#G05
 
Female Service Girls Call Delhi 9873940964 Provide Best And Top Girl Service ...
Female Service Girls Call Delhi 9873940964 Provide Best And Top Girl Service ...Female Service Girls Call Delhi 9873940964 Provide Best And Top Girl Service ...
Female Service Girls Call Delhi 9873940964 Provide Best And Top Girl Service ...
elbertablack
 
Best Skills to Learn for Freelancing.pdf
Best Skills to Learn for Freelancing.pdfBest Skills to Learn for Freelancing.pdf
Best Skills to Learn for Freelancing.pdf
Million-$-Knowledge {Million Dollar Knowledge}
 
Network Security version1.0 - Module 3.pptx
Network Security version1.0 - Module 3.pptxNetwork Security version1.0 - Module 3.pptx
Network Security version1.0 - Module 3.pptx
Infotainmentforall
 
Geolocation and Geofeed Implementation bdNOG18
Geolocation and Geofeed Implementation bdNOG18Geolocation and Geofeed Implementation bdNOG18
Geolocation and Geofeed Implementation bdNOG18
Bangladesh Network Operators Group
 
University of California, Riverside diploma
University of California, Riverside diplomaUniversity of California, Riverside diploma
University of California, Riverside diploma
eufdev
 
Trump fist pump t shirts Trump fist pump t shirts
Trump fist pump t shirts Trump fist pump t shirtsTrump fist pump t shirts Trump fist pump t shirts
Trump fist pump t shirts Trump fist pump t shirts
exgf28
 
Lordsexch ID: An Ultimate Online Cricket ID Provider In India
Lordsexch ID: An Ultimate Online Cricket ID Provider In IndiaLordsexch ID: An Ultimate Online Cricket ID Provider In India
Lordsexch ID: An Ultimate Online Cricket ID Provider In India
exchangeid32
 
DASH, presented by Elly Tawhai at PacNOG 33
DASH, presented by Elly Tawhai at PacNOG 33DASH, presented by Elly Tawhai at PacNOG 33
DASH, presented by Elly Tawhai at PacNOG 33
APNIC
 
Study of international anticancer research trends.pdf
Study of international anticancer research trends.pdfStudy of international anticancer research trends.pdf
Study of international anticancer research trends.pdf
Preston University
 

Recently uploaded (20)

Week 1 - Pendidikan Pancasila - Gr 1.docx
Week 1 - Pendidikan Pancasila - Gr 1.docxWeek 1 - Pendidikan Pancasila - Gr 1.docx
Week 1 - Pendidikan Pancasila - Gr 1.docx
 
Enhancing seamless access using TIGERfed
Enhancing seamless access using TIGERfedEnhancing seamless access using TIGERfed
Enhancing seamless access using TIGERfed
 
Effective Tips for Creating the Best Rich Media Ads .pptx
Effective Tips for Creating the Best Rich Media Ads .pptxEffective Tips for Creating the Best Rich Media Ads .pptx
Effective Tips for Creating the Best Rich Media Ads .pptx
 
SisAi World - Software is AI - Providing AI as Software - Protecting the Inte...
SisAi World - Software is AI - Providing AI as Software - Protecting the Inte...SisAi World - Software is AI - Providing AI as Software - Protecting the Inte...
SisAi World - Software is AI - Providing AI as Software - Protecting the Inte...
 
202254.com香蕉影视,在线观看《我才不要和你做朋友呢》在线观看最新电影,香蕉影视在线观看《我才不要和你做朋友呢》在线观看高清电影
202254.com香蕉影视,在线观看《我才不要和你做朋友呢》在线观看最新电影,香蕉影视在线观看《我才不要和你做朋友呢》在线观看高清电影202254.com香蕉影视,在线观看《我才不要和你做朋友呢》在线观看最新电影,香蕉影视在线观看《我才不要和你做朋友呢》在线观看高清电影
202254.com香蕉影视,在线观看《我才不要和你做朋友呢》在线观看最新电影,香蕉影视在线观看《我才不要和你做朋友呢》在线观看高清电影
 
Top 50 Telephone Conversation Sample Examples For IT Industries.pdf
Top 50 Telephone Conversation Sample Examples For IT Industries.pdfTop 50 Telephone Conversation Sample Examples For IT Industries.pdf
Top 50 Telephone Conversation Sample Examples For IT Industries.pdf
 
Do it again anti Republican shirt Do it again anti Republican shirt
Do it again anti Republican shirt Do it again anti Republican shirtDo it again anti Republican shirt Do it again anti Republican shirt
Do it again anti Republican shirt Do it again anti Republican shirt
 
Saint Louis University diploma
Saint Louis University diplomaSaint Louis University diploma
Saint Louis University diploma
 
upgrade to zabbix-7 0 como atualiza lts1
upgrade to zabbix-7 0 como atualiza lts1upgrade to zabbix-7 0 como atualiza lts1
upgrade to zabbix-7 0 como atualiza lts1
 
Portugal Dreamin 24 - How to easily use an API with Flows
Portugal Dreamin 24  - How to easily use an API with FlowsPortugal Dreamin 24  - How to easily use an API with Flows
Portugal Dreamin 24 - How to easily use an API with Flows
 
Girls Call Mahipalpur 000XX00000 Provide Best And Top Girl Service And No1 in...
Girls Call Mahipalpur 000XX00000 Provide Best And Top Girl Service And No1 in...Girls Call Mahipalpur 000XX00000 Provide Best And Top Girl Service And No1 in...
Girls Call Mahipalpur 000XX00000 Provide Best And Top Girl Service And No1 in...
 
Female Service Girls Call Delhi 9873940964 Provide Best And Top Girl Service ...
Female Service Girls Call Delhi 9873940964 Provide Best And Top Girl Service ...Female Service Girls Call Delhi 9873940964 Provide Best And Top Girl Service ...
Female Service Girls Call Delhi 9873940964 Provide Best And Top Girl Service ...
 
Best Skills to Learn for Freelancing.pdf
Best Skills to Learn for Freelancing.pdfBest Skills to Learn for Freelancing.pdf
Best Skills to Learn for Freelancing.pdf
 
Network Security version1.0 - Module 3.pptx
Network Security version1.0 - Module 3.pptxNetwork Security version1.0 - Module 3.pptx
Network Security version1.0 - Module 3.pptx
 
Geolocation and Geofeed Implementation bdNOG18
Geolocation and Geofeed Implementation bdNOG18Geolocation and Geofeed Implementation bdNOG18
Geolocation and Geofeed Implementation bdNOG18
 
University of California, Riverside diploma
University of California, Riverside diplomaUniversity of California, Riverside diploma
University of California, Riverside diploma
 
Trump fist pump t shirts Trump fist pump t shirts
Trump fist pump t shirts Trump fist pump t shirtsTrump fist pump t shirts Trump fist pump t shirts
Trump fist pump t shirts Trump fist pump t shirts
 
Lordsexch ID: An Ultimate Online Cricket ID Provider In India
Lordsexch ID: An Ultimate Online Cricket ID Provider In IndiaLordsexch ID: An Ultimate Online Cricket ID Provider In India
Lordsexch ID: An Ultimate Online Cricket ID Provider In India
 
DASH, presented by Elly Tawhai at PacNOG 33
DASH, presented by Elly Tawhai at PacNOG 33DASH, presented by Elly Tawhai at PacNOG 33
DASH, presented by Elly Tawhai at PacNOG 33
 
Study of international anticancer research trends.pdf
Study of international anticancer research trends.pdfStudy of international anticancer research trends.pdf
Study of international anticancer research trends.pdf
 

Sasaki datathon-madrid-2015

  • 1. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015 Roundtripping of NIF based Linguistic Linked Data with non linked data sources Felix Sasaki DFKI / W3C Fellow Slides: http://de.slideshare.net/atcfsenzoku/sasaki-datathonmadrid2015 1
  • 2. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015 What is NIF? • Natural Language Processing Interchange Format – See http://nlp2rdf.org/ • LLD format to store annotations & to organize NLP pipelines • API specification to create NIF workflows • More details: after the coffee break  • Following slides: main roles for NIF 2
  • 3. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015 Example (Partial; JSON-LD Syntax) { "@graph" : [ { "@id" : "p:char=0,18", "@type" : [ "nif:Context", "nif:Sentence", "nif:RFC5147String" ], "anchorOf" : "Welcome to Prague.", "beginIndex" : "0", "endIndex" : "18", "isString" : "Welcome to Prague.", "referenceContext" : "p:char=0,18” }, { "@id" : "p:char=11,17", "@type" : [ "nif:RFC5147String", "nif:Word" ], … "referenceContext" : "p:char=0,18", "taIdentRef" : "http://dbpedia.org/resource/Prague" }, …] } 3
  • 4. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015 Example (Partial; JSON-LD Syntax) { "@graph" : [ { "@id" : "p:char=0,18", "@type" : [ "nif:Context", "nif:Sentence", "nif:RFC5147String" ], "anchorOf" : "Welcome to Prague.", "beginIndex" : "0", "endIndex" : "18", "isString" : "Welcome to Prague.", "referenceContext" : "p:char=0,18” }, { "@id" : "p:char=11,17", "@type" : [ "nif:RFC5147String", "nif:Word" ], … "referenceContext" : "p:char=0,18", "taIdentRef" : "http://dbpedia.org/resource/Prague" }, …] } 4 • Identifying and typing annotations • Identifying annotation offsets • Adding additional knowledge, e.g. named entity identifier • Interrelating annotations
  • 5. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015 Example (Partial; JSON-LD Syntax) { "@graph" : [ { "@id" : "p:char=0,18", "@type" : [ "nif:Context", "nif:Sentence", "nif:RFC5147String" ], "anchorOf" : "Welcome to Prague.", "beginIndex" : "0", "endIndex" : "18", "isString" : "Welcome to Prague.", "referenceContext" : "p:char=0,18” }, { "@id" : "p:char=11,17", "@type" : [ "nif:RFC5147String", "nif:Word" ], … "referenceContext" : "p:char=0,18", "taIdentRef" : "http://dbpedia.org/resource/Prague" }, …] } 5 • Identifying and typing annotations • Identifying annotation offsets • Adding additional knowledge, e.g. named entity identifier • Interrelating annotations
  • 6. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015 Example (Partial; JSON-LD Syntax) { "@graph" : [ { "@id" : "p:char=0,18", "@type" : [ "nif:Context", "nif:Sentence", "nif:RFC5147String" ], "anchorOf" : "Welcome to Prague.", "beginIndex" : "0", "endIndex" : "18", "isString" : "Welcome to Prague.", "referenceContext" : "p:char=0,18” }, { "@id" : "p:char=11,17", "@type" : [ "nif:RFC5147String", "nif:Word" ], … "referenceContext" : "p:char=0,18", "taIdentRef" : "http://dbpedia.org/resource/Prague" }, …] } 6 • Identifying and typing annotations • Identifying annotation offsets • Adding additional knowledge, e.g. named entity identifier • Interrelating annotations
  • 7. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015 Example (Partial; JSON-LD Syntax) { "@graph" : [ { "@id" : "p:char=0,18", "@type" : [ "nif:Context", "nif:Sentence", "nif:RFC5147String" ], "anchorOf" : "Welcome to Prague.", "beginIndex" : "0", "endIndex" : "18", "isString" : "Welcome to Prague.", "referenceContext" : "p:char=0,18” }, { "@id" : "p:char=11,17", "@type" : [ "nif:RFC5147String", "nif:Word" ], … "referenceContext" : "p:char=0,18", "taIdentRef" : "http://dbpedia.org/resource/Prague" }, …] } 7 • Identifying and typing annotations • Identifying annotation offsets • Adding additional knowledge, e.g. named entity identifier • Interrelating annotations
  • 8. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015 A NIF workflow 8 Existing content Content analytics, e.g. named entity recognition Conversion to NIF Deploying knowledge from the LLD cloud
  • 9. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015 Potential scenario: roundtripping 9 Existing content Content analytics, e.g. named entity recognition Conversion to NIF Storing annotations in original content Deploying knowledge from the LLD cloud
  • 10. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015 Roundtripping • Roundtripping: Storing the outcome of content processing (analytics) tasks in the original content • Not always needed, but sometimes – examples: – Enriching Web content with named entity information; generating Schema.org markup via NIF pipelines. Format: HTML – Enriching localisation content, to add value beyond translation: Format: XLIFF 10
  • 11. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015 Example: HTML Example roundtripping workflow 11 … <p>Welcome to Prague!</p>… …<p>Welcome to <span … itemtype="http://schema.org/Place">Prague</span>!< /p>… 1) Conversion to NIF 2) NER processing 3) Back conversion to HTML
  • 12. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015 Example: XLIFF Example roundtripping workflow 12 … <xlf:source>Welcome to Prague!</xlf:source> … … <xlf:source>Welcome to <mrk … its:taClassRef="http://schema.org/Place">Prague </mrk>!</xlf:source> … 1) Conversion to NIF 2) NER processing 3) Back conversion to HTML
  • 13. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015 Example usage scenario: FREME project • See http://www.freme-project.eu/ • Developing interfaces for multilingual and semantic enrichment of digital content • Relies on NIF based enrichment workflows – See FREME API version 0.1 http://api.freme-project.eu/doc/0.1/ • Deploys aspects of the LIDER reference architecture for LLD processing – See D3.1.1 at http://lider-project.eu/?q=doc/deliverables • Focuses on four business cases – Localization BC requires XLIFF roundtripping – Web content personalisation BC requires HTML roundtripping 13
  • 14. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015 Challenges for roundtripping • Source format – How to store enrichment information (annotations) – How to handle existing information • Annotation model – NIF = a general graph-based annotation model – Sources format and annotation motivation may require restriction of the model 14
  • 15. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015 How to store annotations in various source formats • Solvable for markup languages like HTML or XLIFF • Challenge to preserve existing markup “<p>Welcome to <b>Prague</b>!</p>” • General issue with complex and proprietary formats: – “My own” storage mechanism = no tool support – Using existing storage mechanisms may mean: overloading semantics 15
  • 16. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015 Source format example: Word … <w:t>Welcome to Prague!</w:t> … 16 … <w:commentRangeStart w:id="0"/><w:t>Prague</w:t> <w:commentRangeEnd w:id="0"/> <w:r w:rsidR="00987079"> … <w:p w:rsidRPr="00987079">… Enrichment: type "http://schema.org/Place"…</w:p> Enrichment process; storing enrichment as comments Change of original content: creation of anchor Comment stored separately; refers to anchor: “standoff approach” Content storage Comment storage Content storage (Word file unzipped)
  • 17. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015 Annotation models • NIF: like RDF = general graph model – Consisting of nodes and arcs 17 p:char=11,17 dbp:Prague taIdentRef
  • 18. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015 Restricting graphs: Tree structured annotations on several layers 18 • Tree structures for syntactic annotations • Several annotation layers for the same text • Concurrent hierarchies • Representation only of one of these in roundtripping with XML Example taken from TEI http://www.tei-c.org/release/doc/tei-p5-doc/en/html/NH.html
  • 19. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015 Representing overlapping hierarchies with markup (1/2) Solutions advertised by the TEI • Multiple encoding of the same information – One XML document per annotation • Boundary marking with empty “milestone” elements – Also used by XLIFF 19
  • 20. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015 Representing overlapping hierarchies with markup (2/2) Solutions advertised by the TEI • Fragmentation and reconstitution of virtual elements – One hierarchy explicit, others with interrelated marked-up spans • Stand-off markup – Separation of text and annotations, interlinked via anchor and reference – Cf. Word example 20
  • 21. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015 Representing overlapping hierarchies in RDF POWLA (cf. Chiarcos, 2012) • RDF representation for corpus annotation, based on PAULA XML Standoff format • Allows to represent hierarchical, multi-layer corpora in RDF and query in SPARQL • Not relevant for roundtripping, but for linguistic annotation representation and processing in RDF 21
  • 22. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015 Lessons learned • Choose the overlap solution that fits your roundtripping modelling and processing needs • Consider off-the-shelf tooling – For 100% hierarchical data: XPath / CSS selectors, DOM, … • Consider libraries – For extraction only: Tika http://tika.apache.org/ – For roundtripping: Okapi http://okapi.opentag.com/ - in FREME currently being adapted for roundtripping in selected formats • Make sure the annotation survives in the original format – cf. Word example – Soon to be made easier by using Okapi 22
  • 23. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015 Roundtripping of NIF based Linguistic Linked Data with non linked data sources Felix Sasaki DFKI / W3C Fellow 23