We describe the NIF approach towards representing annotations and focus on roundtripping: the conversion of existing digital content from formats like Word, HTML etc. into NIF and re-integration of annotations into the original file format. Such roundtripping is needed for many industry applications of linguistic linked data and natural language processing. Roundtripping is not always possible and constrained by 1) possibilities to store annotations in the original format, while preserving existing information (e.g. HTML inline tags) and 2) constraints of the annotation model, which is basicaly tree-structured for markup languages like generic XML or HTML. There is no general solution to this problem. Developers of roundtripping applications should use existing libraries as much as possible and leverage them to their needs.
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level...Till Blume
The paper presentation was given at the foundations of databases (GvDB) workshop.
https://dbs.cs.uni-duesseldorf.de/gvdb2018/wp-content/uploads/2018/05/GvDB2018_paper_4.pdf
Visual Ontology Modeling for Domain Experts and Business Users with metaphactoryPeter Haase
Visual Ontology Modeling for Domain Experts and Business Users with metaphactory
Presentation at the OntoCommons Workshop on Ontology Engineering Tools @ Fri Mar 19, 2021
Explicit Semantics in Graph DBs Driving Digital Transformation With Neo4jConnected Data World
Dr. Jesús Barrasa's slides from his talk at Connected Data London. Jesús, who is a senior field engineer at Neo4j presented how semantic web principles can be used in a graph database.
Querying GrAF data in linguistic analysisPeter Bouda
The “Graph Annotation Framework” (GrAF) defines an API and an XML format to store and query linguistic annotations as annotation graphs. The format was standardized as ISO 24612 in 20121, and was explicitly developed as an underlying data model for linguistic annotations in a radical stand-off approach2 ([Ide and Suderman 2007]). The basic data structures are annotation graphs as proposed in [Bird and Liberman 2001], and are general and expressive enough to encode all known varieties of annotation in linguistics and other “annotation-based” disciplines. Although GrAF is not a TEI-compatible format, both standards share a certain technological foundation and grew in a similar ecosystem, but with slightly different applications in mind. In our talk we will show the connections between TEI and GrAF, propose an option to convert between the „two worlds“, and demonstrate a query system for GrAF data that we already use in typological analysis of annotated data from language documentation projects.
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level...Till Blume
The paper presentation was given at the foundations of databases (GvDB) workshop.
https://dbs.cs.uni-duesseldorf.de/gvdb2018/wp-content/uploads/2018/05/GvDB2018_paper_4.pdf
Visual Ontology Modeling for Domain Experts and Business Users with metaphactoryPeter Haase
Visual Ontology Modeling for Domain Experts and Business Users with metaphactory
Presentation at the OntoCommons Workshop on Ontology Engineering Tools @ Fri Mar 19, 2021
Explicit Semantics in Graph DBs Driving Digital Transformation With Neo4jConnected Data World
Dr. Jesús Barrasa's slides from his talk at Connected Data London. Jesús, who is a senior field engineer at Neo4j presented how semantic web principles can be used in a graph database.
Querying GrAF data in linguistic analysisPeter Bouda
The “Graph Annotation Framework” (GrAF) defines an API and an XML format to store and query linguistic annotations as annotation graphs. The format was standardized as ISO 24612 in 20121, and was explicitly developed as an underlying data model for linguistic annotations in a radical stand-off approach2 ([Ide and Suderman 2007]). The basic data structures are annotation graphs as proposed in [Bird and Liberman 2001], and are general and expressive enough to encode all known varieties of annotation in linguistics and other “annotation-based” disciplines. Although GrAF is not a TEI-compatible format, both standards share a certain technological foundation and grew in a similar ecosystem, but with slightly different applications in mind. In our talk we will show the connections between TEI and GrAF, propose an option to convert between the „two worlds“, and demonstrate a query system for GrAF data that we already use in typological analysis of annotated data from language documentation projects.
Semantic Variation Graphs the case for RDF & SPARQLJerven Bolleman
Presentation given to the GA4GH dataworking group. It starts with an introduction to what RDF is followed by how one can model genomic variation graphs in RDF. Then we show how one can use SPARQL to query this data.
Presented in : JIST2015, Yichang, China
Prototype: http://rc.lodac.nii.ac.jp/rdf4u/
Video: https://www.youtube.com/watch?v=z3roA9-Cp8g
Abstract: It is known that Semantic Web and Linked Open Data (LOD) are powerful technologies for knowledge management, and explicit knowledge is expected to be presented by RDF format (Resource Description Framework), but normal users are far from RDF due to technical skills required. As we learn, a concept-map or a node-link diagram can enhance the learning ability of learners from beginner to advanced user level, so RDF graph visualization can be a suitable tool for making users be familiar with Semantic technology. However, an RDF graph generated from the whole query result is not suitable for reading, because it is highly connected like a hairball and less organized. To make a graph presenting knowledge be more proper to read, this research introduces an approach to sparsify a graph using the combination of three main functions: graph simplification, triple ranking, and property selection. These functions are mostly initiated based on the interpretation of RDF data as knowledge units together with statistical analysis in order to deliver an easily-readable graph to users. A prototype is implemented to demonstrate the suitability and feasibility of the approach. It shows that the simple and flexible graph visualization is easy to read, and it creates the impression of users. In addition, the attractive tool helps to inspire users to realize the advantageous role of linked data in knowledge management.
Knowledge graph embeddings are a mechanism that projects each entity in a knowledge graph to a point in a continuous vector space. It is commonly assumed that those approaches project two entities closely to each other if they are similar and/or related. In this talk, I give a closer look at the roles of similarity and relatedness with respect to knowledge graph embeddings, and discuss how the well-known embedding mechanism RDF2vec can be tailored towards focusing on similarity, relatedness, or both.
Knowledge Discovery tools using Linked Data techniques - {resentation for the Linked Data 4 Knowledge Discovery Workshop at ECML/PKDD2015 conference - http://events.kmi.open.ac.uk/ld4kd2015/ -
The Power of Semantic Technologies to Explore Linked Open DataOntotext
Atanas Kiryakov's, Ontotext’s CEO, presentation at the first edition of Graphorum (http://graphorum2017.dataversity.net/) – a new forum that taps into the growing interest in Graph Databases and Technologies. Graphorum is co-located with the Smart Data Conference, organized by the digital publishing platform Dataversity.
The presentation demonstrates the capabilities of Ontotext’s own approach to contributing to the discipline of more intelligent information gathering and analysis by:
- graphically explorinh the connectivity patterns in big datasets;
- building new links between identical entities residing in different data silos;
- getting insights of what type of queries can be run against various linked data sets;
- reliably filtering information based on relationships, e.g., between people and organizations, in the news;
- demonstrating the conversion of tabular data into RDF.
Learn more at http://ontotext.com/.
ROI in Linking Content to CRM by Applying the Linked Data StackMartin Voigt
Today, decision makers in enterprises have to rely more and more on a variety of data sets that are internally but also externally available in heterogeneous formats. Therefore, intelligent processes are required to build an integrated knowledge-base. Unfortunately, the adoption of the Linked Data lifecycle within enterprises, which targets the extraction, interlinking, publishing and analytics of distributed data, lags behind the public domain due to missing frameworks that are efficiently to deploy and ease to use. In this paper, we present our adoption of the lifecycle through our generic, enterprise-ready Linked Data workbench. To judge its benefits, we describe its application within a real-world Customer Relationship Management scenario. It shows (1) that sales employee could significantly reduce their workload and (2) that the integration of sophisticated Linked Data tools come with an obvious positive Return on Investment.
First Steps in Semantic Data Modelling and Search & Analytics in the CloudOntotext
This webinar will break the roadblocks that prevent many from reaping the benefits of heavyweight Semantic Technology in small scale projects. We will show you how to build Semantic Search & Analytics proof of concepts by using managed services in the Cloud.
Property graph vs. RDF Triplestore comparison in 2020Ontotext
This presentation goes all the way from intro "what graph databases are" to table comparing the RDF vs. PG plus two different diagrams presenting the market circa 2020
Hacktoberfest 2020 'Intro to Knowledge Graph' with Chris Woodward of ArangoDB and reKnowledge. Accompanying video is available here: https://youtu.be/ZZt6xBmltz4
Linked Data Experiences at Springer NatureMichele Pasin
An overview of how we're using semantic technologies at Springer Nature, and an introduction to our latest product: www.scigraph.com
(Keynote given at http://2016.semantics.cc/, Leipzig, Sept 2016)
Linked Data at the National Széchényi Library : road to the publicationhorvadam
National Széchényi Library (National Library of Hungary) published its entire catalog and the thesaurus and the name authority file into the semantic web.ű
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data21Style
MuseoTorino, is the first italian project using Web 3.0 tecnologies. NOSQL-GraphDB (Neo4J), RDFa, Linked Open Data.
MuseoTorino is a 21style (www.21-style.com) project for the municipality of Torino, Italy.
These slides come from CodeMotion, the best Italian conference for developers and IT entusiast !
Semantic Variation Graphs the case for RDF & SPARQLJerven Bolleman
Presentation given to the GA4GH dataworking group. It starts with an introduction to what RDF is followed by how one can model genomic variation graphs in RDF. Then we show how one can use SPARQL to query this data.
Presented in : JIST2015, Yichang, China
Prototype: http://rc.lodac.nii.ac.jp/rdf4u/
Video: https://www.youtube.com/watch?v=z3roA9-Cp8g
Abstract: It is known that Semantic Web and Linked Open Data (LOD) are powerful technologies for knowledge management, and explicit knowledge is expected to be presented by RDF format (Resource Description Framework), but normal users are far from RDF due to technical skills required. As we learn, a concept-map or a node-link diagram can enhance the learning ability of learners from beginner to advanced user level, so RDF graph visualization can be a suitable tool for making users be familiar with Semantic technology. However, an RDF graph generated from the whole query result is not suitable for reading, because it is highly connected like a hairball and less organized. To make a graph presenting knowledge be more proper to read, this research introduces an approach to sparsify a graph using the combination of three main functions: graph simplification, triple ranking, and property selection. These functions are mostly initiated based on the interpretation of RDF data as knowledge units together with statistical analysis in order to deliver an easily-readable graph to users. A prototype is implemented to demonstrate the suitability and feasibility of the approach. It shows that the simple and flexible graph visualization is easy to read, and it creates the impression of users. In addition, the attractive tool helps to inspire users to realize the advantageous role of linked data in knowledge management.
Knowledge graph embeddings are a mechanism that projects each entity in a knowledge graph to a point in a continuous vector space. It is commonly assumed that those approaches project two entities closely to each other if they are similar and/or related. In this talk, I give a closer look at the roles of similarity and relatedness with respect to knowledge graph embeddings, and discuss how the well-known embedding mechanism RDF2vec can be tailored towards focusing on similarity, relatedness, or both.
Knowledge Discovery tools using Linked Data techniques - {resentation for the Linked Data 4 Knowledge Discovery Workshop at ECML/PKDD2015 conference - http://events.kmi.open.ac.uk/ld4kd2015/ -
The Power of Semantic Technologies to Explore Linked Open DataOntotext
Atanas Kiryakov's, Ontotext’s CEO, presentation at the first edition of Graphorum (http://graphorum2017.dataversity.net/) – a new forum that taps into the growing interest in Graph Databases and Technologies. Graphorum is co-located with the Smart Data Conference, organized by the digital publishing platform Dataversity.
The presentation demonstrates the capabilities of Ontotext’s own approach to contributing to the discipline of more intelligent information gathering and analysis by:
- graphically explorinh the connectivity patterns in big datasets;
- building new links between identical entities residing in different data silos;
- getting insights of what type of queries can be run against various linked data sets;
- reliably filtering information based on relationships, e.g., between people and organizations, in the news;
- demonstrating the conversion of tabular data into RDF.
Learn more at http://ontotext.com/.
ROI in Linking Content to CRM by Applying the Linked Data StackMartin Voigt
Today, decision makers in enterprises have to rely more and more on a variety of data sets that are internally but also externally available in heterogeneous formats. Therefore, intelligent processes are required to build an integrated knowledge-base. Unfortunately, the adoption of the Linked Data lifecycle within enterprises, which targets the extraction, interlinking, publishing and analytics of distributed data, lags behind the public domain due to missing frameworks that are efficiently to deploy and ease to use. In this paper, we present our adoption of the lifecycle through our generic, enterprise-ready Linked Data workbench. To judge its benefits, we describe its application within a real-world Customer Relationship Management scenario. It shows (1) that sales employee could significantly reduce their workload and (2) that the integration of sophisticated Linked Data tools come with an obvious positive Return on Investment.
First Steps in Semantic Data Modelling and Search & Analytics in the CloudOntotext
This webinar will break the roadblocks that prevent many from reaping the benefits of heavyweight Semantic Technology in small scale projects. We will show you how to build Semantic Search & Analytics proof of concepts by using managed services in the Cloud.
Property graph vs. RDF Triplestore comparison in 2020Ontotext
This presentation goes all the way from intro "what graph databases are" to table comparing the RDF vs. PG plus two different diagrams presenting the market circa 2020
Hacktoberfest 2020 'Intro to Knowledge Graph' with Chris Woodward of ArangoDB and reKnowledge. Accompanying video is available here: https://youtu.be/ZZt6xBmltz4
Linked Data Experiences at Springer NatureMichele Pasin
An overview of how we're using semantic technologies at Springer Nature, and an introduction to our latest product: www.scigraph.com
(Keynote given at http://2016.semantics.cc/, Leipzig, Sept 2016)
Linked Data at the National Széchényi Library : road to the publicationhorvadam
National Széchényi Library (National Library of Hungary) published its entire catalog and the thesaurus and the name authority file into the semantic web.ű
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data21Style
MuseoTorino, is the first italian project using Web 3.0 tecnologies. NOSQL-GraphDB (Neo4J), RDFa, Linked Open Data.
MuseoTorino is a 21style (www.21-style.com) project for the municipality of Torino, Italy.
These slides come from CodeMotion, the best Italian conference for developers and IT entusiast !
The nature.com ontologies portal: nature.com/ontologiesTony Hammond
Presentation by Tony Hammond and Michele Pasin to Linked Science workshop, co-located with International Semantic Web Conference (ISWC) 2015, on October 12, 2015
Data integration with a façade. The case of knowledge graph construction.Enrico Daga
"Data integration with a façade.
The case of knowledge graph construction." is an overview of recent research in façade-based data access. The slides introduce core notions of façade-based data access and the design principles of SPARQL Anything, a system that allows querying of many formats (CSV, JSON, XML, HTML, Markdown , Excel, ...) in plain SPARQL.
A brief history of the RDF4J Project and an overview of tools and code examples that demonstrate how to work with it in your applications.
Slides accompanying the Lotico Webinar event on May 14, 2020 - see http://www.lotico.com/index.php/Eclipse_RDF4J_-_Working_with_RDF_in_Java
This XML Prague 2015 Pre-conference presentations shows practical usage of linked data sources. These sources can help to: enrich content with entities, add link to external data sources, use the enriched content in question answering, machine translation or other scenarios. The aim is to show the practical application of linked data sources in XML tooling. The presentation is an update and provides outcomes of the related session held at XML Prague 2014.
The Rhizomer Semantic Content Management SystemRoberto García
The Rhizomer platform is a Content Management System (CMS) based on a Resource Oriented Approach (RESTful) and Semantic Web technologies. It achieves a great level of flexibility and provides sophisticated content management services. All content is described using semantic metadata semi-automatically extracted from multimedia content, which enriches the browsing experience and enables semantic queries. A usable user interface is built on top of the CMS in order to facilitate the interaction with content and enhance it with the information provided by the associated semantic metadata. As an application scenario of the platform, its use in a media company where audio content is managed and its speech transcript semantically annotated is described.
RDF Linked Data - Automatic Exchange of BIM ContainersSafe Software
This presentation tells the story, and FME solutions of a Dutch Utility company for the automatic exchange of data containers containing RDF Linked data, BIM, and documents.
The presentation will focus on the non-traditional representation of RDF Linked Data and how this integrates with FME through SPARQL, Apache Jena, and a few customer-built transformers in FME.
This FME solution also uses my Excel switch-based method of directing the data flow (my presentation during the FME World Fair).
Unternehmen ändern sich im Zeitalter der Digitalisierung rapide: weg von starren Hierarchien, hin zu vernetzten und flexiblen Strukturen. Graphdatenbanken sind ein Werkzeug, um diese Veränderung in Form von so genannten Unternehmensgraphen („Enterprise Knowledge Graph“) zu begleiten. Komplementär zu den technischen Aspekten von Graphdatenbanken werden in der Wirtschaftsinformatik Prinzipien der organisatorischen Hoheit („Governance“) über Graphstrukturen vermittelt. Mit dieser Kombination sind Wirschaftsinformatiker bestens aufgestellt, Schlüsselfiguren für das Informations- und Wissensmanagement im Unternehmen zu werden.
Presentation at EVA 2017 conference on best practices for enrichment of (meta)data with linked data. Slides are in German but with many self explanatory images
Freme at feisgiltt 2015 freme & linked data & localisersFelix Sasaki
This presentation is a complement to
http://slideshare.net/atcfsenzoku/freme-at-feisgiltt-2015-freme-use-cases
It provides more details on processing of linked data in FREME, using NIF and with the FREME services e-Entity and e-Link
Metadaten zur Anreicherung von Inhalten ist möglich.
Prototypen Tools für Content Authors existieren.
Externe, offene Linked Data Datenquellen sind wichtiger Bestandteil der Anreicherung.
Angereicherte Inhalte können Basis für neue Anwendungen wie SEO sein.
Angereicherte Inhalte können selbst zur Datenquelle werden und neue Anwendungen wie (mehrsprachige) Q/A Services erlauben.
1.Wireless Communication System_Wireless communication is a broad term that i...JeyaPerumal1
Wireless communication involves the transmission of information over a distance without the help of wires, cables or any other forms of electrical conductors.
Wireless communication is a broad term that incorporates all procedures and forms of connecting and communicating between two or more devices using a wireless signal through wireless communication technologies and devices.
Features of Wireless Communication
The evolution of wireless technology has brought many advancements with its effective features.
The transmitted distance can be anywhere between a few meters (for example, a television's remote control) and thousands of kilometers (for example, radio communication).
Wireless communication can be used for cellular telephony, wireless access to the internet, wireless home networking, and so on.
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC
Ellisha Heppner, Grant Management Lead, presented an update on APNIC Foundation to the PNG DNS Forum held from 6 to 10 May, 2024 in Port Moresby, Papua New Guinea.
Understanding User Behavior with Google Analytics.pdfSEO Article Boost
Unlocking the full potential of Google Analytics is crucial for understanding and optimizing your website’s performance. This guide dives deep into the essential aspects of Google Analytics, from analyzing traffic sources to understanding user demographics and tracking user engagement.
Traffic Sources Analysis:
Discover where your website traffic originates. By examining the Acquisition section, you can identify whether visitors come from organic search, paid campaigns, direct visits, social media, or referral links. This knowledge helps in refining marketing strategies and optimizing resource allocation.
User Demographics Insights:
Gain a comprehensive view of your audience by exploring demographic data in the Audience section. Understand age, gender, and interests to tailor your marketing strategies effectively. Leverage this information to create personalized content and improve user engagement and conversion rates.
Tracking User Engagement:
Learn how to measure user interaction with your site through key metrics like bounce rate, average session duration, and pages per session. Enhance user experience by analyzing engagement metrics and implementing strategies to keep visitors engaged.
Conversion Rate Optimization:
Understand the importance of conversion rates and how to track them using Google Analytics. Set up Goals, analyze conversion funnels, segment your audience, and employ A/B testing to optimize your website for higher conversions. Utilize ecommerce tracking and multi-channel funnels for a detailed view of your sales performance and marketing channel contributions.
Custom Reports and Dashboards:
Create custom reports and dashboards to visualize and interpret data relevant to your business goals. Use advanced filters, segments, and visualization options to gain deeper insights. Incorporate custom dimensions and metrics for tailored data analysis. Integrate external data sources to enrich your analytics and make well-informed decisions.
This guide is designed to help you harness the power of Google Analytics for making data-driven decisions that enhance website performance and achieve your digital marketing objectives. Whether you are looking to improve SEO, refine your social media strategy, or boost conversion rates, understanding and utilizing Google Analytics is essential for your success.
Italy Agriculture Equipment Market Outlook to 2027harveenkaur52
Agriculture and Animal Care
Ken Research has an expertise in Agriculture and Animal Care sector and offer vast collection of information related to all major aspects such as Agriculture equipment, Crop Protection, Seed, Agriculture Chemical, Fertilizers, Protected Cultivators, Palm Oil, Hybrid Seed, Animal Feed additives and many more.
Our continuous study and findings in agriculture sector provide better insights to companies dealing with related product and services, government and agriculture associations, researchers and students to well understand the present and expected scenario.
Our Animal care category provides solutions on Animal Healthcare and related products and services, including, animal feed additives, vaccination
1. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
Roundtripping of NIF based
Linguistic Linked Data with non
linked data sources
Felix Sasaki
DFKI / W3C Fellow
Slides:
http://de.slideshare.net/atcfsenzoku/sasaki-datathonmadrid2015
1
2. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
What is NIF?
• Natural Language Processing Interchange
Format
– See http://nlp2rdf.org/
• LLD format to store annotations & to organize
NLP pipelines
• API specification to create NIF workflows
• More details: after the coffee break
• Following slides: main roles for NIF
2
8. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
A NIF workflow
8
Existing
content
Content analytics, e.g.
named entity
recognition
Conversion to
NIF
Deploying knowledge from the LLD cloud
9. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
Potential scenario: roundtripping
9
Existing
content
Content analytics, e.g.
named entity
recognition
Conversion to
NIF
Storing annotations in original content
Deploying knowledge from the LLD cloud
10. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
Roundtripping
• Roundtripping: Storing the outcome of
content processing (analytics) tasks in the
original content
• Not always needed, but sometimes –
examples:
– Enriching Web content with named entity
information; generating Schema.org markup via
NIF pipelines. Format: HTML
– Enriching localisation content, to add value
beyond translation: Format: XLIFF
10
11. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
Example: HTML
Example roundtripping workflow
11
… <p>Welcome to Prague!</p>…
…<p>Welcome to <span …
itemtype="http://schema.org/Place">Prague</span>!<
/p>…
1) Conversion to NIF 2) NER processing
3) Back conversion to HTML
12. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
Example: XLIFF
Example roundtripping workflow
12
… <xlf:source>Welcome to Prague!</xlf:source> …
… <xlf:source>Welcome to <mrk …
its:taClassRef="http://schema.org/Place">Prague
</mrk>!</xlf:source> …
1) Conversion to NIF 2) NER processing
3) Back conversion to HTML
13. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
Example usage scenario:
FREME project
• See http://www.freme-project.eu/
• Developing interfaces for multilingual and semantic
enrichment of digital content
• Relies on NIF based enrichment workflows
– See FREME API version 0.1
http://api.freme-project.eu/doc/0.1/
• Deploys aspects of the LIDER reference architecture for LLD
processing
– See D3.1.1 at http://lider-project.eu/?q=doc/deliverables
• Focuses on four business cases
– Localization BC requires XLIFF roundtripping
– Web content personalisation BC requires HTML roundtripping
13
14. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
Challenges for roundtripping
• Source format
– How to store enrichment information
(annotations)
– How to handle existing information
• Annotation model
– NIF = a general graph-based annotation model
– Sources format and annotation motivation may
require restriction of the model
14
15. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
How to store annotations in various
source formats
• Solvable for markup languages like HTML or
XLIFF
• Challenge to preserve existing markup
“<p>Welcome to <b>Prague</b>!</p>”
• General issue with complex and proprietary
formats:
– “My own” storage mechanism = no tool support
– Using existing storage mechanisms may mean:
overloading semantics
15
16. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
Source format example: Word
… <w:t>Welcome to Prague!</w:t> …
16
… <w:commentRangeStart w:id="0"/><w:t>Prague</w:t>
<w:commentRangeEnd w:id="0"/>
<w:r w:rsidR="00987079"> …
<w:p w:rsidRPr="00987079">… Enrichment: type "http://schema.org/Place"…</w:p>
Enrichment process; storing enrichment as comments
Change of original content: creation of anchor
Comment stored separately; refers to anchor: “standoff approach”
Content storage
Comment storage
Content storage (Word file unzipped)
17. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
Annotation models
• NIF: like RDF = general graph model
– Consisting of nodes and arcs
17
p:char=11,17 dbp:Prague
taIdentRef
18. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
Restricting graphs: Tree structured annotations
on several layers
18
• Tree structures
for syntactic
annotations
• Several
annotation layers
for the same text
• Concurrent
hierarchies
• Representation
only of one of
these in
roundtripping
with XML
Example taken from TEI http://www.tei-c.org/release/doc/tei-p5-doc/en/html/NH.html
19. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
Representing overlapping hierarchies
with markup (1/2)
Solutions advertised by the TEI
• Multiple encoding of the same information
– One XML document per annotation
• Boundary marking with empty “milestone”
elements
– Also used by XLIFF
19
20. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
Representing overlapping hierarchies
with markup (2/2)
Solutions advertised by the TEI
• Fragmentation and reconstitution of virtual
elements
– One hierarchy explicit, others with interrelated
marked-up spans
• Stand-off markup
– Separation of text and annotations, interlinked via
anchor and reference
– Cf. Word example
20
21. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
Representing overlapping hierarchies
in RDF
POWLA (cf. Chiarcos, 2012)
• RDF representation for corpus annotation,
based on PAULA XML Standoff format
• Allows to represent hierarchical, multi-layer
corpora in RDF and query in SPARQL
• Not relevant for roundtripping, but for
linguistic annotation representation and
processing in RDF
21
22. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
Lessons learned
• Choose the overlap solution that fits your
roundtripping modelling and processing needs
• Consider off-the-shelf tooling
– For 100% hierarchical data: XPath / CSS selectors, DOM, …
• Consider libraries
– For extraction only: Tika http://tika.apache.org/
– For roundtripping: Okapi http://okapi.opentag.com/ - in
FREME currently being adapted for roundtripping in
selected formats
• Make sure the annotation survives in the original
format – cf. Word example
– Soon to be made easier by using Okapi
22
23. Sasaki – LLD Datathon – Cercedilla, Spain, May 2015
Roundtripping of NIF based
Linguistic Linked Data with non
linked data sources
Felix Sasaki
DFKI / W3C Fellow
23